Souichi Oka, Yoshiyasu Takefuji
European journal of radiology 191 112308-112308 2025年10月 査読有り
This correspondence critically examines the methodology of Schindele et al. (2025) on thyroid cancer recurrence prediction. While their interpretable XGBoost model achieved a high predictive accuracy of 95.8% and a 0.947 AUROC, it is crucial to recognize that this predictive power does not justify the reliability of its derived feature importance rankings. As widely acknowledged in the literature, high predictive accuracy does not guarantee unbiased or reliable feature attribution. We underscore that gradient boosting decision tree (GBDT) models, including XGBoost, are prone to inherent biases in feature importance estimation, often due to overfitting. Furthermore, SHapley Additive exPlanations (SHAP), a widely adopted explainable AI (XAI) technique, can inherit and even amplify these biases, given its model-dependent nature. This raises concerns about the interpretive validity of the identified risk factors. To mitigate these methodological limitations, we advocate for integrative analytical frameworks that combine machine learning with robust statistical and non-parametric approaches, such as Highly Variable Feature Selection (HVFS) and Independent Component Analysis (ICA). These multi-faceted strategies are indispensable for obtaining robust and interpretable insights into feature importance, warranting their prioritization in future research efforts.