Thymoma habitat segmentation and risk prediction model using CT imaging and K-means clustering.
Liang Z, Li J, He S, Li S, Cai R, Chen C, Zhang Y, Deng B, Wu Y
•papers•May 19 2025Thymomas, though rare, present a wide range of clinical behaviors, from indolent to aggressive forms, making accurate risk stratification crucial for treatment planning. Traditional methods such as histopathology and radiological assessments often lack the ability to capture tumor heterogeneity, which can impact prognosis. Radiomics, combined with machine learning, provides a method to extract and analyze quantitative imaging features, offering the potential to improve tumor classification and risk prediction. By segmenting tumors into distinct habitat zones, it becomes possible to assess intratumoral heterogeneity more effectively. This study employs radiomics and machine learning techniques to enhance thymoma risk prediction, aiming to improve diagnostic consistency and reduce variability in radiologists' assessments. This study aims to identify different habitat zones within thymomas through CT imaging feature analysis and to establish a predictive model to differentiate between high and low-risk thymomas. Additionally, the study explores how this model can assist radiologists. We obtained CT imaging data from 133 patients with thymoma who were treated at the Affiliated Hospital of Guangdong Medical University from 2015 to 2023. Images from the plain scan phase, venous phase, arterial phase, and their differential images (subtracted images) were used. Tumor regions were segmented into three habitat zones using K-Means clustering. Imaging features from each habitat zone were extracted using the PyRadiomics (van Griethuysen, 2017) library. The 28 most distinguishing features were selected through Mann-Whitney U tests (Mann, 1947) and Spearman's correlation analysis (Spearman, 1904). Five predictive models were built using the same machine learning algorithm (Support Vector Machine [SVM]): Habitat1, Habitat2, Habitat3 (trained on features from individual tumor habitat regions), Habitat All (trained on combined features from all regions), and Intra (trained on intratumoral features), and their performances were evaluated for comparison. The models' diagnostic outcomes were compared with the diagnoses of four radiologists (two junior and two experienced physicians). The AUC (area under curve) for habitat zone 1 was 0.818, for habitat zone 2 was 0.732, and for habitat zone 3 was 0.763. The comprehensive model, which combined data from all habitat zones, achieved an AUC of 0.960, outperforming the model based on traditional radiomic features (AUC of 0.720). The model significantly improved the diagnostic accuracy of all four radiologists. The AUCs for junior radiologists 1 and 2 increased from 0.747 and 0.775 to 0.932 and 0.972, respectively, while for experienced radiologists 1 and 2, the AUCs increased from 0.932 and 0.859 to 0.977 and 0.972, respectively. This study successfully identified distinct habitat zones within thymomas through CT imaging feature analysis and developed an efficient predictive model that significantly improved diagnostic accuracy. This model offers a novel tool for risk assessment of thymomas and can aid in guiding clinical decision-making.