Towards interpretable AI in personalized medicine through a radiological-biological radiomics dictionary linking semantic Lung-RADS and imaging radiomics features.
Authors
Affiliations (5)
Affiliations (5)
- Technological Virtual Collaboration (TECVICO Corp.), Vancouver, BC, Canada; NAIRG, Department of Neuroscience, Hamadan University of Medical Sciences, Hamadan, Iran.
- Department of Radiology, School of Paramedical Sciences, Guilan University of Medical Sciences, Rasht, Iran.
- Technological Virtual Collaboration (TECVICO Corp.), Vancouver, BC, Canada.
- Department of Basic and Translation Research, BC Cancer Research Institute, Vancouver, BC, Canada; Department of Radiology, University of British Columbia, Vancouver, BC, Canada.
- Technological Virtual Collaboration (TECVICO Corp.), Vancouver, BC, Canada; Department of Basic and Translation Research, BC Cancer Research Institute, Vancouver, BC, Canada; Department of Radiology, University of British Columbia, Vancouver, BC, Canada. Electronic address: [email protected].
Abstract
Lung cancer remains the leading cause of cancer-related mortality worldwide, with survival largely dependent on early detection. Standard-dose computed tomography (CT) screening, guided by the Lung Imaging Reporting and Data System (Lung-RADS), provides standardized criteria for nodule evaluation. However, interpretation is limited by inter-reader variability and reliance on qualitative descriptors. Radiomics offers quantitative biomarkers but faces challenges of clinical interpretability. In this work, we introduce a radiological-biological dictionary of radiomic features (RFs) that aligns quantitative metrics with Lung-RADS semantic categories, thereby bridging computational and clinical reasoning. We developed a clinically informed dictionary translating Lung-RADS semantic features into RFs through literature curation and expert review. As a proof-of-concept, imaging and clinical data from 977 patients across 12 collections in The Cancer Imaging Archive (TCIA) were analyzed. Following preprocessing and manual segmentation, 110 RFs per nodule were extracted using PyRadiomics in compliance with the Image Biomarker Standardization Initiative (IBSI). A semi-supervised learning (SSL) framework incorporating 499 labeled and 478 unlabeled cases was employed to enhance model generalizability. Seven feature selection techniques and ten interpretable classification models were evaluated. SHapley Additive exPlanations (SHAP) analysis was used to assess correspondence between feature importance and Lung-RADS descriptors. A clinically informed dictionary was developed through literature curation, translating ten Lung-RADS semantic features into corresponding RFs (4-24 RFs per descriptor), and was validated by eight expert reviewers. The best SSL pipeline (ANOVA + support vector machine) achieved a mean validation accuracy of 0.79 ± 0.13. SHAP identifies key radiomic proxies corresponding to Lung-RADS descriptors (e.g., attenuation, margin irregularity, spiculation), which confirmed our dictionary approach. The proposed dictionary provides an interpretable framework linking radiomics and Lung-RADS semantics, advancing explainable artificial intelligence for CT-based lung cancer screening.