Latest Papers on Radiology AI. Tags: CT

A computed tomography-based radiomics prediction model for BRAF mutation status in colorectal cancer.

Zhou B, Tan H, Wang Y, Huang B, Wang Z, Zhang S, Zhu X, Wang Z, Zhou J, Cao Y

•papers•May 15 2025

The aim of this study was to develop and validate CT venous phase image-based radiomics to predict BRAF gene mutation status in preoperative colorectal cancer patients. In this study, 301 patients with pathologically confirmed colorectal cancer were retrospectively enrolled, comprising 225 from Centre I (73 mutant and 152 wild-type) and 76 from Centre II (36 mutant and 40 wild-type). The Centre I cohort was randomly divided into a training set (n = 158) and an internal validation set (n = 67) in a 7:3 ratio, while Centre II served as an independent external validation set (n = 76). The whole tumor region of interest was segmented, and radiomics characteristics were extracted. To explore whether tumor expansion could improve the performance of the study objectives, the tumor contour was extended by 3 mm in this study. Finally, a t-test, Pearson correlation, and LASSO regression were used to screen out features strongly associated with BRAF mutations. Based on these features, six classifiers-Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), K-Nearest Neighbors (KNN), and Extreme Gradient Boosting (XGBoost)-were constructed. The model performance and clinical utility were evaluated using receiver operating characteristic (ROC) curves, decision curve analysis, accuracy, sensitivity, and specificity. Gender was an independent predictor of BRAF mutations. The unexpanded RF model, constructed using 11 imaging histologic features, demonstrated the best predictive performance. For the training cohort, it achieved an AUC of 0.814 (95% CI 0.732-0.895), an accuracy of 0.810, and a sensitivity of 0.620. For the internal validation cohort, it achieved an AUC of 0.798 (95% CI 0.690-0.907), an accuracy of 0.761, and a sensitivity of 0.609. For the external validation cohort, it achieved an AUC of 0.737 (95% CI 0.616-0.847), an accuracy of 0.658, and a sensitivity of 0.667. A machine learning model based on CT radiomics can effectively predict BRAF mutations in patients with colorectal cancer. The unexpanded RF model demonstrated optimal predictive performance.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab

External Validation of a CT-Based Radiogenomics Model for the Detection of EGFR Mutation in NSCLC and the Impact of Prevalence in Model Building by Using Synthetic Minority Over Sampling (SMOTE): Lessons Learned.

Kohan AA, Mirshahvalad SA, Hinzpeter R, Kulanthaivelu R, Avery L, Ortega C, Metser U, Hope A, Veit-Haibach P

•papers•May 15 2025

Radiogenomics holds promise in identifying molecular alterations in nonsmall cell lung cancer (NSCLC) using imaging features. Previously, we developed a radiogenomics model to predict epidermal growth factor receptor (EGFR) mutations based on contrast-enhanced computed tomography (CECT) in NSCLC patients. The current study aimed to externally validate this model using a publicly available National Institutes of Health (NIH)-based NSCLC dataset and assess the effect of EGFR mutation prevalence on model performance through synthetic minority oversampling technique (SMOTE). The original radiogenomics model was validated on an independent NIH cohort (n=140). For assessing the influence of disease prevalence, six SMOTE-augmented datasets were created, simulating EGFR mutation prevalence from 25% to 50%. Seven models were developed (one from original data, six SMOTE-augmented), each undergoing rigorous cross-validation, feature selection, and logistic regression modeling. Models were tested against the NIH cohort. Performance was compared using area under the receiver operating characteristic curve (Area Under the Curve [AUC]), and differences between radiomic-only, clinical-only, and combined models were statistically assessed. External validation revealed poor diagnostic performance for both our model and a previously published EGFR radiomics model (AUC ∼0.5). The clinical model alone achieved higher diagnostic accuracy (AUC 0.74). SMOTE-augmented models showed increased sensitivity but did not improve overall AUC compared to the clinical-only model. Changing EGFR mutation prevalence had minimal impact on AUC, challenging previous assumptions about the influence of sample imbalance on model performance. External validation failed to reproduce prior radiogenomics model performance, while clinical variables alone retained strong predictive value. SMOTE-based oversampling did not improve diagnostic accuracy, suggesting that, in EGFR prediction, radiomics may offer limited value beyond clinical data. Emphasis on robust external validation and data-sharing is essential for future clinical implementation of radiogenomic models.

CT Classification Chest Retrospective Clinical In Silico Academic Lab Reproducibility

Comparative performance of large language models in structuring head CT radiology reports: multi-institutional validation study in Japan.

Takita H, Walston SL, Mitsuyama Y, Watanabe K, Ishimaru S, Ueda D

•papers•May 14 2025

To compare the diagnostic performance of three proprietary large language models (LLMs)-Claude, GPT, and Gemini-in structuring free-text Japanese radiology reports for intracranial hemorrhage and skull fractures, and to assess the impact of three different prompting approaches on model accuracy. In this retrospective study, head CT reports from the Japan Medical Imaging Database between 2018 and 2023 were collected. Two board-certified radiologists established the ground truth regarding intracranial hemorrhage and skull fractures through independent review and consensus. Each radiology report was analyzed by three LLMs using three prompting strategies-Standard, Chain of Thought, and Self Consistency prompting. Diagnostic performance (accuracy, precision, recall, and F1-score) was calculated for each LLM-prompt combination and compared using McNemar's tests with Bonferroni correction. Misclassified cases underwent qualitative error analysis. A total of 3949 head CT reports from 3949 patients (mean age 59 ± 25 years, 56.2% male) were enrolled. Across all institutions, 856 patients (21.6%) had intracranial hemorrhage and 264 patients (6.6%) had skull fractures. All nine LLM-prompt combinations achieved very high accuracy. Claude demonstrated significantly higher accuracy for intracranial hemorrhage than GPT and Gemini, and also outperformed Gemini for skull fractures (p < 0.0001). Gemini's performance improved notably with Chain of Thought prompting. Error analysis revealed common challenges including ambiguous phrases and findings unrelated to intracranial hemorrhage or skull fractures, underscoring the importance of careful prompt design. All three proprietary LLMs exhibited strong performance in structuring free-text head CT reports for intracranial hemorrhage and skull fractures. While the choice of prompting method influenced accuracy, all models demonstrated robust potential for clinical and research applications. Future work should refine the prompts and validate these approaches in prospective, multilingual settings.

CT LLM Radiology Report Neurological Retrospective Clinical In Silico Academic Lab GenAI

CT-based AI framework leveraging multi-scale features for predicting pathological grade and Ki67 index in clear cell renal cell carcinoma: a multicenter study.

Yang H, Zhang Y, Li F, Liu W, Zeng H, Yuan H, Ye Z, Huang Z, Yuan Y, Xiang Y, Wu K, Liu H

•papers•May 14 2025

To explore whether a CT-based AI framework, leveraging multi-scale features, can offer a non-invasive approach to accurately predict pathological grade and Ki67 index in clear cell renal cell carcinoma (ccRCC). In this multicenter retrospective study, a total of 1073 pathologically confirmed ccRCC patients from seven cohorts were split into internal cohorts (training and validation sets) and an external test set. The AI framework comprised an image processor, a 3D-kidney and tumor segmentation model by 3D-UNet, a multi-scale features extractor built upon unsupervised learning, and a multi-task classifier utilizing XGBoost. A quantitative model interpretation technique, known as SHapley Additive exPlanations (SHAP), was employed to explore the contribution of multi-scale features. The 3D-UNet model showed excellent performance in segmenting both the kidney and tumor regions, with Dice coefficients exceeding 0.92. The proposed multi-scale features model exhibited strong predictive capability for pathological grading and Ki67 index, with AUROC values of 0.84 and 0.87, respectively, in the internal validation set, and 0.82 and 0.82, respectively, in the external test set. The SHAP results demonstrated that features from radiomics, the 3D Auto-Encoder, and dimensionality reduction all made significant contributions to both prediction tasks. The proposed AI framework, leveraging multi-scale features, accurately predicts the pathological grade and Ki67 index of ccRCC. The CT-based AI framework leveraging multi-scale features offers a promising avenue for accurately predicting the pathological grade and Ki67 index of ccRCC preoperatively, indicating a direction for non-invasive assessment. Non-invasively determining pathological grade and Ki67 index in ccRCC could guide treatment decisions. The AI framework integrates segmentation, classification, and model interpretation, enabling fully automated analysis. The AI framework enables non-invasive preoperative detection of high-risk tumors, assisting clinical decision-making.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab

Application of artificial intelligence medical imaging aided diagnosis system in the diagnosis of pulmonary nodules.

Yang Y, Wang P, Yu C, Zhu J, Sheng J

•papers•May 14 2025

The application of artificial intelligence (AI) technology has realized the transformation of people's production and lifestyle, and also promoted the rapid development of the medical field. At present, the application of intelligence in the medical field is increasing. Using its advanced methods and technologies of AI, this paper aims to realize the integration of medical imaging-aided diagnosis system and AI, which is helpful to analyze and solve the loopholes and errors of traditional artificial diagnosis in the diagnosis of pulmonary nodules. Drawing on the principles and rules of image segmentation methods, the construction and optimization of a medical image-aided diagnosis system is carried out to realize the precision of the diagnosis system in the diagnosis of pulmonary nodules. In the diagnosis of pulmonary nodules carried out by traditional artificial and medical imaging-assisted diagnosis systems, 231 nodules with pathology or no change in follow-up for more than two years were also tested in 200 cases. The results showed that the AI software detected a total of 881 true nodules with a sensitivity of 99.10% (881/889). The radiologists detected 385 true nodules with a sensitivity of 43.31% (385/889). The sensitivity of AI software in detecting non-calcified nodules was significantly higher than that of radiologists (99.01% vs 43.30%, P < 0.001), and the difference was statistically significant.

CT Detection Chest Retrospective Clinical In Silico Academic Lab

Zero-Shot Multi-modal Large Language Model v.s. Supervised Deep Learning: A Comparative Study on CT-Based Intracranial Hemorrhage Subtyping

Yinuo Wang, Yue Zeng, Kai Chen, Cai Meng, Chao Pan, Zhouping Tang

•preprint•May 14 2025

Introduction: Timely identification of intracranial hemorrhage (ICH) subtypes on non-contrast computed tomography is critical for prognosis prediction and therapeutic decision-making, yet remains challenging due to low contrast and blurring boundaries. This study evaluates the performance of zero-shot multi-modal large language models (MLLMs) compared to traditional deep learning methods in ICH binary classification and subtyping. Methods: We utilized a dataset provided by RSNA, comprising 192 NCCT volumes. The study compares various MLLMs, including GPT-4o, Gemini 2.0 Flash, and Claude 3.5 Sonnet V2, with conventional deep learning models, including ResNet50 and Vision Transformer. Carefully crafted prompts were used to guide MLLMs in tasks such as ICH presence, subtype classification, localization, and volume estimation. Results: The results indicate that in the ICH binary classification task, traditional deep learning models outperform MLLMs comprehensively. For subtype classification, MLLMs also exhibit inferior performance compared to traditional deep learning models, with Gemini 2.0 Flash achieving an macro-averaged precision of 0.41 and a macro-averaged F1 score of 0.31. Conclusion: While MLLMs excel in interactive capabilities, their overall accuracy in ICH subtyping is inferior to deep networks. However, MLLMs enhance interpretability through language interactions, indicating potential in medical imaging analysis. Future efforts will focus on model refinement and developing more precise MLLMs to improve performance in three-dimensional medical image processing.

CT Classification Neurological Retrospective Clinical In Silico GenAI Benchmark SOTA

Using Foundation Models as Pseudo-Label Generators for Pre-Clinical 4D Cardiac CT Segmentation

Anne-Marie Rickmann, Stephanie L. Thorn, Shawn S. Ahn, Supum Lee, Selen Uman, Taras Lysyy, Rachel Burns, Nicole Guerrera, Francis G. Spinale, Jason A. Burdick, Albert J. Sinusas, James S. Duncan

•preprint•May 14 2025

Cardiac image segmentation is an important step in many cardiac image analysis and modeling tasks such as motion tracking or simulations of cardiac mechanics. While deep learning has greatly advanced segmentation in clinical settings, there is limited work on pre-clinical imaging, notably in porcine models, which are often used due to their anatomical and physiological similarity to humans. However, differences between species create a domain shift that complicates direct model transfer from human to pig data. Recently, foundation models trained on large human datasets have shown promise for robust medical image segmentation; yet their applicability to porcine data remains largely unexplored. In this work, we investigate whether foundation models can generate sufficiently accurate pseudo-labels for pig cardiac CT and propose a simple self-training approach to iteratively refine these labels. Our method requires no manually annotated pig data, relying instead on iterative updates to improve segmentation quality. We demonstrate that this self-training process not only enhances segmentation accuracy but also smooths out temporal inconsistencies across consecutive frames. Although our results are encouraging, there remains room for improvement, for example by incorporating more sophisticated self-training strategies and by exploring additional foundation models and other cardiac imaging technologies.

CT Segmentation Cardiac Methodology In Silico GenAI

AI-based metal artefact correction algorithm for radiotherapy patients with dental hardware in head and neck CT: Towards precise imaging.

Yu X, Zhong S, Zhang G, Du J, Wang G, Hu J

•papers•May 14 2025

To investigate the clinical efficiency of an AI-based metal artefact correction algorithm (AI-MAC), for reducing dental metal artefacts in head and neck CT, compared to conventional interpolation-based MAC. We retrospectively collected 41 patients with non-removal dental hardware who underwent non-contrast head and neck CT prior to radiotherapy. All images were reconstructed with standard reconstruction algorithm (SRA), and were additionally processed with both conventional MAC and AI-MAC. The image quality of SRA, MAC and AI-MAC were compared by qualitative scoring on a 5-point scale, with scores ≥ 3 considered interpretable. This was followed by a quantitative evaluation, including signal-to-noise ratio (SNR) and artefact index (Idxartefact). Organ contouring accuracy was quantified via calculating the dice similarity coefficient (DSC) and hausdorff distance (HD) for oral cavity and teeth, using the clinically accepted contouring as reference. Moreover, the treatment planning dose distribution for oral cavity was assessed. AI-MAC yielded superior qualitative image quality as well as quantitative metrics, including SNR and Idxartefact, to SRA and MAC. The image interpretability significantly improved from 41.46% for SRA and 56.10% for MAC to 92.68% for AI-MAC (p < 0.05). Compared to SRA and MAC, the best DSC and HD for both oral cavity and teeth were obtained on AI-MAC (all p < 0.05). No significant differences for dose distribution were found among the three image sets. AI-MAC outperforms conventional MAC in metal artefact reduction, achieving superior image quality with high image interpretability for patients with dental hardware undergoing head and neck CT. Furthermore, the use of AI-MAC improves the accuracy of organ contouring while providing consistent dose calculation against metal artefacts in radiotherapy. AI-MAC is a novel deep learning-based technique for reducing metal artefacts on CT. This in-vivo study first demonstrated its capability of reducing metal artefacts while preserving organ visualization, as compared with conventional MAC.

CT Reconstruction Neurological Retrospective Clinical In Silico Academic Lab GenAI

Individual thigh muscle and proximal femoral features predict displacement in femoral neck Fractures: An AI-driven CT analysis.

Yoo JI, Kim HS, Kim DY, Byun DW, Ha YC, Lee YK

•papers•May 13 2025

Hip fractures, particularly among the elderly, impose a significant public health burden due to increased morbidity and mortality. Femoral neck fractures, commonly resulting from low-energy falls, can lead to severe complications such as avascular necrosis, and often necessitate total hip arthroplasty. This study harnesses AI to enhance musculoskeletal assessments by performing automatic muscle segmentation on whole thigh CT scans and detailed cortical measurements using the StradView program. The primary aim is to improve the prediction and prevention of severe femoral neck fractures, ultimately supporting more effective rehabilitation and treatment strategies. This study measured anatomical features from whole thigh CT scans of 60 femoral neck fracture patients. An AI-driven individual muscle segmentation model (a dice score of 0.84) segmented 27 muscles in the thigh region, to calculate muscle volumes. Proximal femoral bone parameters were measured using StradView, including average cortical thickness, inner density and FWHM at four regions. Correlation analysis evaluated relationships between muscle features, cortical parameters, and fracture displacement. Machine learning models (Random Forest, SVM and Multi-layer Perceptron) predicted displacement using these variables. Correlation analysis showed significant associations between femoral neck displacement and trabecular density at the femoral neck/intertrochanter, as well as volumes of specific thigh muscles such as the Tensor fasciae latae. Machine learning models using a combined feature set of thigh muscle volumes and proximal femoral parameters performed best in predicting displacement, with the Random Forest model achieving an F1 score of 0.91 and SVM model 0.93. Decreased volumes of the Tensor fasciae latae, Rectus femoris, and Semimembranosus muscles, coupled with reduced trabecular density at the femoral neck and intertrochanter, were significantly associated with increased fracture displacement. Notably, our SVM model-integrating both muscle and femoral features-achieved the highest predictive performance. These findings underscore the critical importance of muscle strength and bone density in rehabilitation planning and highlight the potential of AI-driven predictive models for improving clinical outcomes in femoral neck fractures.

CT Segmentation Musculoskeletal Retrospective Clinical In Silico Academic Lab

The automatic pelvic screw corridor planning for intact pelvises based on deep learning deformable registration.

Ju F, Chai X, Zhao J, Dong M

•papers•May 13 2025

Percutaneous screw fixation technique in pelvic trauma surgery is an extremely challenging operation that typically requires a trial-and-error insertion process under the guidance of continuous intraoperative X-ray. This process can be simplified by utilizing surgical navigation systems. Understanding the complexity of the intraosseous pelvis corridor is essential for establishing the optimal screw corridor, which further facilitates preoperative planning and intraoperative application. Traditional screw corridor search algorithms necessitate traversing the entrance and exit areas of the screw and calculating the distance from the corridor axis to the bone surface to ascertain the location of the screw. This process is computationally complex, and manual measurement by the physician is time consuming, labor intensive, and empirically dependent. In this study, we propose an automated planning algorithm for pelvic screw corridors based on deep learning deformable registration technology, which can efficiently and accurately identify the optimal screw corridors. Compared to traditional methods, the innovations of this study include: (1) the introduction of corridor safety range constraints on screw positioning, which enhances search efficiency; (2) the application of deep learning deformable registration to facilitate the automatic annotation of the screw entrance and exit areas, as well as the safety range of the corridor; and (3) the development of a highly efficient algorithm for optimal corridor searching, quickly determining the corridor without traversing the entrance and exit areas and enhancing efficiency via a vector-based diameter calculation method. The whole framework of the algorithm consists of three key components: atlas generation module, deformable registration and optimal corridor searching strategy. In the experiments, we test the performance of the proposed algorithm on 198 intact pelvises for calculating the optimal corridor of anterior column corridor and S1 sacroiliac screws. The results show that the new algorithm can increase the corridor diameter by 2.1%-3.3% compared to manual measurements, while significantly reducing the average time from 1038s and 3398s to 18.9s and 26.7s on anterior column corridor and S1 sacroiliac corridor, respectively, compared to the traditional screw searching algorithm. This demonstrates the advantages of the algorithm in terms of efficiency and accuracy. However, the current method is validated only on intact pelvises; further research is required for pelvic fracture scenarios.

CT Registration Musculoskeletal Retrospective Clinical In Silico Academic Lab

Filter Papers

Tags

A computed tomography-based radiomics prediction model for BRAF mutation status in colorectal cancer.

External Validation of a CT-Based Radiogenomics Model for the Detection of EGFR Mutation in NSCLC and the Impact of Prevalence in Model Building by Using Synthetic Minority Over Sampling (SMOTE): Lessons Learned.

Comparative performance of large language models in structuring head CT radiology reports: multi-institutional validation study in Japan.

CT-based AI framework leveraging multi-scale features for predicting pathological grade and Ki67 index in clear cell renal cell carcinoma: a multicenter study.

Application of artificial intelligence medical imaging aided diagnosis system in the diagnosis of pulmonary nodules.

Zero-Shot Multi-modal Large Language Model v.s. Supervised Deep Learning: A Comparative Study on CT-Based Intracranial Hemorrhage Subtyping

Using Foundation Models as Pseudo-Label Generators for Pre-Clinical 4D Cardiac CT Segmentation

AI-based metal artefact correction algorithm for radiotherapy patients with dental hardware in head and neck CT: Towards precise imaging.

Individual thigh muscle and proximal femoral features predict displacement in femoral neck Fractures: An AI-driven CT analysis.

The automatic pelvic screw corridor planning for intact pelvises based on deep learning deformable registration.

Ready to Sharpen Your Edge?