Latest Papers on Radiology AI. Tags: CT

Accuracy and reproducibility of large language model measurements of liver metastases: comparison with radiologist measurements.

Sugawara H, Takada A, Kato S

•papers•Oct 4 2025

To compare the accuracy and reproducibility of lesion-diameter measurements performed by three state-of-the-art LLMs with those obtained by radiologists. In this retrospective study using a public database, 83 patients with solitary colorectal-cancer liver metastases were identified. From each CT series, a radiologist extracted the single axial slice showing the maximal tumor diameter and converted it to a 512 × 512-pixel PNG image (window level 50 HU, window width 400 HU) with pixel size encoded in the filename. Three LLMs-ChatGPT-o3 (OpenAI), Gemini 2.5 Pro (Google), and Claude 4 Opus (Anthropic)-were prompted to estimate the longest lesion diameter twice, ≥ 1 week apart. Two board-certified radiologists (12 years' experience each) independently measured the same single slice images and one radiologist repeated the measurements after ≥ 1 week. Agreement was assessed with intraclass correlation coefficients (ICC); 95% confidence intervals were obtained by bootstrap resampling (5 000 iterations). Radiologist inter-observer agreement was excellent (ICC = 0.95, 95% CI 0.86-0.99); intra-observer agreement was 0.98 (95% CI 0.94-0.99). Gemini achieved good model-to-radiologist agreement (ICC = 0.81, 95% CI 0.68-0.89) and intra-model reproducibility (ICC = 0.78, 95% CI 0.65-0.87). GPT-o3 showed moderate agreement (ICC = 0.52) and poor reproducibility (ICC = 0.25); Claude showed poor agreement (ICC = 0.07) and reproducibility (ICC = 0.47). LLMs do not yet match radiologists in measuring colorectal cancer liver metastasis; however, Gemini's good agreement and reproducibility highlight the rapid progress of image interpretation capability of LLMs.

CT Detection Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

AI-Assisted Pleural Effusion Volume Estimation from Contrast-Enhanced CT Images

Sanhita Basu, Tomas Fröding, Ali Teymur Kahraman, Dimitris Toumpanakis, Tobias Sjöblom

•preprint•Oct 4 2025

Background: Pleural Effusions (PE) is a common finding in many different clinical conditions, but accurately measuring their volume from CT scans is challenging. Purpose: To improve PE segmentation and quantification for enhanced clinical management, we have developed and trained a semi-supervised deep learning framework on contrast-enhanced CT volumes. Materials and Methods: This retrospective study collected CT Pulmonary Angiogram (CTPA) data from internal and external datasets. A subset of 100 cases was manually annotated for model training, while the remaining cases were used for testing and validation. A novel semi-supervised deep learning framework, Teacher-Teaching Assistant-Student (TTAS), was developed and used to enable efficient training in non-segmented examinations. Segmentation performance was compared to that of state-of-the-art models. Results: 100 patients (mean age, 72 years, 28 [standard deviation]; 55 men) were included in the study. The TTAS model demonstrated superior segmentation performance compared to state-of-the-art models, achieving a mean Dice score of 0.82 (95% CI, 0.79 - 0.84) versus 0.73 for nnU-Net (p < 0.0001, Student's T test). Additionally, TTAS exhibited a four-fold lower mean Absolute Volume Difference (AbVD) of 6.49 mL (95% CI, 4.80 - 8.20) compared to nnU-Net's AbVD of 23.16 mL (p < 0.0001). Conclusion: The developed TTAS framework offered superior PE segmentation, aiding accurate volume determination from CT scans.

CT Segmentation Chest Retrospective Clinical In Silico

A Multimodal Classification Method for Nasal Obstruction Severity Based on Computed Tomography and Nasal Resistance.

Wang Q, Li S, Sun H, Cui S, Song W

•papers•Oct 3 2025

The assessment of the degree of nasal obstruction is valuable in disease diagnosis, quality of life assessment, and epidemiological studies. To this end, this article proposes a multimodal nasal obstruction degree classification model based on cone beam computed tomography (CBCT) images and nasal resistance measurements. The model consists of four modules: image feature extraction, table feature extraction, feature fusion, and classification. In the image feature extraction module, this article proposes a strategy of using the trained MedicalNet large model to get the pre-training parameters and then migrating them to the three-dimensional convolutional neural network (3D CNN) feature extraction model. For the nasal resistance measurement form data, a method based on extreme gradient boosting (XGBoost) feature importance analysis is proposed to filter key features to reduce the data dimension. In order to fuse the two types of modal data, a feature fusion method based on local and global features was designed. Finally, the fused features are classified using the tabular network (TabNet) model. In order to verify the effectiveness of the proposed method, comparison experiments and ablation experiments are designed, and the experimental results show that the accuracy and recall of the proposed multimodal classification model reach 0.93 and 0.9, respectively, which are significantly higher than other methods.

CT Classification Methodology In Silico Academic Lab

Diagnostic Accuracy and Robustness of AI-based Fully Automated CT-FFR for the Detection of Significant CAD in Patients With Transcatheter Aortic Valve Replacement.

Fu Y, Zhou L, Zhang X, Xie G, Zhang T, Gong Y, Pan T, Kang W, Lv L, Xu H, Chen Q

•papers•Oct 3 2025

To explore the diagnostic accuracy and robustness of artificial intelligence (AI)-based fully automated CT-derived fractional flow reserve (CT-FFR) in detecting significant coronary artery disease (CAD) in patients with transcatheter aortic valve replacement (TAVR). This single-center retrospective study included consecutive patients who underwent TAVR between January 2020 and June 2023. All patients received preoperative coronary CT angiography (CCTA) and invasive coronary angiography (ICA). CT-FFR was evaluated with a fully automated AI-based software. The diagnostic performance of CCTA and CT-FFR for the identification of significant CAD was compared using ICA (≥70% diameter stenosis) as the reference standard. Patients who underwent post-TAVR CCTA within 3 months were used to calculate CT-FFR values. The post-TAVR CT-FFR calculations were compared with pre-TAVR CT-FFR to evaluate the robustness of the AI-based software. A total of 77 pre-TAVR patients and 164 vessels were included. Significant CAD was identified by ICA in 18 patients (23.4%). In per-patient analysis, the sensitivity, specificity, positive predictive value, negative predictive value, and diagnostic accuracy were 44.4%, 91.5%, 61.5%, 84.4%, and 80.5% for CCTA and 94.4%, 83.1%, 64.0%, 98.0%, and 85.7% for CT-FFR. The area under the receiver operating characteristic curve of CT-FFR was superior to CCTA (0.83 vs. 0.63, P = 0.001). Thirty-five (45.5%) patients underwent CT-FFR calculations before and after TAVR. There was good agreement between pre- and post-TAVR of CT-FFR values (intraclass correlation coefficient 0.85). AI-based fully automated CT-FFR enables to improve the diagnostic performance of CCTA for the detection of significant CAD pre-TAVR and demonstrates robust stability post-TAVR.

CT Classification Cardiac Retrospective Clinical In Silico Academic Lab

The Iodine Opportunity for Sustainable Radiology: Quantifying Supply Chain Strategies to Cut Contrast's Carbon and Costs.

Nghiem DX, Yahyavi-Firouz-Abadi N, Hwang GL, Zafari Z, Moy L, Carlos RC, Doo FX

•papers•Oct 3 2025

To estimate economic and environmental reduction potential of iodinated contrast media (ICM) saving strategies, by examining supply chain data (from iodine extraction through administration) to inform a decision-making framework which can be tailored to local institutional priorities. A 100 mL polymer vial of ICM was set as the standard reference case (SRC) for baseline comparison. To evaluate cost and emissions impacts, four ICM reduction strategies were modeled relative to this SRC baseline: vial optimization, hardware or software (AI-enabled) dose reduction, and multi-dose vial/injector systems. This analysis was then translated into a decision-making framework for radiologists to compare ICM strategies by cost, emissions, and operational feasibility. The supply chain life cycle of a 100 mL iodinated contrast vial produces 1,029 g CO2e, primarily from iodine extraction and clinical use. ICM-saving strategies varied widely in emissions reduction, ranging from 12%-50% nationally. Economically a 125% tariff could inflate national ICM-related costs to $11.9B, the ICM reduction strategy of AI-enhanced ICM systems could lower this expenditure to $2.7B. Institutional analysis reveals that the ICM savings from high-capital upfront investment strategies can offset their initial investment, highlighting important trade-offs for implementation decision-making. ICM is a major and modifiable contributor to healthcare carbon emissions. Depending on the utilized ICM-reduction strategy, emissions can be reduced by up to 53% and ICM-related costs by up to 50%. To guide implementation, we developed a decision-making framework that categorizes strategies based on environmental benefit, cost, and operational feasibility, enabling radiology leaders to align sustainability goals with institutional priorities.

CT Reconstruction Whole Body Methodology In Silico Academic Lab Policy

SAHDAI-XAI Subarachnoid Hemorrhage Detection Artificial Intelligence- eXplainable AI: Testing explainability in SAH Imaging Data and AI Modeling

morgan, s., salman, s., walker, j., Freeman, W. D.

•preprint•Oct 3 2025

IntroductionSubarachnoid hemorrhage (SAH) is a life-threatening and crucial neurological emergency. SAHDAI-XAI (Subarachnoid Hemorrhage Detection Artificial Intelligence) is a cloud-based machine learning model created as a binary positive and negative classifier to detect SAH bleeding seen in any of eight potential hemorrhage spaces. It aims to address the lack of transparency in AI- based detection of subarachnoid hemorrhage. MethodsThis project is divided into two phases, integrating Auto-ML and BLAST, combining the statistical assessment of hemorrhage detection accuracy using a low-code approach with the simultaneous colour-based visualization of bleeding areas to enhance transparency. In phase 1, an AutoML model was trained on Google Cloud Vertex AI after preprocessing. The Model completed four runs, progressively increasing the dataset size. The dataset is split into 80% for training, 10% for validation, and 10% for testing, with explainability (XRAI) applied to the testing images. We started with 20 non-contrast head CT images followed by 40, 200, and then 300 images, and in each AutoML run, the dataset was equivalently divided into one half manually labeled as positive for hemorrhage and the other half labeled as negative controls. The fourth AutoML evaluated the models ability to differentiate between a hemorrhage and other pathologies, such as tumors and calcifications. In phase 2, the goal is to increase explainability by visualizing predictive image features and showing the detection of hemorrhage locations using the Brain Lesion Analysis and Segmentation Tool for Computed Tomography (BLAST). This model segments and quantifies four different hemorrhage and edema locations. ResultsIn phase one, the first two AutoML runs demonstrated 100% average precision due to the small data size. In the third run, the average precision was 97.9% after increasing the dataset size, and one false negative (FN) image was detected. In the fourth round, after evaluating the models differentiation abilities, the average precision rate dropped to 94.4%. This round demonstrated two false positive (FP) images from the testing deck. After extensive preprocessing using the BLAST model public Python code in the second phase, topographic images of the bleeding were demonstrated with different outcomes. Some accurately cover a significant percentage of the bleeding, whereas others do not. ConclusionThe SAHDAI-XAI model is a new image-based SAH explainable AI model that shows enhanced transparency for AI hemorrhage detection in daily clinical life and aims to overcome AIs untransparent nature and accelerate time to diagnosis, thereby helping decrease the mortality rates.6 BLAST model utilization facilitates a better understanding of AI outcomes and supports the creation of visually demonstrated XAI in SAH detection and predicting hemorrhage coverage. The goal is to resolve AIs hidden black-box aspect, making ML model outcomes increasingly transparent and explainable. Keywords: SAH, explainable AI, GCP, AutoML, BLAST, black-box.

CT Classification Neurological Methodology In Silico GenAI

Machine learning and quantitative computed tomography radiomics prediction of postoperative functional recovery in paraplegic dogs.

Low D, Rutherford S

•papers•Oct 2 2025

To develop a computed tomography (CT)-radiomics-based machine-learning algorithm for prediction of functional recovery in paraplegic dogs with acute intervertebral disc extrusion (IVDE). Multivariable prediction model development. Paraplegic dogs with acute IVDE: 128 deep-pain positive and 86 deep-pain negative (DPN). Radiomics features from noncontrast CT were combined with deep-pain perception in an extreme gradient algorithm using an 80:20 train-test split. Model performance was assessed on the independent test set (Testfull) and on the test set of DPN dogs (TestDPN). Deep-pain perception alone served as the control. Recovery of ambulation was recorded in 165/214 dogs (77.1%) after decompressive surgery. The model had an area under the receiver operating characteristic curve (AUC) of .9118 (95% CI: .8366-.9872), accuracy of 86.1% (95% CI: 74.4%-95.4%), sensitivity of 82.4% (95% CI: 68.6%-93.9%), and specificity of 100.0% (95% CI: 100.0%-100.0%) on Testfull, and an AUC of .7692 (95% CI: .6250-.9000), accuracy of 72.7% (95% CI: 50.0%-90.9%), sensitivity of 53.8% (95% CI: 25.0%-80.0%), and specificity of 100.0% (95% CI: 100.0%-100.0%) on TestDPN. Deep-pain perception had an AUC of .8088 (95% CI: .7273-.8871), accuracy of 69.8% (95% CI: 55.8%-83.7%), sensitivity of 61.8% (95% CI: 45.5%-77.4%), and specificity of 100.0% (95% CI: 100.0%-100.0%), which was different from that of the model (p = .02). Noncontrast CT-based radiomics provided prognostic information in dogs with severe spinal cord injury secondary to acute intervertebral disc extrusion. The model outperformed deep-pain perception alone in identifying dogs that recovered ambulation following decompressive surgery. Radiomics features from noncontrast CT, when integrated into a multimodal machine-learning algorithm, may be useful as an assistive tool for surgical decision making.

CT Classification Neurological Retrospective Clinical In Silico Academic Lab

Evaluating GPT-4o for emergency disposition of complex respiratory cases with pulmonology consultation: a diagnostic accuracy study.

Yıldırım C, Aykut A, Günsoy E, Öncül MV

•papers•Oct 2 2025

Large Language Models (LLMs), such as GPT-4o, are increasingly investigated for clinical decision support in emergency medicine. However, their real-world performance in disposition prediction remains insufficiently studied. This study evaluated the diagnostic accuracy of GPT-4o in predicting ED disposition-discharge, ward admission, or ICU admission-in complex emergency respiratory cases requiring pulmonology consultation and chest CT, representing a selective high-acuity subgroup of ED patients. We conducted a retrospective observational study in a tertiary ED between November 2024 and February 2025. We retrospectively included ED patients with complex respiratory presentations who underwent pulmonology consultation and chest CT, representing a selective high-acuity subgroup rather than the general ED respiratory population. GPT-4o was prompted to predict the most appropriate ED disposition using three progressively enriched input models: Model 1 (age, sex, oxygen saturation, home oxygen therapy, and venous blood gas parameters); Model 2 (Model 1 plus laboratory data); and Model 3 (Model 2 plus chest CT findings). Model performance was assessed using accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score. Among the 221 patients included, 69.2% were admitted to the ward, 9.0% to the intensive care unit (ICU), and 21.7% were discharged. For hospital admission prediction, Model 3 demonstrated the highest sensitivity (91.9%) and overall accuracy (76.5%), but the lowest specificity (20.8%). In contrast, for discharge prediction, Model 3 achieved the highest specificity (91.9%) but the lowest sensitivity (20.8%). Numerical improvements were observed across models, but none reached statistical significance (all p > 0.22). Model 1 therefore performed comparably to Models 2-3 while being less complex. Among patients who were discharged despite GPT-4o predicting admission, the 14-day ED re-presentation rates were 23.8% (5/21) for Model 1, 30.0% (9/30) for Model 2, and 28.9% (11/38) for Model 3. GPT-4o demonstrated high sensitivity in identifying ED patients requiring hospital admission, particularly those needing intensive care, when provided with progressively enriched clinical input. However, its low sensitivity for discharge prediction resulted in frequent overtriage, limiting its utility for autonomous decision-making. This proof-of-concept study demonstrates GPT-4o's capacity to stratify disposition decisions in complex respiratory cases under varying levels of limited input data. However, these findings should be interpreted in light of key limitations, including the selective high-acuity cohort and the absence of vital signs, and require prospective validation before clinical implementation.

CT Classification Chest Retrospective Clinical In Silico GenAI

Dosiomic and radiomic features within radiotherapy target volume for predicting the treatment response in patients with glioma after radiotherapy.

Wang Y, Zhang Y, Lin L, Hu Z, Wang H

•papers•Oct 2 2025

This study aimed to develop interpretable machine learning models using radiomic and dosiomic features from radiotherapy target volumes to predict treatment response in glioma patients. A retrospective analysis was conducted on 176 glioma patients. Treatment response was categorized into disease control rate (DCR) and non-DCR groups (training cohort: 71 vs. 44; validation cohort: 34 vs. 27). Five regions of interest (ROIs) were identified: gross tumor volume (GTV), gross tumor volume with tumor bed (GTVtb), clinical target volume (CTV), GTV-GTV and CTV-GTVtb. For each ROI, radiomic features and dosiomic features were separately extracted from CT images and dose maps. Feature selection was performed. Six dosimetric parameters and six clinical variables were also included in model development. Five predictive models were constructed using four machine learning algorithms: Radiomic, Dosiomic, Dose-Volume Histogram (DVH), Combined (integrating clinical, radiomic, dosiomic, and DVH features), and Clinical models. Model performance was evaluated using accuracy, precision, recall, F1-score, and area under the curve (AUC). SHAP analysis was applied to explain model predictions. The CTV_combined support vector machine (SVM) model achieved the best performance, with an AUC of 0.728 in the validation cohort. SHAP summary plots showed that dosiomic features contributed significantly to prediction. Force plots further illustrated how individual features affected classification outcomes. The SHAP-interpretable CTV_combined SVM model demonstrated strong predictive ability for treatment response in glioma patients. This approach may support radiation oncologists in identifying the underlying pathological mechanisms of poor treatment response and adjusting dose distribution accordingly, thereby aiding the development of personalized radiotherapy strategies. Not applicable.

CT Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Transformer-enhanced vertebrae segmentation and anatomical variation recognition from CT images.

Yang C, Huang L, Sucharit W, Xie H, Huang X, Li Y

•papers•Oct 2 2025

Accurate segmentation and anatomical classification of vertebrae in spinal CT scans are crucial for clinical diagnosis, surgical planning, and disease monitoring. However, the task is complicated by anatomical variability, degenerative changes, and the presence of rare vertebral anomalies. In this study, we propose a hybrid framework that combines a high-resolution WNet segmentation backbone with a Vision Transformer (ViT)-based classification module to perform vertebral identification and anomaly detection. Our model incorporates an attention-based anatomical variation module and leverages patient-specific metadata (age, sex, vertebral distribution) to improve the accuracy and personalization of vertebrae typing. Extensive experiments on the VerSe 2019 and 2020 datasets demonstrate that our approach outperforms state-of-the-art baselines such as nnUNet and SwinUNet, especially in detecting transitional vertebrae (e.g., T13, L6) and modeling morphological diversity. The system maintains high robustness under slice skipping, noise perturbation, and scanner variations, while offering interpretability through attention heatmaps and case-specific alerts. Our findings suggest that integrating anatomical priors and demographic context into transformer-based pipelines is a promising direction for personalized, intelligent spinal image analysis.

CT Segmentation Musculoskeletal Methodology In Silico Benchmark SOTA

Filter Papers

Tags

Accuracy and reproducibility of large language model measurements of liver metastases: comparison with radiologist measurements.

AI-Assisted Pleural Effusion Volume Estimation from Contrast-Enhanced CT Images

A Multimodal Classification Method for Nasal Obstruction Severity Based on Computed Tomography and Nasal Resistance.

Diagnostic Accuracy and Robustness of AI-based Fully Automated CT-FFR for the Detection of Significant CAD in Patients With Transcatheter Aortic Valve Replacement.

The Iodine Opportunity for Sustainable Radiology: Quantifying Supply Chain Strategies to Cut Contrast's Carbon and Costs.

SAHDAI-XAI Subarachnoid Hemorrhage Detection Artificial Intelligence- eXplainable AI: Testing explainability in SAH Imaging Data and AI Modeling

Machine learning and quantitative computed tomography radiomics prediction of postoperative functional recovery in paraplegic dogs.

Evaluating GPT-4o for emergency disposition of complex respiratory cases with pulmonology consultation: a diagnostic accuracy study.

Dosiomic and radiomic features within radiotherapy target volume for predicting the treatment response in patients with glioma after radiotherapy.

Transformer-enhanced vertebrae segmentation and anatomical variation recognition from CT images.

Ready to Sharpen Your Edge?