Latest Papers on Radiology AI.

MOIS-SAM2: Exemplar-based Segment Anything Model 2 for multilesion interactive segmentation of neurofibromas in whole-body MRI

Georgii Kolokolnikov, Marie-Lena Schmalhofer, Sophie Goetz, Lennart Well, Said Farschtschi, Victor-Felix Mautner, Inka Ristow, Rene Werner

•preprint•Sep 23 2025

Background and Objectives: Neurofibromatosis type 1 is a genetic disorder characterized by the development of numerous neurofibromas (NFs) throughout the body. Whole-body MRI (WB-MRI) is the clinical standard for detection and longitudinal surveillance of NF tumor growth. Existing interactive segmentation methods fail to combine high lesion-wise precision with scalability to hundreds of lesions. This study proposes a novel interactive segmentation model tailored to this challenge. Methods: We introduce MOIS-SAM2, a multi-object interactive segmentation model that extends the state-of-the-art, transformer-based, promptable Segment Anything Model 2 (SAM2) with exemplar-based semantic propagation. MOIS-SAM2 was trained and evaluated on 119 WB-MRI scans from 84 NF1 patients acquired using T2-weighted fat-suppressed sequences. The dataset was split at the patient level into a training set and four test sets (one in-domain and three reflecting different domain shift scenarios, e.g., MRI field strength variation, low tumor burden, differences in clinical site and scanner vendor). Results: On the in-domain test set, MOIS-SAM2 achieved a scan-wise DSC of 0.60 against expert manual annotations, outperforming baseline 3D nnU-Net (DSC: 0.54) and SAM2 (DSC: 0.35). Performance of the proposed model was maintained under MRI field strength shift (DSC: 0.53) and scanner vendor variation (DSC: 0.50), and improved in low tumor burden cases (DSC: 0.61). Lesion detection F1 scores ranged from 0.62 to 0.78 across test sets. Preliminary inter-reader variability analysis showed model-to-expert agreement (DSC: 0.62-0.68), comparable to inter-expert agreement (DSC: 0.57-0.69). Conclusions: The proposed MOIS-SAM2 enables efficient and scalable interactive segmentation of NFs in WB-MRI with minimal user input and strong generalization, supporting integration into clinical workflows.

MRI Segmentation Whole Body Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Enhancing the CAD-RADS™ 2.0 Category Assignment Performance of ChatGPT and DeepSeek Through "Few-shot" Prompting.

Kaya HE

•papers•Sep 23 2025

To assess whether few-shot prompting improves the performance of 2 popular large language models (LLMs) (ChatGPT o1 and DeepSeek-R1) in assigning Coronary Artery Disease Reporting and Data System (CAD-RADS™ 2.0) categories. A detailed few-shot prompt based on CAD-RADS™ 2.0 framework was developed using 20 reports from the MIMIC-IV database. Subsequently, 100 modified reports from the same database were categorized using zero-shot and few-shot prompts through the models' user interface. Model accuracy was evaluated by comparing assignments to a reference radiologist's classifications, including stenosis categories and modifiers. To assess reproducibility, 50 reports were reclassified using the same few-shot prompt. McNemar tests and Cohen kappa were used for statistical analysis. Using zero-shot prompting, accuracy was low for both models (ChatGPT: 14%, DeepSeek: 8%), with correct assignments occurring almost exclusively in CAD-RADS 0 cases. Hallucinations occurred frequently (ChatGPT: 19%, DeepSeek: 54%). Few-shot prompting significantly improved accuracy to 98% for ChatGPT and 93% for DeepSeek (both P<0.001) and eliminated hallucinations. Kappa values for agreement between model-generated and radiologist-assigned classifications were 0.979 (0.950, 1.000) (P<0.001) for ChatGPT and 0.916 (0.859, 0.973) (P<0.001) for DeepSeek, indicating almost perfect agreement for both models without a significant difference between the models (P=0.180). Reproducibility analysis yielded kappa values of 0.957 (0.900, 1.000) (P<0.001) for ChatGPT and 0.873 [0.779, 0.967] (P<0.001) for DeepSeek, indicating almost perfect and strong agreement between repeated assignments, respectively, with no significant difference between the models (P=0.125). Few-shot prompting substantially enhances LLMs' accuracy in assigning CAD-RADS™ 2.0 categories, suggesting potential for clinical application and facilitating system adoption.

CT Classification Cardiac Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Deep Learning for Standardized Head CT Reformatting: A Quantitative Analysis of Image Quality and Operator Variability.

Chang PD, Chu E, Floriolli D, Soun J, Fussell D

•papers•Sep 23 2025

To validate a deep learning foundation model for automated head computed tomography (CT) reformatting and to quantify the quality, speed, and variability of conventional manual reformats in a real-world dataset. A foundation artificial intelligence (AI) model was used to create automated reformats for 1,763 consecutive non-contrast head CT examinations. Model accuracy was first validated on a 100-exam subset by assessing landmark detection as well as rotational, centering, and zoom error against expert manual annotations. The validated model was subsequently used as a reference standard to evaluate the quality and speed of the original technician-generated reformats from the full dataset. The AI model demonstrated high concordance with expert annotations, with a mean landmark localization error of 0.6-0.9 mm. Compared to expert-defined planes, AI-generated reformats exhibited a mean rotational error of 0.7 degrees, a mean centering error of 0.3%, and a mean zoom error of 0.4%. By contrast, technician-generated reformats demonstrated a mean rotational error of 11.2 degrees, a mean centering error of 6.4%, and a mean zoom error of 6.2%. Significant variability in manual reformat quality was observed across different factors including patient age, scanner location, report findings, and individual technician operators. Manual head CT reformatting is subject to substantial variability in both quality and speed. A single-shot deep learning foundation model can generate reformats with high accuracy and consistency. The implementation of such an automated method offers the potential to improve standardization, increase workflow efficiency, and reduce operational costs in clinical practice.

CT Reconstruction Neurological Retrospective Clinical In Silico Academic Lab

Improving the performance of medical image segmentation with instructive feature learning.

Dai D, Dong C, Huang H, Liu F, Li Z, Xu S

•papers•Sep 23 2025

Although deep learning models have greatly automated medical image segmentation, they still struggle with complex samples, especially those with irregular shapes, notable scale variations, or blurred boundaries. One key reason for this is that existing methods often overlook the importance of identifying and enhancing the instructive features tailored to various targets, thereby impeding optimal feature extraction and transmission. To address these issues, we propose two innovative modules: an Instructive Feature Enhancement Module (IFEM) and an Instructive Feature Integration Module (IFIM). IFEM synergistically captures rich detailed information and local contextual cues within a unified convolutional module through flexible resolution scaling and extensive information interplay, thereby enhancing the network's feature extraction capabilities. Meanwhile, IFIM explicitly guides the fusion of encoding-decoding features to create more discriminative representations through sensitive intermediate predictions and omnipresent attention operations, thus refining contextual feature transmission. These two modules can be seamlessly integrated into existing segmentation frameworks, significantly boosting their performance. Furthermore, to achieve superior performance with substantially reduced computational demands, we develop an effective and efficient segmentation framework (EESF). Unlike traditional U-Nets, EESF adopts a shallower and wider asymmetric architecture, achieving a better balance between fine-grained information retention and high-order semantic abstraction with minimal learning parameters. Ultimately, by incorporating IFEM and IFIM into EESF, we construct EE-Net, a high-performance and low-resource segmentation network. Extensive experiments across six diverse segmentation tasks consistently demonstrate that EE-Net outperforms a wide range of competing methods in terms of segmentation performance, computational efficiency, and learning ability. The code is available at https://github.com/duweidai/EE-Net.

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code

CT-based radiomics deep learning signatures for noninvasive prediction of early recurrence after radical surgery in locally advanced colorectal cancer: A multicenter study.

Zhou Y, Zhao J, Tan Y, Zou F, Fang L, Wei P, Zeng W, Gong L, Liu L, Zhong L

•papers•Sep 23 2025

Preoperative identification of high-risk locally advanced colorectal cancer (LACRC) patients is vital for optimizing treatment and minimizing toxicity. This study aims to develop and validate a combined model of CT-based images and clinical laboratory parameters to noninvasively predict postoperative early recurrence (ER) in LACRC patients. A retrospective cohort of 560 pathologically confirmed LACRC patients collected from three centers between July 2018 and March 2022 and the Gene Expression Omnibus (GEO) dataset was analyzed. We extracted radiomics and deep learning signatures (RDs) using eight machine learning techniques, integrated them with clinical-laboratory parameters to construct a preoperative combined model, and validated it in two external datasets. Its predictive performance was compared with postoperative pathological and TNM staging models. Kaplan-Meier analysis was used to evaluate preoperative risk stratification, and molecular correlations with ER were explored using GEO RNA-sequencing data. The model included five independent prognostic factors: RDs, lymphocyte-to-monocyte ratio, neutrophil-to-lymphocyte ratio, lymphocyte-Albumin, and prognostic nutritional index. It outperformed pathological and TNM models in two external datasets (AUC for test set 1:0.865 vs. 0.766, 0.665; AUC for test set 2: 0.848 vs. 0.754, 0.694). Preoperative risk stratification identified significantly better disease-free survival in low-risk vs. high-risk patients across all subgroups (p < 0.01). High enrichment scores were associated with upregulated tumor proliferation pathways (epithelial-mesenchymal transition [EMT] and inflammatory response pathways) and altered immune cell infiltration patterns in the tumor microenvironment. The preoperative model enables treatment strategy optimization and reduces unnecessary drug toxicity by noninvasively predicting ER in LACRC.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab

The LongiMam model for improved breast cancer risk prediction using longitudinal mammograms

Manel Rakez, Thomas Louis, Julien Guillaumin, Foucauld Chamming's, Pierre Fillard, Brice Amadeo, Virginie Rondeau

•preprint•Sep 23 2025

Risk-adapted breast cancer screening requires robust models that leverage longitudinal imaging data. Most current deep learning models use single or limited prior mammograms and lack adaptation for real-world settings marked by imbalanced outcome distribution and heterogeneous follow-up. We developed LongiMam, an end-to-end deep learning model that integrates both current and up to four prior mammograms. LongiMam combines a convolutional and a recurrent neural network to capture spatial and temporal patterns predictive of breast cancer. The model was trained and evaluated using a large, population-based screening dataset with disproportionate case-to-control ratio typical of clinical screening. Across several scenarios that varied in the number and composition of prior exams, LongiMam consistently improved prediction when prior mammograms were included. The addition of prior and current visits outperformed single-visit models, while priors alone performed less well, highlighting the importance of combining historical and recent information. Subgroup analyses confirmed the model's efficacy across key risk groups, including women with dense breasts and those aged 55 years or older. Moreover, the model performed best in women with observed changes in mammographic density over time. These findings demonstrate that longitudinal modeling enhances breast cancer prediction and support the use of repeated mammograms to refine risk stratification in screening programs. LongiMam is publicly available as open-source software.

Mammography Classification Breast Methodology In Silico Academic Lab Open Code

A systematic review of early neuroimaging and neurophysiological biomarkers for post-stroke mobility prognostication

Levy, C., Dalton, E. J., Ferris, J. K., Campbell, B. C. V., Brodtmann, A., Brauer, S., Churilov, L., Hayward, K. S.

•preprint•Sep 23 2025

BackgroundAccurate prognostication of mobility outcomes is essential to guide rehabilitation and manage patient expectations. The prognostic utility of neuroimaging and neurophysiological biomarkers remains uncertain when measured early post-stroke. This systematic review aimed to examine the prognostic capacity of early neuroimaging and neurophysiological biomarkers of mobility outcomes up to 24-months post-stroke. MethodsMEDLINE and EMBASE were searched from inception to June 2025. Cohort studies that reported neuroimaging or neurophysiological biomarkers measured [≤]14-days post-stroke and mobility outcome(s) assessed >14-days and [≤]24-months post-stroke were included. Biomarker analyses were classified by statistical analysis approach (association, discrimination/classification or validation). Magnitude of relevant statistical measures was used as the primary indicator of prognostic capacity. Risk of bias was assessed using the Quality in Prognostic Studies tool. Meta-analysis was not performed due to heterogeneity. ResultsTwenty reports from 18 independent study samples (n=2,160 participants) were included. Biomarkers were measured a median 7.5-days post-stroke, and outcomes were assessed between 1- and 12-months. Eighty-six biomarker analyses were identified (61 neuroimaging, 25 neurophysiological) and the majority used an association approach (88%). Few used discrimination/classification methods (11%), and only one conducted internal validation (1%); an MRI-based machine learning model which demonstrated excellent discrimination but still requires external validation. Structural and functional corticospinal tract integrity were frequently investigated, and most associations were small or non-significant. Lesion location and size were also commonly examined, but findings were inconsistent and often lacked magnitude reporting. Methodological limitations were common, including small sample sizes, moderate to high risk of bias, poor reporting of magnitudes, and heterogeneous outcome measures and follow-up time points. ConclusionsCurrent evidence provides limited support for early neuroimaging and neurophysiological biomarkers to prognosticate post-stroke mobility outcomes. Most analyses remain at the association stage, with minimal progress toward validation and clinical implementation. Advancing the field requires international collaboration using harmonized methodologies, standardised statistical reporting, and consistent outcome measures and timepoints. RegistrationURL: https://www.crd.york.ac.uk/prospero/; Unique identifier: CRD42022350771.

MRI Classification Neurological Review Concept Academic Lab

Enhancing Instance Feature Representation: A Foundation Model-Based Multi-Instance Approach for Neonatal Retinal Screening.

Guo J, Wang K, Tan G, Li G, Zhang X, Chen J, Hu J, Liang Y, Jiang B

•papers•Sep 22 2025

Automated analysis of neonatal fundus images presents a uniquely intricate challenge in medical imaging. Existing methodologies predominantly focus on diagnosing abnormalities from individual images, often leading to inaccuracies due to the diverse and subtle nature of neonatal retinal features. Consequently, clinical standards frequently mandate the acquisition of retinal images from multiple angles to ensure the detection of minute lesions. To accommodate this, we propose leveraging multiple fundus images captured from various regions of the retina to comprehensively screen for a wide range of neonatal ocular pathologies. We employ Multiple Instance Learning (MIL) for this task, and introduce a simple yet effective learnable structure on the existing MIL method, called Learnable Dense to Global (LD2G-MIL). Different from other methods that focus on instance-to-bag feature aggregation, the proposed method focuses on generating better instance-level representations that are co-optimized with downstream MIL targets in a learnable way. Additionally, it incorporates a bag prior-based similarity loss (BP loss) mechanism, leveraging prior knowledge to enhance performance in neonatal retinal screening. To validate the efficacy of our LD2G-MIL method, we compiled the Neonatal Fundus Images (NFI) dataset, an extensive collection comprising 115,621 retinal images from 8,886 neonatal clinical episodes. Empirical evaluations on this dataset demonstrate that our approach consistently outperforms stateof-the-art (SOTA) generic and specialized methods. The code and trained models are publicly available at https: //github.com/CVIU-CSU/LD2G-MIL.

OCT Classification Methodology In Silico Academic Lab Open Dataset Open Code Benchmark SOTA

Development of a patient-specific cone-beam computed tomography dose optimization model using machine learning in image-guided radiation therapy.

Miura S

•papers•Sep 22 2025

Cone-beam computed tomography (CBCT) is commonly utilized in radiation therapy to visualize soft tissues and bone structures. This study aims to develop a machine learning model that predicts optimal, patient-specific CBCT doses that minimize radiation exposure while maintaining soft tissue image quality in prostate radiation therapy. Phantom studies evaluated the relationship between dose and two image quality metrics: image standard deviation (SD) and contrast-to-noise ratio (CNR). In a prostate-simulating phantom, CNR did not significantly decrease at doses above 40% compared to the 100% dose. Based on low-contrast resolution, this value was selected as the minimum clinical dose level. In clinical image analysis, both SD and CNR degraded with decreasing dose, consistent with the phantom findings. The structural similarity index between CBCT and planning computed tomography (CT) significantly decreased at doses below 60%, with a mean value of 0.69 at 40%. Previous studies suggest that this level may correspond to acceptable registration accuracy within the typical planning target volume margins applied in image-guided radiotherapy. A machine learning model was developed to predict CBCT doses using patient-specific metrics from planning CT scans and CBCT image quality parameters. Among the tested models, support vector regression achieved the highest accuracy, with an R<sup>2</sup> value of 0.833 and a root mean squared error of 0.0876, and was therefore adopted for dose prediction. These results support the feasibility of patient-specific CBCT imaging protocols that reduce radiation dose while maintaining clinically acceptable image quality for soft tissue registration.

CT Reconstruction Abdominal Methodology In Silico Academic Lab

Threshold optimization in AI chest radiography analysis: integrating real-world data and clinical subgroups.

Rudolph J, Huemmer C, Preuhs A, Buizza G, Dinkel J, Koliogiannis V, Fink N, Goller SS, Schwarze V, Heimer M, Hoppe BF, Liebig T, Ricke J, Sabel BO, Rueckel J

•papers•Sep 22 2025

Manufacturer-defined AI thresholds for chest x-ray (CXR) often lack customization options. Threshold optimization strategies utilizing users' clinical real-world data along with pathology-enriched validation data may better address subgroup-specific and user-specific needs. A pathology-enriched dataset (study cohort, 563 (CXRs)) with pleural effusions, consolidations, pneumothoraces, nodules, and unremarkable findings was analysed by an AI system and six reference radiologists. The same AI model was applied to a routine dataset (clinical cohort, 15,786 consecutive routine CXRs). Iterative receiver operating characteristic analysis linked achievable sensitivities (study cohort) to resulting AI alert rates in clinical routine inpatient or outpatient subgroups. "Optimized" thresholds (OTs) were defined by a 1% sensitivity increase leading to more than a 1% rise in AI alert rates. Threshold comparisons (OTs versus AI vendor's default thresholds (AIDT) versus Youden's thresholds) were based on 400 clinical cohort cases with expert radiologists' reference. AIDTs, OTs, and Youden's thresholds varied across scenarios, with OTs differing based on tailoring for inpatient or outpatient CXRs. AIDT lowering most reasonably improved sensitivity for pleural effusion, with increases from 46.8% (AIDT) to 87.2% (OT) for outpatients and from 76.3% (AIDT) to 93.5% (OT) for inpatients; similar trends appeared for consolidations. Conversely, regarding inpatient nodule detection, increasing the threshold improved accuracy from 69.5% (AIDT) to 82.5% (OT) without compromising sensitivity. Graphical analysis supports threshold selection by illustrating estimated sensitivities and clinical routine AI alert rates. An innovative, subgroup-specific AI threshold optimization is proposed, automatically implemented and transferable to other AI algorithms and varying clinical subgroup settings. Individually customizing thresholds tailored to specific medical experts' needs and patient subgroup characteristics is promising and may enhance diagnostic accuracy and the clinical acceptance of diagnostic AI algorithms. Customizing AI thresholds individually addresses specific user/patient subgroup needs. The presented approach utilizes pathology-enriched and real-world subgroup data for optimization. Potential is shown by comparing individualized thresholds with vendor defaults. Distinct thresholds for in- and outpatient CXR AI analysis may improve perception. The automated pipeline methodology is transferable to other AI models or subgroups.

X-Ray Detection Chest Retrospective Clinical In Silico Academic Lab

Filter Papers

Tags

MOIS-SAM2: Exemplar-based Segment Anything Model 2 for multilesion interactive segmentation of neurofibromas in whole-body MRI

Enhancing the CAD-RADS™ 2.0 Category Assignment Performance of ChatGPT and DeepSeek Through "Few-shot" Prompting.

Deep Learning for Standardized Head CT Reformatting: A Quantitative Analysis of Image Quality and Operator Variability.

Improving the performance of medical image segmentation with instructive feature learning.

CT-based radiomics deep learning signatures for noninvasive prediction of early recurrence after radical surgery in locally advanced colorectal cancer: A multicenter study.

The LongiMam model for improved breast cancer risk prediction using longitudinal mammograms

A systematic review of early neuroimaging and neurophysiological biomarkers for post-stroke mobility prognostication

Enhancing Instance Feature Representation: A Foundation Model-Based Multi-Instance Approach for Neonatal Retinal Screening.

Development of a patient-specific cone-beam computed tomography dose optimization model using machine learning in image-guided radiation therapy.

Threshold optimization in AI chest radiography analysis: integrating real-world data and clinical subgroups.

Ready to Sharpen Your Edge?