Latest Papers on Radiology AI. Tags: Classification

FusionFM: Fusing Eye-specific Foundational Models for Optimized Ophthalmic Diagnosis

Ke Zou, Jocelyn Hui Lin Goh, Yukun Zhou, Tian Lin, Samantha Min Er Yew, Sahana Srinivasan, Meng Wang, Rui Santos, Gabor M. Somfai, Huazhu Fu, Haoyu Chen, Pearse A. Keane, Ching-Yu Cheng, Yih Chung Tham

•preprint•Aug 15 2025

Foundation models (FMs) have shown great promise in medical image analysis by improving generalization across diverse downstream tasks. In ophthalmology, several FMs have recently emerged, but there is still no clear answer to fundamental questions: Which FM performs the best? Are they equally good across different tasks? What if we combine all FMs together? To our knowledge, this is the first study to systematically evaluate both single and fused ophthalmic FMs. To address these questions, we propose FusionFM, a comprehensive evaluation suite, along with two fusion approaches to integrate different ophthalmic FMs. Our framework covers both ophthalmic disease detection (glaucoma, diabetic retinopathy, and age-related macular degeneration) and systemic disease prediction (diabetes and hypertension) based on retinal imaging. We benchmarked four state-of-the-art FMs (RETFound, VisionFM, RetiZero, and DINORET) using standardized datasets from multiple countries and evaluated their performance using AUC and F1 metrics. Our results show that DINORET and RetiZero achieve superior performance in both ophthalmic and systemic disease tasks, with RetiZero exhibiting stronger generalization on external datasets. Regarding fusion strategies, the Gating-based approach provides modest improvements in predicting glaucoma, AMD, and hypertension. Despite these advances, predicting systemic diseases, especially hypertension in external cohort remains challenging. These findings provide an evidence-based evaluation of ophthalmic FMs, highlight the benefits of model fusion, and point to strategies for enhancing their clinical applicability.

OCT Classification Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Spatio-temporal deep learning with temporal attention for indeterminate lung nodule classification.

Farina B, Carbajo Benito R, Montalvo-García D, Bermejo-Peláez D, Maceiras LS, Ledesma-Carbayo MJ

•papers•Aug 15 2025

Lung cancer is the leading cause of cancer-related death worldwide. Deep learning-based computer-aided diagnosis (CAD) systems in screening programs enhance malignancy prediction, assist radiologists in decision-making, and reduce inter-reader variability. However, limited research has explored the analysis of repeated annual exams of indeterminate lung nodules to improve accuracy. We introduced a novel spatio-temporal deep learning framework, the global attention convolutional recurrent neural network (globAttCRNN), to predict indeterminate lung nodule malignancy using serial screening computed tomography (CT) images from the National Lung Screening Trial (NLST) dataset. The model comprises a lightweight 2D convolutional neural network for spatial feature extraction and a recurrent neural network with a global attention module to capture the temporal evolution of lung nodules. Additionally, we proposed new strategies to handle missing data in the temporal dimension to mitigate potential biases arising from missing time steps, including temporal augmentation and temporal dropout. Our model achieved an area under the receiver operating characteristic curve (AUC-ROC) of 0.954 in an independent test set of 175 lung nodules, each detected in multiple CT scans over patient follow-up, outperforming baseline single-time and multiple-time architectures. The temporal global attention module prioritizes informative time points, enabling the model to capture key spatial and temporal features while ignoring irrelevant or redundant information. Our evaluation emphasizes its potential as a valuable tool for the diagnosis and stratification of patients at risk of lung cancer.

CT Classification Chest Retrospective Clinical In Silico Academic Lab

SMAS: Structural MRI-based AD Score using Bayesian supervised VAE.

Nemali A, Bernal J, Yakupov R, D S, Dyrba M, Incesoy EI, Mukherjee S, Peters O, Ersözlü E, Hellmann-Regen J, Preis L, Priller J, Spruth E, Altenstein S, Lohse A, Schneider A, Fliessbach K, Kimmich O, Wiltfang J, Hansen N, Schott B, Rostamzadeh A, Glanz W, Butryn M, Buerger K, Janowitz D, Ewers M, Perneczky R, Rauchmann B, Teipel S, Kilimann I, Goerss D, Laske C, Sodenkamp S, Spottke A, Coenjaerts M, Brosseron F, Lüsebrink F, Dechent P, Scheffler K, Hetzer S, Kleineidam L, Stark M, Jessen F, Duzel E, Ziegler G

•papers•Aug 15 2025

This study introduces the Structural MRI-based Alzheimer's Disease Score (SMAS), a novel index intended to quantify Alzheimer's Disease (AD)-related morphometric patterns using a deep learning Bayesian-supervised Variational Autoencoder (Bayesian-SVAE). The SMAS index was constructed using baseline structural MRI data from the DELCODE study and evaluated longitudinally in two independent cohorts: DELCODE (n=415) and ADNI (n=190). Our findings indicate that SMAS has strong associations with cognitive performance (DELCODE: r=-0.83; ADNI: r=-0.62), age (DELCODE: r=0.50; ADNI: r=0.28), hippocampal volume (DELCODE: r=-0.44; ADNI: r=-0.66), and total gray matter volume (DELCODE: r=-0.42; ADNI: r=-0.47), suggesting its potential as a biomarker for AD-related brain atrophy. Moreover, our longitudinal studies indicated that SMAS may be useful for the early identification and tracking of AD. The model demonstrated significant predictive accuracy in distinguishing cognitively healthy individuals from those with AD (DELCODE: AUC=0.971 at baseline, 0.833 at 36 months; ADNI: AUC=0.817 at baseline, improving to 0.903 at 24 months). Notably, over 36 months, the SMAS index outperformed existing measures such as SPARE-AD and hippocampal volume. The relevance map analysis revealed significant morphological changes in key AD-related brain regions, including the hippocampus, posterior cingulate cortex, precuneus, and lateral parietal cortex, highlighting that SMAS is a sensitive and interpretable biomarker of brain atrophy, suitable for early AD detection and longitudinal monitoring of disease progression.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Radiomics in pediatric brain tumors: from images to insights.

Rai P, Ahmed S, Mahajan A

•papers•Aug 15 2025

Radiomics has emerged as a promising non-invasive imaging approach in pediatric neuro-oncology, offering the ability to extract high-dimensional quantitative features from routine MRI to support diagnosis, risk stratification, molecular characterization, and outcome prediction. Pediatric brain tumors, which differ significantly from adult tumors in biology and imaging appearance, present unique diagnostic and prognostic challenges. By integrating radiomics with machine learning algorithms, studies have demonstrated strong performance in classifying tumor types such as medulloblastoma, ependymoma, and gliomas, and predicting molecular subgroups and mutations such as H3K27M and BRAF. Recent studies combining radiomics with machine learning algorithms - including support vector machines, random forests, and deep learning CNNs - have demonstrated promising performance, with AUCs ranging from 0.75 to 0.98 for tumor classification and 0.77 to 0.88 for molecular subgroup prediction, across cohorts from 50 to over 450 patients, with internal cross-validation and external validation in some cases. In resource-limited settings or regions with limited radiologist manpower, radiomics-based tools could help augment diagnostic accuracy and consistency, serving as decision support to prioritize patients for further evaluation or biopsy. Emerging applications such as radio-immunomics and radio-pathomics may further enhance understanding of tumor biology but remain investigational. Despite its potential, clinical translation faces notable barriers, including limited pediatric-specific datasets, variable imaging protocols, and the lack of standardized, reproducible workflows. Multi-institutional collaboration, harmonized pipelines, and prospective validation are essential next steps. Radiomics should be viewed as a supplementary tool that complements existing clinical and pathological frameworks, supporting more informed and equitable care in pediatric brain tumor management.

MRI Classification Neurological Review In Silico

End-to-end deep learning for the diagnosis of pelvic and sacral tumors using non-enhanced MRI: a multi-center study.

Yin P, Liu K, Chen R, Liu Y, Lu L, Sun C, Liu Y, Zhang T, Zhong J, Chen W, Yu R, Wang D, Liu X, Hong N

•papers•Aug 15 2025

This study developed an end-to-end deep learning (DL) model using non-enhanced MRI to diagnose benign and malignant pelvic and sacral tumors (PSTs). Retrospective data from 835 patients across four hospitals were employed to train, validate, and test the models. Six diagnostic models with varied input sources were compared. Performance (AUC, accuracy/ACC) and reading times of three radiologists were compared. The proposed Model SEG-CL-NC achieved AUC/ACC of 0.823/0.776 (Internal Test Set 1) and 0.836/0.781 (Internal Test Set 2). In External Dataset Centers 2, 3, and 4, its ACC was 0.714, 0.740, and 0.756, comparable to contrast-enhanced models and radiologists (P > 0.05), while its diagnosis time was significantly shorter than radiologists (P < 0.01). Our results suggested that the proposed Model SEG-CL-NC could achieve comparable performance to contrast-enhanced models and radiologists in diagnosing benign and malignant PSTs, offering an accurate, efficient, and cost-effective tool for clinical practice.

MRI Classification Abdominal Retrospective Clinical In Silico Academic Lab

Benchmarking GPT-5 for Zero-Shot Multimodal Medical Reasoning in Radiology and Radiation Oncology

Mingzhe Hu, Zach Eidex, Shansong Wang, Mojtaba Safari, Qiang Li, Xiaofeng Yang

•preprint•Aug 15 2025

Radiology, radiation oncology, and medical physics require decision-making that integrates medical images, textual reports, and quantitative data under high-stakes conditions. With the introduction of GPT-5, it is critical to assess whether recent advances in large multimodal models translate into measurable gains in these safety-critical domains. We present a targeted zero-shot evaluation of GPT-5 and its smaller variants (GPT-5-mini, GPT-5-nano) against GPT-4o across three representative tasks. We present a targeted zero-shot evaluation of GPT-5 and its smaller variants (GPT-5-mini, GPT-5-nano) against GPT-4o across three representative tasks: (1) VQA-RAD, a benchmark for visual question answering in radiology; (2) SLAKE, a semantically annotated, multilingual VQA dataset testing cross-modal grounding; and (3) a curated Medical Physics Board Examination-style dataset of 150 multiple-choice questions spanning treatment planning, dosimetry, imaging, and quality assurance. Across all datasets, GPT-5 achieved the highest accuracy, with substantial gains over GPT-4o up to +20.00% in challenging anatomical regions such as the chest-mediastinal, +13.60% in lung-focused questions, and +11.44% in brain-tissue interpretation. On the board-style physics questions, GPT-5 attained 90.7% accuracy (136/150), exceeding the estimated human passing threshold, while GPT-4o trailed at 78.0%. These results demonstrate that GPT-5 delivers consistent and often pronounced performance improvements over GPT-4o in both image-grounded reasoning and domain-specific numerical problem-solving, highlighting its potential to augment expert workflows in medical imaging and therapeutic physics.

Mixed Modality Classification Methodology In Silico Academic Lab Benchmark SOTA GenAI

A Case Study on Colposcopy-Based Cervical Cancer Staging Reveals an Alarming Lack of Data Sharing Hindering the Adoption of Machine Learning in Clinical Practice

Schulz, M., Leha, A.

•preprint•Aug 15 2025

BackgroundThe inbuilt ability to adapt existing models to new applications has been one of the key drivers of the success of deep learning models. Thereby, sharing trained models is crucial for their adaptation to different populations and domains. Not sharing models prohibits validation and potentially following translation into clinical practice, and hinders scientific progress. In this paper we examine the current state of data and model sharing in the medical field using cervical cancer staging on colposcopy images as a case example. MethodsWe conducted a comprehensive literature search in PubMed to identify studies employing machine learning techniques in the analysis of colposcopy images. For studies where raw data was not directly accessible, we systematically inquired about accessing the pre-trained model weights and/or raw colposcopy image data by contacting the authors using various channels. ResultsWe included 46 studies and one publicly available dataset in our study. We retrieved data of the latter and inquired about data access for the 46 studies by contacting a total of 92 authors. We received 15 responses related to 14 studies (30%). The remaining 32 studies remained unresponsive (70%). Of the 15 responses received, two responses redirected our inquiry to other authors, two responses were initially pending, and 11 declined data sharing. Despite our follow-up efforts on all responses received, none of the inquiries led to actual data sharing (0%). The only available data source remained the publicly available dataset. ConclusionsDespite the long-standing demands for reproducible research and efforts to incentivize data sharing, such as the requirement of data availability statements, our case study reveals a persistent lack of data sharing culture. Reasons identified in this case study include a lack of resources to provide the data, data privacy concerns, ongoing trial registrations and low response rates to inquiries. Potential routes for improvement could include comprehensive data availability statements required by journals, data preparation and deposition in a repository as part of the publication process, an automatic maximal embargo time after which data will become openly accessible and data sharing rules set by funders.

Mixed Modality Classification Abdominal Review In Silico Academic Lab Reproducibility

Noninvasive prediction of microsatellite instability in stage II/III rectal cancer using dynamic contrast-enhanced magnetic resonance imaging radiomics.

Zheng CY, Zhang JM, Lin QS, Lian T, Shi LP, Chen JY, Cai YL

•papers•Aug 15 2025

Colorectal cancer stands among the most prevalent digestive system malignancies. The microsatellite instability (MSI) profile plays a crucial role in determining patient outcomes and therapy responsiveness. Traditional MSI evaluation methods require invasive tissue sampling, are lengthy, and can be compromised by intratumoral heterogeneity. To establish a non-invasive technique utilizing dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) radiomics and machine learning algorithms to determine MSI status in patients with intermediate-stage rectal cancer. This retrospective analysis examined 120 individuals diagnosed with stage II/III rectal cancer [30 MSI-high (MSI-H) and 90 microsatellite stability (MSS)/MSI-low (MSI-L) cases]. We extracted comprehensive radiomics signatures from DCE-MRI scans, encompassing textural parameters that reflect tumor heterogeneity, shape-based metrics, and histogram-derived statistical values. Least absolute shrinkage and selection operator regression facilitated feature selection, while predictive frameworks were developed using various classification algorithms (logistic regression, support vector machine, and random forest). Performance assessment utilized separate training and validation cohorts. Our investigation uncovered distinctive imaging characteristics between MSI-H and MSS/MSI-L neoplasms. MSI-H tumors exhibited significantly elevated entropy values (7.84 ± 0.92 vs 6.39 ± 0.83, P = 0.004), enhanced surface-to-volume proportions (0.72 ± 0.14 vs 0.58 ± 0.11, P = 0.008), and heightened signal intensity variation (3642 ± 782 vs 2815 ± 645, P = 0.007). The random forest model demonstrated superior classification capability with area under the curves (AUCs) of 0.891 and 0.896 across training and validation datasets, respectively. An integrated approach combining radiomics with clinical parameters further enhanced performance metrics (AUC 0.923 and 0.914), achieving 88.5% sensitivity alongside 87.2% specificity. DCE-MRI radiomics features interpreted through machine learning frameworks offer an effective strategy for MSI status assessment in intermediate-stage rectal cancer.

MRI Classification Abdominal Retrospective Clinical In Silico Academic Lab

A software ecosystem for brain tractometry processing, analysis, and insight.

Kruper J, Richie-Halford A, Qiao J, Gilmore A, Chang K, Grotheer M, Roy E, Caffarra S, Gomez T, Chou S, Cieslak M, Koudoro S, Garyfallidis E, Satthertwaite TD, Yeatman JD, Rokem A

•papers•Aug 14 2025

Tractometry uses diffusion-weighted magnetic resonance imaging (dMRI) to assess physical properties of brain connections. Here, we present an integrative ecosystem of software that performs all steps of tractometry: post-processing of dMRI data, delineation of major white matter pathways, and modeling of the tissue properties within them. This ecosystem also provides a set of interoperable and extensible tools for visualization and interpretation of the results that extract insights from these measurements. These include novel machine learning and statistical analysis methods adapted to the characteristic structure of tract-based data. We benchmark the performance of these statistical analysis methods in different datasets and analysis tasks, including hypothesis testing on group differences and predictive analysis of subject age. We also demonstrate that computational advances implemented in the software offer orders of magnitude of acceleration. Taken together, these open-source software tools-freely available at https://tractometry.org-provide a transformative environment for the analysis of dMRI data.

MRI Classification Neurological Methodology In Silico Academic Lab Open Code Open Dataset

A novel hybrid convolutional and recurrent neural network model for automatic pituitary adenoma classification using dynamic contrast-enhanced MRI.

Motamed M, Bastam M, Tabatabaie SM, Elhaie M, Shahbazi-Gahrouei D

•papers•Aug 14 2025

Pituitary adenomas, ranging from subtle microadenomas to mass-effect macroadenomas, pose diagnostic challenges for radiologists due to increasing scan volumes and the complexity of dynamic contrast-enhanced MRI interpretation. A hybrid CNN-LSTM model was trained and validated on a multi-center dataset of 2,163 samples from Tehran and Babolsar, Iran. Transfer learning and preprocessing techniques (e.g., Wiener filters) were utilized to improve classification performance for microadenomas (< 10 mm) and macroadenomas (> 10 mm). The model achieved 90.5% accuracy, an area under the receiver operating characteristic curve (AUROC) of 0.92, and 89.6% sensitivity (93.5% for microadenomas, 88.3% for macroadenomas), outperforming standard CNNs by 5-18% across metrics. With a processing time of 0.17 s per scan, the model demonstrated robustness to variations in imaging conditions, including scanner differences and contrast variations, excelling in real-time detection and differentiation of adenoma subtypes. This dual-path approach, the first to synergize spatial and temporal MRI features for pituitary diagnostics, offers high precision and efficiency. Supported by comparisons with existing models, it provides a scalable, reproducible tool to improve patient outcomes, with potential adaptability to broader neuroimaging challenges.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab

Filter Papers

Tags

FusionFM: Fusing Eye-specific Foundational Models for Optimized Ophthalmic Diagnosis

Spatio-temporal deep learning with temporal attention for indeterminate lung nodule classification.

SMAS: Structural MRI-based AD Score using Bayesian supervised VAE.

Radiomics in pediatric brain tumors: from images to insights.

End-to-end deep learning for the diagnosis of pelvic and sacral tumors using non-enhanced MRI: a multi-center study.

Benchmarking GPT-5 for Zero-Shot Multimodal Medical Reasoning in Radiology and Radiation Oncology

A Case Study on Colposcopy-Based Cervical Cancer Staging Reveals an Alarming Lack of Data Sharing Hindering the Adoption of Machine Learning in Clinical Practice

Noninvasive prediction of microsatellite instability in stage II/III rectal cancer using dynamic contrast-enhanced magnetic resonance imaging radiomics.

A software ecosystem for brain tractometry processing, analysis, and insight.

A novel hybrid convolutional and recurrent neural network model for automatic pituitary adenoma classification using dynamic contrast-enhanced MRI.

Ready to Sharpen Your Edge?