Latest Papers on Radiology AI. Tags: Classification

Non-Invasive Detection of PROState Cancer with Novel Time-Dependent Diffusion MRI and AI-Enhanced Quantitative Radiological Interpretation: PROS-TD-AI

Baltasar Ramos, Cristian Garrido, Paulette Narv'aez, Santiago Gelerstein Claro, Haotian Li, Rafael Salvador, Constanza V'asquez-Venegas, Iv'an Gallegos, Yi Zhang, V'ictor Casta~neda, Cristian Acevedo, Dan Wu, Gonzalo C'ardenas, Camilo G. Sotomayor

•preprint•Sep 29 2025

Prostate cancer (PCa) is the most frequently diagnosed malignancy in men and the eighth leading cause of cancer death worldwide. Multiparametric MRI (mpMRI) has become central to the diagnostic pathway for men at intermediate risk, improving de-tection of clinically significant PCa (csPCa) while reducing unnecessary biopsies and over-diagnosis. However, mpMRI remains limited by false positives, false negatives, and moderate to substantial interobserver agreement. Time-dependent diffusion (TDD) MRI, a novel sequence that enables tissue microstructure characterization, has shown encouraging preclinical performance in distinguishing clinically significant from insignificant PCa. Combining TDD-derived metrics with machine learning may provide robust, zone-specific risk prediction with less dependence on reader training and improved accuracy compared to current standard-of-care. This study protocol out-lines the rationale and describes the prospective evaluation of a home-developed AI-enhanced TDD-MRI software (PROSTDAI) in routine diagnostic care, assessing its added value against PI-RADS v2.1 and validating results against MRI-guided prostate biopsy.

MRI Classification Abdominal Prospective Clinical Pilot Academic Lab

Convolutional neural network models of structural MRI for discriminating categories of cognitive impairment: a systematic review and meta-analysis.

Dong X, Li Y, Hao J, Zhou P, Yang C, Ai Y, He M, Zhang W, Hu H

•papers•Sep 29 2025

Alzheimer's disease (AD) and mild cognitive impairment (MCI) pose significant challenges to public health and underscore the need for accurate and early diagnostic tools. Structural magnetic resonance imaging (sMRI) combined with advanced analytical techniques like convolutional neural networks (CNNs) seemed to offer a promising avenue for the diagnosis of these conditions. This systematic review and meta-analysis aimed to evaluate the diagnostic performance of CNN algorithms applied to sMRI data in differentiating between AD, MCI, and normal cognition (NC). Following the PRISMA-DTA guidelines, a comprehensive literature search was carried out in PubMed and Web of Science databases for studies published between 2018 and 2024. Studies were included if they employed CNNs for the diagnostic classification of sMRI data from participants with AD, MCI, or NC. The methodological quality of the included studies was assessed using the QUADAS-2 and METRICS tools. Data extraction and statistical analysis were performed to calculate pooled diagnostic accuracy metrics. A total of 21 studies were included in the study, comprising 16,139 participants in the analysis. The pooled sensitivity and specificity of CNN algorithms for differentiating AD from NC were 0.92 and 0.91, respectively. For distinguishing MCI from NC, the pooled sensitivity and specificity were 0.74 and 0.79, respectively. The algorithms also showed a moderate ability to differentiate AD from MCI, with a pooled sensitivity and specificity of 0.73 and 0.79, respectively. In the pMCI versus sMCI classification, a pooled sensitivity was 0.69 and a specificity was 0.81. Heterogeneity across studies was significant, as indicated by meta-regression results. CNN algorithms demonstrated promising diagnostic performance in differentiating AD, MCI, and NC using sMRI data. The highest accuracy was observed in distinguishing AD from NC and the lowest accuracy observed in distinguishing pMCI from sMCI. These findings suggest that CNN-based radiomics has the potential to serve as a valuable tool in the diagnostic armamentarium for neurodegenerative diseases. However, the heterogeneity among studies indicates a need for further methodological refinement and validation. This systematic review was registered in PROSPERO (Registration ID: CRD42022295408).

MRI Classification Neurological Meta Analysis In Silico Benchmark SOTA

Predictive Value of MRI Radiomics for the Efficacy of High-Intensity Focused Ultrasound (HIFU) Ablation in Uterine Fibroids: A Systematic Review and Meta-Analysis.

Salimi M, Abdolizadeh A, Fayedeh F, Vadipour P

•papers•Sep 29 2025

High-Intensity Focused Ultrasound (HIFU) ablation has emerged as a non-invasive treatment option for uterine fibroids that preserves fertility and offers faster recovery. Pre-intervention prediction of HIFU efficacy can augment clinical decision-making and patient management. This systematic review and meta-analysis aims to evaluate the performance of MRI-based radiomics machine learning (ML) models in predicting the efficacy of HIFU ablation in uterine fibroids. Studies were retrieved by conducting a thorough literature search across databases including PubMed, Scopus, Embase, and Web of Science, up to June 2025. The quality of the included studies was assessed using the QUADAS-2 and METRICS tools. A meta-analysis of the radiomics models was conducted to pool sensitivity, specificity, and AUC using a bivariate random-effects model. A total of 13 studies were incorporated in the systematic review and meta-analysis. Meta-analysis of 608 patients from 7 internal and 6 external validation cohorts showed pooled AUC, sensitivity, and specificity of 0.84, 77%, and 78%, respectively. QUADAS-2 was notable for significant methodological biases in the index test and flow and timing domains. Across all studies, the mean METRICS score was 76.93%-with a range of 54.9%-90.3%-denoting good overall quality and performance in most domains but with notable gaps in the open science domain. MRI-based radiomics models show promise in predicting the effectiveness of HIFU ablation for uterine fibroids. However, limitations such as limited geographic diversity, inconsistent reporting standards, and poor open science practices hinder broader application. Therefore, future research should focus on standardizing imaging protocols, using multi-center designs with external validation, and integrating diverse data sources.

MRI Classification Abdominal Meta Analysis In Silico Academic Lab Benchmark SOTA Reproducibility

MetaChest: Generalized few-shot learning of patologies from chest X-rays

Berenice Montalvo-Lezama, Gibran Fuentes-Pineda

•preprint•Sep 29 2025

The limited availability of annotated data presents a major challenge for applying deep learning methods to medical image analysis. Few-shot learning methods aim to recognize new classes from only a small number of labeled examples. These methods are typically studied under the standard few-shot learning setting, where all classes in a task are new. However, medical applications such as pathology classification from chest X-rays often require learning new classes while simultaneously leveraging knowledge of previously known ones, a scenario more closely aligned with generalized few-shot classification. Despite its practical relevance, few-shot learning has been scarcely studied in this context. In this work, we present MetaChest, a large-scale dataset of 479,215 chest X-rays collected from four public databases. MetaChest includes a meta-set partition specifically designed for standard few-shot classification, as well as an algorithm for generating multi-label episodes. We conduct extensive experiments evaluating both a standard transfer learning approach and an extension of ProtoNet across a wide range of few-shot multi-label classification tasks. Our results demonstrate that increasing the number of classes per episode and the number of training examples per class improves classification performance. Notably, the transfer learning approach consistently outperforms the ProtoNet extension, despite not being tailored for few-shot learning. We also show that higher-resolution images improve accuracy at the cost of additional computation, while efficient model architectures achieve comparable performance to larger models with significantly reduced resource requirements.

X-Ray Classification Chest Dataset Release In Silico Open Dataset Benchmark SOTA

Radiology's Last Exam (RadLE): Benchmarking Frontier Multimodal AI Against Human Experts and a Taxonomy of Visual Reasoning Errors in Radiology

Suvrankar Datta, Divya Buchireddygari, Lakshmi Vennela Chowdary Kaza, Mrudula Bhalke, Kautik Singh, Ayush Pandey, Sonit Sai Vasipalli, Upasana Karnwal, Hakikat Bir Singh Bhatti, Bhavya Ratan Maroo, Sanjana Hebbar, Rahul Joseph, Gurkawal Kaur, Devyani Singh, Akhil V, Dheeksha Devasya Shama Prasad, Nishtha Mahajan, Ayinaparthi Arisha, Rajesh Vanagundi, Reet Nandy, Kartik Vuthoo, Snigdhaa Rajvanshi, Nikhileswar Kondaveeti, Suyash Gunjal, Rishabh Jain, Rajat Jain, Anurag Agrawal

•preprint•Sep 29 2025

Generalist multimodal AI systems such as large language models (LLMs) and vision language models (VLMs) are increasingly accessed by clinicians and patients alike for medical image interpretation through widely available consumer-facing chatbots. Most evaluations claiming expert level performance are on public datasets containing common pathologies. Rigorous evaluation of frontier models on difficult diagnostic cases remains limited. We developed a pilot benchmark of 50 expert-level "spot diagnosis" cases across multiple imaging modalities to evaluate the performance of frontier AI models against board-certified radiologists and radiology trainees. To mirror real-world usage, the reasoning modes of five popular frontier AI models were tested through their native web interfaces, viz. OpenAI o3, OpenAI GPT-5, Gemini 2.5 Pro, Grok-4, and Claude Opus 4.1. Accuracy was scored by blinded experts, and reproducibility was assessed across three independent runs. GPT-5 was additionally evaluated across various reasoning modes. Reasoning quality errors were assessed and a taxonomy of visual reasoning errors was defined. Board-certified radiologists achieved the highest diagnostic accuracy (83%), outperforming trainees (45%) and all AI models (best performance shown by GPT-5: 30%). Reliability was substantial for GPT-5 and o3, moderate for Gemini 2.5 Pro and Grok-4, and poor for Claude Opus 4.1. These findings demonstrate that advanced frontier models fall far short of radiologists in challenging diagnostic cases. Our benchmark highlights the present limitations of generalist AI in medical imaging and cautions against unsupervised clinical use. We also provide a qualitative analysis of reasoning traces and propose a practical taxonomy of visual reasoning errors by AI models for better understanding their failure modes, informing evaluation standards and guiding more robust model development.

Mixed Modality Classification Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Evaluating Temperature Scaling Calibration Effectiveness for CNNs under Varying Noise Levels in Brain Tumour Detection

Ankur Chanda, Kushan Choudhury, Shubhrodeep Roy, Shubhajit Biswas, Somenath Kuiry

•preprint•Sep 29 2025

Precise confidence estimation in deep learning is vital for high-stakes fields like medical imaging, where overconfident misclassifications can have serious consequences. This work evaluates the effectiveness of Temperature Scaling (TS), a post-hoc calibration technique, in improving the reliability of convolutional neural networks (CNNs) for brain tumor classification. We develop a custom CNN and train it on a merged brain MRI dataset. To simulate real-world uncertainty, five types of image noise are introduced: Gaussian, Poisson, Salt & Pepper, Speckle, and Uniform. Model performance is evaluated using precision, recall, F1-score, accuracy, negative log-likelihood (NLL), and expected calibration error (ECE), both before and after calibration. Results demonstrate that TS significantly reduces ECE and NLL under all noise conditions without degrading classification accuracy. This underscores TS as an effective and computationally efficient approach to enhance decision confidence of medical AI systems, hence making model outputs more reliable in noisy or uncertain settings.

MRI Classification Neurological Methodology In Silico Academic Lab

Precision medicine in prostate cancer: individualized treatment through radiomics, genomics, and biomarkers.

Min K, Lin Q, Qiu D

•papers•Sep 29 2025

Prostate cancer (PCa) is one of the most common malignancies threatening men's health globally. A comprehensive and integrated approach is essential for its early screening, diagnosis, risk stratification, treatment guidance, and efficacy assessment. Radiomics, leveraging multi-parametric magnetic resonance imaging (mpMRI) and positron emission tomography/computed tomography (PET/CT), has demonstrated significant clinical value in the non-invasive diagnosis, aggressiveness assessment, and prognosis prediction of PCa, with substantial potential when combined with artificial intelligence. In genomics, mutations or deletions in genes such as TMPRSS2-ERG, PTEN, RB1, TP53, and DNA damage repair genes (e.g., BRCA1/2) are closely associated with disease development and progression, holding profound implications for diagnosis, treatment, and prognosis. Concurrently, biomarkers like prostate-specific antigen (PSA), novel urinary markers (e.g., PCA3), and circulating tumor cells (CTCs) are widely utilized in PCa research and management. Integrating these technologies into personalized treatment plans and the broader framework of precision medicine allows for an in-depth exploration of the relationship between specific biomarkers and disease pathogenesis. This review summarizes the current research on radiomics, genomics, and biomarkers in PCa, and discusses their future potential and applications in advancing individualized patient care.

Mixed Modality Classification Abdominal Review Concept GenAI

Evaluation of Context-Aware Prompting Techniques for Classification of Tumor Response Categories in Radiology Reports Using Large Language Model.

Park J, Sim WS, Yu JY, Park YR, Lee YH

•papers•Sep 29 2025

Radiology reports are essential for medical decision-making, providing crucial data for diagnosing diseases, devising treatment plans, and monitoring disease progression. While large language models (LLMs) have shown promise in processing free-text reports, research on effective prompting techniques for radiologic applications remains limited. To evaluate the effectiveness of LLM-driven classification based on radiology reports in terms of tumor response category (TRC), and to optimize the model through a comparison of four different prompt engineering techniques for effectively performing this classification task in clinical applications, we included 3062 whole-spine contrast-enhanced magnetic resonance imaging (MRI) radiology reports for prompt engineering and validation. TRCs were labeled by two radiologists based on criteria modified from the Response Evaluation Criteria in Solid Tumors (RECIST) guidelines. The Llama3 instruct model was used to classify TRCs in this study through four different prompts: General, In-Context Learning (ICL), Chain-of-Thought (CoT), and ICL with CoT. AUROC, accuracy, precision, recall, and F1-score were calculated against each prompt and model (8B, 70B) with the test report dataset. The average AUROC for ICL (0.96 internally, 0.93 externally) and ICL with CoT prompts (0.97 internally, 0.94 externally) outperformed other prompts. Error increased with prompt complexity, including 0.8% incomplete sentence errors and 11.3% probability-classification inconsistencies. This study demonstrates that context-aware LLM prompts substantially improved the efficiency and effectiveness of classifying TRCs from radiology reports, despite potential intrinsic hallucinations. While further improvements are required for real-world application, our findings suggest that context-aware prompts have significant potential for segmenting complex radiology reports and enhancing oncology clinical workflows.

MRI Classification Retrospective Clinical In Silico Academic Lab GenAI

MMRQA: Signal-Enhanced Multimodal Large Language Models for MRI Quality Assessment

Fankai Jia, Daisong Gan, Zhe Zhang, Zhaochi Wen, Chenchen Dan, Dong Liang, Haifeng Wang

•preprint•Sep 29 2025

Magnetic resonance imaging (MRI) quality assessment is crucial for clinical decision-making, yet remains challenging due to data scarcity and protocol variability. Traditional approaches face fundamental trade-offs: signal-based methods like MRIQC provide quantitative metrics but lack semantic understanding, while deep learning approaches achieve high accuracy but sacrifice interpretability. To address these limitations, we introduce the Multimodal MRI Quality Assessment (MMRQA) framework, pioneering the integration of multimodal large language models (MLLMs) with acquisition-aware signal processing. MMRQA combines three key innovations: robust metric extraction via MRQy augmented with simulated artifacts, structured transformation of metrics into question-answer pairs using Qwen, and parameter-efficient fusion through Low-Rank Adaptation (LoRA) of LLaVA-OneVision. Evaluated on MR-ART, FastMRI, and MyConnectome benchmarks, MMRQA achieves state-of-the-art performance with strong zero-shot generalization, as validated by comprehensive ablation studies. By bridging quantitative analysis with semantic reasoning, our framework generates clinically interpretable outputs that enhance quality control in dynamic medical settings.

MRI Classification Methodology In Silico Academic Lab Benchmark SOTA

Accelerating Cerebral Diagnostics with BrainFusion: A Comprehensive MRI Tumor Framework

Walid Houmaidi, Youssef Sabiri, Salmane El Mansour Billah, Amine Abouaomar

•preprint•Sep 29 2025

The early and accurate classification of brain tumors is crucial for guiding effective treatment strategies and improving patient outcomes. This study presents BrainFusion, a significant advancement in brain tumor analysis using magnetic resonance imaging (MRI) by combining fine-tuned convolutional neural networks (CNNs) for tumor classification--including VGG16, ResNet50, and Xception--with YOLOv8 for precise tumor localization with bounding boxes. Leveraging the Brain Tumor MRI Dataset, our experiments reveal that the fine-tuned VGG16 model achieves test accuracy of 99.86%, substantially exceeding previous benchmarks. Beyond setting a new accuracy standard, the integration of bounding-box localization and explainable AI techniques further enhances both the clinical interpretability and trustworthiness of the system's outputs. Overall, this approach underscores the transformative potential of deep learning in delivering faster, more reliable diagnoses, ultimately contributing to improved patient care and survival rates.

MRI Classification Neurological Methodology In Silico Benchmark SOTA

Filter Papers

Tags

Non-Invasive Detection of PROState Cancer with Novel Time-Dependent Diffusion MRI and AI-Enhanced Quantitative Radiological Interpretation: PROS-TD-AI

Convolutional neural network models of structural MRI for discriminating categories of cognitive impairment: a systematic review and meta-analysis.

Predictive Value of MRI Radiomics for the Efficacy of High-Intensity Focused Ultrasound (HIFU) Ablation in Uterine Fibroids: A Systematic Review and Meta-Analysis.

MetaChest: Generalized few-shot learning of patologies from chest X-rays

Radiology's Last Exam (RadLE): Benchmarking Frontier Multimodal AI Against Human Experts and a Taxonomy of Visual Reasoning Errors in Radiology

Evaluating Temperature Scaling Calibration Effectiveness for CNNs under Varying Noise Levels in Brain Tumour Detection

Precision medicine in prostate cancer: individualized treatment through radiomics, genomics, and biomarkers.

Evaluation of Context-Aware Prompting Techniques for Classification of Tumor Response Categories in Radiology Reports Using Large Language Model.

MMRQA: Signal-Enhanced Multimodal Large Language Models for MRI Quality Assessment

Accelerating Cerebral Diagnostics with BrainFusion: A Comprehensive MRI Tumor Framework

Ready to Sharpen Your Edge?