Latest Papers on Radiology AI. Tags: Classification

Using a Large Language Model for Breast Imaging Reporting and Data System Classification and Malignancy Prediction to Enhance Breast Ultrasound Diagnosis: Retrospective Study.

Miaojiao S, Xia L, Xian Tao Z, Zhi Liang H, Sheng C, Songsong W

•papers•Jun 11 2025

Breast ultrasound is essential for evaluating breast nodules, with Breast Imaging Reporting and Data System (BI-RADS) providing standardized classification. However, interobserver variability among radiologists can affect diagnostic accuracy. Large language models (LLMs) like ChatGPT-4 have shown potential in medical imaging interpretation. This study explores its feasibility in improving BI-RADS classification consistency and malignancy prediction compared to radiologists. This study aims to evaluate the feasibility of using LLMs, particularly ChatGPT-4, to assess the consistency and diagnostic accuracy of standardized breast ultrasound imaging reports, using pathology as the reference standard. This retrospective study analyzed breast nodule ultrasound data from 671 female patients (mean 45.82, SD 9.20 years; range 26-75 years) who underwent biopsy or surgical excision at our hospital between June 2019 and June 2024. ChatGPT-4 was used to interpret BI-RADS classifications and predict benign versus malignant nodules. The study compared the model's performance to that of two senior radiologists (≥15 years of experience) and two junior radiologists (<5 years of experience) using key diagnostic metrics, including accuracy, sensitivity, specificity, area under the receiver operating characteristic curve, P values, and odds ratios with 95% CIs. Two diagnostic models were evaluated: (1) image interpretation model, where ChatGPT-4 classified nodules based on BI-RADS features, and (2) image-to-text-LLM model, where radiologists provided textual descriptions, and ChatGPT-4 determined malignancy probability based on keywords. Radiologists were blinded to pathological outcomes, and BI-RADS classifications were finalized through consensus. ChatGPT-4 achieved an overall BI-RADS classification accuracy of 96.87%, outperforming junior radiologists (617/671, 91.95% and 604/671, 90.01%, P<.01). For malignancy prediction, ChatGPT-4 achieved an area under the receiver operating characteristic curve of 0.82 (95% CI 0.79-0.85), an accuracy of 80.63% (541/671 cases), a sensitivity of 90.56% (259/286 cases), and a specificity of 73.51% (283/385 cases). The image interpretation model demonstrated performance comparable to senior radiologists, while the image-to-text-LLM model further improved diagnostic accuracy for all radiologists, increasing their sensitivity and specificity significantly (P<.001). Statistical analyses, including the McNemar test and DeLong test, confirmed that ChatGPT-4 outperformed junior radiologists (P<.01) and showed noninferiority compared to senior radiologists (P>.05). Pathological diagnoses served as the reference standard, ensuring robust evaluation reliability. Integrating ChatGPT-4 into an image-to-text-LLM workflow improves BI-RADS classification accuracy and supports radiologists in breast ultrasound diagnostics. These results demonstrate its potential as a decision-support tool to enhance diagnostic consistency and reduce variability.

Ultrasound Classification Breast Retrospective Clinical In Silico Academic Lab GenAI

Non-invasive prediction of nuclear grade in renal cell carcinoma using CT-Based radiomics: a systematic review and meta-analysis.

Salimi M, Hajikarimloo B, Vadipour P, Abdolizadeh A, Fayedeh F, Seifi S

•papers•Jun 11 2025

Renal cell carcinoma (RCC) represents the most prevalent malignant neoplasm of the kidney, with a rising global incidence. Tumor nuclear grade is a crucial prognostic factor, guiding treatment decisions, but current histopathological grading via biopsy is invasive and prone to sampling errors. This study aims to assess the diagnostic performance and quality of CT-based radiomics for preoperatively predicting RCC nuclear grade. A comprehensive search was conducted across PubMed, Scopus, Embase, and Web of Science to identify relevant studies up until 19 April 2025. Quality was assessed using the QUADAS-2 and METRICS tools. A bivariate random-effects meta-analysis was performed to evaluate model performance, including sensitivity, specificity, and Area Under the Curve (AUC). Results from separate validation cohorts were pooled, and clinical and combined models were analyzed separately in distinct analyses. A total of 26 studies comprising 1993 individuals in 10 external and 16 internal validation cohorts were included. Meta-analysis of radiomics models showed pooled AUC of 0.88, sensitivity of 0.78, and specificity of 0.82. Clinical and combined (clinical-radiomics) models showed AUCs of 0.73 and 0.86, respectively. QUADAS-2 revealed significant risk of bias in the Index Test and Flow and Timing domains. METRICS scores ranged from 49.7 to 88.4%, with an average of 66.65%, indicating overall good quality, though gaps in some aspects of study methodologies were identified. This study suggests that radiomics models show great potential and diagnostic accuracy for non-invasive preoperative nuclear grading of RCC. However, challenges related to generalizability and clinical applicability remain, as further research with standardized methodologies, external validation, and larger cohorts is needed to enhance their reliability and integration into routine clinical practice.

CT Classification Abdominal Meta Analysis In Silico Academic Lab Benchmark SOTA

Towards Practical Alzheimer's Disease Diagnosis: A Lightweight and Interpretable Spiking Neural Model

Changwei Wu, Yifei Chen, Yuxin Du, Jinying Zong, Jie Dong, Mingxuan Liu, Yong Peng, Jin Fan, Feiwei Qin, Changmiao Wang

•preprint•Jun 11 2025

Early diagnosis of Alzheimer's Disease (AD), especially at the mild cognitive impairment (MCI) stage, is vital yet hindered by subjective assessments and the high cost of multimodal imaging modalities. Although deep learning methods offer automated alternatives, their energy inefficiency and computational demands limit real-world deployment, particularly in resource-constrained settings. As a brain-inspired paradigm, spiking neural networks (SNNs) are inherently well-suited for modeling the sparse, event-driven patterns of neural degeneration in AD, offering a promising foundation for interpretable and low-power medical diagnostics. However, existing SNNs often suffer from weak expressiveness and unstable training, which restrict their effectiveness in complex medical tasks. To address these limitations, we propose FasterSNN, a hybrid neural architecture that integrates biologically inspired LIF neurons with region-adaptive convolution and multi-scale spiking attention. This design enables sparse, efficient processing of 3D MRI while preserving diagnostic accuracy. Experiments on benchmark datasets demonstrate that FasterSNN achieves competitive performance with substantially improved efficiency and stability, supporting its potential for practical AD screening. Our source code is available at https://github.com/wuchangw/FasterSNN.

MRI Classification Neurological Methodology In Silico Academic Lab Open Code

Towards more reliable prostate cancer detection: Incorporating clinical data and uncertainty in MRI deep learning.

Taguelmimt K, Andrade-Miranda G, Harb H, Thanh TT, Dang HP, Malavaud B, Bert J

•papers•Jun 11 2025

Prostate cancer (PCa) is one of the most common cancers among men, and artificial intelligence (AI) is emerging as a promising tool to enhance its diagnosis. This work proposes a classification approach for PCa cases using deep learning techniques. We conducted a comparison between unimodal models based either on biparametric magnetic resonance imaging (bpMRI) or clinical data (such as prostate-specific antigen levels, prostate volume, and age). We also introduced a bimodal model that simultaneously integrates imaging and clinical data to address the limitations of unimodal approaches. Furthermore, we propose a framework that not only detects the presence of PCa but also evaluates the uncertainty associated with the predictions. This approach makes it possible to identify highly confident predictions and distinguish them from those characterized by uncertainty, thereby enhancing the reliability and applicability of automated medical decisions in clinical practice. The results show that the bimodal model significantly improves performance, with an area under the curve (AUC) reaching 0.82±0.03, a sensitivity of 0.73±0.04, while maintaining high specificity. Uncertainty analysis revealed that the bimodal model produces more confident predictions, with an uncertainty accuracy of 0.85, surpassing the imaging-only model (which is 0.71). This increase in reliability is crucial in a clinical context, where precise and dependable diagnostic decisions are essential for patient care. The integration of clinical data with imaging data in a bimodal model not only improves diagnostic performance but also strengthens the reliability of predictions, making this approach particularly suitable for clinical use.

MRI Classification Abdominal Methodology In Silico Academic Lab Benchmark SOTA

AI-based radiomic features predict outcomes and the added benefit of chemoimmunotherapy over chemotherapy in extensive stage small cell lung cancer: A Multi-institutional study.

Khorrami M, Mutha P, Barrera C, Viswanathan VS, Ardeshir-Larijani F, Jain P, Higgins K, Madabhushi A

•papers•Jun 11 2025

Small cell lung cancer (SCLC) is aggressive with poor survival outcomes, and most patients develop resistance to chemotherapy. No predictive biomarkers currently guide therapy. This study evaluates radiomic features to predict PFS and OS in limited-stage SCLC (LS-SCLC) and assesses PFS, OS, and the added benefit of chemoimmunotherapy (CHIO) in extensive-stage SCLC (ES-SCLC). A total of 660 SCLC patients (470 ES-SCLC, 190 LS-SCLC) from three sites were analyzed. LS-SCLC patients received chemotherapy and radiation, while ES-SCLC patients received either chemotherapy alone or chemoimmunotherapy. Radiomic and quantitative vasculature tortuosity features were extracted from CT scans. A LASSO-Cox regression model was used to construct the ES- Risk-Score (ESRS) and LS- Risk-Score (LSRS). ESRS was associated with PFS in training (HR = 1.54, adj. P = .0013) and validation sets (HR = 1.32, adj. P = .0001; HR = 2.4, adj. P = .0073) and with OS in training (HR = 1.37, adj. P = .0054) and validation sets (HR = 1.35, adj. P < .0006; HR = 1.6, adj. P < .0085) in ES-SCLC patients treated with chemotherapy. High-risk patients had improved PFS (HR = 0.68, adj. P < .001) and OS (HR = 0.78, adj. P = .026) with chemoimmunotherapy. LSRS was associated with PFS in training and validation sets (HR = 1.9, adj. P = .007; HR = 1.4, adj. P = .0098; HR = 2.1, adj. P = .028) in LS-SCLC patients receiving chemoradiation. Radiomics is prognostic for PFS and OS and predicts chemoimmunotherapy benefit in high-risk ES-SCLC patients.

CT Classification Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

RCMIX model based on pre-treatment MRI imaging predicts T-downstage in MRI-cT4 stage rectal cancer.

Bai F, Liao L, Tang Y, Wu Y, Wang Z, Zhao H, Huang J, Wang X, Ding P, Wu X, Cai Z

•papers•Jun 11 2025

Neoadjuvant therapy (NAT) is the standard treatment strategy for MRI-defined cT4 rectal cancer. Predicting tumor regression can guide the resection plane to some extent. Here, we covered pre-treatment MRI imaging of 363 cT4 rectal cancer patients receiving NAT and radical surgery from three hospitals: Center 1 (n = 205), Center 2 (n = 109) and Center 3 (n = 52). We propose a machine learning model named RCMIX, which incorporates a multilayer perceptron algorithm based on 19 pre-treatment MRI radiomic features and 2 clinical features in cT4 rectal cancer patients receiving NAT. The model was trained on 205 cases of cT4 rectal cancer patients, achieving an AUC of 0.903 (95% confidence interval, 0.861-0.944) in predicting T-downstage. It also achieved AUC of 0.787 (0.699-0.874) and 0.773 (0.646-0.901) in two independent test cohorts, respectively. cT4 rectal cancer patients who were predicted as Well T-downstage by the RCMIX model had significantly better disease-free survival than those predicted as Poor T-downstage. Our study suggests that the RCMIX model demonstrates satisfactory performance in predicting T-downstage by NAT for cT4 rectal cancer patients, which may provide critical insights to improve surgical strategies.

MRI Classification Abdominal Retrospective Clinical In Silico Academic Lab

Towards a general-purpose foundation model for fMRI analysis

Cheng Wang, Yu Jiang, Zhihao Peng, Chenxin Li, Changbae Bang, Lin Zhao, Jinglei Lv, Jorge Sepulcre, Carl Yang, Lifang He, Tianming Liu, Daniel Barron, Quanzheng Li, Randy Hirschtick, Byung-Hoon Kim, Xiang Li, Yixuan Yuan

•preprint•Jun 11 2025

Functional Magnetic Resonance Imaging (fMRI) is essential for studying brain function and diagnosing neurological disorders, but current analysis methods face reproducibility and transferability issues due to complex pre-processing and task-specific models. We introduce NeuroSTORM (Neuroimaging Foundation Model with Spatial-Temporal Optimized Representation Modeling), a generalizable framework that directly learns from 4D fMRI volumes and enables efficient knowledge transfer across diverse applications. NeuroSTORM is pre-trained on 28.65 million fMRI frames (>9,000 hours) from over 50,000 subjects across multiple centers and ages 5 to 100. Using a Mamba backbone and a shifted scanning strategy, it efficiently processes full 4D volumes. We also propose a spatial-temporal optimized pre-training approach and task-specific prompt tuning to improve transferability. NeuroSTORM outperforms existing methods across five tasks: age/gender prediction, phenotype prediction, disease diagnosis, fMRI-to-image retrieval, and task-based fMRI classification. It demonstrates strong clinical utility on datasets from hospitals in the U.S., South Korea, and Australia, achieving top performance in disease diagnosis and cognitive phenotype prediction. NeuroSTORM provides a standardized, open-source foundation model to improve reproducibility and transferability in fMRI-based clinical research.

MRI Classification Neurological Methodology In Silico Academic Lab Open Code Benchmark SOTA

Test-Time-Scaling for Zero-Shot Diagnosis with Visual-Language Reasoning

Ji Young Byun, Young-Jin Park, Navid Azizan, Rama Chellappa

•preprint•Jun 11 2025

As a cornerstone of patient care, clinical decision-making significantly influences patient outcomes and can be enhanced by large language models (LLMs). Although LLMs have demonstrated remarkable performance, their application to visual question answering in medical imaging, particularly for reasoning-based diagnosis, remains largely unexplored. Furthermore, supervised fine-tuning for reasoning tasks is largely impractical due to limited data availability and high annotation costs. In this work, we introduce a zero-shot framework for reliable medical image diagnosis that enhances the reasoning capabilities of LLMs in clinical settings through test-time scaling. Given a medical image and a textual prompt, a vision-language model processes a medical image along with a corresponding textual prompt to generate multiple descriptions or interpretations of visual features. These interpretations are then fed to an LLM, where a test-time scaling strategy consolidates multiple candidate outputs into a reliable final diagnosis. We evaluate our approach across various medical imaging modalities -- including radiology, ophthalmology, and histopathology -- and demonstrate that the proposed test-time scaling strategy enhances diagnostic accuracy for both our and baseline methods. Additionally, we provide an empirical analysis showing that the proposed approach, which allows unbiased prompting in the first stage, improves the reliability of LLM-generated diagnoses and enhances classification accuracy.

Mixed Modality Classification Methodology In Silico Academic Lab GenAI

ADAgent: LLM Agent for Alzheimer's Disease Analysis with Collaborative Coordinator

Wenlong Hou, Gangqian Yang, Ye Du, Yeung Lau, Lihao Liu, Junjun He, Ling Long, Shujun Wang

•preprint•Jun 11 2025

Alzheimer's disease (AD) is a progressive and irreversible neurodegenerative disease. Early and precise diagnosis of AD is crucial for timely intervention and treatment planning to alleviate the progressive neurodegeneration. However, most existing methods rely on single-modality data, which contrasts with the multifaceted approach used by medical experts. While some deep learning approaches process multi-modal data, they are limited to specific tasks with a small set of input modalities and cannot handle arbitrary combinations. This highlights the need for a system that can address diverse AD-related tasks, process multi-modal or missing input, and integrate multiple advanced methods for improved performance. In this paper, we propose ADAgent, the first specialized AI agent for AD analysis, built on a large language model (LLM) to address user queries and support decision-making. ADAgent integrates a reasoning engine, specialized medical tools, and a collaborative outcome coordinator to facilitate multi-modal diagnosis and prognosis tasks in AD. Extensive experiments demonstrate that ADAgent outperforms SOTA methods, achieving significant improvements in accuracy, including a 2.7% increase in multi-modal diagnosis, a 0.7% improvement in multi-modal prognosis, and enhancements in MRI and PET diagnosis tasks.

Mixed Modality Classification Neurological Methodology In Silico GenAI Benchmark SOTA

ADAgent: LLM Agent for Alzheimer's Disease Analysis with Collaborative Coordinator

Wenlong Hou, Guangqian Yang, Ye Du, Yeung Lau, Lihao Liu, Junjun He, Ling Long, Shujun Wang

•preprint•Jun 11 2025

Mixed Modality Classification Neurological Methodology In Silico Academic Lab GenAI

Filter Papers

Tags

Using a Large Language Model for Breast Imaging Reporting and Data System Classification and Malignancy Prediction to Enhance Breast Ultrasound Diagnosis: Retrospective Study.

Non-invasive prediction of nuclear grade in renal cell carcinoma using CT-Based radiomics: a systematic review and meta-analysis.

Towards Practical Alzheimer's Disease Diagnosis: A Lightweight and Interpretable Spiking Neural Model

Towards more reliable prostate cancer detection: Incorporating clinical data and uncertainty in MRI deep learning.

AI-based radiomic features predict outcomes and the added benefit of chemoimmunotherapy over chemotherapy in extensive stage small cell lung cancer: A Multi-institutional study.

RCMIX model based on pre-treatment MRI imaging predicts T-downstage in MRI-cT4 stage rectal cancer.

Towards a general-purpose foundation model for fMRI analysis

Test-Time-Scaling for Zero-Shot Diagnosis with Visual-Language Reasoning

ADAgent: LLM Agent for Alzheimer's Disease Analysis with Collaborative Coordinator

ADAgent: LLM Agent for Alzheimer's Disease Analysis with Collaborative Coordinator

Ready to Sharpen Your Edge?