Sort by:
Page 1 of 18172 results
Next

Performance of GPT-5 in Brain Tumor MRI Reasoning

Mojtaba Safari, Shansong Wang, Mingzhe Hu, Zach Eidex, Qiang Li, Xiaofeng Yang

arxiv logopreprintAug 14 2025
Accurate differentiation of brain tumor types on magnetic resonance imaging (MRI) is critical for guiding treatment planning in neuro-oncology. Recent advances in large language models (LLMs) have enabled visual question answering (VQA) approaches that integrate image interpretation with natural language reasoning. In this study, we evaluated GPT-4o, GPT-5-nano, GPT-5-mini, and GPT-5 on a curated brain tumor VQA benchmark derived from 3 Brain Tumor Segmentation (BraTS) datasets - glioblastoma (GLI), meningioma (MEN), and brain metastases (MET). Each case included multi-sequence MRI triplanar mosaics and structured clinical features transformed into standardized VQA items. Models were assessed in a zero-shot chain-of-thought setting for accuracy on both visual and reasoning tasks. Results showed that GPT-5-mini achieved the highest macro-average accuracy (44.19%), followed by GPT-5 (43.71%), GPT-4o (41.49%), and GPT-5-nano (35.85%). Performance varied by tumor subtype, with no single model dominating across all cohorts. These findings suggest that GPT-5 family models can achieve moderate accuracy in structured neuro-oncological VQA tasks, but not at a level acceptable for clinical use.

Explainable AI Technique in Lung Cancer Detection Using Convolutional Neural Networks

Nishan Rai, Sujan Khatri, Devendra Risal

arxiv logopreprintAug 13 2025
Early detection of lung cancer is critical to improving survival outcomes. We present a deep learning framework for automated lung cancer screening from chest computed tomography (CT) images with integrated explainability. Using the IQ-OTH/NCCD dataset (1,197 scans across Normal, Benign, and Malignant classes), we evaluate a custom convolutional neural network (CNN) and three fine-tuned transfer learning backbones: DenseNet121, ResNet152, and VGG19. Models are trained with cost-sensitive learning to mitigate class imbalance and evaluated via accuracy, precision, recall, F1-score, and ROC-AUC. While ResNet152 achieved the highest accuracy (97.3%), DenseNet121 provided the best overall balance in precision, recall, and F1 (up to 92%, 90%, 91%, respectively). We further apply Shapley Additive Explanations (SHAP) to visualize evidence contributing to predictions, improving clinical transparency. Results indicate that CNN-based approaches augmented with explainability can provide fast, accurate, and interpretable support for lung cancer screening, particularly in resource-limited settings.

A Chain of Diagnosis Framework for Accurate and Explainable Radiology Report Generation

Haibo Jin, Haoxuan Che, Sunan He, Hao Chen

arxiv logopreprintAug 13 2025
Despite the progress of radiology report generation (RRG), existing works face two challenges: 1) The performances in clinical efficacy are unsatisfactory, especially for lesion attributes description; 2) the generated text lacks explainability, making it difficult for radiologists to trust the results. To address the challenges, we focus on a trustworthy RRG model, which not only generates accurate descriptions of abnormalities, but also provides basis of its predictions. To this end, we propose a framework named chain of diagnosis (CoD), which maintains a chain of diagnostic process for clinically accurate and explainable RRG. It first generates question-answer (QA) pairs via diagnostic conversation to extract key findings, then prompts a large language model with QA diagnoses for accurate generation. To enhance explainability, a diagnosis grounding module is designed to match QA diagnoses and generated sentences, where the diagnoses act as a reference. Moreover, a lesion grounding module is designed to locate abnormalities in the image, further improving the working efficiency of radiologists. To facilitate label-efficient training, we propose an omni-supervised learning strategy with clinical consistency to leverage various types of annotations from different datasets. Our efforts lead to 1) an omni-labeled RRG dataset with QA pairs and lesion boxes; 2) a evaluation tool for assessing the accuracy of reports in describing lesion location and severity; 3) extensive experiments to demonstrate the effectiveness of CoD, where it outperforms both specialist and generalist models consistently on two RRG benchmarks and shows promising explainability by accurately grounding generated sentences to QA diagnoses and images.

Safeguarding Generative AI Applications in Preclinical Imaging through Hybrid Anomaly Detection

Jakub Binda, Valentina Paneta, Vasileios Eleftheriadis, Hongkyou Chung, Panagiotis Papadimitroulas, Neo Christopher Chung

arxiv logopreprintAug 11 2025
Generative AI holds great potentials to automate and enhance data synthesis in nuclear medicine. However, the high-stakes nature of biomedical imaging necessitates robust mechanisms to detect and manage unexpected or erroneous model behavior. We introduce development and implementation of a hybrid anomaly detection framework to safeguard GenAI models in BIOEMTECH's eyes(TM) systems. Two applications are demonstrated: Pose2Xray, which generates synthetic X-rays from photographic mouse images, and DosimetrEYE, which estimates 3D radiation dose maps from 2D SPECT/CT scans. In both cases, our outlier detection (OD) enhances reliability, reduces manual oversight, and supports real-time quality control. This approach strengthens the industrial viability of GenAI in preclinical settings by increasing robustness, scalability, and regulatory compliance.

Perceptual Evaluation of GANs and Diffusion Models for Generating X-rays

Gregory Schuit, Denis Parra, Cecilia Besa

arxiv logopreprintAug 10 2025
Generative image models have achieved remarkable progress in both natural and medical imaging. In the medical context, these techniques offer a potential solution to data scarcity-especially for low-prevalence anomalies that impair the performance of AI-driven diagnostic and segmentation tools. However, questions remain regarding the fidelity and clinical utility of synthetic images, since poor generation quality can undermine model generalizability and trust. In this study, we evaluate the effectiveness of state-of-the-art generative models-Generative Adversarial Networks (GANs) and Diffusion Models (DMs)-for synthesizing chest X-rays conditioned on four abnormalities: Atelectasis (AT), Lung Opacity (LO), Pleural Effusion (PE), and Enlarged Cardiac Silhouette (ECS). Using a benchmark composed of real images from the MIMIC-CXR dataset and synthetic images from both GANs and DMs, we conducted a reader study with three radiologists of varied experience. Participants were asked to distinguish real from synthetic images and assess the consistency between visual features and the target abnormality. Our results show that while DMs generate more visually realistic images overall, GANs can report better accuracy for specific conditions, such as absence of ECS. We further identify visual cues radiologists use to detect synthetic images, offering insights into the perceptual gaps in current models. These findings underscore the complementary strengths of GANs and DMs and point to the need for further refinement to ensure generative models can reliably augment training datasets for AI diagnostic systems.

Large-scale Multi-sequence Pretraining for Generalizable MRI Analysis in Versatile Clinical Applications

Zelin Qiu, Xi Wang, Zhuoyao Xie, Juan Zhou, Yu Wang, Lingjie Yang, Xinrui Jiang, Juyoung Bae, Moo Hyun Son, Qiang Ye, Dexuan Chen, Rui Zhang, Tao Li, Neeraj Ramesh Mahboobani, Varut Vardhanabhuti, Xiaohui Duan, Yinghua Zhao, Hao Chen

arxiv logopreprintAug 10 2025
Multi-sequence Magnetic Resonance Imaging (MRI) offers remarkable versatility, enabling the distinct visualization of different tissue types. Nevertheless, the inherent heterogeneity among MRI sequences poses significant challenges to the generalization capability of deep learning models. These challenges undermine model performance when faced with varying acquisition parameters, thereby severely restricting their clinical utility. In this study, we present PRISM, a foundation model PRe-trained with large-scale multI-Sequence MRI. We collected a total of 64 datasets from both public and private sources, encompassing a wide range of whole-body anatomical structures, with scans spanning diverse MRI sequences. Among them, 336,476 volumetric MRI scans from 34 datasets (8 public and 26 private) were curated to construct the largest multi-organ multi-sequence MRI pretraining corpus to date. We propose a novel pretraining paradigm that disentangles anatomically invariant features from sequence-specific variations in MRI, while preserving high-level semantic representations. We established a benchmark comprising 44 downstream tasks, including disease diagnosis, image segmentation, registration, progression prediction, and report generation. These tasks were evaluated on 32 public datasets and 5 private cohorts. PRISM consistently outperformed both non-pretrained models and existing foundation models, achieving first-rank results in 39 out of 44 downstream benchmarks with statistical significance improvements. These results underscore its ability to learn robust and generalizable representations across unseen data acquired under diverse MRI protocols. PRISM provides a scalable framework for multi-sequence MRI analysis, thereby enhancing the translational potential of AI in radiology. It delivers consistent performance across diverse imaging protocols, reinforcing its clinical applicability.

Fusion-Based Brain Tumor Classification Using Deep Learning and Explainable AI, and Rule-Based Reasoning

Melika Filvantorkaman, Mohsen Piri, Maral Filvan Torkaman, Ashkan Zabihi, Hamidreza Moradi

arxiv logopreprintAug 9 2025
Accurate and interpretable classification of brain tumors from magnetic resonance imaging (MRI) is critical for effective diagnosis and treatment planning. This study presents an ensemble-based deep learning framework that combines MobileNetV2 and DenseNet121 convolutional neural networks (CNNs) using a soft voting strategy to classify three common brain tumor types: glioma, meningioma, and pituitary adenoma. The models were trained and evaluated on the Figshare dataset using a stratified 5-fold cross-validation protocol. To enhance transparency and clinical trust, the framework integrates an Explainable AI (XAI) module employing Grad-CAM++ for class-specific saliency visualization, alongside a symbolic Clinical Decision Rule Overlay (CDRO) that maps predictions to established radiological heuristics. The ensemble classifier achieved superior performance compared to individual CNNs, with an accuracy of 91.7%, precision of 91.9%, recall of 91.7%, and F1-score of 91.6%. Grad-CAM++ visualizations revealed strong spatial alignment between model attention and expert-annotated tumor regions, supported by Dice coefficients up to 0.88 and IoU scores up to 0.78. Clinical rule activation further validated model predictions in cases with distinct morphological features. A human-centered interpretability assessment involving five board-certified radiologists yielded high Likert-scale scores for both explanation usefulness (mean = 4.4) and heatmap-region correspondence (mean = 4.0), reinforcing the framework's clinical relevance. Overall, the proposed approach offers a robust, interpretable, and generalizable solution for automated brain tumor classification, advancing the integration of deep learning into clinical neurodiagnostics.

Trustworthy Medical Imaging with Large Language Models: A Study of Hallucinations Across Modalities

Anindya Bijoy Das, Shahnewaz Karim Sakib, Shibbir Ahmed

arxiv logopreprintAug 9 2025
Large Language Models (LLMs) are increasingly applied to medical imaging tasks, including image interpretation and synthetic image generation. However, these models often produce hallucinations, which are confident but incorrect outputs that can mislead clinical decisions. This study examines hallucinations in two directions: image to text, where LLMs generate reports from X-ray, CT, or MRI scans, and text to image, where models create medical images from clinical prompts. We analyze errors such as factual inconsistencies and anatomical inaccuracies, evaluating outputs using expert informed criteria across imaging modalities. Our findings reveal common patterns of hallucination in both interpretive and generative tasks, with implications for clinical reliability. We also discuss factors contributing to these failures, including model architecture and training data. By systematically studying both image understanding and generation, this work provides insights into improving the safety and trustworthiness of LLM driven medical imaging systems.

Spatio-Temporal Conditional Diffusion Models for Forecasting Future Multiple Sclerosis Lesion Masks Conditioned on Treatments

Gian Mario Favero, Ge Ya Luo, Nima Fathi, Justin Szeto, Douglas L. Arnold, Brennan Nichyporuk, Chris Pal, Tal Arbel

arxiv logopreprintAug 9 2025
Image-based personalized medicine has the potential to transform healthcare, particularly for diseases that exhibit heterogeneous progression such as Multiple Sclerosis (MS). In this work, we introduce the first treatment-aware spatio-temporal diffusion model that is able to generate future masks demonstrating lesion evolution in MS. Our voxel-space approach incorporates multi-modal patient data, including MRI and treatment information, to forecast new and enlarging T2 (NET2) lesion masks at a future time point. Extensive experiments on a multi-centre dataset of 2131 patient 3D MRIs from randomized clinical trials for relapsing-remitting MS demonstrate that our generative model is able to accurately predict NET2 lesion masks for patients across six different treatments. Moreover, we demonstrate our model has the potential for real-world clinical applications through downstream tasks such as future lesion count and location estimation, binary lesion activity classification, and generating counterfactual future NET2 masks for several treatments with different efficacies. This work highlights the potential of causal, image-based generative models as powerful tools for advancing data-driven prognostics in MS.

Text Embedded Swin-UMamba for DeepLesion Segmentation

Ruida Cheng, Tejas Sudharshan Mathai, Pritam Mukherjee, Benjamin Hou, Qingqing Zhu, Zhiyong Lu, Matthew McAuliffe, Ronald M. Summers

arxiv logopreprintAug 8 2025
Segmentation of lesions on CT enables automatic measurement for clinical assessment of chronic diseases (e.g., lymphoma). Integrating large language models (LLMs) into the lesion segmentation workflow offers the potential to combine imaging features with descriptions of lesion characteristics from the radiology reports. In this study, we investigate the feasibility of integrating text into the Swin-UMamba architecture for the task of lesion segmentation. The publicly available ULS23 DeepLesion dataset was used along with short-form descriptions of the findings from the reports. On the test dataset, a high Dice Score of 82% and low Hausdorff distance of 6.58 (pixels) was obtained for lesion segmentation. The proposed Text-Swin-UMamba model outperformed prior approaches: 37% improvement over the LLM-driven LanGuideMedSeg model (p < 0.001),and surpassed the purely image-based xLSTM-UNet and nnUNet models by 1.74% and 0.22%, respectively. The dataset and code can be accessed at https://github.com/ruida/LLM-Swin-UMamba
Page 1 of 18172 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.