Sort by:
Page 3 of 58571 results

Spatio-Temporal Conditional Diffusion Models for Forecasting Future Multiple Sclerosis Lesion Masks Conditioned on Treatments

Gian Mario Favero, Ge Ya Luo, Nima Fathi, Justin Szeto, Douglas L. Arnold, Brennan Nichyporuk, Chris Pal, Tal Arbel

arxiv logopreprintAug 9 2025
Image-based personalized medicine has the potential to transform healthcare, particularly for diseases that exhibit heterogeneous progression such as Multiple Sclerosis (MS). In this work, we introduce the first treatment-aware spatio-temporal diffusion model that is able to generate future masks demonstrating lesion evolution in MS. Our voxel-space approach incorporates multi-modal patient data, including MRI and treatment information, to forecast new and enlarging T2 (NET2) lesion masks at a future time point. Extensive experiments on a multi-centre dataset of 2131 patient 3D MRIs from randomized clinical trials for relapsing-remitting MS demonstrate that our generative model is able to accurately predict NET2 lesion masks for patients across six different treatments. Moreover, we demonstrate our model has the potential for real-world clinical applications through downstream tasks such as future lesion count and location estimation, binary lesion activity classification, and generating counterfactual future NET2 masks for several treatments with different efficacies. This work highlights the potential of causal, image-based generative models as powerful tools for advancing data-driven prognostics in MS.

Trustworthy Medical Imaging with Large Language Models: A Study of Hallucinations Across Modalities

Anindya Bijoy Das, Shahnewaz Karim Sakib, Shibbir Ahmed

arxiv logopreprintAug 9 2025
Large Language Models (LLMs) are increasingly applied to medical imaging tasks, including image interpretation and synthetic image generation. However, these models often produce hallucinations, which are confident but incorrect outputs that can mislead clinical decisions. This study examines hallucinations in two directions: image to text, where LLMs generate reports from X-ray, CT, or MRI scans, and text to image, where models create medical images from clinical prompts. We analyze errors such as factual inconsistencies and anatomical inaccuracies, evaluating outputs using expert informed criteria across imaging modalities. Our findings reveal common patterns of hallucination in both interpretive and generative tasks, with implications for clinical reliability. We also discuss factors contributing to these failures, including model architecture and training data. By systematically studying both image understanding and generation, this work provides insights into improving the safety and trustworthiness of LLM driven medical imaging systems.

Fusion-Based Brain Tumor Classification Using Deep Learning and Explainable AI, and Rule-Based Reasoning

Melika Filvantorkaman, Mohsen Piri, Maral Filvan Torkaman, Ashkan Zabihi, Hamidreza Moradi

arxiv logopreprintAug 9 2025
Accurate and interpretable classification of brain tumors from magnetic resonance imaging (MRI) is critical for effective diagnosis and treatment planning. This study presents an ensemble-based deep learning framework that combines MobileNetV2 and DenseNet121 convolutional neural networks (CNNs) using a soft voting strategy to classify three common brain tumor types: glioma, meningioma, and pituitary adenoma. The models were trained and evaluated on the Figshare dataset using a stratified 5-fold cross-validation protocol. To enhance transparency and clinical trust, the framework integrates an Explainable AI (XAI) module employing Grad-CAM++ for class-specific saliency visualization, alongside a symbolic Clinical Decision Rule Overlay (CDRO) that maps predictions to established radiological heuristics. The ensemble classifier achieved superior performance compared to individual CNNs, with an accuracy of 91.7%, precision of 91.9%, recall of 91.7%, and F1-score of 91.6%. Grad-CAM++ visualizations revealed strong spatial alignment between model attention and expert-annotated tumor regions, supported by Dice coefficients up to 0.88 and IoU scores up to 0.78. Clinical rule activation further validated model predictions in cases with distinct morphological features. A human-centered interpretability assessment involving five board-certified radiologists yielded high Likert-scale scores for both explanation usefulness (mean = 4.4) and heatmap-region correspondence (mean = 4.0), reinforcing the framework's clinical relevance. Overall, the proposed approach offers a robust, interpretable, and generalizable solution for automated brain tumor classification, advancing the integration of deep learning into clinical neurodiagnostics.

Ultrasound-Based Machine Learning and SHapley Additive exPlanations Method Evaluating Risk of Gallbladder Cancer: A Bicentric and Validation Study.

Chen B, Zhong H, Lin J, Lyu G, Su S

pubmed logopapersAug 9 2025
This study aims to construct and evaluate 8 machine learning models by integrating ultrasound imaging features, clinical characteristics, and serological features to assess the risk of gallbladder cancer (GBC) occurrence in patients. A retrospective analysis was conducted on ultrasound and clinical data of 300 suspected GBC patients who visited the Second Affiliated Hospital of Fujian Medical University from January 2020 to January 2024 and 69 patients who visited the Zhongshan Hospital Affiliated to Xiamen University from January 2024 to January 2025. Key relevant features were selected using Least Absolute Shrinkage and Selection Operator (LASSO) regression. Predictive models were constructed using XGBoost, logistic regression, support vector machine, k-nearest neighbors, random forest, decision tree, naive Bayes, and neural network, with the SHapley Additive exPlanations (SHAP) method employed to explain model interpretability. The LASSO regression demonstrated that gender, age, alkaline phosphatase (ALP), clarity of interface with liver, stratification of the gallbladder wall, intracapsular anechoic lesions, and intracapsular punctiform strong lesions were key features for GBC. The XGBoost model demonstrated an area under receiver operating characteristic curve (AUC) of 0.934, 0.916, and 0.813 in the training, validating, and test sets. SHAP analysis revealed the importance ranking of factors as clarity of interface with liver, stratification of the gallbladder wall, intracapsular anechoic lesions, and intracapsular punctiform strong lesions, ALP, gender, and age. Personalized prediction explanations through SHAP values demonstrated the contribution of each feature to the final prediction, enhancing result interpretability. Furthermore, decision plots were generated to display the influence trajectory of each feature on model predictions, aiding in analyzing which features had the greatest impact on these mispredictions; thereby facilitating further model optimization or feature adjustment. This study proposed a GBC ML model based on ultrasound, clinical, and serological characteristics, indicating the superior performance of the XGBoost model and enhancing the interpretability of the model through the SHAP method.

Prediction of Early Recurrence After Bronchial Arterial Chemoembolization in Non-small Cell Lung Cancer Patients Using Dual-energy CT: An Interpretable Model Based on SHAP Methodology.

Feng Y, Xu Y, Wang J, Cao Z, Liu B, Du Z, Zhou L, Hua H, Wang W, Mei J, Lai L, Tu J

pubmed logopapersAug 9 2025
Bronchial artery chemoembolization (BACE) is a new treatment method for lung cancer. This study aimed to investigate the ability of dual-energy computed tomography (DECT) to predict early recurrence (ER) after BACE among patients with non-small cell lung cancer (NSCLC) who failed first-line therapy. Clinical and imaging data from NSCLC patients undergoing BACE at Wenzhou Medical University Affiliated Fifth *** Hospital (10/2023-06/2024) were retrospectively analyzed. Logistic regression (LR) machine learning models were developed using 5 arterial-phase (AP) virtual monoenergetic images (VMIs; 40, 70, 100, 120, and 150 keV), while deep learning models utilized ResNet50/101/152 architectures with iodine maps. A combined model integrating optimal Rad-score, DL-score, and clinical features was established. Model performance was assessed via area under the receiver operating characteristic curve analysis (AUC), with SHapley Additive exPlanations (SHAP) framework applied for interpretability. A total of 196 patients were enrolled in this study (training cohort: n=158; testing cohort: n=38). The 100 keV machine learning model demonstrated superior performance (AUC=0.751) compared to other VMIs. The deep learning model based on the ResNet101 method (AUC=0.791) performed better than other approaches. The hybrid model combining Rad-score-100keV-A, Rad-score-100keV-V, DL-score-ResNet101-A, DL-score-ResNet101-V, and clinical features exhibited the best performance (AUC=0.798) among all models. DECT holds promise for predicting ER after BACE among NSCLC patients who have failed first-line therapy, offering valuable guidance for clinical treatment planning.

Enhancing B-mode-based breast cancer diagnosis via cross-attention fusion of H-scan and Nakagami imaging with multi-CAM-QUS-driven XAI.

Mondol SS, Hasan MK

pubmed logopapersAug 8 2025
B-mode ultrasound is widely employed for breast lesion diagnosis due to its affordability, widespread availability, and effectiveness, particularly in cases of dense breast tissue where mammography may be less sensitive. However, it disregards critical tissue information embedded in raw radiofrequency (RF) data. While both modalities have demonstrated promise in Computer-Aided Diagnosis (CAD), their combined potential remains largely unexplored.
Approach.This paper presents an automated breast lesion classification network that utilizes H-scan and Nakagami parametric images derived from RF ultrasound signals, combined with machine-generated B-mode images, seamlessly integrated through a Multi Modal Cross Attention Fusion (MM-CAF) mechanism to extract complementary information. The proposed architecture also incorporates an attention-guided modified InceptionV3 for feature extraction, a Knowledge-Guided Cross-Modality Learning (KGCML) module for inter‑modal knowledge sharing, and Attention-Driven Context Enhancement (ADCE) modules to improve contextual understanding and fusion with the classification network. The network employs categorical cross-entropy loss, a Multi-CAM-based loss to guide learning toward accurate lesion-specific features, and a Multi-QUS-based loss to embed clinically meaningful domain knowledge and effectively distinguishing between benign and malignant lesions, all while supporting explainable AI (XAI) principles.
Main results. Experiments conducted on multi-center breast ultrasound datasets--BUET-BUSD, ATL, and OASBUD--characterized by demographic diversity, demonstrate the effectiveness of the proposed approach, achieving classification accuracies of 92.54%, 89.93%, and 90.0%, respectively, along with high interpretability and trustworthiness. These results surpass those of existing methods based on B-mode and/or RF data, highlighting the superior performance and robustness of the proposed technique. By integrating complementary RF‑derived information with B‑mode imaging with pseudo‑segmentation and domain‑informed loss functions, our method significantly boosts lesion classification accuracy-enabling fully automated, explainable CAD and paving the way for widespread clinical adoption of AI‑driven breast screening.

Text Embedded Swin-UMamba for DeepLesion Segmentation

Ruida Cheng, Tejas Sudharshan Mathai, Pritam Mukherjee, Benjamin Hou, Qingqing Zhu, Zhiyong Lu, Matthew McAuliffe, Ronald M. Summers

arxiv logopreprintAug 8 2025
Segmentation of lesions on CT enables automatic measurement for clinical assessment of chronic diseases (e.g., lymphoma). Integrating large language models (LLMs) into the lesion segmentation workflow offers the potential to combine imaging features with descriptions of lesion characteristics from the radiology reports. In this study, we investigate the feasibility of integrating text into the Swin-UMamba architecture for the task of lesion segmentation. The publicly available ULS23 DeepLesion dataset was used along with short-form descriptions of the findings from the reports. On the test dataset, a high Dice Score of 82% and low Hausdorff distance of 6.58 (pixels) was obtained for lesion segmentation. The proposed Text-Swin-UMamba model outperformed prior approaches: 37% improvement over the LLM-driven LanGuideMedSeg model (p < 0.001),and surpassed the purely image-based xLSTM-UNet and nnUNet models by 1.74% and 0.22%, respectively. The dataset and code can be accessed at https://github.com/ruida/LLM-Swin-UMamba

GPT-4 vs. Radiologists: who advances mediastinal tumor classification better across report quality levels? A cohort study.

Wen R, Li X, Chen K, Sun M, Zhu C, Xu P, Chen F, Ji C, Mi P, Li X, Deng X, Yang Q, Song W, Shang Y, Huang S, Zhou M, Wang J, Zhou C, Chen W, Liu C

pubmed logopapersAug 8 2025
Accurate mediastinal tumor classification is crucial for treatment planning, but diagnostic performance varies with radiologists' experience and report quality. To evaluate GPT-4's diagnostic accuracy in classifying mediastinal tumors from radiological reports compared to radiologists of different experience levels using radiological reports of varying quality. We conducted a retrospective study of 1,494 patients from five tertiary hospitals with mediastinal tumors diagnosed via chest CT and pathology. Radiological reports were categorized into low-, medium-, and high-quality based on predefined criteria assessed by experienced radiologists. Six radiologists (two residents, two attending radiologists, and two associate senior radiologists) and GPT-4 evaluated the chest CT reports. Diagnostic performance was analyzed overall, by report quality, and by tumor type using Wald χ2 tests and 95% CIs calculated via the Wilson method. GPT-4 achieved an overall diagnostic accuracy of 73.3% (95% CI: 71.0-75.5), comparable to associate senior radiologists (74.3%, 95% CI: 72.0-76.5; p >0.05). For low-quality reports, GPT-4 outperformed associate senior radiologists (60.8% vs. 51.1%, p<0.001). In high-quality reports, GPT-4 was comparable to attending radiologists (80.6% vs.79.4%, p>0.05). Diagnostic performance varied by tumor type: GPT-4 was comparable to radiology residents for neurogenic tumors (44.9% vs. 50.3%, p>0.05), similar to associate senior radiologists for teratomas (68.1% vs. 65.9%, p>0.05), and superior in diagnosing lymphoma (75.4% vs. 60.4%, p<0.001). GPT-4 demonstrated interpretation accuracy comparable to Associate Senior Radiologists, excelling in low-quality reports and outperforming them in diagnosing lymphoma. These findings underscore GPT-4's potential to enhance diagnostic performance in challenging diagnostic scenarios.

GPT-4 for automated sequence-level determination of MRI protocols based on radiology request forms from clinical routine.

Terzis R, Kaya K, Schömig T, Janssen JP, Iuga AI, Kottlors J, Lennartz S, Gietzen C, Gözdas C, Müller L, Hahnfeldt R, Maintz D, Dratsch T, Pennig L

pubmed logopapersAug 8 2025
This study evaluated GPT-4's accuracy in MRI sequence selection based on radiology request forms (RRFs), comparing its performance to radiology residents. This retrospective study included 100 RRFs across four subspecialties (cardiac imaging, neuroradiology, musculoskeletal, and oncology). GPT-4 and two radiology residents (R1: 2 years, R2: 5 years MRI experience) selected sequences based on each patient's medical history and clinical questions. Considering imaging society guidelines, five board-certified specialized radiologists assessed protocols based on completeness, quality, and utility in consensus, using 5-point Likert scales. Clinical applicability was rated binarily by the institution's lead radiographer. GPT-4 achieved median scores of 3 (1-5) for completeness, 4 (1-5) for quality, and 4 (1-5) for utility, comparable to R1 (3 (1-5), 4 (1-5), 4 (1-5); each p > 0.05) but inferior to R2 (4 (1-5), 5 (1-5); p < 0.01, respectively, and 5 (1-5); p < 0.001). Subspecialty protocol quality varied: GPT-4 matched R1 (4 (2-4) vs. 4 (2-5), p = 0.20) and R2 (4 (2-5); p = 0.47) in cardiac imaging; showed no differences in neuroradiology (all 5 (1-5), p > 0.05); scored lower than R1 and R2 in musculoskeletal imaging (3 (2-5) vs. 4 (3-5); p < 0.01, and 5 (3-5); p < 0.001); and matched R1 (4 (1-5) vs. 2 (1-4), p = 0.12) as well as R2 (5 (2-5); p = 0.20) in oncology. GPT-4-based protocols were clinically applicable in 95% of cases, comparable to R1 (95%) and R2 (96%). GPT-4 generated MRI protocols with notable completeness, quality, utility, and clinical applicability, excelling in standardized subspecialties like cardiac and neuroradiology imaging while yielding lower accuracy in musculoskeletal examinations. Question Long MRI acquisition times limit patient access, making accurate protocol selection crucial for efficient diagnostics, though it's time-consuming and error-prone, especially for inexperienced residents. Findings GPT-4 generated MRI protocols of remarkable yet inconsistent quality, performing on par with an experienced resident in standardized fields, but moderately in musculoskeletal examinations. Clinical relevance The large language model can assist less experienced radiologists in determining detailed MRI protocols and counteract increasing workloads. The model could function as a semi-automatic tool, generating MRI protocols for radiologists' confirmation, optimizing resource allocation, and improving diagnostics and cost-effectiveness.

Can Diffusion Models Bridge the Domain Gap in Cardiac MR Imaging?

Xin Ci Wong, Duygu Sarikaya, Kieran Zucker, Marc De Kamps, Nishant Ravikumar

arxiv logopreprintAug 8 2025
Magnetic resonance (MR) imaging, including cardiac MR, is prone to domain shift due to variations in imaging devices and acquisition protocols. This challenge limits the deployment of trained AI models in real-world scenarios, where performance degrades on unseen domains. Traditional solutions involve increasing the size of the dataset through ad-hoc image augmentation or additional online training/transfer learning, which have several limitations. Synthetic data offers a promising alternative, but anatomical/structural consistency constraints limit the effectiveness of generative models in creating image-label pairs. To address this, we propose a diffusion model (DM) trained on a source domain that generates synthetic cardiac MR images that resemble a given reference. The synthetic data maintains spatial and structural fidelity, ensuring similarity to the source domain and compatibility with the segmentation mask. We assess the utility of our generative approach in multi-centre cardiac MR segmentation, using the 2D nnU-Net, 3D nnU-Net and vanilla U-Net segmentation networks. We explore domain generalisation, where, domain-invariant segmentation models are trained on synthetic source domain data, and domain adaptation, where, we shift target domain data towards the source domain using the DM. Both strategies significantly improved segmentation performance on data from an unseen target domain, in terms of surface-based metrics (Welch's t-test, p < 0.01), compared to training segmentation models on real data alone. The proposed method ameliorates the need for transfer learning or online training to address domain shift challenges in cardiac MR image analysis, especially useful in data-scarce settings.
Page 3 of 58571 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.