Latest Papers on Radiology AI. Tags: GenAI

The HeartMagic prospective observational study protocol - characterizing subtypes of heart failure with preserved ejection fraction

Meyer, P., Rocca, A., Banus, J., Ogier, A. C., Georgantas, C., Calarnou, P., Fatima, A., Vallee, J.-P., Deux, J.-F., Thomas, A., Marquis, J., Monney, P., Lu, H., Ledoux, J.-B., Tillier, C., Crowe, L. A., Abdurashidova, T., Richiardi, J., Hullin, R., van Heeswijk, R. B.

•preprint•Sep 16 2025

Introduction Heart failure (HF) is a life-threatening syndrome with significant morbidity and mortality. While evidence-based drug treatments have effectively reduced morbidity and mortality in HF with reduced ejection fraction (HFrEF), few therapies have been demonstrated to improve outcomes in HF with preserved ejection fraction (HFpEF). The multifaceted clinical presentation is one of the main reasons why the current understanding of HFpEF remains limited. This may be caused by the existence of several HFpEF disease subtypes that each need different treatments. There is therefore an unmet need for a holistic approach that combines comprehensive imaging with metabolomic, transcriptomic and genomic mapping to subtype HFpEF patients. This protocol details the approach employed in the HeartMagic study to address this gap in understanding. Methods This prospective multi-center observational cohort study will include 500 consecutive patients with actual or recent hospitalization for treatment of HFpEF at two Swiss university hospitals, along with 50 age-matched HFrEF patients and 50 age-matched healthy controls. Diagnosis of heart failure is based on clinical signs and symptoms and subgrouping HF patients is based on the left-ventricular ejection fraction. In addition to routine clinical workup, participants undergo genomic, transcriptomic, and metabolomic analyses, while the anatomy, composition, and function of the heart are quantified by comprehensive echocardiography and magnetic resonance imaging (MRI). Quantitative MRI is also applied to characterize the kidney. The primary outcome is a composite of one-year cardiovascular mortality or rehospitalization. Machine learning (ML) based multi-modal clustering will be employed to identify distinct HFpEF subtypes in the holistic data. The clinical importance of these subtypes shall be evaluated based on their association with the primary outcome. Statistical analysis will include group comparisons across modalities, survival analysis for the primary outcome, and integrative multi-modal clustering combining clinical, imaging, ECG, genomic, transcriptomic, and metabolomic data to identify and validate HFpEF subtypes. Discussion The integration of comprehensive MRI with extensive genomic and metabolomic profiling in this study will result in an unprecedented panoramic view of HFpEF and should enable us to distinguish functional subgroups of HFpEF patients. This approach has the potential to provide unprecedented insights on HFpEF disease and should provide a basis for personalized therapies. Beyond this, identifying HFpEF subtypes with specific molecular and structural characteristics could lead to new targeted pharmacological interventions, with the potential to improve patient outcomes.

MRI Classification Cardiac Prospective Clinical Pilot Academic Lab GenAI

Data Scaling Laws for Radiology Foundation Models

Maximilian Ilse, Harshita Sharma, Anton Schwaighofer, Sam Bond-Taylor, Fernando Pérez-García, Olesya Melnichenko, Anne-Marie G. Sykes, Kelly K. Horst, Ashish Khandelwal, Maxwell Reynolds, Maria T. Wetscherek, Noel C. F. Codella, Javier Alvarez-Valle, Korfiatis Panagiotis, Valentina Salvatelli

•preprint•Sep 16 2025

Foundation vision encoders such as CLIP and DINOv2, trained on web-scale data, exhibit strong transfer performance across tasks and datasets. However, medical imaging foundation models remain constrained by smaller datasets, limiting our understanding of how data scale and pretraining paradigms affect performance in this setting. In this work, we systematically study continual pretraining of two vision encoders, MedImageInsight (MI2) and RAD-DINO representing the two major encoder paradigms CLIP and DINOv2, on up to 3.5M chest x-rays from a single institution, holding compute and evaluation protocols constant. We evaluate on classification (radiology findings, lines and tubes), segmentation (lines and tubes), and radiology report generation. While prior work has primarily focused on tasks related to radiology findings, we include lines and tubes tasks to counterbalance this bias and evaluate a model's ability to extract features that preserve continuity along elongated structures. Our experiments show that MI2 scales more effectively for finding-related tasks, while RAD-DINO is stronger on tube-related tasks. Surprisingly, continually pretraining MI2 with both reports and structured labels using UniCL improves performance, underscoring the value of structured supervision at scale. We further show that for some tasks, as few as 30k in-domain samples are sufficient to surpass open-weights foundation models. These results highlight the utility of center-specific continual pretraining, enabling medical institutions to derive significant performance gains by utilizing in-domain data.

X-Ray Classification Chest Methodology In Silico Academic Lab Benchmark SOTA GenAI

Prediction and Causality of functional MRI and synthetic signal using a Zero-Shot Time-Series Foundation Model

Alessandro Crimi, Andrea Brovelli

•preprint•Sep 15 2025

Time-series forecasting and causal discovery are central in neuroscience, as predicting brain activity and identifying causal relationships between neural populations and circuits can shed light on the mechanisms underlying cognition and disease. With the rise of foundation models, an open question is how they compare to traditional methods for brain signal forecasting and causality analysis, and whether they can be applied in a zero-shot setting. In this work, we evaluate a foundation model against classical methods for inferring directional interactions from spontaneous brain activity measured with functional magnetic resonance imaging (fMRI) in humans. Traditional approaches often rely on Wiener-Granger causality. We tested the forecasting ability of the foundation model in both zero-shot and fine-tuned settings, and assessed causality by comparing Granger-like estimates from the model with standard Granger causality. We validated the approach using synthetic time series generated from ground-truth causal models, including logistic map coupling and Ornstein-Uhlenbeck processes. The foundation model achieved competitive zero-shot forecasting fMRI time series (mean absolute percentage error of 0.55 in controls and 0.27 in patients). Although standard Granger causality did not show clear quantitative differences between models, the foundation model provided a more precise detection of causal interactions. Overall, these findings suggest that foundation models offer versatility, strong zero-shot performance, and potential utility for forecasting and causal discovery in time-series data.

MRI Classification Neurological Methodology In Silico GenAI

Challenging the Status Quo Regarding the Benefit of Chest Radiographic Screening.

Yankelevitz DF, Yip R, Henschke CI

•papers•Sep 15 2025

Chest radiographic (CXR) screening is currently not recommended in the United States by any major guideline organization. Multiple randomized controlled trials done in the United States and also in Europe, with the largest being the Prostate, Lung, Colorectal and Ovarian (PLCO) trial, all failed to show a benefit and are used as evidence to support the current recommendation. Nevertheless, there is renewed interest in CXR screening, especially in low- and middle-resourced countries around the world. Reasons for this are multi-factorial, including the continued concern that those trials still may have missed a benefit, but perhaps more importantly, it is now established conclusively that finding smaller cancers is better than finding larger ones. This was the key finding in those large randomized controlled trials for CT screening. So, while CT finds cancers smaller than CXR, both clearly perform better than waiting for cancers to be larger and detected by symptom prompting. Without it being well understood that treating cancers found in the asymptomatic state by CXR, there would also be no basis for treating them when found incidentally. In addition, advances in artificial intelligence are allowing for nodules to be found earlier and more reliably with CXR than in those prior studies, and in many countries around the world, TB screening is already taking place on a large scale. This presents a major opportunity for integration with lung screening programs.

X-Ray Detection Chest GenAI

Prediction and Causality of functional MRI and synthetic signal using a Zero-Shot Time-Series Foundation Model

Alessandro Crimi, Andrea Brovelli

•preprint•Sep 15 2025

MRI Classification Neurological Methodology In Silico GenAI

Large language models in radiology workflows: An exploratory study of generative AI for non-visual tasks in the German healthcare system.

Steinhauser S, Welsch S

•papers•Sep 15 2025

Large language models (LLMs) are gaining attention for their potential to enhance radiology workflows by addressing challenges such as increasing workloads and staff shortages. However, limited knowledge among radiologists and concerns about their practical implementation and ethical implications present challenges. This study investigates radiologists' perspectives on the use of LLMs, exploring their potential benefits, challenges, and impact on workflows and professional roles. An exploratory, qualitative study was conducted using 12 semi-structured interviews with radiology experts. Data were analyzed to assess participants' awareness, attitudes, and perceived applications of LLMs in radiology. LLMs were identified as promising tools for reducing workloads by streamlining tasks like summarizing clinical histories and generating standardized reports, improving communication and efficiency. Participants expressed openness to LLM integration but noted concerns about their impact on human interaction, ethical standards, and liability. The role of radiologists is expected to evolve with LLM adoption, with a shift toward data stewardship and interprofessional collaboration. Barriers to implementation included limited awareness, regulatory constraints, and outdated infrastructure. The integration of LLMs is hindered by regulatory challenges, outdated infrastructure, and limited awareness among radiologists. Policymakers should establish clear, practical regulations to address liability and ethical concerns while ensuring compliance with privacy standards. Investments in modernizing clinical infrastructure and expanding training programs are critical to enable radiologists to effectively use these tools. By addressing these barriers, LLMs can enhance efficiency, reduce workloads, and improve patient care, while preserving the central role of radiologists in diagnostic and therapeutic processes.

LLM Radiology Report Retrospective Clinical Concept Academic Lab Policy GenAI Ethics

Evaluating the role of LLMs in supporting patient education during the informed consent process for routine radiology procedures.

Einspänner E, Schwab R, Hupfeld S, Thormann M, Fuchs E, Gawlitza M, Borggrefe J, Behme D

•papers•Sep 15 2025

This study evaluated three LLM chatbots (GPT-3.5-turbo, GPT-4-turbo, and GPT-4o) on their effectiveness in supporting patient education by answering common patient questions for CT, MRI, and DSA informed consent, assessing their accuracy and clarity. Two radiologists formulated 90 questions categorized as general, clinical, or technical. Each LLM answered every question five times. Radiologists then rated the responses for medical accuracy and clarity, while medical physicists assessed technical accuracy using a Likert scale. semantic similarity was analyzed with SBERT and cosine similarity. Ratings improved with newer model versions. Linear mixed-effects models revealed that GPT-4 models were rated significantly higher than GPT-3.5 (p < 0.001) by both physicians and physicists. However, physicians' ratings for GPT-4 models showed a significant performance decrease for complex modalities like DSA and MRI (p < 0.01), a pattern not observed in physicists' ratings. SBERT analysis revealed high internal consistency across all models. SBERT analysis revealed high internal consistency across all models. Variability in ratings revealed that while models effectively handled general and technical questions, they struggled with contextually complex medical inquiries requiring personalized responses and nuanced understanding. Statistical analysis confirms that while newer models are superior, their performance is modality-dependent and perceived differently by clinical and technical experts. This study evaluates the potential of LLMs to enhance informed consent in radiology, highlighting strengths in general and technical questions while noting limitations with complex clinical inquiries, with performance varying significantly by model type and imaging modality.

Mixed Modality LLM Radiology Report Retrospective Clinical In Silico Academic Lab GenAI

Adapting and Evaluating Multimodal Large Language Models for Adolescent Idiopathic Scoliosis Self-Management: A Divide and Conquer Framework

Zhaolong Wu, Pu Luo, Jason Pui Yin Cheung, Teng Zhang

•preprint•Sep 15 2025

This study presents the first comprehensive evaluation of Multimodal Large Language Models (MLLMs) for Adolescent Idiopathic Scoliosis (AIS) self-management. We constructed a database of approximately 3,000 anteroposterior X-rays with diagnostic texts and evaluated five MLLMs through a `Divide and Conquer' framework consisting of a visual question-answering task, a domain knowledge assessment task, and a patient education counseling assessment task. Our investigation revealed limitations of MLLMs' ability in interpreting complex spinal radiographs and comprehending AIS care knowledge. To address these, we pioneered enhancing MLLMs with spinal keypoint prompting and compiled an AIS knowledge base for retrieval augmented generation (RAG), respectively. Results showed varying effectiveness of visual prompting across different architectures, while RAG substantially improved models' performances on the knowledge assessment task. Our findings indicate current MLLMs are far from capable in realizing personalized assistant in AIS care. The greatest challenge lies in their abilities to obtain accurate detections of spinal deformity locations (best accuracy: 0.55) and directions (best accuracy: 0.13).

X-Ray Detection Musculoskeletal Methodology In Silico Academic Lab GenAI Benchmark SOTA

PET-Computed Tomography in the Management of Sarcoma by Interventional Oncology.

Yazdanpanah F, Hunt SJ

•papers•Sep 13 2025

PET-computed tomography (CT) has become essential in sarcoma management, offering precise diagnosis, staging, and response assessment by combining metabolic and anatomic imaging. Its high accuracy in detecting primary, recurrent, and metastatic disease guides personalized treatment strategies and enhances interventional procedures like biopsies and ablations. Advances in novel radiotracers and hybrid imaging modalities further improve diagnostic specificity, especially in complex and pediatric cases. Integrating PET-CT with genomic data and artificial intelligence (AI)-driven tools promises to advance personalized medicine, enabling tailored therapies and better outcomes. As a cornerstone of multidisciplinary sarcoma care, PET-CT continues to transform diagnostic and therapeutic approaches in oncology.

Mixed Modality Classification Review Concept Academic Lab GenAI

Chat GPT-4 shows high agreement in MRI protocol selection compared to board-certified neuroradiologists.

Bendella Z, Wichtmann BD, Clauberg R, Keil VC, Lehnen NC, Haase R, Sáez LC, Wiest IC, Kather JN, Endler C, Radbruch A, Paech D, Deike K

•papers•Sep 13 2025

The aim of this study was to determine whether ChatGPT-4 can correctly suggest MRI protocols and additional MRI sequences based on real-world Radiology Request Forms (RRFs) as well as to investigate the ability of ChatGPT-4 to suggest time saving protocols. Retrospectively, 1,001 RRFs of our Department of Neuroradiology (in-house dataset), 200 RRFs of an independent Department of General Radiology (independent dataset) and 300 RRFs from an external, foreign Department of Neuroradiology (external dataset) were included. Patients' age, sex, and clinical information were extracted from the RRFs and used to prompt ChatGPT- 4 to choose an adequate MRI protocol from predefined institutional lists. Four independent raters then assessed its performance. Additionally, ChatGPT-4 was tasked with creating case-specific protocols aimed at saving time. Two and 7 of 1,001 protocol suggestions of ChatGPT-4 were rated "unacceptable" in the in-house dataset for reader 1 and 2, respectively. No protocol suggestions were rated "unacceptable" in both the independent and external dataset. When assessing the inter-reader agreement, Coheńs weighted ĸ ranged from 0.88 to 0.98 (each p < 0.001). ChatGPT-4's freely composed protocols were approved in 766/1,001 (76.5 %) and 140/300 (46.67 %) cases of the in-house and external dataset with mean time savings (standard deviation) of 3:51 (minutes:seconds) (±2:40) minutes and 2:59 (±3:42) minutes per adopted in-house and external MRI protocol. ChatGPT-4 demonstrated a very high agreement with board-certified (neuro-)radiologists in selecting MRI protocols and was able to suggest approved time saving protocols from the set of available sequences.

MRI LLM Radiology Report Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA GenAI

Filter Papers

Tags

The HeartMagic prospective observational study protocol - characterizing subtypes of heart failure with preserved ejection fraction

Data Scaling Laws for Radiology Foundation Models

Prediction and Causality of functional MRI and synthetic signal using a Zero-Shot Time-Series Foundation Model

Challenging the Status Quo Regarding the Benefit of Chest Radiographic Screening.

Prediction and Causality of functional MRI and synthetic signal using a Zero-Shot Time-Series Foundation Model

Large language models in radiology workflows: An exploratory study of generative AI for non-visual tasks in the German healthcare system.

Evaluating the role of LLMs in supporting patient education during the informed consent process for routine radiology procedures.

Adapting and Evaluating Multimodal Large Language Models for Adolescent Idiopathic Scoliosis Self-Management: A Divide and Conquer Framework

PET-Computed Tomography in the Management of Sarcoma by Interventional Oncology.

Chat GPT-4 shows high agreement in MRI protocol selection compared to board-certified neuroradiologists.

Ready to Sharpen Your Edge?