Latest Papers on Radiology AI. Tags: Open Dataset.

Privacy-Preserving Latent Diffusion-Based Synthetic Medical Image Generation.

Shi Y, Xia W, Niu C, Wiedeman C, Wang G

•papers•Oct 6 2025

Deep learning methods have impacted almost every research field, demonstrating notable successes in medical imaging tasks such as denoising and super-resolution. However, the prerequisite for deep learning is data at scale, but data sharing is expensive yet at risk of privacy leakage. As cutting-edge AI generative models, diffusion models have now become dominant because of their rigorous foundation and unprecedented outcomes. Here we propose a latent diffusion approach for data synthesis without compromising patient privacy. In our exemplary case studies, we develop a latent diffusion model to generate medical CT, MRI, and PET images using publicly available datasets. We demonstrate that state-of-the-art deep learning-based denoising/super-resolution networks can be trained on our synthetic data to achieve image quality with no significant difference from what the same network can achieve after being trained on the original data. In our advanced diffusion model, we specifically embed a safeguard mechanism to protect patient privacy effectively and efficiently. Our approach allows privacy-proof public sharing of diverse big datasets for development of deep models, potentially enabling federated learning at the level of input data instead of local network weights.

Mixed Modality Image Synthesis Whole Body Methodology In Silico Academic Lab Open Dataset GenAI Ethics

SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus

Ming Zhao, Wenhui Dong, Yang Zhang, Xiang Zheng, Zhonghao Zhang, Zian Zhou, Yunzhi Guan, Liukun Xu, Wei Peng, Zhaoyang Gong, Zhicheng Zhang, Dachuan Li, Xiaosheng Ma, Yuli Ma, Jianing Ni, Changjiang Jiang, Lixia Tian, Qixin Chen, Kaishun Xia, Pingping Liu, Tongshun Zhang, Zhiqiang Liu, Zhongan Bi, Chenyang Si, Tiansheng Sun, Caifeng Shan

•preprint•Oct 3 2025

Spine disorders affect 619 million people globally and are a leading cause of disability, yet AI-assisted diagnosis remains limited by the lack of level-aware, multimodal datasets. Clinical decision-making for spine disorders requires sophisticated reasoning across X-ray, CT, and MRI at specific vertebral levels. However, progress has been constrained by the absence of traceable, clinically-grounded instruction data and standardized, spine-specific benchmarks. To address this, we introduce SpineMed, an ecosystem co-designed with practicing spine surgeons. It features SpineMed-450k, the first large-scale dataset explicitly designed for vertebral-level reasoning across imaging modalities with over 450,000 instruction instances, and SpineBench, a clinically-grounded evaluation framework. SpineMed-450k is curated from diverse sources, including textbooks, guidelines, open datasets, and ~1,000 de-identified hospital cases, using a clinician-in-the-loop pipeline with a two-stage LLM generation method (draft and revision) to ensure high-quality, traceable data for question-answering, multi-turn consultations, and report generation. SpineBench evaluates models on clinically salient axes, including level identification, pathology assessment, and surgical planning. Our comprehensive evaluation of several recently advanced large vision-language models (LVLMs) on SpineBench reveals systematic weaknesses in fine-grained, level-specific reasoning. In contrast, our model fine-tuned on SpineMed-450k demonstrates consistent and significant improvements across all tasks. Clinician assessments confirm the diagnostic clarity and practical utility of our model's outputs.

Mixed Modality LLM Radiology Report Musculoskeletal Dataset Release In Silico Open Dataset Benchmark SOTA

SpurBreast: A Curated Dataset for Investigating Spurious Correlations in Real-world Breast MRI Classification

Jong Bum Won, Wesley De Neve, Joris Vankerschaver, Utku Ozbulak

•preprint•Oct 2 2025

Deep neural networks (DNNs) have demonstrated remarkable success in medical imaging, yet their real-world deployment remains challenging due to spurious correlations, where models can learn non-clinical features instead of meaningful medical patterns. Existing medical imaging datasets are not designed to systematically study this issue, largely due to restrictive licensing and limited supplementary patient data. To address this gap, we introduce SpurBreast, a curated breast MRI dataset that intentionally incorporates spurious correlations to evaluate their impact on model performance. Analyzing over 100 features involving patient, device, and imaging protocol, we identify two dominant spurious signals: magnetic field strength (a global feature influencing the entire image) and image orientation (a local feature affecting spatial alignment). Through controlled dataset splits, we demonstrate that DNNs can exploit these non-clinical signals, achieving high validation accuracy while failing to generalize to unbiased test data. Alongside these two datasets containing spurious correlations, we also provide benchmark datasets without spurious correlations, allowing researchers to systematically investigate clinically relevant and irrelevant features, uncertainty estimation, adversarial robustness, and generalization strategies. Models and datasets are available at https://github.com/utkuozbulak/spurbreast.

MRI Classification Breast Dataset Release In Silico Academic Lab Open Dataset Reproducibility

MedQ-Bench: Evaluating and Exploring Medical Image Quality Assessment Abilities in MLLMs

Jiyao Liu, Jinjie Wei, Wanying Qu, Chenglong Ma, Junzhi Ning, Yunheng Li, Ying Chen, Xinzhe Luo, Pengcheng Chen, Xin Gao, Ming Hu, Huihui Xu, Xin Wang, Shujian Gao, Dingkang Yang, Zhongying Deng, Jin Ye, Lihao Liu, Junjun He, Ningsheng Xu

•preprint•Oct 2 2025

Medical Image Quality Assessment (IQA) serves as the first-mile safety gate for clinical AI, yet existing approaches remain constrained by scalar, score-based metrics and fail to reflect the descriptive, human-like reasoning process central to expert evaluation. To address this gap, we introduce MedQ-Bench, a comprehensive benchmark that establishes a perception-reasoning paradigm for language-based evaluation of medical image quality with Multi-modal Large Language Models (MLLMs). MedQ-Bench defines two complementary tasks: (1) MedQ-Perception, which probes low-level perceptual capability via human-curated questions on fundamental visual attributes; and (2) MedQ-Reasoning, encompassing both no-reference and comparison reasoning tasks, aligning model evaluation with human-like reasoning on image quality. The benchmark spans five imaging modalities and over forty quality attributes, totaling 2,600 perceptual queries and 708 reasoning assessments, covering diverse image sources including authentic clinical acquisitions, images with simulated degradations via physics-based reconstructions, and AI-generated images. To evaluate reasoning ability, we propose a multi-dimensional judging protocol that assesses model outputs along four complementary axes. We further conduct rigorous human-AI alignment validation by comparing LLM-based judgement with radiologists. Our evaluation of 14 state-of-the-art MLLMs demonstrates that models exhibit preliminary but unstable perceptual and reasoning skills, with insufficient accuracy for reliable clinical use. These findings highlight the need for targeted optimization of MLLMs in medical IQA. We hope that MedQ-Bench will catalyze further exploration and unlock the untapped potential of MLLMs for medical image quality evaluation.

Mixed Modality Classification Whole Body Dataset Release In Silico Academic Lab Benchmark SOTA Open Dataset

From 2D to 3D, Deep Learning-based Shape Reconstruction in Magnetic Resonance Imaging: A Review

Emma McMillian, Abhirup Banerjee, Alfonso Bueno-Orovio

•preprint•Oct 1 2025

Deep learning-based 3-dimensional (3D) shape reconstruction from 2-dimensional (2D) magnetic resonance imaging (MRI) has become increasingly important in medical disease diagnosis, treatment planning, and computational modeling. This review surveys the methodological landscape of 3D MRI reconstruction, focusing on 4 primary approaches: point cloud, mesh-based, shape-aware, and volumetric models. For each category, we analyze the current state-of-the-art techniques, their methodological foundation, limitations, and applications across anatomical structures. We provide an extensive overview ranging from cardiac to neurological to lung imaging. We also focus on the clinical applicability of models to diseased anatomy, and the influence of their training and testing data. We examine publicly available datasets, computational demands, and evaluation metrics. Finally, we highlight the emerging research directions including multimodal integration and cross-modality frameworks. This review aims to provide researchers with a structured overview of current 3D reconstruction methodologies to identify opportunities for advancing deep learning towards more robust, generalizable, and clinically impactful solutions.

MRI Reconstruction Review Concept Benchmark SOTA Open Dataset

CardioBench: Do Echocardiography Foundation Models Generalize Beyond the Lab?

Darya Taratynova, Ahmed Aly, Numan Saeed, Mohammad Yaqub

•preprint•Oct 1 2025

Foundation models (FMs) are reshaping medical imaging, yet their application in echocardiography remains limited. While several echocardiography-specific FMs have recently been introduced, no standardized benchmark exists to evaluate them. Echocardiography poses unique challenges, including noisy acquisitions, high frame redundancy, and limited public datasets. Most existing solutions evaluate on private data, restricting comparability. To address this, we introduce CardioBench, a comprehensive benchmark for echocardiography FMs. CardioBench unifies eight publicly available datasets into a standardized suite spanning four regression and five classification tasks, covering functional, structural, diagnostic, and view recognition endpoints. We evaluate several leading FM, including cardiac-specific, biomedical, and general-purpose encoders, under consistent zero-shot, probing, and alignment protocols. Our results highlight complementary strengths across model families: temporal modeling is critical for functional regression, retrieval provides robustness under distribution shift, and domain-specific text encoders capture physiologically meaningful axes. General-purpose encoders transfer strongly and often close the gap with probing, but struggle with fine-grained distinctions like view classification and subtle pathology recognition. By releasing preprocessing, splits, and public evaluation pipelines, CardioBench establishes a reproducible reference point and offers actionable insights to guide the design of future echocardiography foundation models.

Ultrasound Classification Cardiac Dataset Release In Silico Academic Lab Benchmark SOTA Open Dataset Reproducibility

Automated Structured Radiology Report Generation with Rich Clinical Context

Seongjae Kang, Dong Bok Lee, Juho Jung, Dongseop Kim, Won Hwa Kim, Sunghoon Joo

•preprint•Oct 1 2025

Automated structured radiology report generation (SRRG) from chest X-ray images offers significant potential to reduce workload of radiologists by generating reports in structured formats that ensure clarity, consistency, and adherence to clinical reporting standards. While radiologists effectively utilize available clinical contexts in their diagnostic reasoning, existing SRRG systems overlook these essential elements. This fundamental gap leads to critical problems including temporal hallucinations when referencing non-existent clinical contexts. To address these limitations, we propose contextualized SRRG (C-SRRG) that comprehensively incorporates rich clinical context for SRRG. We curate C-SRRG dataset by integrating comprehensive clinical context encompassing 1) multi-view X-ray images, 2) clinical indication, 3) imaging techniques, and 4) prior studies with corresponding comparisons based on patient histories. Through extensive benchmarking with state-of-the-art multimodal large language models, we demonstrate that incorporating clinical context with the proposed C-SRRG significantly improves report generation quality. We publicly release dataset, code, and checkpoints to facilitate future research for clinically-aligned automated RRG at https://github.com/vuno/contextualized-srrg.

X-Ray Report Generation Chest Methodology In Silico Academic Lab Open Dataset Open Code

Image-based reconstruction of anthropomorphic breast phantoms for synthetic mammogram generation.

Oria M, Ferrero R, Andreis C, Vicentini M, van Engen R, Roozemond C, Lamberti P, Remogna S, Manzin A

•papers•Oct 1 2025

The aim of this work is the generation of realistic synthetic mammograms, using as an input of the imaging acquisition simulation process digital anthropomorphic phantoms, reconstructed from sets of dedicated breast computed tomography (BCT) images from different patients. The voxel-based structure and the segmentation into fibroglandular, adipose and skin tissues are performed through trivariate tensor-product B-spline approximation and morphological operations. The obtained phantoms can be modified by means of geometrical transformations that replicate typical breast shape deformities, and by locally introducing virtual masses and calcifications. After simulating biomechanical compression of the 3D breast phantoms, we generate the mammograms in both craniocaudal (CC) and mediolateral oblique (MLO) views, modelling the x-ray interaction with breast tissues with a Monte Carlo approach implemented in the in silico breast imaging pipeline VICTRE. The methodology proposed here can contribute to the creation of synthetic mammogram databases, to be used for in silico testing of diagnostic and therapeutic techniques, as well as for the validation of artificial intelligence (AI) systems in diagnostic imaging and cancer screening. The great advantage is that, from a single BCT scan, it is possible to generate multiple realistic mammograms, with different anatomical features, in terms of breast shape and size, and type and location of lesions.

Mammography Image Synthesis Breast Methodology In Silico Open Dataset

LMOD+: A Comprehensive Multimodal Dataset and Benchmark for Developing and Evaluating Multimodal Large Language Models in Ophthalmology

Zhenyue Qin, Yang Liu, Yu Yin, Jinyu Ding, Haoran Zhang, Anran Li, Dylan Campbell, Xuansheng Wu, Ke Zou, Tiarnan D. L. Keenan, Emily Y. Chew, Zhiyong Lu, Yih-Chung Tham, Ninghao Liu, Xiuzhen Zhang, Qingyu Chen

•preprint•Sep 30 2025

Vision-threatening eye diseases pose a major global health burden, with timely diagnosis limited by workforce shortages and restricted access to specialized care. While multimodal large language models (MLLMs) show promise for medical image interpretation, advancing MLLMs for ophthalmology is hindered by the lack of comprehensive benchmark datasets suitable for evaluating generative models. We present a large-scale multimodal ophthalmology benchmark comprising 32,633 instances with multi-granular annotations across 12 common ophthalmic conditions and 5 imaging modalities. The dataset integrates imaging, anatomical structures, demographics, and free-text annotations, supporting anatomical structure recognition, disease screening, disease staging, and demographic prediction for bias evaluation. This work extends our preliminary LMOD benchmark with three major enhancements: (1) nearly 50% dataset expansion with substantial enlargement of color fundus photography; (2) broadened task coverage including binary disease diagnosis, multi-class diagnosis, severity classification with international grading standards, and demographic prediction; and (3) systematic evaluation of 24 state-of-the-art MLLMs. Our evaluations reveal both promise and limitations. Top-performing models achieved ~58% accuracy in disease screening under zero-shot settings, and performance remained suboptimal for challenging tasks like disease staging. We will publicly release the dataset, curation pipeline, and leaderboard to potentially advance ophthalmic AI applications and reduce the global burden of vision-threatening diseases.

Mixed Modality Classification Dataset Release In Silico Academic Lab Open Dataset GenAI

Artificial Intelligence in Low-Dose Computed Tomography Screening of the Chest: Past, Present, and Future.

Yip R, Jirapatnakul A, Avila R, Gutierrez JG, Naghavi M, Yankelevitz DF, Henschke CI

•papers•Sep 30 2025

The integration of artificial intelligence (AI) with low-dose computed tomography (LDCT) has the potential to transform lung cancer screening into a comprehensive approach to early detection of multiple diseases. Building on over 3 decades of research and global implementation by the International Early Lung Cancer Action Program (I-ELCAP), this paper reviews the development and clinical integration of AI for interpreting LDCT scans. We describe the historical milestones in AI-assisted lung nodule detection, emphysema quantification, and cardiovascular risk assessment using visual and quantitative imaging features. We also discuss challenges related to image acquisition variability, ground truth curation, and clinical integration, with a particular focus on the design and implementation of the open-source IELCAP-AIRS system and the ScreeningPLUS infrastructure, which enable AI training, validation, and deployment in real-world screening environments. AI algorithms for rule-out decisions, nodule tracking, and disease quantification have the potential to reduce radiologist workload and advance precision screening. With the ability to evaluate multiple diseases from a single LDCT scan, AI-enabled screening offers a powerful, scalable tool for improving population health. Ongoing collaboration, standardized protocols, and large annotated datasets are critical to advancing the future of integrated, AI-driven preventive care.

CT Detection Chest Review Clinical Pilot Consortium Open Code Open Dataset

Filter Papers

Tags

Privacy-Preserving Latent Diffusion-Based Synthetic Medical Image Generation.

SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus

SpurBreast: A Curated Dataset for Investigating Spurious Correlations in Real-world Breast MRI Classification

MedQ-Bench: Evaluating and Exploring Medical Image Quality Assessment Abilities in MLLMs

From 2D to 3D, Deep Learning-based Shape Reconstruction in Magnetic Resonance Imaging: A Review

CardioBench: Do Echocardiography Foundation Models Generalize Beyond the Lab?

Automated Structured Radiology Report Generation with Rich Clinical Context

Image-based reconstruction of anthropomorphic breast phantoms for synthetic mammogram generation.

LMOD+: A Comprehensive Multimodal Dataset and Benchmark for Developing and Evaluating Multimodal Large Language Models in Ophthalmology

Artificial Intelligence in Low-Dose Computed Tomography Screening of the Chest: Past, Present, and Future.

Ready to Sharpen Your Edge?