Latest Papers on Radiology AI. Tags: Open Dataset

Transformer Classification of Breast Lesions: The BreastDCEDL_AMBL Benchmark Dataset and 0.92 AUC Baseline

Naomi Fridman, Anat Goldstein

•preprint•Sep 30 2025

Breast magnetic resonance imaging is a critical tool for cancer detection and treatment planning, but its clinical utility is hindered by poor specificity, leading to high false-positive rates and unnecessary biopsies. This study introduces a transformer-based framework for automated classification of breast lesions in dynamic contrast-enhanced MRI, addressing the challenge of distinguishing benign from malignant findings. We implemented a SegFormer architecture that achieved an AUC of 0.92 for lesion-level classification, with 100% sensitivity and 67% specificity at the patient level - potentially eliminating one-third of unnecessary biopsies without missing malignancies. The model quantifies malignant pixel distribution via semantic segmentation, producing interpretable spatial predictions that support clinical decision-making. To establish reproducible benchmarks, we curated BreastDCEDL_AMBL by transforming The Cancer Imaging Archive's AMBL collection into a standardized deep learning dataset with 88 patients and 133 annotated lesions (89 benign, 44 malignant). This resource addresses a key infrastructure gap, as existing public datasets lack benign lesion annotations, limiting benign-malignant classification research. Training incorporated an expanded cohort of over 1,200 patients through integration with BreastDCEDL datasets, validating transfer learning approaches despite primary tumor-only annotations. Public release of the dataset, models, and evaluation protocols provides the first standardized benchmark for DCE-MRI lesion classification, enabling methodological advancement toward clinical deployment.

MRI Classification Breast Dataset Release In Silico Academic Lab Open Dataset Open Code Benchmark SOTA

A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI

Arvind Murari Vepa, Yannan Yu, Jingru Gan, Anthony Cuturrufo, Weikai Li, Wei Wang, Fabien Scalzo, Yizhou Sun

•preprint•Sep 30 2025

We introduce mpLLM, a prompt-conditioned hierarchical mixture-of-experts (MoE) architecture for visual question answering over multi-parametric 3D brain MRI (mpMRI). mpLLM routes across modality-level and token-level projection experts to fuse multiple interrelated 3D modalities, enabling efficient training without image-report pretraining. To address limited image-text paired supervision, mpLLM integrates a synthetic visual question answering (VQA) protocol that generates medically relevant VQA from segmentation annotations, and we collaborate with medical experts for clinical validation. mpLLM outperforms strong medical VLM baselines by 5.3% on average across multiple mpMRI datasets. Our study features three main contributions: (1) the first clinically validated VQA dataset for 3D brain mpMRI, (2) a novel multimodal LLM that handles multiple interrelated 3D modalities, and (3) strong empirical results that demonstrate the medical utility of our methodology. Ablations highlight the importance of modality-level and token-level experts and prompt-conditioned routing.

MRI LLM Radiology Report Neurological Methodology In Silico Academic Lab Breakthrough Open Dataset

Artificial Intelligence in Low-Dose Computed Tomography Screening of the Chest: Past, Present, and Future.

Yip R, Jirapatnakul A, Avila R, Gutierrez JG, Naghavi M, Yankelevitz DF, Henschke CI

•papers•Sep 30 2025

The integration of artificial intelligence (AI) with low-dose computed tomography (LDCT) has the potential to transform lung cancer screening into a comprehensive approach to early detection of multiple diseases. Building on over 3 decades of research and global implementation by the International Early Lung Cancer Action Program (I-ELCAP), this paper reviews the development and clinical integration of AI for interpreting LDCT scans. We describe the historical milestones in AI-assisted lung nodule detection, emphysema quantification, and cardiovascular risk assessment using visual and quantitative imaging features. We also discuss challenges related to image acquisition variability, ground truth curation, and clinical integration, with a particular focus on the design and implementation of the open-source IELCAP-AIRS system and the ScreeningPLUS infrastructure, which enable AI training, validation, and deployment in real-world screening environments. AI algorithms for rule-out decisions, nodule tracking, and disease quantification have the potential to reduce radiologist workload and advance precision screening. With the ability to evaluate multiple diseases from a single LDCT scan, AI-enabled screening offers a powerful, scalable tool for improving population health. Ongoing collaboration, standardized protocols, and large annotated datasets are critical to advancing the future of integrated, AI-driven preventive care.

CT Detection Chest Review Clinical Pilot Consortium Open Code Open Dataset

Dolphin v1.0 Technical Report

Taohan Weng, Chi zhang, Chaoran Yan, Siya Liu, Xiaoyang Liu, Yalun Wu, Boyang Wang, Boyan Wang, Jiren Ren, Kaiwen Yan, Jinze Yu, Kaibing Hu, Henan Liu, Haoyun zheng, Anjie Le, Hongcheng Guo

•preprint•Sep 30 2025

Ultrasound is crucial in modern medicine but faces challenges like operator dependence, image noise, and real-time scanning, hindering AI integration. While large multimodal models excel in other medical imaging areas, they struggle with ultrasound's complexities. To address this, we introduce Dolphin v1.0 (V1) and its reasoning-augmented version, Dolphin R1-the first large-scale multimodal ultrasound foundation models unifying diverse clinical tasks in a single vision-language framework.To tackle ultrasound variability and noise, we curated a 2-million-scale multimodal dataset, combining textbook knowledge, public data, synthetic samples, and general corpora. This ensures robust perception, generalization, and clinical adaptability.The Dolphin series employs a three-stage training strategy: domain-specialized pretraining, instruction-driven alignment, and reinforcement-based refinement. Dolphin v1.0 delivers reliable performance in classification, detection, regression, and report generation. Dolphin R1 enhances diagnostic inference, reasoning transparency, and interpretability through reinforcement learning with ultrasound-specific rewards.Evaluated on U2-Bench across eight ultrasound tasks, Dolphin R1 achieves a U2-score of 0.5835-over twice the second-best model (0.2968) setting a new state of the art. Dolphin v1.0 also performs competitively, validating the unified framework. Comparisons show reasoning-enhanced training significantly improves diagnostic accuracy, consistency, and interpretability, highlighting its importance for high-stakes medical AI.

Ultrasound LLM Radiology Report Methodology In Silico Academic Lab Benchmark SOTA Open Dataset GenAI

A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI

Arvind Murari Vepa, Yannan Yu, Jingru Gan, Anthony Cuturrufo, Weikai Li, Wei Wang, Fabien Scalzo, Yizhou Sun

•preprint•Sep 30 2025

We introduce mpLLM, a prompt-conditioned hierarchical mixture-of-experts (MoE) architecture for visual question answering over multi-parametric 3D brain MRI (mpMRI). mpLLM routes across modality-level and token-level projection experts to fuse multiple interrelated 3D modalities, enabling efficient training without image--report pretraining. To address limited image-text paired supervision, mpLLM integrates a synthetic visual question answering (VQA) protocol that generates medically relevant VQA from segmentation annotations, and we collaborate with medical experts for clinical validation. mpLLM outperforms strong medical VLM baselines by 5.3% on average across multiple mpMRI datasets. Our study features three main contributions: (1) the first clinically validated VQA dataset for 3D brain mpMRI, (2) a novel multimodal LLM that handles multiple interrelated 3D modalities, and (3) strong empirical results that demonstrate the medical utility of our methodology. Ablations highlight the importance of modality-level and token-level experts and prompt-conditioned routing. We have included our source code in the supplementary materials and will release our dataset upon publication.

MRI LLM Radiology Report Neurological Methodology In Silico Academic Lab Open Code Open Dataset GenAI

Dolphin v1.0 Technical Report

Taohan Weng, Chi zhang, Chaoran Yan, Siya Liu, Xiaoyang Liu, Yalun Wu, Boyang Wang, Boyan Wang, Jiren Ren, Kaiwen Yan, Jinze Yu, Kaibing Hu, Henan Liu, Haoyun Zheng, Zhenyu Liu, Duo Zhang, Xiaoqing Guo, Anjie Le, Hongcheng Guo

•preprint•Sep 30 2025

Ultrasound Methodology In Silico Breakthrough Benchmark SOTA Open Dataset GenAI

MetaChest: Generalized few-shot learning of patologies from chest X-rays

Berenice Montalvo-Lezama, Gibran Fuentes-Pineda

•preprint•Sep 29 2025

The limited availability of annotated data presents a major challenge for applying deep learning methods to medical image analysis. Few-shot learning methods aim to recognize new classes from only a small number of labeled examples. These methods are typically studied under the standard few-shot learning setting, where all classes in a task are new. However, medical applications such as pathology classification from chest X-rays often require learning new classes while simultaneously leveraging knowledge of previously known ones, a scenario more closely aligned with generalized few-shot classification. Despite its practical relevance, few-shot learning has been scarcely studied in this context. In this work, we present MetaChest, a large-scale dataset of 479,215 chest X-rays collected from four public databases. MetaChest includes a meta-set partition specifically designed for standard few-shot classification, as well as an algorithm for generating multi-label episodes. We conduct extensive experiments evaluating both a standard transfer learning approach and an extension of ProtoNet across a wide range of few-shot multi-label classification tasks. Our results demonstrate that increasing the number of classes per episode and the number of training examples per class improves classification performance. Notably, the transfer learning approach consistently outperforms the ProtoNet extension, despite not being tailored for few-shot learning. We also show that higher-resolution images improve accuracy at the cost of additional computation, while efficient model architectures achieve comparable performance to larger models with significantly reduced resource requirements.

X-Ray Classification Chest Dataset Release In Silico Open Dataset Benchmark SOTA

Democratizing AI in Healthcare with Open Medical Inference (OMI): Protocols, Data Exchange, and AI Integration.

Pelka O, Sigle S, Werner P, Schweizer ST, Iancu A, Scherer L, Kamzol NA, Eil JH, Apfelbacher T, Seletkov D, Susetzky T, May MS, Bucher AM, Fegeler C, Boeker M, Braren R, Prokosch HU, Nensa F

•papers•Sep 29 2025

The integration of artificial intelligence (AI) into healthcare is transforming clinical decision-making, patient outcomes, and workflows. AI inference, applying trained models to new data, is central to this evolution, with cloud-based infrastructures enabling scalable AI deployment. The Open Medical Inference (OMI) platform democratizes AI access through open protocols and standardized data formats for seamless, interoperable healthcare data exchange. By integrating standards like FHIR and DICOMweb, OMI ensures interoperability between healthcare institutions and AI services while fostering ethical AI use through a governance framework addressing privacy, transparency, and fairness.OMI's implementation is structured into work packages, each addressing technical and ethical aspects. These include expanding the Medical Informatics Initiative (MII) Core Dataset for medical imaging, developing infrastructure for AI inference, and creating an open-source DICOMweb adapter for legacy systems. Standardized data formats ensure interoperability, while the AI Governance Framework promotes trust and responsible AI use.The project aims to establish an interoperable AI network across healthcare institutions, connecting existing infrastructures and AI services to enhance clinical outcomes. · OMI develops open protocols and standardized data formats for seamless healthcare data exchange.. · Integration with FHIR and DICOMweb ensures interoperability between healthcare systems and AI services.. · A governance framework addresses privacy, transparency, and fairness in AI usage.. · Work packages focus on expanding datasets, creating infrastructure, and enabling legacy system integration.. · The project aims to create a scalable, secure, and interoperable AI network in healthcare.. · Pelka O, Sigle S, Werner P et al. Democratizing AI in Healthcare with Open Medical Inference (OMI): Protocols, Data Exchange, and AI Integration. Rofo 2025; DOI 10.1055/a-2651-6653.

Mixed Modality Classification Whole Body Methodology Concept Consortium Open Code Ethics Open Dataset

TemMed-Bench: Evaluating Temporal Medical Image Reasoning in Vision-Language Models

Junyi Zhang, Jia-Chen Gu, Wenbo Hu, Yu Zhou, Robinson Piramuthu, Nanyun Peng

•preprint•Sep 29 2025

Existing medical reasoning benchmarks for vision-language models primarily focus on analyzing a patient's condition based on an image from a single visit. However, this setting deviates significantly from real-world clinical practice, where doctors typically refer to a patient's historical conditions to provide a comprehensive assessment by tracking their changes over time. In this paper, we introduce TemMed-Bench, the first benchmark designed for analyzing changes in patients' conditions between different clinical visits, which challenges large vision-language models (LVLMs) to reason over temporal medical images. TemMed-Bench consists of a test set comprising three tasks - visual question-answering (VQA), report generation, and image-pair selection - and a supplementary knowledge corpus of over 17,000 instances. With TemMed-Bench, we conduct an evaluation of six proprietary and six open-source LVLMs. Our results show that most LVLMs lack the ability to analyze patients' condition changes over temporal medical images, and a large proportion perform only at a random-guessing level in the closed-book setting. In contrast, GPT o3, o4-mini and Claude 3.5 Sonnet demonstrate comparatively decent performance, though they have yet to reach the desired level. Furthermore, we explore augmenting the input with both retrieved visual and textual modalities in the medical domain. We also show that multi-modal retrieval augmentation yields notably higher performance gains than no retrieval and textual retrieval alone across most models on our benchmark, with the VQA task showing an average improvement of 2.59%. Overall, we compose a benchmark grounded on real-world clinical practice, and it reveals LVLMs' limitations in temporal medical image reasoning, as well as highlighting the use of multi-modal retrieval augmentation as a potentially promising direction worth exploring to address this challenge.

Mixed Modality LLM Radiology Report Dataset Release In Silico Academic Lab Open Dataset Benchmark SOTA

Toward a Vision-Language Foundation Model for Medical Data: Multimodal Dataset and Benchmarks for Vietnamese PET/CT Report Generation

Huu Tien Nguyen, Dac Thai Nguyen, The Minh Duc Nguyen, Trung Thanh Nguyen, Thao Nguyen Truong, Huy Hieu Pham, Johan Barthelemy, Minh Quan Tran, Thanh Tam Nguyen, Quoc Viet Hung Nguyen, Quynh Anh Chau, Hong Son Mai, Thanh Trung Nguyen, Phi Le Nguyen

•preprint•Sep 29 2025

Vision-Language Foundation Models (VLMs), trained on large-scale multimodal datasets, have driven significant advances in Artificial Intelligence by enabling rich cross-modal reasoning. Despite their success in general domains, applying these models to medical imaging remains challenging due to the limited availability of diverse imaging modalities and multilingual clinical data. Most existing medical VLMs are trained on a subset of imaging modalities and focus primarily on high-resource languages, thus limiting their generalizability and clinical utility. To address these limitations, we introduce a novel Vietnamese-language multimodal medical dataset comprising 1,567,062 paired CT-PET images and corresponding 2,757 full-length clinical reports. This dataset is designed to fill two pressing gaps in medical AI development: (1) the lack of PET/CT imaging data in existing VLMs training corpora, which hinders the development of models capable of handling functional imaging tasks; and (2) the underrepresentation of low-resource languages, particularly the Vietnamese language, in medical vision-language research. To the best of our knowledge, this is the first dataset to provide comprehensive PET/CT-report pairs in Vietnamese. We further introduce a training framework to enhance VLMs' learning, including data augmentation and expert-validated test sets. We conduct comprehensive experiments benchmarking state-of-the-art VLMs on downstream tasks, including medical report generation and visual question answering. The experimental results show that incorporating our dataset significantly improves the performance of existing VLMs. We believe this dataset and benchmark will serve as a pivotal step in advancing the development of more robust VLMs for medical imaging, particularly in low-resource languages, and improving their clinical relevance in Vietnamese healthcare.

Mixed Modality Report Generation Whole Body Dataset Release In Silico Academic Lab Open Dataset GenAI

Filter Papers

Tags

Transformer Classification of Breast Lesions: The BreastDCEDL_AMBL Benchmark Dataset and 0.92 AUC Baseline

A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI

Artificial Intelligence in Low-Dose Computed Tomography Screening of the Chest: Past, Present, and Future.

Dolphin v1.0 Technical Report

A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI

Dolphin v1.0 Technical Report

MetaChest: Generalized few-shot learning of patologies from chest X-rays

Democratizing AI in Healthcare with Open Medical Inference (OMI): Protocols, Data Exchange, and AI Integration.

TemMed-Bench: Evaluating Temporal Medical Image Reasoning in Vision-Language Models

Toward a Vision-Language Foundation Model for Medical Data: Multimodal Dataset and Benchmarks for Vietnamese PET/CT Report Generation

Ready to Sharpen Your Edge?