Latest Papers on Radiology AI. Tags: Open Code

Ultrasam: a foundation model for ultrasound using large open-access segmentation datasets.

Meyer A, Murali A, Zarin F, Mutter D, Padoy N

•papers•Sep 11 2025

Automated ultrasound (US) image analysis remains a longstanding challenge due to anatomical complexity and the scarcity of annotated data. Although large-scale pretraining has improved data efficiency in many visual domains, its impact in US is limited by a pronounced domain shift from other imaging modalities and high variability across clinical applications, such as chest, ovarian, and endoscopic imaging. To address this, we propose UltraSam, a SAM-style model trained on a heterogeneous collection of publicly available segmentation datasets, originally developed in isolation. UltraSam is trained under the prompt-conditioned segmentation paradigm, which eliminates the need for unified labels and enables generalization to a broad range of downstream tasks. We compile US-43d, a large-scale collection of 43 open-access US datasets comprising over 282,000 images with segmentation masks covering 58 anatomical structures. We explore adaptation and fine-tuning strategies for SAM and systematically evaluate transferability across downstream tasks, comparing against state-of-the-art pretraining methods. We further propose prompted classification, a new use case where object-specific prompts and image features are jointly decoded to improve classification performance. In experiments on three diverse public US datasets, UltraSam outperforms existing SAM variants on prompt-based segmentation and surpasses self-supervised US foundation models on downstream (prompted) classification and instance segmentation tasks. UltraSam demonstrates that SAM-style training on diverse, sparsely annotated US data enables effective generalization across tasks. By unlocking the value of fragmented public datasets, our approach lays the foundation for scalable, real-world US representation learning. We release our code and pretrained models at https://github.com/CAMMA-public/UltraSam and invite the community to further this effort by continuing to contribute high-quality datasets.

Ultrasound Segmentation Methodology In Silico Academic Lab Open Dataset Open Code Benchmark SOTA

Enhancing 3D Medical Image Understanding with Pretraining Aided by 2D Multimodal Large Language Models

Qiuhui Chen, Xuancheng Yao, Huping Ye, Yi Hong

•preprint•Sep 11 2025

Understanding 3D medical image volumes is critical in the medical field, yet existing 3D medical convolution and transformer-based self-supervised learning (SSL) methods often lack deep semantic comprehension. Recent advancements in multimodal large language models (MLLMs) provide a promising approach to enhance image understanding through text descriptions. To leverage these 2D MLLMs for improved 3D medical image understanding, we propose Med3DInsight, a novel pretraining framework that integrates 3D image encoders with 2D MLLMs via a specially designed plane-slice-aware transformer module. Additionally, our model employs a partial optimal transport based alignment, demonstrating greater tolerance to noise introduced by potential noises in LLM-generated content. Med3DInsight introduces a new paradigm for scalable multimodal 3D medical representation learning without requiring human annotations. Extensive experiments demonstrate our state-of-the-art performance on two downstream tasks, i.e., segmentation and classification, across various public datasets with CT and MRI modalities, outperforming current SSL methods. Med3DInsight can be seamlessly integrated into existing 3D medical image understanding networks, potentially enhancing their performance. Our source code, generated datasets, and pre-trained models will be available at https://github.com/Qybc/Med3DInsight.

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code Benchmark SOTA

MetaLLMix : An XAI Aided LLM-Meta-learning Based Approach for Hyper-parameters Optimization

Mohammed Tiouti, Mohamed Bal-Ghaoui

•preprint•Sep 11 2025

Effective model and hyperparameter selection remains a major challenge in deep learning, often requiring extensive expertise and computation. While AutoML and large language models (LLMs) promise automation, current LLM-based approaches rely on trial and error and expensive APIs, which provide limited interpretability and generalizability. We propose MetaLLMiX, a zero-shot hyperparameter optimization framework combining meta-learning, explainable AI, and efficient LLM reasoning. By leveraging historical experiment outcomes with SHAP explanations, MetaLLMiX recommends optimal hyperparameters and pretrained models without additional trials. We further employ an LLM-as-judge evaluation to control output format, accuracy, and completeness. Experiments on eight medical imaging datasets using nine open-source lightweight LLMs show that MetaLLMiX achieves competitive or superior performance to traditional HPO methods while drastically reducing computational cost. Our local deployment outperforms prior API-based approaches, achieving optimal results on 5 of 8 tasks, response time reductions of 99.6-99.9%, and the fastest training times on 6 datasets (2.4-15.7x faster), maintaining accuracy within 1-5% of best-performing baselines.

Mixed Modality Classification Methodology In Silico Academic Lab Benchmark SOTA Open Code

Invisible Attributes, Visible Biases: Exploring Demographic Shortcuts in MRI-based Alzheimer's Disease Classification

Akshit Achara, Esther Puyol Anton, Alexander Hammers, Andrew P. King

•preprint•Sep 11 2025

Magnetic resonance imaging (MRI) is the gold standard for brain imaging. Deep learning (DL) algorithms have been proposed to aid in the diagnosis of diseases such as Alzheimer's disease (AD) from MRI scans. However, DL algorithms can suffer from shortcut learning, in which spurious features, not directly related to the output label, are used for prediction. When these features are related to protected attributes, they can lead to performance bias against underrepresented protected groups, such as those defined by race and sex. In this work, we explore the potential for shortcut learning and demographic bias in DL based AD diagnosis from MRI. We first investigate if DL algorithms can identify race or sex from 3D brain MRI scans to establish the presence or otherwise of race and sex based distributional shifts. Next, we investigate whether training set imbalance by race or sex can cause a drop in model performance, indicating shortcut learning and bias. Finally, we conduct a quantitative and qualitative analysis of feature attributions in different brain regions for both the protected attribute and AD classification tasks. Through these experiments, and using multiple datasets and DL models (ResNet and SwinTransformer), we demonstrate the existence of both race and sex based shortcut learning and bias in DL based AD classification. Our work lays the foundation for fairer DL diagnostic tools in brain MRI. The code is provided at https://github.com/acharaakshit/ShortMR

MRI Classification Neurological Methodology In Silico Academic Lab Ethics Open Code

Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis

Jing Hao, Yuxuan Fan, Yanpeng Sun, Kaixin Guo, Lizhuo Lin, Jinrong Yang, Qi Yong H. Ai, Lun M. Wong, Hao Tang, Kuo Feng Hung

•preprint•Sep 11 2025

Recent advances in large vision-language models (LVLMs) have demonstrated strong performance on general-purpose medical tasks. However, their effectiveness in specialized domains such as dentistry remains underexplored. In particular, panoramic X-rays, a widely used imaging modality in oral radiology, pose interpretative challenges due to dense anatomical structures and subtle pathological cues, which are not captured by existing medical benchmarks or instruction datasets. To this end, we introduce MMOral, the first large-scale multimodal instruction dataset and benchmark tailored for panoramic X-ray interpretation. MMOral consists of 20,563 annotated images paired with 1.3 million instruction-following instances across diverse task types, including attribute extraction, report generation, visual question answering, and image-grounded dialogue. In addition, we present MMOral-Bench, a comprehensive evaluation suite covering five key diagnostic dimensions in dentistry. We evaluate 64 LVLMs on MMOral-Bench and find that even the best-performing model, i.e., GPT-4o, only achieves 41.45% accuracy, revealing significant limitations of current models in this domain. To promote the progress of this specific domain, we also propose OralGPT, which conducts supervised fine-tuning (SFT) upon Qwen2.5-VL-7B with our meticulously curated MMOral instruction dataset. Remarkably, a single epoch of SFT yields substantial performance enhancements for LVLMs, e.g., OralGPT demonstrates a 24.73% improvement. Both MMOral and OralGPT hold significant potential as a critical foundation for intelligent dentistry and enable more clinically impactful multimodal AI systems in the dental field. The dataset, model, benchmark, and evaluation suite are available at https://github.com/isbrycee/OralGPT.

X-Ray Report Generation Dataset Release In Silico Academic Lab Open Dataset Open Code Benchmark SOTA

MetaLLMix : An XAI Aided LLM-Meta-learning Based Approach for Hyper-parameters Optimization

Mohammed Tiouti, Mohamed Bal-Ghaoui

•preprint•Sep 11 2025

Mixed Modality Classification Methodology In Silico Academic Lab Open Code

Medverse: A Universal Model for Full-Resolution 3D Medical Image Segmentation, Transformation and Enhancement

Jiesi Hu, Jianfeng Cao, Yanwu Yang, Chenfei Ye, Yixuan Zhang, Hanyang Peng, Ting Ma

•preprint•Sep 11 2025

In-context learning (ICL) offers a promising paradigm for universal medical image analysis, enabling models to perform diverse image processing tasks without retraining. However, current ICL models for medical imaging remain limited in two critical aspects: they cannot simultaneously achieve high-fidelity predictions and global anatomical understanding, and there is no unified model trained across diverse medical imaging tasks (e.g., segmentation and enhancement) and anatomical regions. As a result, the full potential of ICL in medical imaging remains underexplored. Thus, we present \textbf{Medverse}, a universal ICL model for 3D medical imaging, trained on 22 datasets covering diverse tasks in universal image segmentation, transformation, and enhancement across multiple organs, imaging modalities, and clinical centers. Medverse employs a next-scale autoregressive in-context learning framework that progressively refines predictions from coarse to fine, generating consistent, full-resolution volumetric outputs and enabling multi-scale anatomical awareness. We further propose a blockwise cross-attention module that facilitates long-range interactions between context and target inputs while preserving computational efficiency through spatial sparsity. Medverse is extensively evaluated on a broad collection of held-out datasets covering previously unseen clinical centers, organs, species, and imaging modalities. Results demonstrate that Medverse substantially outperforms existing ICL baselines and establishes a novel paradigm for in-context learning. Code and model weights will be made publicly available. Our model are publicly available at https://github.com/jiesihu/Medverse.

Mixed Modality Segmentation Whole Body Methodology In Silico Academic Lab Open Code Benchmark SOTA

Implementing a Resource-Light and Low-Code Large Language Model System for Information Extraction from Mammography Reports: A Pilot Study.

Dennstädt F, Fauser S, Cihoric N, Schmerder M, Lombardo P, Cereghetti GM, von Däniken S, Minder T, Meyer J, Chiang L, Gaio R, Lerch L, Filchenko I, Reichenpfader D, Denecke K, Vojvodic C, Tatalovic I, Sander A, Hastings J, Aebersold DM, von Tengg-Kobligk H, Nairz K

•papers•Sep 10 2025

Large language models (LLMs) have been successfully used for data extraction from free-text radiology reports. Most current studies were conducted with LLMs accessed via an application programming interface (API). We evaluated the feasibility of using open-source LLMs, deployed on limited local hardware resources for data extraction from free-text mammography reports, using a common data element (CDE)-based structure. Seventy-nine CDEs were defined by an interdisciplinary expert panel, reflecting real-world reporting practice. Sixty-one reports were classified by two independent researchers to establish ground truth. Five different open-source LLMs deployable on a single GPU were used for data extraction using the general-classifier Python package. Extractions were performed for five different prompt approaches with calculation of overall accuracy, micro-recall and micro-F1. Additional analyses were conducted using thresholds for the relative probability of classifications. High inter-rater agreement was observed between manual classifiers (Cohen's kappa 0.83). Using default prompts, the LLMs achieved accuracies of 59.2-72.9%. Chain-of-thought prompting yielded mixed results, while few-shot prompting led to decreased accuracy. Adaptation of the default prompts to precisely define classification tasks improved performance for all models, with accuracies of 64.7-85.3%. Setting certainty thresholds further improved accuracies to > 90% but reduced the coverage rate to < 50%. Locally deployed open-source LLMs can effectively extract information from mammography reports, maintaining compatibility with limited computational resources. Selection and evaluation of the model and prompting strategy are critical. Clear, task-specific instructions appear crucial for high performance. Using a CDE-based framework provides clear semantics and structure for the data extraction.

Mammography LLM Radiology Report Breast Methodology Prototype Academic Lab Open Code

RoentMod: A Synthetic Chest X-Ray Modification Model to Identify and Correct Image Interpretation Model Shortcuts

Lauren H. Cooke, Matthias Jung, Jan M. Brendel, Nora M. Kerkovits, Borek Foldyna, Michael T. Lu, Vineet K. Raghu

•preprint•Sep 10 2025

Chest radiographs (CXRs) are among the most common tests in medicine. Automated image interpretation may reduce radiologists\' workload and expand access to diagnostic expertise. Deep learning multi-task and foundation models have shown strong performance for CXR interpretation but are vulnerable to shortcut learning, where models rely on spurious and off-target correlations rather than clinically relevant features to make decisions. We introduce RoentMod, a counterfactual image editing framework that generates anatomically realistic CXRs with user-specified, synthetic pathology while preserving unrelated anatomical features of the original scan. RoentMod combines an open-source medical image generator (RoentGen) with an image-to-image modification model without requiring retraining. In reader studies with board-certified radiologists and radiology residents, RoentMod-produced images appeared realistic in 93\% of cases, correctly incorporated the specified finding in 89-99\% of cases, and preserved native anatomy comparable to real follow-up CXRs. Using RoentMod, we demonstrate that state-of-the-art multi-task and foundation models frequently exploit off-target pathology as shortcuts, limiting their specificity. Incorporating RoentMod-generated counterfactual images during training mitigated this vulnerability, improving model discrimination across multiple pathologies by 3-19\% AUC in internal validation and by 1-11\% for 5 out of 6 tested pathologies in external testing. These findings establish RoentMod as a broadly applicable tool for probing and correcting shortcut learning in medical AI. By enabling controlled counterfactual interventions, RoentMod enhances the robustness and interpretability of CXR interpretation models and provides a generalizable strategy for improving foundation models in medical imaging.

X-Ray Image Synthesis Chest Methodology In Silico Academic Lab GenAI Benchmark SOTA Open Code

RetiGen: Framework leveraging domain generalization and test-time adaptation for multi-view retinal diagnostics.

Zhang G, Chen Z, Huo J, do Rio JN, Komninos C, Liu Y, Sparks R, Ourselin S, Bergeles C, Jackson TL

•papers•Sep 10 2025

Domain generalization techniques involve training a model on one set of domains and evaluating its performance on different, unseen domains. In contrast, test-time adaptation optimizes the model specifically for the target domain during inference. Both approaches improve diagnostic accuracy in medical imaging models. However, no research to date has leveraged the advantages of both approaches in an end-to-end fashion. Our paper introduces RetiGen, a test-time optimization framework designed to be integrated with existing domain generalization approaches. With an emphasis on the ophthalmic imaging domain, RetiGen leverages unlabeled multi-view color fundus photographs-a critical optical technology in retinal diagnostics. By utilizing information from multiple viewing angles, our approach significantly enhances the robustness and accuracy of machine learning models when applied across different domains. By integrating class balancing, test-time adaptation, and a multi-view optimization strategy, RetiGen effectively addresses the persistent issue of domain shift, which often hinders the performance of imaging models. Experimental results demonstrate that our method outperforms state-of-the-art techniques in both domain generalization and test-time optimization. Specifically, RetiGen increases the generalizability of the MFIDDR dataset, improving the AUC from 0.751 to 0.872, a 0.121 improvement. Similarly, for the DRTiD dataset, the AUC increased from 0.794 to 0.879, a 0.085 improvement. The code for RetiGen is publicly available at https://github.com/RViMLab/RetiGen.

OCT Classification Methodology In Silico Academic Lab Open Code

Filter Papers

Tags

Ultrasam: a foundation model for ultrasound using large open-access segmentation datasets.

Enhancing 3D Medical Image Understanding with Pretraining Aided by 2D Multimodal Large Language Models

MetaLLMix : An XAI Aided LLM-Meta-learning Based Approach for Hyper-parameters Optimization

Invisible Attributes, Visible Biases: Exploring Demographic Shortcuts in MRI-based Alzheimer's Disease Classification

Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis

MetaLLMix : An XAI Aided LLM-Meta-learning Based Approach for Hyper-parameters Optimization

Medverse: A Universal Model for Full-Resolution 3D Medical Image Segmentation, Transformation and Enhancement

Implementing a Resource-Light and Low-Code Large Language Model System for Information Extraction from Mammography Reports: A Pilot Study.

RoentMod: A Synthetic Chest X-Ray Modification Model to Identify and Correct Image Interpretation Model Shortcuts

RetiGen: Framework leveraging domain generalization and test-time adaptation for multi-view retinal diagnostics.

Ready to Sharpen Your Edge?