Latest Papers on Radiology AI. Tags: Open Code

SAMed-2: Selective Memory Enhanced Medical Segment Anything Model

Zhiling Yan, Sifan Song, Dingjie Song, Yiwei Li, Rong Zhou, Weixiang Sun, Zhennong Chen, Sekeun Kim, Hui Ren, Tianming Liu, Quanzheng Li, Xiang Li, Lifang He, Lichao Sun

•preprint•Jul 4 2025

Recent "segment anything" efforts show promise by learning from large-scale data, but adapting such models directly to medical images remains challenging due to the complexity of medical data, noisy annotations, and continual learning requirements across diverse modalities and anatomical structures. In this work, we propose SAMed-2, a new foundation model for medical image segmentation built upon the SAM-2 architecture. Specifically, we introduce a temporal adapter into the image encoder to capture image correlations and a confidence-driven memory mechanism to store high-certainty features for later retrieval. This memory-based strategy counters the pervasive noise in large-scale medical datasets and mitigates catastrophic forgetting when encountering new tasks or modalities. To train and evaluate SAMed-2, we curate MedBank-100k, a comprehensive dataset spanning seven imaging modalities and 21 medical segmentation tasks. Our experiments on both internal benchmarks and 10 external datasets demonstrate superior performance over state-of-the-art baselines in multi-task scenarios. The code is available at: https://github.com/ZhilingYan/Medical-SAM-Bench.

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code Open Dataset Benchmark SOTA

A Chain of Diagnosis Framework for Accurate and Explainable Radiology Report Generation.

Jin H, Che H, He S, Chen H

•papers•Jul 3 2025

Despite the progress of radiology report generation (RRG), existing works face two challenges: 1) The performances in clinical efficacy are unsatisfactory, especially for lesion attributes description; 2) the generated text lacks explainability, making it difficult for radiologists to trust the results. To address the challenges, we focus on a trustworthy RRG model, which not only generates accurate descriptions of abnormalities, but also provides basis of its predictions. To this end, we propose a framework named chain of diagnosis (CoD), which maintains a chain of diagnostic process for clinically accurate and explainable RRG. It first generates question-answer (QA) pairs via diagnostic conversation to extract key findings, then prompts a large language model with QA diagnoses for accurate generation. To enhance explainability, a diagnosis grounding module is designed to match QA diagnoses and generated sentences, where the diagnoses act as a reference. Moreover, a lesion grounding module is designed to locate abnormalities in the image, further improving the working efficiency of radiologists. To facilitate label-efficient training, we propose an omni-supervised learning strategy with clinical consistency to leverage various types of annotations from different datasets. Our efforts lead to 1) an omni-labeled RRG dataset with QA pairs and lesion boxes; 2) a evaluation tool for assessing the accuracy of reports in describing lesion location and severity; 3) extensive experiments to demonstrate the effectiveness of CoD, where it outperforms both specialist and generalist models consistently on two RRG benchmarks and shows promising explainability by accurately grounding generated sentences to QA diagnoses and images.

Mixed Modality Report Generation Whole Body Methodology In Silico Academic Lab Open Dataset Open Code

Embedding-Based Federated Data Sharing via Differentially Private Conditional VAEs

Francesco Di Salvo, Hanh Huyen My Nguyen, Christian Ledig

•preprint•Jul 3 2025

Deep Learning (DL) has revolutionized medical imaging, yet its adoption is constrained by data scarcity and privacy regulations, limiting access to diverse datasets. Federated Learning (FL) enables decentralized training but suffers from high communication costs and is often restricted to a single downstream task, reducing flexibility. We propose a data-sharing method via Differentially Private (DP) generative models. By adopting foundation models, we extract compact, informative embeddings, reducing redundancy and lowering computational overhead. Clients collaboratively train a Differentially Private Conditional Variational Autoencoder (DP-CVAE) to model a global, privacy-aware data distribution, supporting diverse downstream tasks. Our approach, validated across multiple feature extractors, enhances privacy, scalability, and efficiency, outperforming traditional FL classifiers while ensuring differential privacy. Additionally, DP-CVAE produces higher-fidelity embeddings than DP-CGAN while requiring $5{\times}$ fewer parameters.

Mixed Modality Image Synthesis Whole Body Methodology In Silico Academic Lab Open Code Ethics

MedFormer: Hierarchical Medical Vision Transformer with Content-Aware Dual Sparse Selection Attention

Zunhui Xia, Hongxing Li, Libin Lan

•preprint•Jul 3 2025

Medical image recognition serves as a key way to aid in clinical diagnosis, enabling more accurate and timely identification of diseases and abnormalities. Vision transformer-based approaches have proven effective in handling various medical recognition tasks. However, these methods encounter two primary challenges. First, they are often task-specific and architecture-tailored, limiting their general applicability. Second, they usually either adopt full attention to model long-range dependencies, resulting in high computational costs, or rely on handcrafted sparse attention, potentially leading to suboptimal performance. To tackle these issues, we present MedFormer, an efficient medical vision transformer with two key ideas. First, it employs a pyramid scaling structure as a versatile backbone for various medical image recognition tasks, including image classification and dense prediction tasks such as semantic segmentation and lesion detection. This structure facilitates hierarchical feature representation while reducing the computation load of feature maps, highly beneficial for boosting performance. Second, it introduces a novel Dual Sparse Selection Attention (DSSA) with content awareness to improve computational efficiency and robustness against noise while maintaining high performance. As the core building technique of MedFormer, DSSA is explicitly designed to attend to the most relevant content. In addition, a detailed theoretical analysis has been conducted, demonstrating that MedFormer has superior generality and efficiency in comparison to existing medical vision transformers. Extensive experiments on a variety of imaging modality datasets consistently show that MedFormer is highly effective in enhancing performance across all three above-mentioned medical image recognition tasks. The code is available at https://github.com/XiaZunhui/MedFormer.

Mixed Modality Classification Methodology In Silico Academic Lab Open Code Benchmark SOTA

Transformer attention-based neural network for cognitive score estimation from sMRI data.

Li S, Zhang Y, Zou C, Zhang L, Li F, Liu Q

•papers•Jul 3 2025

Accurately predicting cognitive scores based on structural MRI holds significant clinical value for understanding the pathological stages of dementia and forecasting Alzheimer's disease (AD). Some existing deep learning methods often depend on anatomical priors, overlooking individual-specific structural differences during AD progression. To address these limitations, this work proposes a deep neural network that incorporates Transformer attention to jointly predict multiple cognitive scores, including ADAS, CDRSB, and MMSE. The architecture first employs a 3D convolutional neural network backbone to encode sMRI, capturing preliminary local structural information. Then an improved Transformer attention block integrated with 3D positional encoding and 3D convolutional layer to adaptively capture discriminative imaging features across the brain, thereby focusing on key cognitive-related regions effectively. Finally, an attention-aware regression network enables the joint prediction of multiple clinical scores. Experimental results demonstrate that our method outperforms some existing traditional and deep learning methods based on the ADNI dataset. Further qualitative analysis reveals that the dementia-related brain regions identified by the model hold important biological significance, effectively enhancing the performance of cognitive score prediction. Our code is publicly available at: https://github.com/lshsx/CTA_MRI.

MRI Classification Neurological Methodology In Silico Academic Lab Open Code

MvHo-IB: Multi-View Higher-Order Information Bottleneck for Brain Disorder Diagnosis

Kunyu Zhang, Qiang Li, Shujian Yu

•preprint•Jul 3 2025

Recent evidence suggests that modeling higher-order interactions (HOIs) in functional magnetic resonance imaging (fMRI) data can enhance the diagnostic accuracy of machine learning systems. However, effectively extracting and utilizing HOIs remains a significant challenge. In this work, we propose MvHo-IB, a novel multi-view learning framework that integrates both pairwise interactions and HOIs for diagnostic decision-making, while automatically compressing task-irrelevant redundant information. MvHo-IB introduces several key innovations: (1) a principled method that combines O-information from information theory with a matrix-based Renyi alpha-order entropy estimator to quantify and extract HOIs, (2) a purpose-built Brain3DCNN encoder to effectively utilize these interactions, and (3) a new multi-view learning information bottleneck objective to enhance representation learning. Experiments on three benchmark fMRI datasets demonstrate that MvHo-IB achieves state-of-the-art performance, significantly outperforming previous methods, including recent hypergraph-based techniques. The implementation of MvHo-IB is available at https://github.com/zky04/MvHo-IB.

MRI Classification Neurological Methodology In Silico Academic Lab Open Code Benchmark SOTA

Prompt learning with bounding box constraints for medical image segmentation

Mélanie Gaillochet, Mehrdad Noori, Sahar Dastani, Christian Desrosiers, Hervé Lombaert

•preprint•Jul 3 2025

Pixel-wise annotations are notoriously labourious and costly to obtain in the medical domain. To mitigate this burden, weakly supervised approaches based on bounding box annotations-much easier to acquire-offer a practical alternative. Vision foundation models have recently shown noteworthy segmentation performance when provided with prompts such as points or bounding boxes. Prompt learning exploits these models by adapting them to downstream tasks and automating segmentation, thereby reducing user intervention. However, existing prompt learning approaches depend on fully annotated segmentation masks. This paper proposes a novel framework that combines the representational power of foundation models with the annotation efficiency of weakly supervised segmentation. More specifically, our approach automates prompt generation for foundation models using only bounding box annotations. Our proposed optimization scheme integrates multiple constraints derived from box annotations with pseudo-labels generated by the prompted foundation model. Extensive experiments across multimodal datasets reveal that our weakly supervised method achieves an average Dice score of 84.90% in a limited data setting, outperforming existing fully-supervised and weakly-supervised approaches. The code is available at https://github.com/Minimel/box-prompt-learning-VFM.git

Segmentation Methodology In Silico Academic Lab Open Code

Multi channel fusion diffusion models for brain tumor MRI data augmentation.

Zuo C, Xue J, Yuan C

•papers•Jul 2 2025

The early diagnosis of brain tumors is crucial for patient prognosis, and medical imaging techniques such as MRI and CT scans are essential tools for diagnosing brain tumors. However, high-quality medical image data for brain tumors is often scarce and difficult to obtain, which hinders the development and application of medical image analysis models. With the advancement of artificial intelligence, particularly deep learning technologies in the field of medical imaging, new concepts and tools have been introduced for the early diagnosis, treatment planning, and prognosis evaluation of brain tumors. To address the challenge of imbalanced brain tumor datasets, we propose a novel data augmentation technique based on a diffusion model, referred to as the Multi-Channel Fusion Diffusion Model(MCFDiffusion). This method tackles the issue of data imbalance by converting healthy brain MRI images into images containing tumors, thereby enabling deep learning models to achieve better performance and assisting physicians in making more accurate diagnoses and treatment plans. In our experiments, we used a publicly available brain tumor dataset and compared the performance of image classification and segmentation tasks between the original data and the data enhanced by our method. The results show that the enhanced data improved the classification accuracy by approximately 3% and the Dice coefficient for segmentation tasks by 1.5%-2.5%. Our research builds upon previous work involving Denoising Diffusion Implicit Models (DDIMs) for image generation and further enhances the applicability of this model in medical imaging by introducing a multi-channel approach and fusing defective areas with healthy images. Future work will explore the application of this model to various types of medical images and further optimize the model to improve its generalization capabilities. We release our code at https://github.com/feiyueaaa/MCFDiffusion.

MRI Image Synthesis Neurological Methodology In Silico Academic Lab Open Code

A Multi-Centric Anthropomorphic 3D CT Phantom-Based Benchmark Dataset for Harmonization

Mohammadreza Amirian, Michael Bach, Oscar Jimenez-del-Toro, Christoph Aberle, Roger Schaer, Vincent Andrearczyk, Jean-Félix Maestrati, Maria Martin Asiain, Kyriakos Flouris, Markus Obmann, Clarisse Dromain, Benoît Dufour, Pierre-Alexandre Alois Poletti, Hendrik von Tengg-Kobligk, Rolf Hügli, Martin Kretzschmar, Hatem Alkadhi, Ender Konukoglu, Henning Müller, Bram Stieltjes, Adrien Depeursinge

•preprint•Jul 2 2025

Artificial intelligence (AI) has introduced numerous opportunities for human assistance and task automation in medicine. However, it suffers from poor generalization in the presence of shifts in the data distribution. In the context of AI-based computed tomography (CT) analysis, significant data distribution shifts can be caused by changes in scanner manufacturer, reconstruction technique or dose. AI harmonization techniques can address this problem by reducing distribution shifts caused by various acquisition settings. This paper presents an open-source benchmark dataset containing CT scans of an anthropomorphic phantom acquired with various scanners and settings, which purpose is to foster the development of AI harmonization techniques. Using a phantom allows fixing variations attributed to inter- and intra-patient variations. The dataset includes 1378 image series acquired with 13 scanners from 4 manufacturers across 8 institutions using a harmonized protocol as well as several acquisition doses. Additionally, we present a methodology, baseline results and open-source code to assess image- and feature-level stability and liver tissue classification, promoting the development of AI harmonization strategies.

CT Image Synthesis Abdominal Dataset Release Phantom/Animal Consortium Open Dataset Open Code

Large language model trained on clinical oncology data predicts cancer progression.

Zhu M, Lin H, Jiang J, Jinia AJ, Jee J, Pichotta K, Waters M, Rose D, Schultz N, Chalise S, Valleru L, Morin O, Moran J, Deasy JO, Pilai S, Nichols C, Riely G, Braunstein LZ, Li A

•papers•Jul 2 2025

Subspecialty knowledge barriers have limited the adoption of large language models (LLMs) in oncology. We introduce Woollie, an open-source, oncology-specific LLM trained on real-world data from Memorial Sloan Kettering Cancer Center (MSK) across lung, breast, prostate, pancreatic, and colorectal cancers, with external validation using University of California, San Francisco (UCSF) data. Woollie surpasses ChatGPT in medical benchmarks and excels in eight non-medical benchmarks. Analyzing 39,319 radiology impression notes from 4002 patients, it achieved an overall area under the receiver operating characteristic curve (AUROC) of 0.97 for cancer progression prediction on MSK data, including a notable 0.98 AUROC for pancreatic cancer. On UCSF data, it achieved an overall AUROC of 0.88, excelling in lung cancer detection with an AUROC of 0.95. As the first oncology specific LLM validated across institutions, Woollie demonstrates high accuracy and consistency across cancer types, underscoring its potential to enhance cancer progression analysis.

Mixed Modality LLM Radiology Report Retrospective Clinical In Silico Academic Lab Open Code Benchmark SOTA GenAI

Filter Papers

Tags

SAMed-2: Selective Memory Enhanced Medical Segment Anything Model

A Chain of Diagnosis Framework for Accurate and Explainable Radiology Report Generation.

Embedding-Based Federated Data Sharing via Differentially Private Conditional VAEs

MedFormer: Hierarchical Medical Vision Transformer with Content-Aware Dual Sparse Selection Attention

Transformer attention-based neural network for cognitive score estimation from sMRI data.

MvHo-IB: Multi-View Higher-Order Information Bottleneck for Brain Disorder Diagnosis

Prompt learning with bounding box constraints for medical image segmentation

Multi channel fusion diffusion models for brain tumor MRI data augmentation.

A Multi-Centric Anthropomorphic 3D CT Phantom-Based Benchmark Dataset for Harmonization

Large language model trained on clinical oncology data predicts cancer progression.

Ready to Sharpen Your Edge?