Latest Papers on Radiology AI. Tags: Mixed Modality

Advancements in Medical Image Classification through Fine-Tuning Natural Domain Foundation Models

Mobina Mansoori, Sajjad Shahabodini, Farnoush Bayatmakou, Jamshid Abouei, Konstantinos N. Plataniotis, Arash Mohammadi

•preprint•May 26 2025

Using massive datasets, foundation models are large-scale, pre-trained models that perform a wide range of tasks. These models have shown consistently improved results with the introduction of new methods. It is crucial to analyze how these trends impact the medical field and determine whether these advancements can drive meaningful change. This study investigates the application of recent state-of-the-art foundation models, DINOv2, MAE, VMamba, CoCa, SAM2, and AIMv2, for medical image classification. We explore their effectiveness on datasets including CBIS-DDSM for mammography, ISIC2019 for skin lesions, APTOS2019 for diabetic retinopathy, and CHEXPERT for chest radiographs. By fine-tuning these models and evaluating their configurations, we aim to understand the potential of these advancements in medical image classification. The results indicate that these advanced models significantly enhance classification outcomes, demonstrating robust performance despite limited labeled data. Based on our results, AIMv2, DINOv2, and SAM2 models outperformed others, demonstrating that progress in natural domain training has positively impacted the medical domain and improved classification outcomes. Our code is publicly available at: https://github.com/sajjad-sh33/Medical-Transfer-Learning.

Mixed Modality Classification Methodology In Silico Academic Lab Open Code Benchmark SOTA

DeepInverse: A Python package for solving imaging inverse problems with deep learning

Julián Tachella, Matthieu Terris, Samuel Hurault, Andrew Wang, Dongdong Chen, Minh-Hai Nguyen, Maxime Song, Thomas Davies, Leo Davy, Jonathan Dong, Paul Escande, Johannes Hertrich, Zhiyuan Hu, Tobías I. Liaudat, Nils Laurent, Brett Levac, Mathurin Massias, Thomas Moreau, Thibaut Modrzyk, Brayan Monroy, Sebastian Neumayer, Jérémy Scanvic, Florian Sarron, Victor Sechaud, Georg Schramm, Chao Tang, Romain Vo, Pierre Weiss

•preprint•May 26 2025

DeepInverse is an open-source PyTorch-based library for solving imaging inverse problems. The library covers all crucial steps in image reconstruction from the efficient implementation of forward operators (e.g., optics, MRI, tomography), to the definition and resolution of variational problems and the design and training of advanced neural network architectures. In this paper, we describe the main functionality of the library and discuss the main design choices.

Mixed Modality Reconstruction Methodology Prototype Open Source Open Code

Methodological Challenges in Deep Learning-Based Detection of Intracranial Aneurysms: A Scoping Review.

Joo B

•papers•May 26 2025

Artificial intelligence (AI), particularly deep learning, has demonstrated high diagnostic performance in detecting intracranial aneurysms on computed tomography angiography (CTA) and magnetic resonance angiography (MRA). However, the clinical translation of these technologies remains limited due to methodological limitations and concerns about generalizability. This scoping review comprehensively evaluates 36 studies that applied deep learning to intracranial aneurysm detection on CTA or MRA, focusing on study design, validation strategies, reporting practices, and reference standards. Key findings include inconsistent handling of ruptured and previously treated aneurysms, underreporting of coexisting brain or vascular abnormalities, limited use of external validation, and an almost complete absence of prospective study designs. Only a minority of studies employed diagnostic cohorts that reflect real-world aneurysm prevalence, and few reported all essential performance metrics, such as patient-wise and lesion-wise sensitivity, specificity, and false positives per case. These limitations suggest that current studies remain at the stage of technical validation, with high risks of bias and limited clinical applicability. To facilitate real-world implementation, future research must adopt more rigorous designs, representative and diverse validation cohorts, standardized reporting practices, and greater attention to human-AI interaction.

Mixed Modality Detection Neurological Review In Silico Academic Lab

AI in Orthopedic Research: A Comprehensive Review.

Misir A, Yuce A

•papers•May 26 2025

Artificial intelligence (AI) is revolutionizing orthopedic research and clinical practice by enhancing diagnostic accuracy, optimizing treatment strategies, and streamlining clinical workflows. Recent advances in deep learning have enabled the development of algorithms that detect fractures, grade osteoarthritis, and identify subtle pathologies in radiographic and magnetic resonance images with performance comparable to expert clinicians. These AI-driven systems reduce missed diagnoses and provide objective, reproducible assessments that facilitate early intervention and personalized treatment planning. Moreover, AI has made significant strides in predictive analytics by integrating diverse patient data-including gait and imaging features-to forecast surgical outcomes, implant survivorship, and rehabilitation trajectories. Emerging applications in robotics, augmented reality, digital twin technologies, and exoskeleton control promise to further transform preoperative planning and intraoperative guidance. Despite these promising developments, challenges such as data heterogeneity, algorithmic bias, and the "black box" nature of many models-as well as issues with robust validation-remain. This comprehensive review synthesizes current developments, critically examines limitations, and outlines future directions for integrating AI into musculoskeletal care.

Mixed Modality Detection Musculoskeletal Review In Silico Academic Lab

Applications of artificial intelligence in abdominal imaging.

Gupta A, Rajamohan N, Bansal B, Chaudhri S, Chandarana H, Bagga B

•papers•May 26 2025

The rapid advancements in artificial intelligence (AI) carry the promise to reshape abdominal imaging by offering transformative solutions to challenges in disease detection, classification, and personalized care. AI applications, particularly those leveraging deep learning and radiomics, have demonstrated remarkable accuracy in detecting a wide range of abdominal conditions, including but not limited to diffuse liver parenchymal disease, focal liver lesions, pancreatic ductal adenocarcinoma (PDAC), renal tumors, and bowel pathologies. These models excel in the automation of tasks such as segmentation, classification, and prognostication across modalities like ultrasound, CT, and MRI, often surpassing traditional diagnostic methods. Despite these advancements, widespread adoption remains limited by challenges such as data heterogeneity, lack of multicenter validation, reliance on retrospective single-center studies, and the "black box" nature of many AI models, which hinder interpretability and clinician trust. The absence of standardized imaging protocols and reference gold standards further complicates integration into clinical workflows. To address these barriers, future directions emphasize collaborative multi-center efforts to generate diverse, standardized datasets, integration of explainable AI frameworks to existing picture archiving and communication systems, and the development of automated, end-to-end pipelines capable of processing multi-source data. Targeted clinical applications, such as early detection of PDAC, improved segmentation of renal tumors, and improved risk stratification in liver diseases, show potential to refine diagnostic accuracy and therapeutic planning. Ethical considerations, such as data privacy, regulatory compliance, and interdisciplinary collaboration, are essential for successful translation into clinical practice. AI's transformative potential in abdominal imaging lies not only in complementing radiologists but also in fostering precision medicine by enabling faster, more accurate, and patient-centered care. Overcoming current limitations through innovation and collaboration will be pivotal in realizing AI's full potential to improve patient outcomes and redefine the landscape of abdominal radiology.

Mixed Modality Segmentation Abdominal Review In Silico Academic Lab Ethics

MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation

Chenglong Ma, Yuanfeng Ji, Jin Ye, Zilong Li, Chenhui Wang, Junzhi Ning, Wei Li, Lihao Liu, Qiushan Guo, Tianbin Li, Junjun He, Hongming Shan

•preprint•May 25 2025

Advanced autoregressive models have reshaped multimodal AI. However, their transformative potential in medical imaging remains largely untapped due to the absence of a unified visual tokenizer -- one capable of capturing fine-grained visual structures for faithful image reconstruction and realistic image synthesis, as well as rich semantics for accurate diagnosis and image interpretation. To this end, we present MedITok, the first unified tokenizer tailored for medical images, encoding both low-level structural details and high-level clinical semantics within a unified latent space. To balance these competing objectives, we introduce a novel two-stage training framework: a visual representation alignment stage that cold-starts the tokenizer reconstruction learning with a visual semantic constraint, followed by a textual semantic representation alignment stage that infuses detailed clinical semantics into the latent space. Trained on the meticulously collected large-scale dataset with over 30 million medical images and 2 million image-caption pairs, MedITok achieves state-of-the-art performance on more than 30 datasets across 9 imaging modalities and 4 different tasks. By providing a unified token space for autoregressive modeling, MedITok supports a wide range of tasks in clinical diagnostics and generative healthcare applications. Model and code will be made publicly available at: https://github.com/Masaaki-75/meditok.

Mixed Modality Image Synthesis Methodology In Silico Academic Lab Open Code GenAI

Improving Medical Reasoning with Curriculum-Aware Reinforcement Learning

Shaohao Rui, Kaitao Chen, Weijie Ma, Xiaosong Wang

•preprint•May 25 2025

Recent advances in reinforcement learning with verifiable, rule-based rewards have greatly enhanced the reasoning capabilities and out-of-distribution generalization of VLMs/LLMs, obviating the need for manually crafted reasoning chains. Despite these promising developments in the general domain, their translation to medical imaging remains limited. Current medical reinforcement fine-tuning (RFT) methods predominantly focus on close-ended VQA, thereby restricting the model's ability to engage in world knowledge retrieval and flexible task adaptation. More critically, these methods fall short of addressing the critical clinical demand for open-ended, reasoning-intensive decision-making. To bridge this gap, we introduce \textbf{MedCCO}, the first multimodal reinforcement learning framework tailored for medical VQA that unifies close-ended and open-ended data within a curriculum-driven RFT paradigm. Specifically, MedCCO is initially fine-tuned on a diverse set of close-ended medical VQA tasks to establish domain-grounded reasoning capabilities, and is then progressively adapted to open-ended tasks to foster deeper knowledge enhancement and clinical interpretability. We validate MedCCO across eight challenging medical VQA benchmarks, spanning both close-ended and open-ended settings. Experimental results show that MedCCO consistently enhances performance and generalization, achieving a 11.4\% accuracy gain across three in-domain tasks, and a 5.7\% improvement on five out-of-domain benchmarks. These findings highlight the promise of curriculum-guided RL in advancing robust, clinically-relevant reasoning in medical multimodal language models.

Mixed Modality Classification Methodology In Silico Academic Lab GenAI

PolyPose: Localizing Deformable Anatomy in 3D from Sparse 2D X-ray Images using Polyrigid Transforms

Vivek Gopalakrishnan, Neel Dey, Polina Golland

•preprint•May 25 2025

Determining the 3D pose of a patient from a limited set of 2D X-ray images is a critical task in interventional settings. While preoperative volumetric imaging (e.g., CT and MRI) provides precise 3D localization and visualization of anatomical targets, these modalities cannot be acquired during procedures, where fast 2D imaging (X-ray) is used instead. To integrate volumetric guidance into intraoperative procedures, we present PolyPose, a simple and robust method for deformable 2D/3D registration. PolyPose parameterizes complex 3D deformation fields as a composition of rigid transforms, leveraging the biological constraint that individual bones do not bend in typical motion. Unlike existing methods that either assume no inter-joint movement or fail outright in this under-determined setting, our polyrigid formulation enforces anatomically plausible priors that respect the piecewise rigid nature of human movement. This approach eliminates the need for expensive deformation regularizers that require patient- and procedure-specific hyperparameter optimization. Across extensive experiments on diverse datasets from orthopedic surgery and radiotherapy, we show that this strong inductive bias enables PolyPose to successfully align the patient's preoperative volume to as few as two X-ray images, thereby providing crucial 3D guidance in challenging sparse-view and limited-angle settings where current registration methods fail.

Mixed Modality Registration Musculoskeletal Methodology In Silico Academic Lab

CDPDNet: Integrating Text Guidance with Hybrid Vision Encoders for Medical Image Segmentation

Jiong Wu, Yang Xing, Boxiao Yu, Wei Shao, Kuang Gong

•preprint•May 25 2025

Most publicly available medical segmentation datasets are only partially labeled, with annotations provided for a subset of anatomical structures. When multiple datasets are combined for training, this incomplete annotation poses challenges, as it limits the model's ability to learn shared anatomical representations among datasets. Furthermore, vision-only frameworks often fail to capture complex anatomical relationships and task-specific distinctions, leading to reduced segmentation accuracy and poor generalizability to unseen datasets. In this study, we proposed a novel CLIP-DINO Prompt-Driven Segmentation Network (CDPDNet), which combined a self-supervised vision transformer with CLIP-based text embedding and introduced task-specific text prompts to tackle these challenges. Specifically, the framework was constructed upon a convolutional neural network (CNN) and incorporated DINOv2 to extract both fine-grained and global visual features, which were then fused using a multi-head cross-attention module to overcome the limited long-range modeling capability of CNNs. In addition, CLIP-derived text embeddings were projected into the visual space to help model complex relationships among organs and tumors. To further address the partial label challenge and enhance inter-task discriminative capability, a Text-based Task Prompt Generation (TTPG) module that generated task-specific prompts was designed to guide the segmentation. Extensive experiments on multiple medical imaging datasets demonstrated that CDPDNet consistently outperformed existing state-of-the-art segmentation methods. Code and pretrained model are available at: https://github.com/wujiong-hub/CDPDNet.git.

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code

MSLAU-Net: A Hybird CNN-Transformer Network for Medical Image Segmentation

Libin Lan, Yanxin Li, Xiaojuan Liu, Juan Zhou, Jianxun Zhang, Nannan Huang, Yudong Zhang

•preprint•May 24 2025

Both CNN-based and Transformer-based methods have achieved remarkable success in medical image segmentation tasks. However, CNN-based methods struggle to effectively capture global contextual information due to the inherent limitations of convolution operations. Meanwhile, Transformer-based methods suffer from insufficient local feature modeling and face challenges related to the high computational complexity caused by the self-attention mechanism. To address these limitations, we propose a novel hybrid CNN-Transformer architecture, named MSLAU-Net, which integrates the strengths of both paradigms. The proposed MSLAU-Net incorporates two key ideas. First, it introduces Multi-Scale Linear Attention, designed to efficiently extract multi-scale features from medical images while modeling long-range dependencies with low computational complexity. Second, it adopts a top-down feature aggregation mechanism, which performs multi-level feature aggregation and restores spatial resolution using a lightweight structure. Extensive experiments conducted on benchmark datasets covering three imaging modalities demonstrate that the proposed MSLAU-Net outperforms other state-of-the-art methods on nearly all evaluation metrics, validating the superiority, effectiveness, and robustness of our approach. Our code is available at https://github.com/Monsoon49/MSLAU-Net.

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code

Filter Papers

Tags

Advancements in Medical Image Classification through Fine-Tuning Natural Domain Foundation Models

DeepInverse: A Python package for solving imaging inverse problems with deep learning

Methodological Challenges in Deep Learning-Based Detection of Intracranial Aneurysms: A Scoping Review.

AI in Orthopedic Research: A Comprehensive Review.

Applications of artificial intelligence in abdominal imaging.

MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation

Improving Medical Reasoning with Curriculum-Aware Reinforcement Learning

PolyPose: Localizing Deformable Anatomy in 3D from Sparse 2D X-ray Images using Polyrigid Transforms

CDPDNet: Integrating Text Guidance with Hybrid Vision Encoders for Medical Image Segmentation

MSLAU-Net: A Hybird CNN-Transformer Network for Medical Image Segmentation

Ready to Sharpen Your Edge?