Latest Papers on Radiology AI. Tags: Mixed Modality, Order: Best Match, Limit: 10.

MedMambaLite: Hardware-Aware Mamba for Medical Image Classification

Romina Aalishah, Mozhgan Navardi, Tinoosh Mohsenin

•preprint•Aug 7 2025

AI-powered medical devices have driven the need for real-time, on-device inference such as biomedical image classification. Deployment of deep learning models at the edge is now used for applications such as anomaly detection and classification in medical images. However, achieving this level of performance on edge devices remains challenging due to limitations in model size and computational capacity. To address this, we present MedMambaLite, a hardware-aware Mamba-based model optimized through knowledge distillation for medical image classification. We start with a powerful MedMamba model, integrating a Mamba structure for efficient feature extraction in medical imaging. We make the model lighter and faster in training and inference by modifying and reducing the redundancies in the architecture. We then distill its knowledge into a smaller student model by reducing the embedding dimensions. The optimized model achieves 94.5% overall accuracy on 10 MedMNIST datasets. It also reduces parameters 22.8x compared to MedMamba. Deployment on an NVIDIA Jetson Orin Nano achieves 35.6 GOPS/J energy per inference. This outperforms MedMamba by 63% improvement in energy per inference.

Mixed Modality Classification Methodology In Silico Academic Lab Benchmark SOTA

Quantum annealing feature selection on light-weight medical image datasets.

Nau MA, Nutricati LA, Camino B, Warburton PA, Maier AK

•papers•Aug 7 2025

We investigate the use of quantum computing algorithms on real quantum hardware to tackle the computationally intensive task of feature selection for light-weight medical image datasets. Feature selection is often formulated as a k of n selection problem, where the complexity grows binomially with increasing k and n. Quantum computers, particularly quantum annealers, are well-suited for such problems, which may offer advantages under certain problem formulations. We present a method to solve larger feature selection instances than previously demonstrated on commercial quantum annealers. Our approach combines a linear Ising penalty mechanism with subsampling and thresholding techniques to enhance scalability. The method is tested in a toy problem where feature selection identifies pixel masks used to reconstruct small-scale medical images. We compare our approach against a range of feature selection strategies, including randomized baselines, classical supervised and unsupervised methods, combinatorial optimization via classical and quantum solvers, and learning-based feature representations. The results indicate that quantum annealing-based feature selection is effective for this simplified use case, demonstrating its potential in high-dimensional optimization tasks. However, its applicability to broader, real-world problems remains uncertain, given the current limitations of quantum computing hardware. While learned feature representations such as autoencoders achieve superior reconstruction performance, they do not offer the same level of interpretability or direct control over input feature selection as our approach.

Mixed Modality Classification Methodology In Silico Academic Lab Breakthrough

A Multimodal Deep Learning Ensemble Framework for Building a Spine Surgery Triage System.

Siavashpour M, McCabe E, Nataraj A, Pareek N, Zaiane O, Gross D

•papers•Aug 7 2025

Spinal radiology reports and physician-completed questionnaires serve as crucial resources for medical decision-making for patients experiencing low back and neck pain. However, due to the time-consuming nature of this process, individuals with severe conditions may experience a deterioration in their health before receiving professional care. In this work, we propose an ensemble framework built on top of pre-trained BERT-based models which can classify patients on their need for surgery given their different data modalities including radiology reports and questionnaires. Our results demonstrate that our approach exceeds previous studies, effectively integrating information from multiple data modalities and serving as a valuable tool to assist physicians in decision making.

Mixed Modality Triage Musculoskeletal Methodology In Silico

Structured Report Generation for Breast Cancer Imaging Based on Large Language Modeling: A Comparative Analysis of GPT-4 and DeepSeek.

Chen K, Hou X, Li X, Xu W, Yi H

•papers•Aug 7 2025

The purpose of this study is to compare the performance of GPT-4 and DeepSeek large language models in generating structured breast cancer multimodality imaging integrated reports from free-text radiology reports including mammography, ultrasound, MRI, and PET/CT. A retrospective analysis was conducted on 1358 free-text reports from 501 breast cancer patients across two institutions. The study design involved synthesizing multimodal imaging data into structured reports with three components: primary lesion characteristics, metastatic lesions, and TNM staging. Input prompts were standardized for both models, with GPT-4 using predesigned instructions and DeepSeek requiring manual input. Reports were evaluated based on physician satisfaction using a Likert scale, descriptive accuracy including lesion localization, size, SUV, and metastasis assessment, and TNM staging correctness according to NCCN guidelines. Statistical analysis included McNemar tests for binary outcomes and correlation analysis for multiclass comparisons with a significance threshold of P < .05. Physician satisfaction scores showed strong correlation between models with r-values of 0.665 and 0.558 and P-values below .001. Both models demonstrated high accuracy in data extraction and integration. The mean accuracy for primary lesion features was 91.7% for GPT-4% and 92.1% for DeepSeek, while feature synthesis accuracy was 93.4% for GPT4 and 93.9% for DeepSeek. Metastatic lesion identification showed comparable overall accuracy at 93.5% for GPT4 and 94.4% for DeepSeek. GPT-4 performed better in pleural lesion detection with 94.9% accuracy compared to 79.5% for DeepSeek, whereas DeepSeek achieved higher accuracy in mesenteric metastasis identification at 87.5% vs 43.8% for GPT4. TNM staging accuracy exceeded 92% for T-stage and 94% for M-stage, with N-stage accuracy improving beyond 90% when supplemented with physical exam data. Both GPT-4 and DeepSeek effectively generate structured breast cancer imaging reports with high accuracy in data mining, integration, and TNM staging. Integrating these models into clinical practice is expected to enhance report standardization and physician productivity.

Mixed Modality LLM Radiology Report Breast Retrospective Clinical In Silico Academic Lab GenAI

MLAgg-UNet: Advancing Medical Image Segmentation with Efficient Transformer and Mamba-Inspired Multi-Scale Sequence.

Jiang J, Lei S, Li H, Sun Y

•papers•Aug 7 2025

Transformers and state space sequence models (SSMs) have attracted interest in biomedical image segmentation for their ability to capture long-range dependency. However, traditional visual state space (VSS) methods suffer from the incompatibility of image tokens with autoregressive assumption. Although Transformer attention does not require this assumption, its high computational cost limits effective channel-wise information utilization. To overcome these limitations, we propose the Mamba-Like Aggregated UNet (MLAgg-UNet), which introduces Mamba-inspired mechanism to enrich Transformer channel representation and exploit implicit autoregressive characteristic within U-shaped architecture. For establishing dependencies among image tokens in single scale, the Mamba-Like Aggregated Attention (MLAgg) block is designed to balance representational ability and computational efficiency. Inspired by the human foveal vision system, Mamba macro-structure, and differential attention, MLAgg block can slide its focus over each image token, suppress irrelevant tokens, and simultaneously strengthen channel-wise information utilization. Moreover, leveraging causal relationships between consecutive low-level and high-level features in U-shaped architecture, we propose the Multi-Scale Mamba Module with Implicit Causality (MSMM) to optimize complementary information across scales. Embedded within skip connections, this module enhances semantic consistency between encoder and decoder features. Extensive experiments on four benchmark datasets, including AbdomenMRI, ACDC, BTCV, and EndoVis17, which cover MRI, CT, and endoscopy modalities, demonstrate that the proposed MLAgg-UNet consistently outperforms state-of-the-art CNN-based, Transformer-based, and Mamba-based methods. Specifically, it achieves improvements of at least 1.24%, 0.20%, 0.33%, and 0.39% in DSC scores on these datasets, respectively. These results highlight the model's ability to effectively capture feature correlations and integrate complementary multi-scale information, providing a robust solution for medical image segmentation. The implementation is publicly available at https://github.com/aticejiang/MLAgg-UNet.

Mixed Modality Segmentation Abdominal Methodology In Silico Academic Lab Open Code

Memory-enhanced and multi-domain learning-based deep unrolling network for medical image reconstruction.

Jiang H, Zhang Q, Hu Y, Jin Y, Liu H, Chen Z, Yumo Z, Fan W, Zheng HR, Liang D, Hu Z

•papers•Aug 7 2025

Reconstructing high-quality images from corrupted measurements remains a fundamental challenge in medical imaging. Recently, deep unrolling (DUN) methods have emerged as a promising solution, combining the interpretability of traditional iterative algorithms with the powerful representation capabilities of deep learning. However, their performance is often limited by weak information flow between iterative stages and a constrained ability to capture global features across stages-limitations that tend to worsen as the number of iterations increases.Approach: In this work, we propose a memory-enhanced and multi-domain learning-based deep unrolling network for interpretable, high-fidelity medical image reconstruction. First, a memory-enhanced module is designed to adaptively integrate historical outputs across stages, reducing information loss. Second, we introduce a cross-stage spatial-domain learning transformer (CS-SLFormer) to extract both local and non-local features within and across stages, improving reconstruction performance. Finally, a frequency-domain consistency learning (FDCL) module ensures alignment between reconstructed and ground truth images in the frequency domain, recovering fine image details.Main Results: Comprehensive experiments evaluated on three representative medical imaging modalities (PET, MRI, and CT) show that the proposed method consistently outperforms state-of-the-art (SOTA) approaches in both quantitative metrics and visual quality. Specifically, our method achieved a PSNR of 37.835 dB and an SSIM of 0.970 in 1 $\%$ dose PET reconstruction.Significance: This study expands the use of model-driven deep learning in medical imaging, demonstrating the potential of memory-enhanced deep unrolling frameworks for high-quality reconstructions.

Mixed Modality Reconstruction Methodology In Silico Academic Lab Benchmark SOTA

MoMA: A Mixture-of-Multimodal-Agents Architecture for Enhancing Clinical Prediction Modelling

Jifan Gao, Mahmudur Rahman, John Caskey, Madeline Oguss, Ann O'Rourke, Randy Brown, Anne Stey, Anoop Mayampurath, Matthew M. Churpek, Guanhua Chen, Majid Afshar

•preprint•Aug 7 2025

Multimodal electronic health record (EHR) data provide richer, complementary insights into patient health compared to single-modality data. However, effectively integrating diverse data modalities for clinical prediction modeling remains challenging due to the substantial data requirements. We introduce a novel architecture, Mixture-of-Multimodal-Agents (MoMA), designed to leverage multiple large language model (LLM) agents for clinical prediction tasks using multimodal EHR data. MoMA employs specialized LLM agents ("specialist agents") to convert non-textual modalities, such as medical images and laboratory results, into structured textual summaries. These summaries, together with clinical notes, are combined by another LLM ("aggregator agent") to generate a unified multimodal summary, which is then used by a third LLM ("predictor agent") to produce clinical predictions. Evaluating MoMA on three prediction tasks using real-world datasets with different modality combinations and prediction settings, MoMA outperforms current state-of-the-art methods, highlighting its enhanced accuracy and flexibility across various tasks.

Mixed Modality Classification Methodology In Silico Academic Lab GenAI

MM2CT: MR-to-CT translation for multi-modal image fusion with mamba

Chaohui Gong, Zhiying Wu, Zisheng Huang, Gaofeng Meng, Zhen Lei, Hongbin Liu

•preprint•Aug 7 2025

Magnetic resonance (MR)-to-computed tomography (CT) translation offers significant advantages, including the elimination of radiation exposure associated with CT scans and the mitigation of imaging artifacts caused by patient motion. The existing approaches are based on single-modality MR-to-CT translation, with limited research exploring multimodal fusion. To address this limitation, we introduce Multi-modal MR to CT (MM2CT) translation method by leveraging multimodal T1- and T2-weighted MRI data, an innovative Mamba-based framework for multi-modal medical image synthesis. Mamba effectively overcomes the limited local receptive field in CNNs and the high computational complexity issues in Transformers. MM2CT leverages this advantage to maintain long-range dependencies modeling capabilities while achieving multi-modal MR feature integration. Additionally, we incorporate a dynamic local convolution module and a dynamic enhancement module to improve MRI-to-CT synthesis. The experiments on a public pelvis dataset demonstrate that MM2CT achieves state-of-the-art performance in terms of Structural Similarity Index Measure (SSIM) and Peak Signal-to-Noise Ratio (PSNR). Our code is publicly available at https://github.com/Gots-ch/MM2CT.

Mixed Modality Image Synthesis Abdominal Methodology In Silico Academic Lab Open Code Benchmark SOTA

MAISI-v2: Accelerated 3D High-Resolution Medical Image Synthesis with Rectified Flow and Region-specific Contrastive Loss

Can Zhao, Pengfei Guo, Dong Yang, Yucheng Tang, Yufan He, Benjamin Simon, Mason Belue, Stephanie Harmon, Baris Turkbey, Daguang Xu

•preprint•Aug 7 2025

Medical image synthesis is an important topic for both clinical and research applications. Recently, diffusion models have become a leading approach in this area. Despite their strengths, many existing methods struggle with (1) limited generalizability that only work for specific body regions or voxel spacings, (2) slow inference, which is a common issue for diffusion models, and (3) weak alignment with input conditions, which is a critical issue for medical imaging. MAISI, a previously proposed framework, addresses generalizability issues but still suffers from slow inference and limited condition consistency. In this work, we present MAISI-v2, the first accelerated 3D medical image synthesis framework that integrates rectified flow to enable fast and high quality generation. To further enhance condition fidelity, we introduce a novel region-specific contrastive loss to enhance the sensitivity to region of interest. Our experiments show that MAISI-v2 can achieve SOTA image quality with $33 \times$ acceleration for latent diffusion model. We also conducted a downstream segmentation experiment to show that the synthetic images can be used for data augmentation. We release our code, training details, model weights, and a GUI demo to facilitate reproducibility and promote further development within the community.

Mixed Modality Image Synthesis Whole Body Methodology In Silico Academic Lab Open Code Reproducibility

Generative Artificial Intelligence in Medical Imaging: Foundations, Progress, and Clinical Translation

Xuanru Zhou, Cheng Li, Shuqiang Wang, Ye Li, Tao Tan, Hairong Zheng, Shanshan Wang

•preprint•Aug 7 2025

Generative artificial intelligence (AI) is rapidly transforming medical imaging by enabling capabilities such as data synthesis, image enhancement, modality translation, and spatiotemporal modeling. This review presents a comprehensive and forward-looking synthesis of recent advances in generative modeling including generative adversarial networks (GANs), variational autoencoders (VAEs), diffusion models, and emerging multimodal foundation architectures and evaluates their expanding roles across the clinical imaging continuum. We systematically examine how generative AI contributes to key stages of the imaging workflow, from acquisition and reconstruction to cross-modality synthesis, diagnostic support, and treatment planning. Emphasis is placed on both retrospective and prospective clinical scenarios, where generative models help address longstanding challenges such as data scarcity, standardization, and integration across modalities. To promote rigorous benchmarking and translational readiness, we propose a three-tiered evaluation framework encompassing pixel-level fidelity, feature-level realism, and task-level clinical relevance. We also identify critical obstacles to real-world deployment, including generalization under domain shift, hallucination risk, data privacy concerns, and regulatory hurdles. Finally, we explore the convergence of generative AI with large-scale foundation models, highlighting how this synergy may enable the next generation of scalable, reliable, and clinically integrated imaging systems. By charting technical progress and translational pathways, this review aims to guide future research and foster interdisciplinary collaboration at the intersection of AI, medicine, and biomedical engineering.

Mixed Modality Image Synthesis Whole Body Review Concept Academic Lab GenAI Benchmark SOTA Policy

MedMambaLite: Hardware-Aware Mamba for Medical Image Classification

Quantum annealing feature selection on light-weight medical image datasets.

A Multimodal Deep Learning Ensemble Framework for Building a Spine Surgery Triage System.

Structured Report Generation for Breast Cancer Imaging Based on Large Language Modeling: A Comparative Analysis of GPT-4 and DeepSeek.

MLAgg-UNet: Advancing Medical Image Segmentation with Efficient Transformer and Mamba-Inspired Multi-Scale Sequence.

Memory-enhanced and multi-domain learning-based deep unrolling network for medical image reconstruction.

MoMA: A Mixture-of-Multimodal-Agents Architecture for Enhancing Clinical Prediction Modelling

MM2CT: MR-to-CT translation for multi-modal image fusion with mamba

MAISI-v2: Accelerated 3D High-Resolution Medical Image Synthesis with Rectified Flow and Region-specific Contrastive Loss

Generative Artificial Intelligence in Medical Imaging: Foundations, Progress, and Clinical Translation

Ready to Sharpen Your Edge?