Sort by:
Page 12 of 1041035 results

MAISI-v2: Accelerated 3D High-Resolution Medical Image Synthesis with Rectified Flow and Region-specific Contrastive Loss

Can Zhao, Pengfei Guo, Dong Yang, Yucheng Tang, Yufan He, Benjamin Simon, Mason Belue, Stephanie Harmon, Baris Turkbey, Daguang Xu

arxiv logopreprintAug 7 2025
Medical image synthesis is an important topic for both clinical and research applications. Recently, diffusion models have become a leading approach in this area. Despite their strengths, many existing methods struggle with (1) limited generalizability that only work for specific body regions or voxel spacings, (2) slow inference, which is a common issue for diffusion models, and (3) weak alignment with input conditions, which is a critical issue for medical imaging. MAISI, a previously proposed framework, addresses generalizability issues but still suffers from slow inference and limited condition consistency. In this work, we present MAISI-v2, the first accelerated 3D medical image synthesis framework that integrates rectified flow to enable fast and high quality generation. To further enhance condition fidelity, we introduce a novel region-specific contrastive loss to enhance the sensitivity to region of interest. Our experiments show that MAISI-v2 can achieve SOTA image quality with $33 \times$ acceleration for latent diffusion model. We also conducted a downstream segmentation experiment to show that the synthetic images can be used for data augmentation. We release our code, training details, model weights, and a GUI demo to facilitate reproducibility and promote further development within the community.

Unsupervised learning for inverse problems in computed tomography

Laura Hellwege, Johann Christopher Engster, Moritz Schaar, Thorsten M. Buzug, Maik Stille

arxiv logopreprintAug 7 2025
This study presents an unsupervised deep learning approach for computed tomography (CT) image reconstruction, leveraging the inherent similarities between deep neural network training and conventional iterative reconstruction methods. By incorporating forward and backward projection layers within the deep learning framework, we demonstrate the feasibility of reconstructing images from projection data without relying on ground-truth images. Our method is evaluated on the two-dimensional 2DeteCT dataset, showcasing superior performance in terms of mean squared error (MSE) and structural similarity index (SSIM) compared to traditional filtered backprojection (FBP) and maximum likelihood (ML) reconstruction techniques. Additionally, our approach significantly reduces reconstruction time, making it a promising alternative for real-time medical imaging applications. Future work will focus on extending this methodology to three-dimensional reconstructions and enhancing the adaptability of the projection geometry.

UltimateSynth: MRI Physics for Pan-Contrast AI

Adams, R., Huynh, K. M., Zhao, W., Hu, S., Lyu, W., Ahmad, S., Ma, D., Yap, P.-T.

biorxiv logopreprintAug 7 2025
Magnetic resonance imaging (MRI) is commonly used in healthcare for its ability to generate diverse tissue contrasts without ionizing radiation. However, this flexibility complicates downstream analysis, as computational tools are often tailored to specific types of MRI and lack generalizability across the full spectrum of scans used in healthcare. Here, we introduce a versatile framework for the development and validation of AI models that can robustly process and analyze the full spectrum of scans achievable with MRI, enabling model deployment across scanner models, scan sequences, and age groups. Core to our framework is UltimateSynth, a technology that combines tissue physiology and MR physics in synthesizing realistic images across a comprehensive range of meaningful contrasts. This pan-contrast capability bolsters the AI development life cycle through efficient data labeling, generalizable model training, and thorough performance benchmarking. We showcase the effectiveness of UltimateSynth by training an off-the-shelf U-Net to generalize anatomical segmentation across any MR contrast. The U-Net yields highly robust tissue volume estimates, with variability under 4% across 150,000 unique-contrast images, 3.8% across 2,000+ low-field 0.3T scans, and 3.5% across 8,000+ images spanning the human lifespan from ages 0 to 100.

MoMA: A Mixture-of-Multimodal-Agents Architecture for Enhancing Clinical Prediction Modelling

Jifan Gao, Mahmudur Rahman, John Caskey, Madeline Oguss, Ann O'Rourke, Randy Brown, Anne Stey, Anoop Mayampurath, Matthew M. Churpek, Guanhua Chen, Majid Afshar

arxiv logopreprintAug 7 2025
Multimodal electronic health record (EHR) data provide richer, complementary insights into patient health compared to single-modality data. However, effectively integrating diverse data modalities for clinical prediction modeling remains challenging due to the substantial data requirements. We introduce a novel architecture, Mixture-of-Multimodal-Agents (MoMA), designed to leverage multiple large language model (LLM) agents for clinical prediction tasks using multimodal EHR data. MoMA employs specialized LLM agents ("specialist agents") to convert non-textual modalities, such as medical images and laboratory results, into structured textual summaries. These summaries, together with clinical notes, are combined by another LLM ("aggregator agent") to generate a unified multimodal summary, which is then used by a third LLM ("predictor agent") to produce clinical predictions. Evaluating MoMA on three prediction tasks using real-world datasets with different modality combinations and prediction settings, MoMA outperforms current state-of-the-art methods, highlighting its enhanced accuracy and flexibility across various tasks.

MM2CT: MR-to-CT translation for multi-modal image fusion with mamba

Chaohui Gong, Zhiying Wu, Zisheng Huang, Gaofeng Meng, Zhen Lei, Hongbin Liu

arxiv logopreprintAug 7 2025
Magnetic resonance (MR)-to-computed tomography (CT) translation offers significant advantages, including the elimination of radiation exposure associated with CT scans and the mitigation of imaging artifacts caused by patient motion. The existing approaches are based on single-modality MR-to-CT translation, with limited research exploring multimodal fusion. To address this limitation, we introduce Multi-modal MR to CT (MM2CT) translation method by leveraging multimodal T1- and T2-weighted MRI data, an innovative Mamba-based framework for multi-modal medical image synthesis. Mamba effectively overcomes the limited local receptive field in CNNs and the high computational complexity issues in Transformers. MM2CT leverages this advantage to maintain long-range dependencies modeling capabilities while achieving multi-modal MR feature integration. Additionally, we incorporate a dynamic local convolution module and a dynamic enhancement module to improve MRI-to-CT synthesis. The experiments on a public pelvis dataset demonstrate that MM2CT achieves state-of-the-art performance in terms of Structural Similarity Index Measure (SSIM) and Peak Signal-to-Noise Ratio (PSNR). Our code is publicly available at https://github.com/Gots-ch/MM2CT.

Unsupervised learning for inverse problems in computed tomography

Laura Hellwege, Johann Christopher Engster, Moritz Schaar, Thorsten M. Buzug, Maik Stille

arxiv logopreprintAug 7 2025
This study presents an unsupervised deep learning approach for computed tomography (CT) image reconstruction, leveraging the inherent similarities between deep neural network training and conventional iterative reconstruction methods. By incorporating forward and backward projection layers within the deep learning framework, we demonstrate the feasibility of reconstructing images from projection data without relying on ground-truth images. Our method is evaluated on the two-dimensional 2DeteCT dataset, showcasing superior performance in terms of mean squared error (MSE) and structural similarity index (SSIM) compared to traditional filtered backprojection (FBP) and maximum likelihood (ML) reconstruction techniques. Additionally, our approach significantly reduces reconstruction time, making it a promising alternative for real-time medical imaging applications. Future work will focus on extending this methodology to three-dimensional reconstructions and enhancing the adaptability of the projection geometry.

Beyond Pixels: Medical Image Quality Assessment with Implicit Neural Representations

Caner Özer, Patryk Rygiel, Bram de Wilde, İlkay Öksüz, Jelmer M. Wolterink

arxiv logopreprintAug 7 2025
Artifacts pose a significant challenge in medical imaging, impacting diagnostic accuracy and downstream analysis. While image-based approaches for detecting artifacts can be effective, they often rely on preprocessing methods that can lead to information loss and high-memory-demand medical images, thereby limiting the scalability of classification models. In this work, we propose the use of implicit neural representations (INRs) for image quality assessment. INRs provide a compact and continuous representation of medical images, naturally handling variations in resolution and image size while reducing memory overhead. We develop deep weight space networks, graph neural networks, and relational attention transformers that operate on INRs to achieve image quality assessment. Our method is evaluated on the ACDC dataset with synthetically generated artifact patterns, demonstrating its effectiveness in assessing image quality while achieving similar performance with fewer parameters.

MedMambaLite: Hardware-Aware Mamba for Medical Image Classification

Romina Aalishah, Mozhgan Navardi, Tinoosh Mohsenin

arxiv logopreprintAug 7 2025
AI-powered medical devices have driven the need for real-time, on-device inference such as biomedical image classification. Deployment of deep learning models at the edge is now used for applications such as anomaly detection and classification in medical images. However, achieving this level of performance on edge devices remains challenging due to limitations in model size and computational capacity. To address this, we present MedMambaLite, a hardware-aware Mamba-based model optimized through knowledge distillation for medical image classification. We start with a powerful MedMamba model, integrating a Mamba structure for efficient feature extraction in medical imaging. We make the model lighter and faster in training and inference by modifying and reducing the redundancies in the architecture. We then distill its knowledge into a smaller student model by reducing the embedding dimensions. The optimized model achieves 94.5% overall accuracy on 10 MedMNIST datasets. It also reduces parameters 22.8x compared to MedMamba. Deployment on an NVIDIA Jetson Orin Nano achieves 35.6 GOPS/J energy per inference. This outperforms MedMamba by 63% improvement in energy per inference.

CT-GRAPH: Hierarchical Graph Attention Network for Anatomy-Guided CT Report Generation

Hamza Kalisch, Fabian Hörst, Jens Kleesiek, Ken Herrmann, Constantin Seibold

arxiv logopreprintAug 7 2025
As medical imaging is central to diagnostic processes, automating the generation of radiology reports has become increasingly relevant to assist radiologists with their heavy workloads. Most current methods rely solely on global image features, failing to capture fine-grained organ relationships crucial for accurate reporting. To this end, we propose CT-GRAPH, a hierarchical graph attention network that explicitly models radiological knowledge by structuring anatomical regions into a graph, linking fine-grained organ features to coarser anatomical systems and a global patient context. Our method leverages pretrained 3D medical feature encoders to obtain global and organ-level features by utilizing anatomical masks. These features are further refined within the graph and then integrated into a large language model to generate detailed medical reports. We evaluate our approach for the task of report generation on the large-scale chest CT dataset CT-RATE. We provide an in-depth analysis of pretrained feature encoders for CT report generation and show that our method achieves a substantial improvement of absolute 7.9\% in F1 score over current state-of-the-art methods. The code is publicly available at https://github.com/hakal104/CT-GRAPH.

RegionMed-CLIP: A Region-Aware Multimodal Contrastive Learning Pre-trained Model for Medical Image Understanding

Tianchen Fang, Guiru Liu

arxiv logopreprintAug 7 2025
Medical image understanding plays a crucial role in enabling automated diagnosis and data-driven clinical decision support. However, its progress is impeded by two primary challenges: the limited availability of high-quality annotated medical data and an overreliance on global image features, which often miss subtle but clinically significant pathological regions. To address these issues, we introduce RegionMed-CLIP, a region-aware multimodal contrastive learning framework that explicitly incorporates localized pathological signals along with holistic semantic representations. The core of our method is an innovative region-of-interest (ROI) processor that adaptively integrates fine-grained regional features with the global context, supported by a progressive training strategy that enhances hierarchical multimodal alignment. To enable large-scale region-level representation learning, we construct MedRegion-500k, a comprehensive medical image-text corpus that features extensive regional annotations and multilevel clinical descriptions. Extensive experiments on image-text retrieval, zero-shot classification, and visual question answering tasks demonstrate that RegionMed-CLIP consistently exceeds state-of-the-art vision language models by a wide margin. Our results highlight the critical importance of region-aware contrastive pre-training and position RegionMed-CLIP as a robust foundation for advancing multimodal medical image understanding.
Page 12 of 1041035 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.