Latest Papers on Radiology AI. Tags: Mixed Modality

Causal Representation Learning from Multimodal Clinical Records under Non-Random Modality Missingness

Zihan Liang, Ziwen Pan, Ruoxuan Xiong

•preprint•Sep 21 2025

Clinical notes contain rich patient information, such as diagnoses or medications, making them valuable for patient representation learning. Recent advances in large language models have further improved the ability to extract meaningful representations from clinical texts. However, clinical notes are often missing. For example, in our analysis of the MIMIC-IV dataset, 24.5% of patients have no available discharge summaries. In such cases, representations can be learned from other modalities such as structured data, chest X-rays, or radiology reports. Yet the availability of these modalities is influenced by clinical decision-making and varies across patients, resulting in modality missing-not-at-random (MMNAR) patterns. We propose a causal representation learning framework that leverages observed data and informative missingness in multimodal clinical records. It consists of: (1) an MMNAR-aware modality fusion component that integrates structured data, imaging, and text while conditioning on missingness patterns to capture patient health and clinician-driven assignment; (2) a modality reconstruction component with contrastive learning to ensure semantic sufficiency in representation learning; and (3) a multitask outcome prediction model with a rectifier that corrects for residual bias from specific modality observation patterns. Comprehensive evaluations across MIMIC-IV and eICU show consistent gains over the strongest baselines, achieving up to 13.8% AUC improvement for hospital readmission and 13.1% for ICU admission.

Mixed Modality Classification Chest Methodology In Silico Benchmark SOTA

MRI annotation using an inversion-based preprocessing for CT model adaptation.

Häntze H, Xu L, Rattunde MN, Donle L, Dorfner FJ, Hering A, Nawabi J, Adams LC, Bressem KK

•papers•Sep 19 2025

Annotating new classes in MRI images is time-consuming. Refining presegmented structures can accelerate this process. Many target classes lacking in MRI are supported by computed tomography (CT) models, but translating MRI to synthetic CT images is challenging. We demonstrate that CT segmentation models can create accurate MRI presegmentations, with or without image inversion. We retrospectively investigated the performance of two CT-trained models on MRI images: a general multiclass model (TotalSegmentator); and a specialized renal tumor model trained in-house. Both models were applied to 100 T1-weighted (T1w) and 100 T2-weighted fat-saturated (T2wfs) MRI sequences from 100 patients (50 male). Segmentation quality was evaluated on both raw and intensity-inverted sequences using Dice similarity coefficients (DSC), with reference annotations comprising manual kidney tumor annotations and automatically generated segmentations for 24 abdominal structures. Segmentation quality varied by MRI sequence and anatomical structure. Both models accurately segmented kidneys in T2wfs sequences without preprocessing (TotalSegmentator DSC 0.60), but TotalSegmentator failed to segment blood vessels and muscles. In T1w sequences, intensity inversion significantly improved TotalSegmentator performance, increasing the mean DSC across 24 structures from 0.04 to 0.56 (p < 0.001). Kidney tumor segmentation demonstrated poor performance in T2wfs sequences regardless of preprocessing. In T1w sequences, inversion improved tumor segmentation DSC from 0.04 to 0.42 (p < 0.001). CT-trained models can generalize to MRI when supported by image augmentation. Inversion preprocessing enabled segmentation of renal cell carcinoma in T1w MRI using a CT-trained model. CT models might be transferable to the MRI domain. CT-trained artificial intelligence models can be adapted for MRI segmentation using simple preprocessing, potentially reducing manual annotation efforts and accelerating the development of AI-assisted tools for MRI analysis in research and future clinical practice. CT segmentation models can create presegmentations for many structures in MRI scans. T1w MRI scans require an additional inversion step before segmenting with a CT model. Results were consistent for a large multiclass model (i.e., TotalSegmentator) and a smaller model for renal cell carcinoma.

Mixed Modality Segmentation Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Leveraging transfer learning from Acute Lymphoblastic Leukemia (ALL) pretraining to enhance Acute Myeloid Leukemia (AML) prediction

Duraiswamy, A., Harris-Birtill, D.

•preprint•Sep 19 2025

We overcome current limitations in Acute Myeloid Leukemia (AML) diagnosis by leveraging a transfer learning approach from Acute Lymphoblastic Leukemia (ALL) classification models, thus addressing the urgent need for more accurate and accessible AML diagnostic tools. AML has poorer prognosis than ALL, with a 5-year relative survival rate of only 17-19% compared to ALL survival rates of up to 75%, making early and accurate detection of AML paramount. Current diagnostic methods, rely heavily on manual microscopic examination, and are often subjective, time-consuming, and can suffer from inter-observer variability. While machine learning has shown promise in cancer classification, its application to AML detection, particularly leveraging the potential of transfer learning from related cancers like Acute Lymphoblastic Leukemia (ALL), remains underexplored. A comprehensive review of state-of-the-art advancements in acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML) classification using deep learning algorithms is undertaken and key approaches are evaluated. The insights gained from this review inform the development of two novel machine learning pipelines designed to benchmark effectiveness of proposed transfer learning approaches. Five pre-trained models are fine-tuned using ALL training data (a novel approach in this context) to optimize their potential for AML classification. The result was the development of a best-in-class (BIC) model that surpasses current state-of-the-art (SOTA) performance in AML classification, advancing the accuracy of machine learning (ML)-driven cancer diagnostics. Author summaryAcute Myeloid Leukemia (AML) is an aggressive cancer with a poor prognosis. Early and accurate diagnosis is critical, but current methods are often subjective and time-consuming. We wanted to create a more accurate diagnostic tool by applying a technique called transfer learning from a similar cancer, Acute Lymphoblastic Leukemia (ALL). Two machine learning pipelines were developed. The first trained five different models on a large AML dataset to establish a baseline. The second pipeline first trained these models on an ALL dataset to "learn" from it before fine-tuning them on the AML data. Our experiments showed that the models that underwent transfer learning process consistently performed better than the models trained on AML data alone. The MobileNetV2 model, in particular, was the best-in-class, outperforming all other models and surpassing the best-reported metrics for AML classification in current literature. Our research demonstrates that transfer learning can enable highly accurate AML diagnostic models. The best-in-class model could potentially be used as a AML diagnostic tool, helping clinicians make faster and more accurate diagnoses, improving patient outcomes.

Mixed Modality Classification Methodology In Silico Benchmark SOTA

Toward Medical Deepfake Detection: A Comprehensive Dataset and Novel Method

Shuaibo Li, Zhaohu Xing, Hongqiu Wang, Pengfei Hao, Xingyu Li, Zekai Liu, Lei Zhu

•preprint•Sep 19 2025

The rapid advancement of generative AI in medical imaging has introduced both significant opportunities and serious challenges, especially the risk that fake medical images could undermine healthcare systems. These synthetic images pose serious risks, such as diagnostic deception, financial fraud, and misinformation. However, research on medical forensics to counter these threats remains limited, and there is a critical lack of comprehensive datasets specifically tailored for this field. Additionally, existing media forensic methods, which are primarily designed for natural or facial images, are inadequate for capturing the distinct characteristics and subtle artifacts of AI-generated medical images. To tackle these challenges, we introduce \textbf{MedForensics}, a large-scale medical forensics dataset encompassing six medical modalities and twelve state-of-the-art medical generative models. We also propose \textbf{DSKI}, a novel \textbf{D}ual-\textbf{S}tage \textbf{K}nowledge \textbf{I}nfusing detector that constructs a vision-language feature space tailored for the detection of AI-generated medical images. DSKI comprises two core components: 1) a cross-domain fine-trace adapter (CDFA) for extracting subtle forgery clues from both spatial and noise domains during training, and 2) a medical forensic retrieval module (MFRM) that boosts detection accuracy through few-shot retrieval during testing. Experimental results demonstrate that DSKI significantly outperforms both existing methods and human experts, achieving superior accuracy across multiple medical modalities.

Mixed Modality Classification Whole Body Dataset Release In Silico Academic Lab Open Dataset

AI-Driven Multimodality Fusion in Cardiac Imaging: Integrating CT, MRI, and Echocardiography for Precision.

Tran HH, Thu A, Twayana AR, Fuertes A, Gonzalez M, Basta M, James M, Mehta KA, Elias D, Figaro YM, Islek D, Frishman WH, Aronow WS

•papers•Sep 19 2025

Artificial intelligence (AI)-enabled multimodal cardiovascular imaging holds significant promise for improving diagnostic accuracy, enhancing risk stratification, and supporting clinical decision-making. However, its translation into routine practice remains limited by multiple technical, infrastructural, and clinical barriers. This review synthesizes current challenges, including variability in image quality, alignment, and acquisition protocols; scarcity of large, annotated multimodality datasets; interoperability limitations across vendors and institutions; clinical skepticism due to limited prospective validation; and substantial development and implementation costs. Drawing from recent advances, we outline future research priorities to bridge the gap between technical feasibility and clinical utility. Key strategies include developing unified, vendor-agnostic AI models resilient to inter-institutional variability; integrating diverse data types such as genomics, wearable biosensors, and longitudinal clinical records; leveraging reinforcement learning for adaptive decision-support systems; and employing longitudinal imaging fusion for disease tracking and predictive analytics. We emphasize the need for rigorous prospective clinical trials, harmonized imaging standards, and collaborative data-sharing frameworks to ensure robust, equitable, and scalable deployment. Addressing these challenges through coordinated multidisciplinary efforts will be essential to realize the full potential of AI-driven multimodal cardiovascular imaging in advancing precision cardiovascular care.

Mixed Modality Registration Cardiac Review Concept Academic Lab Benchmark SOTA

ENSAM: an efficient foundation model for interactive segmentation of 3D medical images

Elias Stenhede, Agnar Martin Bjørnstad, Arian Ranjbar

•preprint•Sep 19 2025

We present ENSAM (Equivariant, Normalized, Segment Anything Model), a lightweight and promptable model for universal 3D medical image segmentation. ENSAM combines a SegResNet-based encoder with a prompt encoder and mask decoder in a U-Net-style architecture, using latent cross-attention, relative positional encoding, normalized attention, and the Muon optimizer for training. ENSAM is designed to achieve good performance under limited data and computational budgets, and is trained from scratch on under 5,000 volumes from multiple modalities (CT, MRI, PET, ultrasound, microscopy) on a single 32 GB GPU in 6 hours. As part of the CVPR 2025 Foundation Models for Interactive 3D Biomedical Image Segmentation Challenge, ENSAM was evaluated on hidden test set with multimodal 3D medical images, obtaining a DSC AUC of 2.404, NSD AUC of 2.266, final DSC of 0.627, and final NSD of 0.597, outperforming two previously published baseline models (VISTA3D, SAM-Med3D) and matching the third (SegVol), surpassing its performance in final DSC but trailing behind in the other three metrics. In the coreset track of the challenge, ENSAM ranks 5th of 10 overall and best among the approaches not utilizing pretrained weights. Ablation studies confirm that our use of relative positional encodings and the Muon optimizer each substantially speed up convergence and improve segmentation quality.

Mixed Modality Segmentation Whole Body Retrospective Clinical In Silico Academic Lab Benchmark SOTA

HybridMamba: A Dual-domain Mamba for 3D Medical Image Segmentation

Weitong Wu, Zhaohu Xing, Jing Gong, Qin Peng, Lei Zhu

•preprint•Sep 18 2025

In the domain of 3D biomedical image segmentation, Mamba exhibits the superior performance for it addresses the limitations in modeling long-range dependencies inherent to CNNs and mitigates the abundant computational overhead associated with Transformer-based frameworks when processing high-resolution medical volumes. However, attaching undue importance to global context modeling may inadvertently compromise critical local structural information, thus leading to boundary ambiguity and regional distortion in segmentation outputs. Therefore, we propose the HybridMamba, an architecture employing dual complementary mechanisms: 1) a feature scanning strategy that progressively integrates representations both axial-traversal and local-adaptive pathways to harmonize the relationship between local and global representations, and 2) a gated module combining spatial-frequency analysis for comprehensive contextual modeling. Besides, we collect a multi-center CT dataset related to lung cancer. Experiments on MRI and CT datasets demonstrate that HybridMamba significantly outperforms the state-of-the-art methods in 3D medical image segmentation.

Mixed Modality Segmentation Chest Methodology In Silico

Deep Learning for Automated Measures of SUV and Molecular Tumor Volume in [68Ga]PSMA-11 or [18F]DCFPyL, [18F]FDG, and [177Lu]Lu-PSMA-617 Imaging with Global Threshold Regional Consensus Network.

Jackson P, Buteau JP, McIntosh L, Sun Y, Kashyap R, Casanueva S, Ravi Kumar AS, Sandhu S, Azad AA, Alipour R, Saghebi J, Kong G, Jewell K, Eifer M, Bollampally N, Hofman MS

•papers•Sep 18 2025

Metastatic castration-resistant prostate cancer has a high rate of mortality with a limited number of effective treatments after hormone therapy. Radiopharmaceutical therapy with [177Lu]Lu-prostate-specific membrane antigen-617 (LuPSMA) is one treatment option; however, response varies and is partly predicted by PSMA expression and metabolic activity, assessed on [68Ga]PSMA-11 or [18F]DCFPyL and [18F]FDG PET, respectively. Automated methods to measure these on PET imaging have previously yielded modest accuracy. Refining computational workflows and standardizing approaches may improve patient selection and prognostication for LuPSMA therapy. Methods: PET/CT and quantitative SPECT/CT images from an institutional cohort of patients staged for LuPSMA therapy were annotated for total disease burden. In total, 676 [68Ga]PSMA-11 or [18F]DCFPyL PET, 390 [18F]FDG PET, and 477 LuPSMA SPECT images were used for development of automated workflow and tested on 56 cases with externally referred PET/CT staging. A segmentation framework, the Global Threshold Regional Consensus Network, was developed based on nnU-Net, with processing refinements to improve boundary definition and overall label accuracy. Results: Using the model to contour disease extent, the mean volumetric Dice similarity coefficient for [68Ga]PSMA-11 or [18F]DCFPyL PET was 0.94, for [18F]FDG PET was 0.84, and for LuPSMA SPECT was 0.97. On external test cases, Dice accuracy was 0.95 and 0.84 on PSMA and FDG PET, respectively. The refined models yielded consistent improvements compared with nnU-Net, with an increase of 3%-5% in Dice accuracy and 10%-17% in surface agreement. Quantitative biomarkers were compared with a human-defined ground truth using the Pearson coefficient, with scores for [68Ga]PSMA-11 or [18F]DCFPyL, [18F]FDG, and LuPSMA, respectively, of 0.98, 0.94, and 0.99 for disease volume; 0.98, 0.88, and 0.99 for SUVmean; 0.96, 0.91, and 0.99 for SUVmax; and 0.97, 0.96, and 0.99 for volume intensity product. Conclusion: Delineation of disease extent and tracer avidity can be performed with a high degree of accuracy using automated deep learning methods. By incorporating threshold-based postprocessing, the tools can closely match the output of manual workflows. Pretrained models and scripts to adapt to institutional data are provided for open use.

Mixed Modality Segmentation Abdominal Methodology In Silico Academic Lab Open Code

A Compound-Eye-Inspired Multi-Scale Neural Architecture with Integrated Attention Mechanisms.

Neri F, Yang M, Xue Y

•papers•Sep 18 2025

In the context of neural system structure modeling and complex visual tasks, the effective integration of multi-scale features and contextual information is critical for enhancing model performance. This paper proposes a biologically inspired hybrid neural network architecture - CompEyeNet - which combines the global modeling capacity of transformers with the efficiency of lightweight convolutional structures. The backbone network, multi-attention transformer backbone network (MATBN), integrates multiple attention mechanisms to collaboratively model local details and long-range dependencies. The neck network, compound eye neck network (CENN), introduces high-resolution feature layers and efficient attention fusion modules to significantly enhance multi-scale information representation and reconstruction capability. CompEyeNet is evaluated on three authoritative medical image segmentation datasets: MICCAI-CVC-ClinicDB, ISIC2018, and MICCAI-tooth-segmentation, demonstrating its superior performance. Experimental results show that compared to models such as Deeplab, Unet, and the YOLO series, CompEyeNet achieves better performance with fewer parameters. Specifically, compared to the baseline model YOLOv11, CompEyeNet reduces the number of parameters by an average of 38.31%. On key performance metrics, the average Dice coefficient improves by 0.87%, the Jaccard index by 1.53%, Precision by 0.58%, and Recall by 1.11%. These findings verify the advantages of the proposed architecture in terms of parameter efficiency and accuracy, highlighting the broad application potential of bio-inspired attention-fusion hybrid neural networks in neural system modeling and image analysis.

Mixed Modality Segmentation Methodology In Silico Benchmark SOTA

ProtoMedX: Towards Explainable Multi-Modal Prototype Learning for Bone Health Classification

Alvaro Lopez Pellicer, Andre Mariucci, Plamen Angelov, Marwan Bukhari, Jemma G. Kerns

•preprint•Sep 18 2025

Bone health studies are crucial in medical practice for the early detection and treatment of Osteopenia and Osteoporosis. Clinicians usually make a diagnosis based on densitometry (DEXA scans) and patient history. The applications of AI in this field are ongoing research. Most successful methods rely on deep learning models that use vision alone (DEXA/X-ray imagery) and focus on prediction accuracy, while explainability is often disregarded and left to post hoc assessments of input contributions. We propose ProtoMedX, a multi-modal model that uses both DEXA scans of the lumbar spine and patient records. ProtoMedX's prototype-based architecture is explainable by design, which is crucial for medical applications, especially in the context of the upcoming EU AI Act, as it allows explicit analysis of model decisions, including incorrect ones. ProtoMedX demonstrates state-of-the-art performance in bone health classification while also providing explanations that can be visually understood by clinicians. Using a dataset of 4,160 real NHS patients, the proposed ProtoMedX achieves 87.58% accuracy in vision-only tasks and 89.8% in its multi-modal variant, both surpassing existing published methods.

Mixed Modality Classification Musculoskeletal Methodology In Silico GenAI

Filter Papers

Tags

Causal Representation Learning from Multimodal Clinical Records under Non-Random Modality Missingness

MRI annotation using an inversion-based preprocessing for CT model adaptation.

Leveraging transfer learning from Acute Lymphoblastic Leukemia (ALL) pretraining to enhance Acute Myeloid Leukemia (AML) prediction

Toward Medical Deepfake Detection: A Comprehensive Dataset and Novel Method

AI-Driven Multimodality Fusion in Cardiac Imaging: Integrating CT, MRI, and Echocardiography for Precision.

ENSAM: an efficient foundation model for interactive segmentation of 3D medical images

HybridMamba: A Dual-domain Mamba for 3D Medical Image Segmentation

Deep Learning for Automated Measures of SUV and Molecular Tumor Volume in [<sup>68</sup>Ga]PSMA-11 or [<sup>18</sup>F]DCFPyL, [<sup>18</sup>F]FDG, and [<sup>177</sup>Lu]Lu-PSMA-617 Imaging with Global Threshold Regional Consensus Network.

A Compound-Eye-Inspired Multi-Scale Neural Architecture with Integrated Attention Mechanisms.

ProtoMedX: Towards Explainable Multi-Modal Prototype Learning for Bone Health Classification

Ready to Sharpen Your Edge?