Latest Papers on Radiology AI. Tags: Mixed Modality

Mitigating Data Bias in Healthcare AI with Self-Supervised Standardization.

Lan G, Zhu Y, Xiao S, Iqbal M, Yang J

•papers•Jul 23 2025

The rapid advancement of artificial intelligence (AI) in healthcare has accelerated innovations in medical algorithms, yet its broader adoption faces critical ethical and technical barriers. A key challenge lies in algorithmic bias stemming from heterogeneous medical data across institutions, equipment, and workflows, which may perpetuate disparities in AI-driven diagnoses and exacerbate inequities in patient care. While AI's ability to extract deep features from large-scale data offers transformative potential, its effectiveness heavily depends on standardized, high-quality datasets. Current standardization gaps not only limit model generalizability but also raise concerns about reliability and fairness in real-world clinical settings, particularly for marginalized populations. Addressing these urgent issues, this paper proposes an ethical AI framework centered on a novel self-supervised medical image standardization method. By integrating self-supervised image style conversion, channel attention mechanisms, and contrastive learning-based loss functions, our approach enhances structural and style consistency in diverse datasets while preserving patient privacy through decentralized learning paradigms. Experiments across multi-institutional medical image datasets demonstrate that our method significantly improves AI generalizability without requiring centralized data sharing. By bridging the data standardization gap, this work advances technical foundations for trustworthy AI in healthcare.

Mixed Modality Image Synthesis Methodology In Silico Academic Lab Ethics Reproducibility

Supervised versus unsupervised GAN for pseudo-CT synthesis in brain MR-guided radiotherapy.

Kermani MZ, Tavakoli MB, Khorasani A, Abedi I, Sadeghi V, Amouheidari A

•papers•Jul 22 2025

Radiotherapy is a crucial treatment for brain tumor malignancies. To address the limitations of CT-based treatment planning, recent research has explored MR-only radiotherapy, requiring precise MR-to-CT synthesis. This study compares two deep learning approaches, supervised (Pix2Pix) and unsupervised (CycleGAN), for generating pseudo-CT (pCT) images from T1- and T2-weighted MR sequences. 3270 paired T1- and T2-weighted MRI images were collected and registered with corresponding CT images. After preprocessing, a supervised pCT generative model was trained using the Pix2Pix framework, and an unsupervised generative network (CycleGAN) was also trained to enable a comparative assessment of pCT quality relative to the Pix2Pix model. To assess differences between pCT and reference CT images, three key metrics (SSIM, PSNR, and MAE) were used. Additionally, a dosimetric evaluation was performed on selected cases to assess clinical relevance. The average SSIM, PSNR, and MAE for Pix2Pix on T1 images were 0.964 ± 0.03, 32.812 ± 5.21, and 79.681 ± 9.52 HU, respectively. Statistical analysis revealed that Pix2Pix significantly outperformed CycleGAN in generating high-fidelity pCT images (p < 0.05). There was no notable difference in the effectiveness of T1-weighted versus T2-weighted MR images for generating pCT (p > 0.05). Dosimetric evaluation confirmed comparable dose distributions between pCT and reference CT, supporting clinical feasibility. Both supervised and unsupervised methods demonstrated the capability to generate accurate pCT images from conventional T1- and T2-weighted MR sequences. While supervised methods like Pix2Pix achieve higher accuracy, unsupervised approaches such as CycleGAN offer greater flexibility by eliminating the need for paired training data, making them suitable for applications where paired data is unavailable.

Mixed Modality Image Synthesis Neurological Retrospective Clinical In Silico

MLRU++: Multiscale Lightweight Residual UNETR++ with Attention for Efficient 3D Medical Image Segmentation

Nand Kumar Yadav, Rodrigue Rizk, William CW Chen, KC

•preprint•Jul 22 2025

Accurate and efficient medical image segmentation is crucial but challenging due to anatomical variability and high computational demands on volumetric data. Recent hybrid CNN-Transformer architectures achieve state-of-the-art results but add significant complexity. In this paper, we propose MLRU++, a Multiscale Lightweight Residual UNETR++ architecture designed to balance segmentation accuracy and computational efficiency. It introduces two key innovations: a Lightweight Channel and Bottleneck Attention Module (LCBAM) that enhances contextual feature encoding with minimal overhead, and a Multiscale Bottleneck Block (M2B) in the decoder that captures fine-grained details via multi-resolution feature aggregation. Experiments on four publicly available benchmark datasets (Synapse, BTCV, ACDC, and Decathlon Lung) demonstrate that MLRU++ achieves state-of-the-art performance, with average Dice scores of 87.57% (Synapse), 93.00% (ACDC), and 81.12% (Lung). Compared to existing leading models, MLRU++ improves Dice scores by 5.38% and 2.12% on Synapse and ACDC, respectively, while significantly reducing parameter count and computational cost. Ablation studies evaluating LCBAM and M2B further confirm the effectiveness of the proposed architectural components. Results suggest that MLRU++ offers a practical and high-performing solution for 3D medical image segmentation tasks. Source code is available at: https://github.com/1027865/MLRUPP

Mixed Modality Segmentation Methodology In Silico Benchmark SOTA Open Code

Re-identification of patients from imaging features extracted by foundation models.

Nebbia G, Kumar S, McNamara SM, Bridge C, Campbell JP, Chiang MF, Mandava N, Singh P, Kalpathy-Cramer J

•papers•Jul 22 2025

Foundation models for medical imaging are a prominent research topic, but risks associated with the imaging features they can capture have not been explored. We aimed to assess whether imaging features from foundation models enable patient re-identification and to relate re-identification to demographic features prediction. Our data included Colour Fundus Photos (CFP), Optical Coherence Tomography (OCT) b-scans, and chest x-rays and we reported re-identification rates of 40.3%, 46.3%, and 25.9%, respectively. We reported varying performance on demographic features prediction depending on re-identification status (e.g., AUC-ROC for gender from CFP is 82.1% for re-identified images vs. 76.8% for non-re-identified ones). When training a deep learning model on the re-identification task, we reported performance of 82.3%, 93.9%, and 63.7% at image level on our internal CFP, OCT, and chest x-ray data. We showed that imaging features extracted from foundation models in ophthalmology and radiology include information that can lead to patient re-identification.

Mixed Modality Classification Methodology In Silico Academic Lab Ethics

AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation

Nima Fathi, Amar Kumar, Tal Arbel

•preprint•Jul 22 2025

Recent advancements in Large Language Models (LLMs) have catalyzed a paradigm shift from static prediction systems to agentic AI agents capable of reasoning, interacting with tools, and adapting to complex tasks. While LLM-based agentic systems have shown promise across many domains, their application to medical imaging remains in its infancy. In this work, we introduce AURA, the first visual linguistic explainability agent designed specifically for comprehensive analysis, explanation, and evaluation of medical images. By enabling dynamic interactions, contextual explanations, and hypothesis testing, AURA represents a significant advancement toward more transparent, adaptable, and clinically aligned AI systems. We highlight the promise of agentic AI in transforming medical image analysis from static predictions to interactive decision support. Leveraging Qwen-32B, an LLM-based architecture, AURA integrates a modular toolbox comprising: (i) a segmentation suite with phase grounding, pathology segmentation, and anatomy segmentation to localize clinically meaningful regions; (ii) a counterfactual image-generation module that supports reasoning through image-level explanations; and (iii) a set of evaluation tools including pixel-wise difference-map analysis, classification, and advanced state-of-the-art components to assess diagnostic relevance and visual interpretability.

Mixed Modality Classification Methodology Concept Academic Lab GenAI Breakthrough

MLRU++: Multiscale Lightweight Residual UNETR++ with Attention for Efficient 3D Medical Image Segmentation

Nand Kumar Yadav, Rodrigue Rizk, William CW Chen, KC Santosh

•preprint•Jul 22 2025

Mixed Modality Segmentation Whole Body Methodology In Silico Academic Lab Open Code Benchmark SOTA

SarAdapter: Prioritizing Attention on Semantic-Aware Representative Tokens for Enhanced Medical Image Segmentation.

Jiang W, Li Y, Liu Z, An L, Quellec G, Ou C

•papers•Jul 22 2025

Transformer-based segmentation methods exhibit considerable potential in medical image analysis. However, their improved performance often comes with increased computational complexity, limiting their application in resource-constrained medical settings. Prior methods follow two independent tracks: (i) accelerating existing networks via semantic-aware routing, and (ii) optimizing token adapter design to enhance network performance. Despite directness, they encounter unavoidable defects (e.g., inflexible acceleration techniques or non-discriminative processing) limiting further improvements of quality-complexity trade-off. To address these shortcomings, we integrate these schemes by proposing the semantic-aware adapter (SarAdapter), which employs a semantic-based routing strategy, leveraging neural operators (ViT and CNN) of varying complexities. Specifically, it merges semantically similar tokens volume into low-resolution regions while preserving semantically distinct tokens as high-resolution regions. Additionally, we introduce a Mixed-adapter unit, which adaptively selects convolutional operators of varying complexities to better model regions at different scales. We evaluate our method on four medical datasets from three modalities and show that it achieves a superior balance between accuracy, model size, and efficiency. Notably, our proposed method achieves state-of-the-art segmentation quality on the Synapse dataset while reducing the number of tokens by 65.6%, signifying a substantial improvement in the efficiency of ViTs for the segmentation task.

Mixed Modality Segmentation Methodology In Silico Benchmark SOTA

Pyramid Hierarchical Masked Diffusion Model for Imaging Synthesis

Xiaojiao Xiao, Qinmin Vivian Hu, Guanghui Wang

•preprint•Jul 22 2025

Medical image synthesis plays a crucial role in clinical workflows, addressing the common issue of missing imaging modalities due to factors such as extended scan times, scan corruption, artifacts, patient motion, and intolerance to contrast agents. The paper presents a novel image synthesis network, the Pyramid Hierarchical Masked Diffusion Model (PHMDiff), which employs a multi-scale hierarchical approach for more detailed control over synthesizing high-quality images across different resolutions and layers. Specifically, this model utilizes randomly multi-scale high-proportion masks to speed up diffusion model training, and balances detail fidelity and overall structure. The integration of a Transformer-based Diffusion model process incorporates cross-granularity regularization, modeling the mutual information consistency across each granularity's latent spaces, thereby enhancing pixel-level perceptual accuracy. Comprehensive experiments on two challenging datasets demonstrate that PHMDiff achieves superior performance in both the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM), highlighting its capability to produce high-quality synthesized images with excellent structural integrity. Ablation studies further confirm the contributions of each component. Furthermore, the PHMDiff model, a multi-scale image synthesis framework across and within medical imaging modalities, shows significant advantages over other methods. The source code is available at https://github.com/xiaojiao929/PHMDiff

Mixed Modality Image Synthesis Methodology In Silico Academic Lab Open Code

ChebMixer: Efficient Graph Representation Learning With MLP Mixer.

Kui X, Yan H, Li Q, Zhang M, Chen L, Zou B

•papers•Jul 22 2025

Graph neural networks (GNNs) have achieved remarkable success in learning graph representations, especially graph Transformers, which have recently shown superior performance on various graph mining tasks. However, the graph Transformer generally treats nodes as tokens, which results in quadratic complexity regarding the number of nodes during self-attention computation. The graph multilayer perceptron (MLP) mixer addresses this challenge using the efficient MLP Mixer technique from computer vision. However, the time-consuming process of extracting graph tokens limits its performance. In this article, we present a novel architecture named ChebMixer, a newly proposed graph MLP Mixer that uses fast Chebyshev polynomials-based spectral filtering to extract a sequence of tokens. First, we produce multiscale representations of graph nodes via fast Chebyshev polynomial-based spectral filtering. Next, we consider each node's multiscale representations as a sequence of tokens and refine the node representation with an effective MLP Mixer. Finally, we aggregate the multiscale representations of nodes through Chebyshev interpolation. Owing to the powerful representation capabilities and fast computational properties of the MLP Mixer, we can quickly extract more informative node representations to improve the performance of downstream tasks. The experimental results prove our significant improvements in various scenarios, ranging from homogeneous and heterophilic graph node classification to medical image segmentation. Compared with NAGphormer, the average performance improved by 1.45% on homogeneous graphs and 4.15% on heterophilic graphs. And the average performance improved by 1.39% on medical image segmentation tasks compared with VM-UNet. We will release the source code after this article is accepted.

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code

EICSeg: Universal Medical Image Segmentation via Explicit In-Context Learning.

Xie S, Zhang L, Niu Z, Ye F, Zhong Q, Xie D, Chen YW, Lin L

•papers•Jul 22 2025

Deep learning models for medical image segmentation often struggle with task-specific characteristics, limiting their generalization to unseen tasks with new anatomies, labels, or modalities. Retraining or fine-tuning these models requires substantial human effort and computational resources. To address this, in-context learning (ICL) has emerged as a promising paradigm, enabling query image segmentation by conditioning on example image-mask pairs provided as prompts. Unlike previous approaches that rely on implicit modeling or non-end-to-end pipelines, we redefine the core interaction mechanism in ICL as an explicit retrieval process, termed E-ICL, benefiting from the emergence of vision foundation models (VFMs). E-ICL captures dense correspondences between queries and prompts at minimal learning cost and leverages them to dynamically weight multi-class prompt masks. Built upon E-ICL, we propose EICSeg, the first end-to-end ICL framework that integrates complementary VFMs for universal medical image segmentation. Specifically, we introduce a lightweight SD-Adapter to bridge the distinct functionalities of the VFMs, enabling more accurate segmentation predictions. To fully exploit the potential of EICSeg, we further design a scalable self-prompt training strategy and an adaptive token-to-image prompt selection mechanism, facilitating both efficient training and inference. EICSeg is trained on 47 datasets covering diverse modalities and segmentation targets. Experiments on nine unseen datasets demonstrate its strong few-shot generalization ability, achieving an average Dice score of 74.0%, outperforming existing in-context and few-shot methods by 4.5%, and reducing the gap to task-specific models to 10.8%. Even with a single prompt, EICSeg achieves a competitive average Dice score of 60.1%. Notably, it performs automatic segmentation without manual prompt engineering, delivering results comparable to interactive models while requiring minimal labeled data. Source code will be available at https://github.com/ zerone-fg/EICSeg.

Mixed Modality Segmentation Whole Body Methodology In Silico Academic Lab Open Code Benchmark SOTA

Filter Papers

Tags

Mitigating Data Bias in Healthcare AI with Self-Supervised Standardization.

Supervised versus unsupervised GAN for pseudo-CT synthesis in brain MR-guided radiotherapy.

MLRU++: Multiscale Lightweight Residual UNETR++ with Attention for Efficient 3D Medical Image Segmentation

Re-identification of patients from imaging features extracted by foundation models.

AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation

MLRU++: Multiscale Lightweight Residual UNETR++ with Attention for Efficient 3D Medical Image Segmentation

SarAdapter: Prioritizing Attention on Semantic-Aware Representative Tokens for Enhanced Medical Image Segmentation.

Pyramid Hierarchical Masked Diffusion Model for Imaging Synthesis

ChebMixer: Efficient Graph Representation Learning With MLP Mixer.

EICSeg: Universal Medical Image Segmentation via Explicit In-Context Learning.

Ready to Sharpen Your Edge?