Latest Papers on Radiology AI.

A Benchmark Framework for the Right Atrium Cavity Segmentation From LGE-MRIs.

Bai J, Zhu J, Chen Z, Yang Z, Lu Y, Li L, Li Q, Wang W, Zhang H, Wang K, Gan J, Zhao J, Lu H, Li S, Huang J, Chen X, Zhang X, Xu X, Li L, Tian Y, Campello VM, Lekadir K

•papers•Jul 22 2025

The right atrium (RA) is critical for cardiac hemodynamics but is often overlooked in clinical diagnostics. This study presents a benchmark framework for RA cavity segmentation from late gadolinium-enhanced magnetic resonance imaging (LGE-MRIs), leveraging a two-stage strategy and a novel 3D deep learning network, RASnet. The architecture addresses challenges in class imbalance and anatomical variability by incorporating multi-path input, multi-scale feature fusion modules, Vision Transformers, context interaction mechanisms, and deep supervision. Evaluated on datasets comprising 354 LGE-MRIs, RASnet achieves SOTA performance with a Dice score of 92.19% on a primary dataset and demonstrates robust generalizability on an independent dataset. The proposed framework establishes a benchmark for RA cavity segmentation, enabling accurate and efficient analysis for cardiac imaging applications. Open-source code (https://github.com/zjinw/RAS) and data (https://zenodo.org/records/15524472) are provided to facilitate further research and clinical adoption.

MRI Segmentation Cardiac Methodology In Silico Academic Lab Benchmark SOTA Open Code Open Dataset

SarAdapter: Prioritizing Attention on Semantic-Aware Representative Tokens for Enhanced Medical Image Segmentation.

Jiang W, Li Y, Liu Z, An L, Quellec G, Ou C

•papers•Jul 22 2025

Transformer-based segmentation methods exhibit considerable potential in medical image analysis. However, their improved performance often comes with increased computational complexity, limiting their application in resource-constrained medical settings. Prior methods follow two independent tracks: (i) accelerating existing networks via semantic-aware routing, and (ii) optimizing token adapter design to enhance network performance. Despite directness, they encounter unavoidable defects (e.g., inflexible acceleration techniques or non-discriminative processing) limiting further improvements of quality-complexity trade-off. To address these shortcomings, we integrate these schemes by proposing the semantic-aware adapter (SarAdapter), which employs a semantic-based routing strategy, leveraging neural operators (ViT and CNN) of varying complexities. Specifically, it merges semantically similar tokens volume into low-resolution regions while preserving semantically distinct tokens as high-resolution regions. Additionally, we introduce a Mixed-adapter unit, which adaptively selects convolutional operators of varying complexities to better model regions at different scales. We evaluate our method on four medical datasets from three modalities and show that it achieves a superior balance between accuracy, model size, and efficiency. Notably, our proposed method achieves state-of-the-art segmentation quality on the Synapse dataset while reducing the number of tokens by 65.6%, signifying a substantial improvement in the efficiency of ViTs for the segmentation task.

Mixed Modality Segmentation Methodology In Silico Benchmark SOTA

ChebMixer: Efficient Graph Representation Learning With MLP Mixer.

Kui X, Yan H, Li Q, Zhang M, Chen L, Zou B

•papers•Jul 22 2025

Graph neural networks (GNNs) have achieved remarkable success in learning graph representations, especially graph Transformers, which have recently shown superior performance on various graph mining tasks. However, the graph Transformer generally treats nodes as tokens, which results in quadratic complexity regarding the number of nodes during self-attention computation. The graph multilayer perceptron (MLP) mixer addresses this challenge using the efficient MLP Mixer technique from computer vision. However, the time-consuming process of extracting graph tokens limits its performance. In this article, we present a novel architecture named ChebMixer, a newly proposed graph MLP Mixer that uses fast Chebyshev polynomials-based spectral filtering to extract a sequence of tokens. First, we produce multiscale representations of graph nodes via fast Chebyshev polynomial-based spectral filtering. Next, we consider each node's multiscale representations as a sequence of tokens and refine the node representation with an effective MLP Mixer. Finally, we aggregate the multiscale representations of nodes through Chebyshev interpolation. Owing to the powerful representation capabilities and fast computational properties of the MLP Mixer, we can quickly extract more informative node representations to improve the performance of downstream tasks. The experimental results prove our significant improvements in various scenarios, ranging from homogeneous and heterophilic graph node classification to medical image segmentation. Compared with NAGphormer, the average performance improved by 1.45% on homogeneous graphs and 4.15% on heterophilic graphs. And the average performance improved by 1.39% on medical image segmentation tasks compared with VM-UNet. We will release the source code after this article is accepted.

Mixed Modality Segmentation Methodology In Silico Academic Lab Open Code

EICSeg: Universal Medical Image Segmentation via Explicit In-Context Learning.

Xie S, Zhang L, Niu Z, Ye F, Zhong Q, Xie D, Chen YW, Lin L

•papers•Jul 22 2025

Deep learning models for medical image segmentation often struggle with task-specific characteristics, limiting their generalization to unseen tasks with new anatomies, labels, or modalities. Retraining or fine-tuning these models requires substantial human effort and computational resources. To address this, in-context learning (ICL) has emerged as a promising paradigm, enabling query image segmentation by conditioning on example image-mask pairs provided as prompts. Unlike previous approaches that rely on implicit modeling or non-end-to-end pipelines, we redefine the core interaction mechanism in ICL as an explicit retrieval process, termed E-ICL, benefiting from the emergence of vision foundation models (VFMs). E-ICL captures dense correspondences between queries and prompts at minimal learning cost and leverages them to dynamically weight multi-class prompt masks. Built upon E-ICL, we propose EICSeg, the first end-to-end ICL framework that integrates complementary VFMs for universal medical image segmentation. Specifically, we introduce a lightweight SD-Adapter to bridge the distinct functionalities of the VFMs, enabling more accurate segmentation predictions. To fully exploit the potential of EICSeg, we further design a scalable self-prompt training strategy and an adaptive token-to-image prompt selection mechanism, facilitating both efficient training and inference. EICSeg is trained on 47 datasets covering diverse modalities and segmentation targets. Experiments on nine unseen datasets demonstrate its strong few-shot generalization ability, achieving an average Dice score of 74.0%, outperforming existing in-context and few-shot methods by 4.5%, and reducing the gap to task-specific models to 10.8%. Even with a single prompt, EICSeg achieves a competitive average Dice score of 60.1%. Notably, it performs automatic segmentation without manual prompt engineering, delivering results comparable to interactive models while requiring minimal labeled data. Source code will be available at https://github.com/ zerone-fg/EICSeg.

Mixed Modality Segmentation Whole Body Methodology In Silico Academic Lab Open Code Benchmark SOTA

A Biomimetic Titanium Scaffold with and Without Magnesium Filled for Adjustable Patient-Specific Elastic Modulus.

Jana S, Sarkar R, Rana M, Das S, Chakraborty A, Das A, Roy Chowdhury A, Pal B, Dutta Majumdar J, Dhara S

•papers•Jul 22 2025

This study focuses on determining the effective young modulus (stiffness) of various lattice structures for titanium scaffolds filled with magnesium and without magnesium. For specific patient success of the implant is depends on adequate elastic modulus which helps proper osteointegration. The Mg filled portion in the Ti scaffold is expected to dissolve with time as the bone growth through the Ti scaffold porous cavity is started. The proposed method is based on a general numerical homogenization scheme to determine the effective elastic properties of the lattice scaffold at the macroscopic scale. A large numerical campaign has been conducted on 18 geometries. The 3D scaffold is conceived based on the model generated from the Micro CT data of the prepared sample. The effect of the scaffold local features, e.g., the distribution of porosity, presence of scaffold's surface area to the adjacent bone location, strut diameter of implant, on the effective elastic properties is investigated. Results show that both the relative density and the geometrical features of the scaffold strongly affect the equivalent macroscopic elastic behaviour of the lattice. 6 samples are made (three each Mg filled and three without Mg) The compression test was carried out for each type of samples and the displacement obtained from the test results were in close match with the simulated results from finite element analysis. To predict the unknown required stiffness what would be the ratio between Ti scaffold and filled up Mg have been calculated using the data driven AI model.

CT Registration Methodology In Silico Academic Lab

Supervised versus unsupervised GAN for pseudo-CT synthesis in brain MR-guided radiotherapy.

Kermani MZ, Tavakoli MB, Khorasani A, Abedi I, Sadeghi V, Amouheidari A

•papers•Jul 22 2025

Radiotherapy is a crucial treatment for brain tumor malignancies. To address the limitations of CT-based treatment planning, recent research has explored MR-only radiotherapy, requiring precise MR-to-CT synthesis. This study compares two deep learning approaches, supervised (Pix2Pix) and unsupervised (CycleGAN), for generating pseudo-CT (pCT) images from T1- and T2-weighted MR sequences. 3270 paired T1- and T2-weighted MRI images were collected and registered with corresponding CT images. After preprocessing, a supervised pCT generative model was trained using the Pix2Pix framework, and an unsupervised generative network (CycleGAN) was also trained to enable a comparative assessment of pCT quality relative to the Pix2Pix model. To assess differences between pCT and reference CT images, three key metrics (SSIM, PSNR, and MAE) were used. Additionally, a dosimetric evaluation was performed on selected cases to assess clinical relevance. The average SSIM, PSNR, and MAE for Pix2Pix on T1 images were 0.964 ± 0.03, 32.812 ± 5.21, and 79.681 ± 9.52 HU, respectively. Statistical analysis revealed that Pix2Pix significantly outperformed CycleGAN in generating high-fidelity pCT images (p < 0.05). There was no notable difference in the effectiveness of T1-weighted versus T2-weighted MR images for generating pCT (p > 0.05). Dosimetric evaluation confirmed comparable dose distributions between pCT and reference CT, supporting clinical feasibility. Both supervised and unsupervised methods demonstrated the capability to generate accurate pCT images from conventional T1- and T2-weighted MR sequences. While supervised methods like Pix2Pix achieve higher accuracy, unsupervised approaches such as CycleGAN offer greater flexibility by eliminating the need for paired training data, making them suitable for applications where paired data is unavailable.

Mixed Modality Image Synthesis Neurological Retrospective Clinical In Silico

Area detection improves the person-based performance of a deep learning system for classifying the presence of carotid artery calcifications on panoramic radiographs.

Kuwada C, Mitsuya Y, Fukuda M, Yang S, Kise Y, Mori M, Naitoh M, Ariji Y, Ariji E

•papers•Jul 22 2025

This study investigated deep learning (DL) systems for diagnosing carotid artery calcifications (CAC) on panoramic radiographs. To this end, two DL systems, one with preceding and one with simultaneous area detection functions, were developed to classify CAC on panoramic radiographs, and their person-based classification performances were compared with that of a DL model directly created using entire panoramic radiographs. A total of 580 panoramic radiographs from 290 patients (with CAC) and 290 controls (without CAC) were used to create and evaluate the DL systems. Two convolutional neural networks, GoogLeNet and YOLOv7, were utilized. The following three systems were created: (1) direct classification of entire panoramic images (System 1), (2) preceding region-of-interest (ROI) detection followed by classification (System 2), and (3) simultaneous ROI detection and classification (System 3). Person-based evaluation using the same test data was performed to compare the three systems. A side-based (left and right sides of participants) evaluation was also performed on Systems 2 and 3. Between-system differences in area under the receiver-operating characteristics curve (AUC) were assessed using DeLong's test. For the side-based evaluation, the AUCs of Systems 2 and 3 were 0.89 and 0.84, respectively, and in the person-based evaluation, Systems 2 and 3 had significantly higher AUC values of 0.86 and 0.90, respectively, compared with System 1 (P < 0.001). No significant difference was found between Systems 2 and 3. Preceding or simultaneous use of area detection improved the person-based performance of DL for classifying the presence of CAC on panoramic radiographs.

X-Ray Detection Retrospective Clinical In Silico Academic Lab

A Hybrid CNN-VSSM model for Multi-View, Multi-Task Mammography Analysis: Robust Diagnosis with Attention-Based Fusion

Yalda Zafari, Roaa Elalfy, Mohamed Mabrok, Somaya Al-Maadeed, Tamer Khattab, Essam A. Rashed

•preprint•Jul 22 2025

Early and accurate interpretation of screening mammograms is essential for effective breast cancer detection, yet it remains a complex challenge due to subtle imaging findings and diagnostic ambiguity. Many existing AI approaches fall short by focusing on single view inputs or single-task outputs, limiting their clinical utility. To address these limitations, we propose a novel multi-view, multitask hybrid deep learning framework that processes all four standard mammography views and jointly predicts diagnostic labels and BI-RADS scores for each breast. Our architecture integrates a hybrid CNN VSSM backbone, combining convolutional encoders for rich local feature extraction with Visual State Space Models (VSSMs) to capture global contextual dependencies. To improve robustness and interpretability, we incorporate a gated attention-based fusion module that dynamically weights information across views, effectively handling cases with missing data. We conduct extensive experiments across diagnostic tasks of varying complexity, benchmarking our proposed hybrid models against baseline CNN architectures and VSSM models in both single task and multi task learning settings. Across all tasks, the hybrid models consistently outperform the baselines. In the binary BI-RADS 1 vs. 5 classification task, the shared hybrid model achieves an AUC of 0.9967 and an F1 score of 0.9830. For the more challenging ternary classification, it attains an F1 score of 0.7790, while in the five-class BI-RADS task, the best F1 score reaches 0.4904. These results highlight the effectiveness of the proposed hybrid framework and underscore both the potential and limitations of multitask learning for improving diagnostic performance and enabling clinically meaningful mammography analysis.

Mammography Classification Breast Methodology In Silico Academic Lab Benchmark SOTA

Harmonization in Magnetic Resonance Imaging: A Survey of Acquisition, Image-level, and Feature-level Methods

Qinqin Yang, Firoozeh Shomal-Zadeh, Ali Gholipour

•preprint•Jul 22 2025

Modern medical imaging technologies have greatly advanced neuroscience research and clinical diagnostics. However, imaging data collected across different scanners, acquisition protocols, or imaging sites often exhibit substantial heterogeneity, known as "batch effects" or "site effects". These non-biological sources of variability can obscure true biological signals, reduce reproducibility and statistical power, and severely impair the generalizability of learning-based models across datasets. Image harmonization aims to eliminate or mitigate such site-related biases while preserving meaningful biological information, thereby improving data comparability and consistency. This review provides a comprehensive overview of key concepts, methodological advances, publicly available datasets, current challenges, and future directions in the field of medical image harmonization, with a focus on magnetic resonance imaging (MRI). We systematically cover the full imaging pipeline, and categorize harmonization approaches into prospective acquisition and reconstruction strategies, retrospective image-level and feature-level methods, and traveling-subject-based techniques. Rather than providing an exhaustive survey, we focus on representative methods, with particular emphasis on deep learning-based approaches. Finally, we summarize the major challenges that remain and outline promising avenues for future research.

MRI Reconstruction Neurological Review Concept Ethics Reproducibility

AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation

Nima Fathi, Amar Kumar, Tal Arbel

•preprint•Jul 22 2025

Recent advancements in Large Language Models (LLMs) have catalyzed a paradigm shift from static prediction systems to agentic AI agents capable of reasoning, interacting with tools, and adapting to complex tasks. While LLM-based agentic systems have shown promise across many domains, their application to medical imaging remains in its infancy. In this work, we introduce AURA, the first visual linguistic explainability agent designed specifically for comprehensive analysis, explanation, and evaluation of medical images. By enabling dynamic interactions, contextual explanations, and hypothesis testing, AURA represents a significant advancement toward more transparent, adaptable, and clinically aligned AI systems. We highlight the promise of agentic AI in transforming medical image analysis from static predictions to interactive decision support. Leveraging Qwen-32B, an LLM-based architecture, AURA integrates a modular toolbox comprising: (i) a segmentation suite with phase grounding, pathology segmentation, and anatomy segmentation to localize clinically meaningful regions; (ii) a counterfactual image-generation module that supports reasoning through image-level explanations; and (iii) a set of evaluation tools including pixel-wise difference-map analysis, classification, and advanced state-of-the-art components to assess diagnostic relevance and visual interpretability.

Mixed Modality Classification Methodology Concept Academic Lab GenAI Breakthrough

Filter Papers

Tags

A Benchmark Framework for the Right Atrium Cavity Segmentation From LGE-MRIs.

SarAdapter: Prioritizing Attention on Semantic-Aware Representative Tokens for Enhanced Medical Image Segmentation.

ChebMixer: Efficient Graph Representation Learning With MLP Mixer.

EICSeg: Universal Medical Image Segmentation via Explicit In-Context Learning.

A Biomimetic Titanium Scaffold with and Without Magnesium Filled for Adjustable Patient-Specific Elastic Modulus.

Supervised versus unsupervised GAN for pseudo-CT synthesis in brain MR-guided radiotherapy.

Area detection improves the person-based performance of a deep learning system for classifying the presence of carotid artery calcifications on panoramic radiographs.

A Hybrid CNN-VSSM model for Multi-View, Multi-Task Mammography Analysis: Robust Diagnosis with Attention-Based Fusion

Harmonization in Magnetic Resonance Imaging: A Survey of Acquisition, Image-level, and Feature-level Methods

AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation

Ready to Sharpen Your Edge?