Latest Papers on Radiology AI. Tags: Other, Order: Best Match, Limit: 10.

MFFBi-Unet: Merging Dynamic Sparse Attention and Multi-scale Feature Fusion for Medical Image Segmentation.

Sun B, Liu C, Wang Q, Bi K, Zhang W

•papers•Jul 29 2025

The advancement of deep learning has driven extensive research validating the effectiveness of U-Net-style symmetric encoder-decoder architectures based on Transformers for medical image segmentation. However, the inherent design requiring attention mechanisms to compute token affinities across all spatial locations leads to prohibitive computational complexity and substantial memory demands. Recent efforts have attempted to address these limitations through sparse attention mechanisms. However, existing approaches employing artificial, content-agnostic sparse attention patterns demonstrate limited capability in modeling long-range dependencies effectively. We propose MFFBi-Unet, a novel architecture incorporating dynamic sparse attention through bi-level routing, enabling context-aware computation allocation with enhanced adaptability. The encoder-decoder module integrates BiFormer to optimize semantic feature extraction and facilitate high-fidelity feature map reconstruction. A novel Multi-scale Feature Fusion (MFF) module in skip connections synergistically combines multi-level contextual information with processed multi-scale features. Extensive evaluations on multiple public medical benchmarks demonstrate that our method consistently exhibits significant advantages. Notably, our method achieves statistically significant improvements, outperforming state-of-the-art approaches like MISSFormer by 2.02% and 1.28% Dice scores on respective benchmarks.

Mixed Modality Segmentation Methodology In Silico Benchmark SOTA

Evaluation of GPT-4o for multilingual translation of radiology reports across imaging modalities.

Terzis R, Salam B, Nowak S, Mueller PT, Mesropyan N, Oberlinkels L, Efferoth AF, Kravchenko D, Voigt M, Ginzburg D, Pieper CC, Hayawi M, Kuetting D, Afat S, Maintz D, Luetkens JA, Kaya K, Isaak A

•papers•Jul 29 2025

Large language models (LLMs) like GPT-4o offer multilingual and real-time translation capabilities. This study aims to evaluate GPT-4o's effectiveness in translating radiology reports into different languages. In this experimental two-center study, 100 real-world radiology reports from four imaging modalities (X-ray, ultrasound, CT, MRI) were randomly selected and fully anonymized. Reports were translated using GPT-4o with zero-shot prompting from German into four languages including English, French, Spanish, and Russian (n = 400 translations). Eight bilingual radiologists (two per language) evaluated the translations for general readability, overall quality, and utility for translators using 5-point Likert scales (ranging from 5 [best score] to 1 [worst score]). Binary questions (yes/no) were conducted to evaluate potential harmful errors, completeness, and factual correctness. The average processing time of GPT-4o for translating reports ranged from 9 to 24 s. The overall quality of translations achieved a median of 4.5 (IQR 4-5), with English (5 [4,5]), French and Spanish (each 4.5 [4,5]) significantly outperforming Russian (4 [3.5-4]; each p < 0.05). Usefulness for translators was rated highest for English (5 [5-5], p < 0.05 against other languages). Readability scores and translation completeness were significantly higher for translations into Spanish, English and French compared to Russian (each p < 0.05). Factual correctness averaged 79 %, with English (84 %) and French (83 %) outperforming Russian (69 %) (each p < 0.05). Potentially harmful errors were identified in 4 % of translations, primarily in Russian (9 %). GPT-4o demonstrated robust performance in translating radiology reports across multiple languages, with limitations observed in Russian translations.

Mixed Modality LLM Radiology Report Retrospective Clinical In Silico Academic Lab GenAI

Multi-Faceted Consistency learning with active cross-labeling for barely-supervised 3D medical image segmentation.

Wu X, Xu Z, Tong RK

•papers•Jul 29 2025

Deep learning-driven 3D medical image segmentation generally necessitates dense voxel-wise annotations, which are expensive and labor-intensive to acquire. Cross-annotation, which labels only a few orthogonal slices per scan, has recently emerged as a cost-effective alternative that better preserves the shape and precise boundaries of the 3D object than traditional weak labeling methods such as bounding boxes and scribbles. However, learning from such sparse labels, referred to as barely-supervised learning (BSL), remains challenging due to less fine-grained object perception, less compact class features and inferior generalizability. To tackle these challenges and foster collaboration between model training and human expertise, we propose a Multi-Faceted ConSistency learning (MF-ConS) framework with a Diversity and Uncertainty Sampling-based Active Learning (DUS-AL) strategy, specifically designed for the active BSL scenario. This framework combines a cross-annotation BSL strategy, where only three orthogonal slices are labeled per scan, with an AL paradigm guided by DUS to direct human-in-the-loop annotation toward the most informative volumes under a fixed budget. Built upon a teacher-student architecture, MF-ConS integrates three complementary consistency regularization modules: (i) neighbor-informed object prediction consistency for advancing fine-grained object perception by encouraging the student model to infer complete segmentation from masked inputs; (ii) prototype-driven consistency, which enhances intra-class compactness and discriminativeness by aligning latent feature and decision spaces using fused prototypes; and (iii) stability constraint that promotes model robustness against input perturbations. Extensive experiments on three benchmark datasets demonstrate that MF-ConS (DUS-AL) consistently outperforms state-of-the-art methods under extremely limited annotation.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA

SwinECAT: A Transformer-based fundus disease classification model with Shifted Window Attention and Efficient Channel Attention

Peiran Gu, Teng Yao, Mengshen He, Fuhao Duan, Feiyan Liu, RenYuan Peng, Bao Ge

•preprint•Jul 29 2025

In recent years, artificial intelligence has been increasingly applied in the field of medical imaging. Among these applications, fundus image analysis presents special challenges, including small lesion areas in certain fundus diseases and subtle inter-disease differences, which can lead to reduced prediction accuracy and overfitting in the models. To address these challenges, this paper proposes the Transformer-based model SwinECAT, which combines the Shifted Window (Swin) Attention with the Efficient Channel Attention (ECA) Attention. SwinECAT leverages the Swin Attention mechanism in the Swin Transformer backbone to effectively capture local spatial structures and long-range dependencies within fundus images. The lightweight ECA mechanism is incorporated to guide the SwinECAT's attention toward critical feature channels, enabling more discriminative feature representation. In contrast to previous studies that typically classify fundus images into 4 to 6 categories, this work expands fundus disease classification to 9 distinct types, thereby enhancing the granularity of diagnosis. We evaluate our method on the Eye Disease Image Dataset (EDID) containing 16,140 fundus images for 9-category classification. Experimental results demonstrate that SwinECAT achieves 88.29\% accuracy, with weighted F1-score of 0.88 and macro F1-score of 0.90. The classification results of our proposed model SwinECAT significantly outperform the baseline Swin Transformer and multiple compared baseline models. To our knowledge, this represents the highest reported performance for 9-category classification on this public dataset.

OCT Classification Methodology In Silico Academic Lab Benchmark SOTA

Deep learning aging marker from retinal images unveils sex-specific clinical and genetic signatures

Trofimova, O., Böttger, L., Bors, S., Pan, Y., Liefers, B., Beyeler, M. J., Presby, D. M., Bontempi, D., Hastings, J., Klaver, C. C. W., Bergmann, S.

•preprint•Jul 29 2025

Retinal fundus images offer a non-invasive window into systemic aging. Here, we fine-tuned a foundation model (RETFound) to predict chronological age from color fundus images in 71,343 participants from the UK Biobank, achieving a mean absolute error of 2.85 years. The resulting retinal age gap (RAG), i.e., the difference between predicted and chronological age, was associated with cardiometabolic traits, inflammation, cognitive performance, mortality, dementia, cancer, and incident cardiovascular disease. Genome-wide analyses identified genes related to longevity, metabolism, neurodegeneration, and age-related eye diseases. Sex-stratified models revealed consistent performance but divergent biological signatures: males had younger-appearing retinas and stronger links to metabolic syndrome, while in females, both model attention and genetic associations pointed to a greater involvement of retinal vasculature. Our study positions retinal aging as a biologically meaningful and sex-sensitive biomarker that can support more personalized approaches to risk assessment and aging-related healthcare.

OCT Registration Retrospective Clinical In Silico Academic Lab Benchmark SOTA

VidFuncta: Towards Generalizable Neural Representations for Ultrasound Videos

Julia Wolleb, Florentin Bieder, Paul Friedrich, Hemant D. Tagare, Xenophon Papademetris

•preprint•Jul 29 2025

Ultrasound is widely used in clinical care, yet standard deep learning methods often struggle with full video analysis due to non-standardized acquisition and operator bias. We offer a new perspective on ultrasound video analysis through implicit neural representations (INRs). We build on Functa, an INR framework in which each image is represented by a modulation vector that conditions a shared neural network. However, its extension to the temporal domain of medical videos remains unexplored. To address this gap, we propose VidFuncta, a novel framework that leverages Functa to encode variable-length ultrasound videos into compact, time-resolved representations. VidFuncta disentangles each video into a static video-specific vector and a sequence of time-dependent modulation vectors, capturing both temporal dynamics and dataset-level redundancies. Our method outperforms 2D and 3D baselines on video reconstruction and enables downstream tasks to directly operate on the learned 1D modulation vectors. We validate VidFuncta on three public ultrasound video datasets -- cardiac, lung, and breast -- and evaluate its downstream performance on ejection fraction prediction, B-line detection, and breast lesion classification. These results highlight the potential of VidFuncta as a generalizable and efficient representation framework for ultrasound videos. Our code is publicly available under https://github.com/JuliaWolleb/VidFuncta_public.

Ultrasound Classification Methodology In Silico Academic Lab Open Code GenAI

Harnessing deep learning to optimize induction chemotherapy choices in nasopharyngeal carcinoma.

Chen ZH, Han X, Lin L, Lin GY, Li B, Kou J, Wu CF, Ai XL, Zhou GQ, Gao MY, Lu LJ, Sun Y

•papers•Jul 28 2025

Currently, there is no guidance for personalized choice of induction chemotherapy (IC) regimens (TPF, docetaxel + cisplatin + 5-Fu; or GP, gemcitabine + cisplatin) for locoregionally advanced nasopharyngeal carcinoma (LA-NPC). This study aimed to develop deep learning models for IC response prediction in LA-NPC. For 1438 LA-NPC patients, pretreatment magnetic resonance imaging (MRI) scans and complete biological response (cBR) information after 3 cycles of IC were collected from two centers. All models were trained in 969 patients (TPF: 548, GP: 421), and internally validated in 243 patients (TPF: 138, GP: 105), then tested on an internal dataset of 226 patients (TPF: 125, GP: 101). MRI models for the TPF and GP cohorts were constructed to predict cBR from MRI using radiomics and graph convolutional network (GCN). The MRI-Clinical models were built based on both MRI and clinical parameters. The MRI models and MRI-Clinical models achieved high discriminative accuracy in both TPF cohorts (MRI model: AUC, 0.835; MRI-Clinical model: AUC, 0.838) and GP cohorts (MRI model: AUC, 0.764; MRI-Clinical model: AUC, 0.777). The MRI-Clinical models also showed good performance in the risk stratification. The survival curve revealed that the 3-year disease-free survival of the high-sensitivity group was better than that of the low-sensitivity group in both the TPF and GP cohorts. An online tool guiding personalized choice of IC regimen was developed based on MRI-Clinical models. Our radiomics and GCN-based IC response prediction tool has robust predictive performance and may provide guidance for personalized treatment.

MRI Classification Retrospective Clinical In Silico Academic Lab GenAI

Continual learning in medical image analysis: A comprehensive review of recent advancements and future prospects.

Kumari P, Chauhan J, Bozorgpour A, Huang B, Azad R, Merhof D

•papers•Jul 28 2025

Medical image analysis has witnessed remarkable advancements, even surpassing human-level performance in recent years, driven by the rapid development of advanced deep-learning algorithms. However, when the inference dataset slightly differs from what the model has seen during one-time training, the model performance is greatly compromised. The situation requires restarting the training process using both the old and the new data, which is computationally costly, does not align with the human learning process, and imposes storage constraints and privacy concerns. Alternatively, continual learning has emerged as a crucial approach for developing unified and sustainable deep models to deal with new classes, tasks, and the drifting nature of data in non-stationary environments for various application areas. Continual learning techniques enable models to adapt and accumulate knowledge over time, which is essential for maintaining performance on evolving datasets and novel tasks. Owing to its popularity and promising performance, it is an active and emerging research topic in the medical field and hence demands a survey and taxonomy to clarify the current research landscape of continual learning in medical image analysis. This systematic review paper provides a comprehensive overview of the state-of-the-art in continual learning techniques applied to medical image analysis. We present an extensive survey of existing research, covering topics including catastrophic forgetting, data drifts, stability, and plasticity requirements. Further, an in-depth discussion of key components of a continual learning framework, such as continual learning scenarios, techniques, evaluation schemes, and metrics, is provided. Continual learning techniques encompass various categories, including rehearsal, regularization, architectural, and hybrid strategies. We assess the popularity and applicability of continual learning categories in various medical sub-fields like radiology and histopathology. Our exploration considers unique challenges in the medical domain, including costly data annotation, temporal drift, and the crucial need for benchmarking datasets to ensure consistent model evaluation. The paper also addresses current challenges and looks ahead to potential future research directions.

Mixed Modality Classification Review Concept Benchmark SOTA Reproducibility

ToothMaker: Realistic Panoramic Dental Radiograph Generation via Disentangled Control.

Yu W, Guo X, Li W, Liu X, Chen H, Yuan Y

•papers•Jul 28 2025

Generating high-fidelity dental radiographs is essential for training diagnostic models. Despite the development of numerous methods for other medical data, generative approaches in dental radiology remain unexplored. Due to the intricate tooth structures and specialized terminology, these methods often yield ambiguous tooth regions and incorrect dental concepts when applied to dentistry. In this paper, we take the first attempt to investigate diffusion-based teeth X-ray image generation and propose ToothMaker, a novel framework specifically designed for the dental domain. Firstly, to synthesize X-ray images that possess accurate tooth structures and realistic radiological styles simultaneously, we design control-disentangled fine-tuning (CDFT) strategy. Specifically, we present two separate controllers to handle style and layout control respectively, and introduce a gradient-based decoupling method that optimizes each using their corresponding disentangled gradients. Secondly, to enhance model's understanding of dental terminology, we propose prior-disentangled guidance module (PDGM), enabling precise synthesis of dental concepts. It utilizes large language model to decompose dental terminology into a series of meta-knowledge elements and performs interactions and refinements through hypergraph neural network. These elements are then fed into the network to guide the generation of dental concepts. Extensive experiments demonstrate the high fidelity and diversity of the images synthesized by our approach. By incorporating the generated data, we achieve substantial performance improvements on downstream segmentation and visual question answering tasks, indicating that our method can greatly reduce the reliance on manually annotated data. Code will be public available at https://github.com/CUHK-AIM-Group/ToothMaker.

X-Ray Image Synthesis Methodology In Silico Academic Lab Open Code

Segmentation of the human tongue musculature using MRI: Field guide and validation in motor neuron disease.

Shaw TB, Ribeiro FL, Zhu X, Aiken P, Bollmann S, Bollmann S, Chang J, Chidley K, Dempsey-Jones H, Eftekhari Z, Gillespie J, Henderson RD, Kiernan MC, Ktena I, McCombe PA, Ngo ST, Taubert ST, Whelan BM, Ye X, Steyn FJ, Tu S, Barth M

•papers•Jul 28 2025

This work addresses the challenge of reliably measuring the muscles of the human tongue, which are difficult to quantify due to complex interwoven muscle types. We introduce a new semi-automated method, enabled by a manually curated dataset of MRI scans to accurately measure five key tongue muscles, combining AI-assisted, atlas-based, and manual segmentation approaches. The method was tested and validated in a dataset of 178 scans and included segmentation validation (n = 103) and clinical application (n = 132) in individuals with motor neuron disease. We show that people with speech and swallowing deficits tend to have smaller muscle volumes and present a normalisation strategy that removes confounding demographic factors, enabling broader application to large MRI datasets. As the tongue is generally covered in neuroimaging protocols, our multi-contrast pipeline will allow for the post-hoc analysis of a vast number of datasets. We expect this work to enable the investigation of tongue muscle morphology as a marker in a wide range of diseases that implicate tongue function, including neurodegenerative diseases and pathological speech disorders.

MRI Segmentation Retrospective Clinical In Silico

MFFBi-Unet: Merging Dynamic Sparse Attention and Multi-scale Feature Fusion for Medical Image Segmentation.

Evaluation of GPT-4o for multilingual translation of radiology reports across imaging modalities.

Multi-Faceted Consistency learning with active cross-labeling for barely-supervised 3D medical image segmentation.

SwinECAT: A Transformer-based fundus disease classification model with Shifted Window Attention and Efficient Channel Attention

Deep learning aging marker from retinal images unveils sex-specific clinical and genetic signatures

VidFuncta: Towards Generalizable Neural Representations for Ultrasound Videos

Harnessing deep learning to optimize induction chemotherapy choices in nasopharyngeal carcinoma.

Continual learning in medical image analysis: A comprehensive review of recent advancements and future prospects.

ToothMaker: Realistic Panoramic Dental Radiograph Generation via Disentangled Control.

Segmentation of the human tongue musculature using MRI: Field guide and validation in motor neuron disease.

Ready to Sharpen Your Edge?