Latest Papers on Radiology AI. Tags: Benchmark SOTA

Accurate Cobb Angle Estimation via SVD-Based Curve Detection and Vertebral Wedging Quantification

Chang Shi, Nan Meng, Yipeng Zhuang, Moxin Zhao, Jason Pui Yin Cheung, Hua Huang, Xiuyuan Chen, Cong Nie, Wenting Zhong, Guiqiang Jiang, Yuxin Wei, Jacob Hong Man Yu, Si Chen, Xiaowen Ou, Teng Zhang

•preprint•Sep 29 2025

Adolescent idiopathic scoliosis (AIS) is a common spinal deformity affecting approximately 2.2% of boys and 4.8% of girls worldwide. The Cobb angle serves as the gold standard for AIS severity assessment, yet traditional manual measurements suffer from significant observer variability, compromising diagnostic accuracy. Despite prior automation attempts, existing methods use simplified spinal models and predetermined curve patterns that fail to address clinical complexity. We present a novel deep learning framework for AIS assessment that simultaneously predicts both superior and inferior endplate angles with corresponding midpoint coordinates for each vertebra, preserving the anatomical reality of vertebral wedging in progressive AIS. Our approach combines an HRNet backbone with Swin-Transformer modules and biomechanically informed constraints for enhanced feature extraction. We employ Singular Value Decomposition (SVD) to analyze angle predictions directly from vertebral morphology, enabling flexible detection of diverse scoliosis patterns without predefined curve assumptions. Using 630 full-spine anteroposterior radiographs from patients aged 10-18 years with rigorous dual-rater annotation, our method achieved 83.45% diagnostic accuracy and 2.55{\deg} mean absolute error. The framework demonstrates exceptional generalization capability on out-of-distribution cases. Additionally, we introduce the Vertebral Wedging Index (VWI), a novel metric quantifying vertebral deformation. Longitudinal analysis revealed VWI's significant prognostic correlation with curve progression while traditional Cobb angles showed no correlation, providing robust support for early AIS detection, personalized treatment planning, and progression monitoring.

X-Ray Detection Musculoskeletal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

MMRQA: Signal-Enhanced Multimodal Large Language Models for MRI Quality Assessment

Fankai Jia, Daisong Gan, Zhe Zhang, Zhaochi Wen, Chenchen Dan, Dong Liang, Haifeng Wang

•preprint•Sep 29 2025

Magnetic resonance imaging (MRI) quality assessment is crucial for clinical decision-making, yet remains challenging due to data scarcity and protocol variability. Traditional approaches face fundamental trade-offs: signal-based methods like MRIQC provide quantitative metrics but lack semantic understanding, while deep learning approaches achieve high accuracy but sacrifice interpretability. To address these limitations, we introduce the Multimodal MRI Quality Assessment (MMRQA) framework, pioneering the integration of multimodal large language models (MLLMs) with acquisition-aware signal processing. MMRQA combines three key innovations: robust metric extraction via MRQy augmented with simulated artifacts, structured transformation of metrics into question-answer pairs using Qwen, and parameter-efficient fusion through Low-Rank Adaptation (LoRA) of LLaVA-OneVision. Evaluated on MR-ART, FastMRI, and MyConnectome benchmarks, MMRQA achieves state-of-the-art performance with strong zero-shot generalization, as validated by comprehensive ablation studies. By bridging quantitative analysis with semantic reasoning, our framework generates clinically interpretable outputs that enhance quality control in dynamic medical settings.

MRI Classification Methodology In Silico Academic Lab Benchmark SOTA

An Enhanced Pyramid Feature Network Based on Long-Range Dependencies for Multi-Organ Medical Image Segmentation

Dayu Tan, Cheng Kong, Yansen Su, Hai Chen, Dongliang Yang, Junfeng Xia, Chunhou Zheng

•preprint•Sep 29 2025

In the field of multi-organ medical image segmentation, recent methods frequently employ Transformers to capture long-range dependencies from image features. However, these methods overlook the high computational cost of Transformers and their deficiencies in extracting local detailed information. To address high computational costs and inadequate local detail information, we reassess the design of feature extraction modules and propose a new deep-learning network called LamFormer for fine-grained segmentation tasks across multiple organs. LamFormer is a novel U-shaped network that employs Linear Attention Mamba (LAM) in an enhanced pyramid encoder to capture multi-scale long-range dependencies. We construct the Parallel Hierarchical Feature Aggregation (PHFA) module to aggregate features from different layers of the encoder, narrowing the semantic gap among features while filtering information. Finally, we design the Reduced Transformer (RT), which utilizes a distinct computational approach to globally model up-sampled features. RRT enhances the extraction of detailed local information and improves the network's capability to capture long-range dependencies. LamFormer outperforms existing segmentation methods on seven complex and diverse datasets, demonstrating exceptional performance. Moreover, the proposed network achieves a balance between model performance and model complexity.

Mixed Modality Segmentation Whole Body Methodology In Silico Benchmark SOTA

BALR-SAM: Boundary-Aware Low-Rank Adaptation of SAM for Resource-Efficient Medical Image Segmentation

Zelin Liu, Sicheng Dong, Bocheng Li, Yixuan Yang, Jiacheng Ruan, Chenxu Zhou, Suncheng Xiang

•preprint•Sep 29 2025

Vision foundation models like the Segment Anything Model (SAM), pretrained on large-scale natural image datasets, often struggle in medical image segmentation due to a lack of domain-specific adaptation. In clinical practice, fine-tuning such models efficiently for medical downstream tasks with minimal resource demands, while maintaining strong performance, is challenging. To address these issues, we propose BALR-SAM, a boundary-aware low-rank adaptation framework that enhances SAM for medical imaging. It combines three tailored components: (1) a Complementary Detail Enhancement Network (CDEN) using depthwise separable convolutions and multi-scale fusion to capture boundary-sensitive features essential for accurate segmentation; (2) low-rank adapters integrated into SAM's Vision Transformer blocks to optimize feature representation and attention for medical contexts, while simultaneously significantly reducing the parameter space; and (3) a low-rank tensor attention mechanism in the mask decoder, cutting memory usage by 75% and boosting inference speed. Experiments on standard medical segmentation datasets show that BALR-SAM, without requiring prompts, outperforms several state-of-the-art (SOTA) methods, including fully fine-tuned MedSAM, while updating just 1.8% (11.7M) of its parameters.

Mixed Modality Segmentation Methodology In Silico Academic Lab Benchmark SOTA

Accelerating Cerebral Diagnostics with BrainFusion: A Comprehensive MRI Tumor Framework

Walid Houmaidi, Youssef Sabiri, Salmane El Mansour Billah, Amine Abouaomar

•preprint•Sep 29 2025

The early and accurate classification of brain tumors is crucial for guiding effective treatment strategies and improving patient outcomes. This study presents BrainFusion, a significant advancement in brain tumor analysis using magnetic resonance imaging (MRI) by combining fine-tuned convolutional neural networks (CNNs) for tumor classification--including VGG16, ResNet50, and Xception--with YOLOv8 for precise tumor localization with bounding boxes. Leveraging the Brain Tumor MRI Dataset, our experiments reveal that the fine-tuned VGG16 model achieves test accuracy of 99.86%, substantially exceeding previous benchmarks. Beyond setting a new accuracy standard, the integration of bounding-box localization and explainable AI techniques further enhances both the clinical interpretability and trustworthiness of the system's outputs. Overall, this approach underscores the transformative potential of deep learning in delivering faster, more reliable diagnoses, ultimately contributing to improved patient care and survival rates.

MRI Classification Neurological Methodology In Silico Benchmark SOTA

EVLF-FM: Explainable Vision Language Foundation Model for Medicine

Yang Bai, Haoran Cheng, Yang Zhou, Jun Zhou, Arun Thirunavukarasu, Yuhe Ke, Jie Yao, Kanae Fukutsu, Chrystie Wan Ning Quek, Ashley Hong, Laura Gutierrez, Zhen Ling Teo, Darren Shu Jeng Ting, Brian T. Soetikno, Christopher S. Nielsen, Tobias Elze, Zengxiang Li, Linh Le Dinh, Hiok Hong Chan, Victor Koh, Marcus Tan, Kelvin Z. Li, Leonard Yip, Ching Yu Cheng, Yih Chung Tham, Gavin Siew Wei Tan, Leopold Schmetterer, Marcus Ang, Rahat Hussain, Jod Mehta, Tin Aung, Lionel Tim-Ee Cheng, Tran Nguyen Tuan Anh, Chee Leong Cheng, Tien Yin Wong, Nan Liu, Iain Beehuat Tan, Soon Thye Lim, Eyal Klang, Tony Kiat Hon Lim, Rick Siow Mong Goh, Yong Liu, Daniel Shu Wei Ting

•preprint•Sep 29 2025

Despite the promise of foundation models in medical AI, current systems remain limited - they are modality-specific and lack transparent reasoning processes, hindering clinical adoption. To address this gap, we present EVLF-FM, a multimodal vision-language foundation model (VLM) designed to unify broad diagnostic capability with fine-grain explainability. The development and testing of EVLF-FM encompassed over 1.3 million total samples from 23 global datasets across eleven imaging modalities related to six clinical specialties: dermatology, hepatology, ophthalmology, pathology, pulmonology, and radiology. External validation employed 8,884 independent test samples from 10 additional datasets across five imaging modalities. Technically, EVLF-FM is developed to assist with multiple disease diagnosis and visual question answering with pixel-level visual grounding and reasoning capabilities. In internal validation for disease diagnostics, EVLF-FM achieved the highest average accuracy (0.858) and F1-score (0.797), outperforming leading generalist and specialist models. In medical visual grounding, EVLF-FM also achieved stellar performance across nine modalities with average mIOU of 0.743 and [email protected] of 0.837. External validations further confirmed strong zero-shot and few-shot performance, with competitive F1-scores despite a smaller model size. Through a hybrid training strategy combining supervised and visual reinforcement fine-tuning, EVLF-FM not only achieves state-of-the-art accuracy but also exhibits step-by-step reasoning, aligning outputs with visual evidence. EVLF-FM is an early multi-disease VLM model with explainability and reasoning capabilities that could advance adoption of and trust in foundation models for real-world clinical deployment.

Mixed Modality Classification Methodology In Silico Academic Lab Breakthrough Benchmark SOTA GenAI

Can Machine Learning Models Based on Radiomic and Clinical Information Improve Radiologists' Diagnostic Performance for Bone Tumors? An MRMC Study.

Pan D, Yuan L, Wang S, Zeng H, Liang T, Ruan C, Ao L, Li X, Chen W

•papers•Sep 29 2025

To explore whether machine learning models of bone tumors can improve the diagnostic performance of imaging physicians. Retrospective radiographic and clinical data collection from bone tumor patients to construct multiple machine learning models. Area under the curve (AUC) values were used as the primary assessment metric to select auxiliary models for this study. Seven readers were selected based on pre-experiment results from the Multireader multicase (MRMC) study. Two reading experiments were conducted using an independent test set to validate the value of interpretable models as clinician aids. We used the Obuchowski-Rockette method to compare differences in physician categorization. The extreme gradient boosting (XGBoost) model based on clinical information and radiomics features performed best for classification with an AUC value of 0.905 (95% CI: 0.841, 0.949). The interpretable algorithm suggested that gray level co-occurrence matrix (GLCM) features provided the most crucial predictive information for the classification model. The AUC was significantly higher for senior physicians (with 7-11 years of experience) than for junior physicians (with 2-5 years of experience) in reading musculoskeletal radiographs (0.929-0.956 vs. 0.812-0.906). The mean AUC value of the independent reading by the seven physicians was 0.904, and the mean AUC value of the model-assisted reading result was improved by 0.037 (95% CI: -0.074, -0.001%), which was statistically significant (P=0.047). The machine learning model based on the radiomics features and clinical information of knee X-ray images can effectively assist clinicians in completing the preoperative diagnosis of benign and malignant bone tumors.

X-Ray Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Evaluation of a commercial deep-learning-based contouring software for CT-based gynecological brachytherapy.

Yang HJ, Patrick J, Vickress J, D'Souza D, Velker V, Mendez L, Starling MM, Fenster A, Hoover D

•papers•Sep 29 2025

To evaluate a commercial deep-learning based auto-contouring software specifically trained for high-dose-rate gynecological brachytherapy. We collected CT images from 30 patients treated with gynecological brachytherapy (19.5-28 Gy in 3-4 fractions) at our institution from January 2018 to December 2022. Clinical and artificial intelligence (AI) generated contours for bladder, bowel, rectum, and sigmoid were obtained. Five patients were randomly selected from the test set and manually re-contoured by 4 radiation oncologists. Contouring was repeated 2 weeks later using AI contours as the starting point ("AI-assisted" approach). Comparisons amongst clinical, AI, AI-assisted, and manual retrospective contours were made using various metrics, including Dice similarity coefficient (DSC) and unsigned D2cc difference. Between clinical and AI contours, DSC was 0.92, 0.79, 0.62, 0.66, for bladder, rectum, sigmoid, and bowel, respectively. Rectum and sigmoid had the lowest median unsigned D2cc difference of 0.20 and 0.21 Gy/fraction respectively between clinical and AI contours, while bowel had the largest median difference of 0.38 Gy/fraction. Agreement between fully automated AI and clinical contours was generally not different compared to agreement between AI-assisted and clinical contours. AI-assisted interobserver agreement was better than manual interobserver agreement for all organs and metrics. The median time to contour all organs for manual and AI-assisted approaches was 14.8 and 6.9 minutes/patient (p < 0.001), respectively. The agreement between AI or AI-assisted contours against the clinical contours was similar to manual interobserver agreement. Implementation of the AI-assisted contouring approach could enhance clinical workflow by decreasing both contouring time and interobserver variability.

CT Segmentation Abdominal Retrospective Clinical Post Market Startup Benchmark SOTA

Towards population scale testis volume segmentation in DIXON MRI.

Ernsting J, Beeken PN, Ogoniak L, Kockwelp J, Roll W, Hahn T, Busch AS, Risse B

•papers•Sep 29 2025

Testis size is known to be one of the main predictors of male fertility, usually assessed in clinical workup via palpation or imaging. Despite its potential, population-level evaluation of testicular volume using imaging remains underexplored. Previous studies, limited by small and biased datasets, have demonstrated the feasibility of machine learning for testis volume segmentation. This paper presents an evaluation of segmentation methods for testicular volume using Magnetic Resonance Imaging data from the UKBiobank. The best model achieves a median dice score of 0.89, compared to median dice score of 0.85 for human interrater reliability on the same dataset, enabling large-scale annotation on a population scale for the first time. Our overall aim is to provide a trained model, comparative baseline methods, and annotated training data to enhance accessibility and reproducibility in testis MRI segmentation research.

MRI Segmentation Abdominal Methodology In Silico Academic Lab Benchmark SOTA Open Dataset Reproducibility

Predictive Value of MRI Radiomics for the Efficacy of High-Intensity Focused Ultrasound (HIFU) Ablation in Uterine Fibroids: A Systematic Review and Meta-Analysis.

Salimi M, Abdolizadeh A, Fayedeh F, Vadipour P

•papers•Sep 29 2025

High-Intensity Focused Ultrasound (HIFU) ablation has emerged as a non-invasive treatment option for uterine fibroids that preserves fertility and offers faster recovery. Pre-intervention prediction of HIFU efficacy can augment clinical decision-making and patient management. This systematic review and meta-analysis aims to evaluate the performance of MRI-based radiomics machine learning (ML) models in predicting the efficacy of HIFU ablation in uterine fibroids. Studies were retrieved by conducting a thorough literature search across databases including PubMed, Scopus, Embase, and Web of Science, up to June 2025. The quality of the included studies was assessed using the QUADAS-2 and METRICS tools. A meta-analysis of the radiomics models was conducted to pool sensitivity, specificity, and AUC using a bivariate random-effects model. A total of 13 studies were incorporated in the systematic review and meta-analysis. Meta-analysis of 608 patients from 7 internal and 6 external validation cohorts showed pooled AUC, sensitivity, and specificity of 0.84, 77%, and 78%, respectively. QUADAS-2 was notable for significant methodological biases in the index test and flow and timing domains. Across all studies, the mean METRICS score was 76.93%-with a range of 54.9%-90.3%-denoting good overall quality and performance in most domains but with notable gaps in the open science domain. MRI-based radiomics models show promise in predicting the effectiveness of HIFU ablation for uterine fibroids. However, limitations such as limited geographic diversity, inconsistent reporting standards, and poor open science practices hinder broader application. Therefore, future research should focus on standardizing imaging protocols, using multi-center designs with external validation, and integrating diverse data sources.

MRI Classification Abdominal Meta Analysis In Silico Academic Lab Benchmark SOTA Reproducibility

Filter Papers

Tags

Accurate Cobb Angle Estimation via SVD-Based Curve Detection and Vertebral Wedging Quantification

MMRQA: Signal-Enhanced Multimodal Large Language Models for MRI Quality Assessment

An Enhanced Pyramid Feature Network Based on Long-Range Dependencies for Multi-Organ Medical Image Segmentation

BALR-SAM: Boundary-Aware Low-Rank Adaptation of SAM for Resource-Efficient Medical Image Segmentation

Accelerating Cerebral Diagnostics with BrainFusion: A Comprehensive MRI Tumor Framework

EVLF-FM: Explainable Vision Language Foundation Model for Medicine

Can Machine Learning Models Based on Radiomic and Clinical Information Improve Radiologists' Diagnostic Performance for Bone Tumors? An MRMC Study.

Evaluation of a commercial deep-learning-based contouring software for CT-based gynecological brachytherapy.

Towards population scale testis volume segmentation in DIXON MRI.

Predictive Value of MRI Radiomics for the Efficacy of High-Intensity Focused Ultrasound (HIFU) Ablation in Uterine Fibroids: A Systematic Review and Meta-Analysis.

Ready to Sharpen Your Edge?