Latest Papers on Radiology AI. Tags: Benchmark SOTA

Medverse: A Universal Model for Full-Resolution 3D Medical Image Segmentation, Transformation and Enhancement

Jiesi Hu, Jianfeng Cao, Yanwu Yang, Chenfei Ye, Yixuan Zhang, Hanyang Peng, Ting Ma

•preprint•Sep 11 2025

In-context learning (ICL) offers a promising paradigm for universal medical image analysis, enabling models to perform diverse image processing tasks without retraining. However, current ICL models for medical imaging remain limited in two critical aspects: they cannot simultaneously achieve high-fidelity predictions and global anatomical understanding, and there is no unified model trained across diverse medical imaging tasks (e.g., segmentation and enhancement) and anatomical regions. As a result, the full potential of ICL in medical imaging remains underexplored. Thus, we present \textbf{Medverse}, a universal ICL model for 3D medical imaging, trained on 22 datasets covering diverse tasks in universal image segmentation, transformation, and enhancement across multiple organs, imaging modalities, and clinical centers. Medverse employs a next-scale autoregressive in-context learning framework that progressively refines predictions from coarse to fine, generating consistent, full-resolution volumetric outputs and enabling multi-scale anatomical awareness. We further propose a blockwise cross-attention module that facilitates long-range interactions between context and target inputs while preserving computational efficiency through spatial sparsity. Medverse is extensively evaluated on a broad collection of held-out datasets covering previously unseen clinical centers, organs, species, and imaging modalities. Results demonstrate that Medverse substantially outperforms existing ICL baselines and establishes a novel paradigm for in-context learning. Code and model weights will be made publicly available. Our model are publicly available at https://github.com/jiesihu/Medverse.

Mixed Modality Segmentation Whole Body Methodology In Silico Academic Lab Open Code Benchmark SOTA

A full-scale attention-augmented CNN-transformer model for segmentation of oropharyngeal mucosa organs-at-risk in radiotherapy.

He L, Sun J, Lu S, Li J, Wang X, Yan Z, Guan J

•papers•Sep 11 2025

Radiation-induced oropharyngeal mucositis (ROM) is a common and severe side effect of radiotherapy in nasopharyngeal cancer patients, leading to significant clinical complications such as malnutrition, infections, and treatment interruptions. Accurate delineation of the oropharyngeal mucosa (OPM) as an organ-at-risk (OAR) is crucial to minimizing radiation exposure and preventing ROM. This study aims to develop and validate an advanced automatic segmentation model, attention-augmented Swin U-Net transformer (AA-Swin UNETR), for accurate delineation of OPM to improve radiotherapy planning and reduce the incidence of ROM. We proposed a hybrid CNN-transformer model, AA-Swin UNETR, based on the Swin UNETR framework, which integrates hierarchical feature extraction with full-scale attention mechanisms. The model includes a Swin Transformer-based encoder and a CNN-based decoder with residual blocks, connected via a full-scale feature connection scheme. The full-scale attention mechanism enables the model to capture long-range dependencies and multi-level features effectively, enhancing the segmentation accuracy. The model was trained on a dataset of 202 CT scans from Nanfang Hospital, using expert manual delineations as the gold standard. We evaluated the performance of AA-Swin UNETR against state-of-the-art (SOTA) segmentation models, including Swin UNETR, nnUNet, and 3D UX-Net, using geometric and dosimetric evaluation parameters. The geometric metrics include Dice similarity coefficient (DSC), surface DSC (sDSC), volume similarity (VS), Hausdorff distance (HD), precision, and recall. The dosimetric metrics include changes of D0.1 cc and Dmean between results derived from manually delineated OPM and auto-segmentation models. The AA-Swin UNETR model achieved the highest mean DSC of 87.72 ± 1.98%, significantly outperforming Swin UNETR (83.53 ± 2.59%), nnUNet (85.48%± 2.68), and 3D UX-Net (80.04 ± 3.76%). The model also showed superior mean sDSC (98.44 ± 1.08%), mean VS (97.86 ± 1.43%), mean precision (87.60 ± 3.06%) and mean recall (89.22 ± 2.70%), with a competitive mean HD of 9.03 ± 2.79 mm. For dosimetric evaluation, the proposed model generates smallest mean [Formula: see text] (0.46 ± 4.92 cGy) and mean [Formula: see text] (6.26 ± 24.90 cGY) relative to manual delineation compared with other auto-segmentation results (mean [Formula: see text] of Swin UNETR = -0.56 ± 7.28 cGy, nnUNet = 0.99 ± 4.73 cGy, 3D UX-Net = -0.65 ± 8.05 cGy; mean [Formula: see text] of Swin UNETR = 7.46 ± 43.37, nnUNet = 21.76 ± 37.86 and 3D UX-Net = 44.61 ± 62.33). In this paper, we proposed a transformer and CNN hybrid deep-learning based model AA-Swin UNETR for automatic segmentation of OPM as an OAR structure in radiotherapy planning. Evaluations with geometric and dosimetric parameters demonstrated AA-Swin UNETR can generate delineations close to a manual reference, both in terms of geometry and dose-volume metrics. The proposed model out-performed existing SOTA models in both evaluation metrics and demonstrated is capability of accurately segmenting complex anatomical structures of the OPM, providing a reliable tool for enhancing radiotherapy planning.

CT Segmentation Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Novel BDefRCNLSTM: an efficient ensemble deep learning approaches for enhanced brain tumor detection and categorization with segmentation.

Janapati M, Akthar S

•papers•Sep 11 2025

Brain tumour detection and classification are critical for improving patient prognosis and treatment planning. However, manual identification from magnetic resonance imaging (MRI) scans is time-consuming, error-prone, and reliant on expert interpretation. The increasing complexity of tumour characteristics necessitates automated solutions to enhance accuracy and efficiency. This study introduces a novel ensemble deep learning model, boosted deformable and residual convolutional network with bi-directional convolutional long short-term memory (BDefRCNLSTM), for the classification and segmentation of brain tumours. The proposed framework integrates entropy-based local binary pattern (ELBP) for extracting spatial semantic features and employs the enhanced sooty tern optimisation (ESTO) algorithm for optimal feature selection. Additionally, an improved X-Net model is utilised for precise segmentation of tumour regions. The model is trained and evaluated on Figshare, Brain MRI, and Kaggle datasets using multiple performance metrics. Experimental results demonstrate that the proposed BDefRCNLSTM model achieves over 99% accuracy in both classification and segmentation, outperforming existing state-of-the-art approaches. The findings establish the proposed approach as a clinically viable solution for automated brain tumour diagnosis. The integration of optimised feature selection and advanced segmentation techniques improves diagnostic accuracy, potentially assisting radiologists in making faster and more reliable decisions.

MRI Segmentation Neurological Methodology In Silico Academic Lab Benchmark SOTA

Application of Deep Learning for Predicting Hematoma Expansion in Intracerebral Hemorrhage Using Computed Tomography Scans: A Systematic Review and Meta-Analysis of Diagnostic Accuracy.

Ahmadzadeh AM, Ashoobi MA, Broomand Lomer N, Elyassirad D, Gheiji B, Vatanparast M, Bathla G, Tu L

•papers•Sep 11 2025

We aimed to systematically review the studies that utilized deep learning (DL)-based networks to predict hematoma expansion (HE) in patients with intracerebral hemorrhage (ICH) using computed tomography (CT) images. We carried out a comprehensive literature search across four major databases to identify relevant studies. To evaluate the quality of the included studies, we used both the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) and the METhodological RadiomICs Score (METRICS) checklists. We then calculated pooled diagnostic estimates and assessed heterogeneity using the I2 statistic. To assess the sources of heterogeneity, effects of individual studies, and publication bias, we performed subgroup analysis, sensitivity analysis, and Deek's asymmetry test. Twenty-two studies were included in the qualitative synthesis, of which 11 and 6 were utilized for exclusive DL and combined DL meta-analyses, respectively. We found pooled sensitivity of 0.81 and 0.84, specificity of 0.79 and 0.91, positive diagnostic likelihood ratio (DLR) of 3.96 and 9.40, negative DLR of 0.23 and 0.18, diagnostic odds ratio of 16.97 and 53.51, and area under the curve of 0.87 and 0.89 for exclusive DL-based and combined DL-based models, respectively. Subgroup analysis revealed significant inter-group differences according to the segmentation technique and study quality. DL-based networks showed strong potential in accurately identifying HE in ICH patients. These models may guide earlier targeted interventions such as intensive blood pressure control or administration of hemostatic drugs, potentially leading to improved patient outcomes.

CT Classification Neurological Meta Analysis Post Market Benchmark SOTA

U-ConvNext: A Robust Approach to Glioma Segmentation in Intraoperative Ultrasound.

Vahdani AM, Rahmani M, Pour-Rashidi A, Ahmadian A, Farnia P

•papers•Sep 11 2025

Intraoperative tumor imaging is critical to achieving maximal safe resection during neurosurgery, especially for low-grade glioma resection. Given the convenience of ultrasound as an intraoperative imaging modality, but also the limitations of the ultrasound modality and the time-consuming process of manual tumor segmentation, we propose a learning-based model for the accurate segmentation of low-grade gliomas in ultrasound images. We developed a novel U-net-based architecture adopting the block architecture of the ConvNext V2 model, titled U-ConvNext, which also incorporates various architectural improvements including global response normalization, fine-tuned kernel sizes, and inception layers. We also adopted the CutMix data augmentation technique for semantic segmentation, aiming for enhanced texture detection. Conformal segmentation, a novel approach to conformal prediction for binary semantic segmentation, was also developed for uncertainty quantification, providing calibrated measures of model uncertainty in a visual format. The proposed models were trained and evaluated on three subsets of images in the RESECT dataset and achieved hold-out test Dice scores of 84.63%, 74.52%, and 90.82% on the "before," "during," and "after" subsets, respectively, which indicates increases of ~ 13-31% compared to the state of the art. Furthermore, external evaluation on the ReMIND dataset indicated a robust performance (dice score of 79.17% [95% CI: 77.82-81.62] and only a moderate decline of < 3% in expected calibration error. Our approach integrates various innovations in model design, model training, and uncertainty quantification, achieving improved results on the segmentation of low-grade glioma in ultrasound images during neurosurgery.

Ultrasound Segmentation Neurological Methodology In Silico Academic Lab Benchmark SOTA

Artificial intelligence in gastric cancer: a systematic review of machine learning and deep learning applications.

Alsallal M, Habeeb MS, Vaghela K, Malathi H, Vashisht A, Sahu PK, Singh D, Al-Hussainy AF, Aljanaby IA, Sameer HN, Athab ZH, Adil M, Yaseen A, Farhood B

•papers•Sep 11 2025

Gastric cancer (GC) remains a major global health concern, ranking as the fifth most prevalent malignancy and the fourth leading cause of cancer-related mortality worldwide. Although early detection can increase the 5-year survival rate of early gastric cancer (EGC) to over 90%, more than 80% of cases are diagnosed at advanced stages due to subtle clinical symptoms and diagnostic challenges. Artificial intelligence (AI), particularly machine learning (ML) and deep learning (DL), has shown great promise in addressing these limitations. This systematic review aims to evaluate the performance, applications, and limitations of ML and DL models in GC management, with a focus on their use in detection, diagnosis, treatment planning, and prognosis prediction across diverse clinical imaging and data modalities. Following the PRISMA 2020 guidelines, a comprehensive literature search was conducted in MEDLINE, Web of Science, and Scopus for studies published between 2004 and May 2025. Eligible studies applied ML or DL algorithms for diagnostic or prognostic tasks in GC using data from endoscopy, computed tomography (CT), pathology, or multi-modal sources. Two reviewers independently performed study selection, data extraction, and risk of bias assessment. A total of 59 studies met the inclusion criteria. DL models, particularly convolutional neural networks (CNNs), demonstrated strong performance in EGC detection, with reported sensitivities up to 95.3% and Area Under the Curve (AUCs) as high as 0.981, often exceeding expert endoscopists. CT-based radiomics and DL models achieved AUCs ranging from 0.825 to 0.972 for tumor staging and metastasis prediction. Pathology-based models reported accuracies up to 100% for EGC detection and AUCs up to 0.92 for predicting treatment response. Cross-modality approaches combining radiomics and pathomics achieved AUCs up to 0.951. Key challenges included algorithmic bias, limited dataset diversity, interpretability issues, and barriers to clinical integration. ML and DL models have demonstrated substantial potential to improve early detection, diagnostic accuracy, and individualized treatment in GC. To advance clinical adoption, future research should prioritize the development of large, diverse datasets, implement explainable AI frameworks, and conduct prospective clinical trials. These efforts will be essential for integrating AI into precision oncology and addressing the increasing global burden of gastric cancer.

CT Detection Abdominal Review In Silico Academic Lab Benchmark SOTA Ethics

Training With Local Data Remains Important for Deep Learning MRI Prostate Cancer Detection.

Carere SG, Jewell J, Nasute Fauerbach PV, Emerson DB, Finelli A, Ghai S, Haider MA

•papers•Sep 11 2025

Domain shift has been shown to have a major detrimental effect on AI model performance however prior studies on domain shift for MRI prostate cancer segmentation have been limited to small, or heterogenous cohorts. Our objective was to assess whether prostate cancer segmentation models trained on local MRI data continue to outperform those trained on external data with cohorts exceeding 1000. We simulated a multi-institutional consortium using the public PICAI dataset (PICAI-TRAIN: 1241 exams, PICAI-TEST: 259) and a local dataset (LOCAL-TRAIN: 1400 exams, LOCAL-TEST: 308). IRB approval was obtained and consent waived. We compared nnUNet-v2 models trained on the combined data (CENTRAL-TRAIN) and separately on PICAI-TRAIN and LOCAL-TRAIN. Accuracy was evaluated using the open source PICAI Score on LOCAL-TEST. Significance was tested using bootstrapping. Just 22% (309/1400) of LOCAL-TRAIN exams would be sufficient to match the performance of a model trained on PICAI-TRAIN. The CENTRAL-TRAIN performance was similar to LOCAL-TRAIN performance, with PICAI Scores [95% CI] of 65 [58-71] and 66 [60-72], respectively. Both of these models exceeded the model trained on PICAI-TRAIN alone which had a score of 58 [51-64] (P < .002). Reducing training set size did not alter these relative trends. Domain shift limits MRI prostate cancer segmentation performance even when training with over 1000 exams from 3 external institutions. Use of local data is paramount at these scales.

MRI Segmentation Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA Open Dataset

DualTrack: Sensorless 3D Ultrasound needs Local and Global Context

Paul F. R. Wilson, Matteo Ronchetti, Rüdiger Göbl, Viktoria Markova, Sebastian Rosenzweig, Raphael Prevost, Parvin Mousavi, Oliver Zettinig

•preprint•Sep 11 2025

Three-dimensional ultrasound (US) offers many clinical advantages over conventional 2D imaging, yet its widespread adoption is limited by the cost and complexity of traditional 3D systems. Sensorless 3D US, which uses deep learning to estimate a 3D probe trajectory from a sequence of 2D US images, is a promising alternative. Local features, such as speckle patterns, can help predict frame-to-frame motion, while global features, such as coarse shapes and anatomical structures, can situate the scan relative to anatomy and help predict its general shape. In prior approaches, global features are either ignored or tightly coupled with local feature extraction, restricting the ability to robustly model these two complementary aspects. We propose DualTrack, a novel dual-encoder architecture that leverages decoupled local and global encoders specialized for their respective scales of feature extraction. The local encoder uses dense spatiotemporal convolutions to capture fine-grained features, while the global encoder utilizes an image backbone (e.g., a 2D CNN or foundation model) and temporal attention layers to embed high-level anatomical features and long-range dependencies. A lightweight fusion module then combines these features to estimate the trajectory. Experimental results on a large public benchmark show that DualTrack achieves state-of-the-art accuracy and globally consistent 3D reconstructions, outperforming previous methods and yielding an average reconstruction error below 5 mm.

Ultrasound Reconstruction Methodology In Silico Academic Lab Benchmark SOTA

Explainable AI for Accelerated Microstructure Imaging: A SHAP-Guided Protocol on the Connectome 2.0 scanner

Quentin Uhl, Tommaso Pavan, Julianna Gerold, Kwok-Shing Chan, Yohan Jun, Shohei Fujita, Aneri Bhatt, Yixin Ma, Qiaochu Wang, Hong-Hsi Lee, Susie Y. Huang, Berkin Bilgic, Ileana Jelescu

•preprint•Sep 11 2025

The diffusion MRI Neurite Exchange Imaging model offers a promising framework for probing gray matter microstructure by estimating parameters such as compartment sizes, diffusivities, and inter-compartmental water exchange time. However, existing protocols require long scan times. This study proposes a reduced acquisition scheme for the Connectome 2.0 scanner that preserves model accuracy while substantially shortening scan duration. We developed a data-driven framework using explainable artificial intelligence with a guided recursive feature elimination strategy to identify an optimal 8-feature subset from a 15-feature protocol. The performance of this optimized protocol was validated in vivo and benchmarked against the full acquisition and alternative reduction strategies. Parameter accuracy, preservation of anatomical contrast, and test-retest reproducibility were assessed. The reduced protocol yielded parameter estimates and cortical maps comparable to the full protocol, with low estimation errors in synthetic data and minimal impact on test-retest variability. Compared to theory-driven and heuristic reduction schemes, the optimized protocol demonstrated superior robustness, reducing the deviation in water exchange time estimates by over two-fold. In conclusion, this hybrid optimization framework enables viable imaging of neurite exchange in 14 minutes without loss of parameter fidelity. This approach supports the broader application of exchange-sensitive diffusion magnetic resonance imaging in neuroscience and clinical research, and offers a generalizable method for designing efficient acquisition protocols in biophysical parameter mapping.

MRI Reconstruction Neurological Methodology In Silico Academic Lab GenAI Benchmark SOTA

Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis

Jing Hao, Yuxuan Fan, Yanpeng Sun, Kaixin Guo, Lizhuo Lin, Jinrong Yang, Qi Yong H. Ai, Lun M. Wong, Hao Tang, Kuo Feng Hung

•preprint•Sep 11 2025

Recent advances in large vision-language models (LVLMs) have demonstrated strong performance on general-purpose medical tasks. However, their effectiveness in specialized domains such as dentistry remains underexplored. In particular, panoramic X-rays, a widely used imaging modality in oral radiology, pose interpretative challenges due to dense anatomical structures and subtle pathological cues, which are not captured by existing medical benchmarks or instruction datasets. To this end, we introduce MMOral, the first large-scale multimodal instruction dataset and benchmark tailored for panoramic X-ray interpretation. MMOral consists of 20,563 annotated images paired with 1.3 million instruction-following instances across diverse task types, including attribute extraction, report generation, visual question answering, and image-grounded dialogue. In addition, we present MMOral-Bench, a comprehensive evaluation suite covering five key diagnostic dimensions in dentistry. We evaluate 64 LVLMs on MMOral-Bench and find that even the best-performing model, i.e., GPT-4o, only achieves 41.45% accuracy, revealing significant limitations of current models in this domain. To promote the progress of this specific domain, we also propose OralGPT, which conducts supervised fine-tuning (SFT) upon Qwen2.5-VL-7B with our meticulously curated MMOral instruction dataset. Remarkably, a single epoch of SFT yields substantial performance enhancements for LVLMs, e.g., OralGPT demonstrates a 24.73% improvement. Both MMOral and OralGPT hold significant potential as a critical foundation for intelligent dentistry and enable more clinically impactful multimodal AI systems in the dental field. The dataset, model, benchmark, and evaluation suite are available at https://github.com/isbrycee/OralGPT.

X-Ray Report Generation Dataset Release In Silico Academic Lab Open Dataset Open Code Benchmark SOTA

Filter Papers

Tags

Medverse: A Universal Model for Full-Resolution 3D Medical Image Segmentation, Transformation and Enhancement

A full-scale attention-augmented CNN-transformer model for segmentation of oropharyngeal mucosa organs-at-risk in radiotherapy.

Novel BDefRCNLSTM: an efficient ensemble deep learning approaches for enhanced brain tumor detection and categorization with segmentation.

Application of Deep Learning for Predicting Hematoma Expansion in Intracerebral Hemorrhage Using Computed Tomography Scans: A Systematic Review and Meta-Analysis of Diagnostic Accuracy.

U-ConvNext: A Robust Approach to Glioma Segmentation in Intraoperative Ultrasound.

Artificial intelligence in gastric cancer: a systematic review of machine learning and deep learning applications.

Training With Local Data Remains Important for Deep Learning MRI Prostate Cancer Detection.

DualTrack: Sensorless 3D Ultrasound needs Local and Global Context

Explainable AI for Accelerated Microstructure Imaging: A SHAP-Guided Protocol on the Connectome 2.0 scanner

Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis

Ready to Sharpen Your Edge?