Latest Papers on Radiology AI. Since Date: 2025-09-08, Until Date: 2025-09-14.

Machine Learning for Preoperative Assessment and Postoperative Prediction in Cervical Cancer: Multicenter Retrospective Model Integrating MRI and Clinicopathological Data.

Li S, Guo C, Fang Y, Qiu J, Zhang H, Ling L, Xu J, Peng X, Jiang C, Wang J, Hua K

•papers•Sep 12 2025

Machine learning (ML) has been increasingly applied to cervical cancer (CC) research. However, few studies have combined both clinical parameters and imaging data. At the same time, there remains an urgent need for more robust and accurate preoperative assessment of parametrial invasion and lymph node metastasis, as well as postoperative prognosis prediction. The objective of this study is to develop an integrated ML model combining clinicopathological variables and magnetic resonance image features for (1) preoperative parametrial invasion and lymph node metastasis detection and (2) postoperative recurrence and survival prediction. Retrospective data from 250 patients with CC (2014-2022; 2 tertiary hospitals) were analyzed. Variables were assessed for their predictive value regarding parametrial invasion, lymph node metastasis, survival, and recurrence using 7 ML models: K-nearest neighbor (KNN), support vector machine, decision tree, random forest (RF), balanced RF, weighted DT, and weighted KNN. Performance was assessed via 5-fold cross-validation using accuracy, sensitivity, specificity, precision, F1-score, and area under the receiver operating characteristic curve (AUC). The optimal models were deployed in an artificial intelligence-assisted contouring and prognosis prediction system. Among 250 women, there were 11 deaths and 24 recurrences. (1) For preoperative evaluation, the integrated model using balanced RF achieved optimal performance (sensitivity 0.81, specificity 0.85) for parametrial invasion, while weighted KNN achieved the best performance for lymph node metastasis (sensitivity 0.98, AUC 0.72). (2) For postoperative prognosis, weighted KNN also demonstrated high accuracy for recurrence (accuracy 0.94, AUC 0.86) and mortality (accuracy 0.97, AUC 0.77), with relatively balanced sensitivity of 0.80 and 0.33, respectively. (3) An artificial intelligence-assisted contouring and prognosis prediction system was developed to support preoperative evaluation and postoperative prognosis prediction. The integration of clinical data and magnetic resonance images provides enhanced diagnostic capability to preoperatively detect parametrial invasion and lymph node metastasis detection and prognostic capability to predict recurrence and mortality for CC, facilitating personalized, precise treatment strategies.

MRI Classification Abdominal Retrospective Clinical In Silico Academic Lab

Ex vivo human brain volumetry: Validation of MRI measurements.

Gérin-Lajoie A, Adame-Gonzalez W, Frigon EM, Guerra Sanches L, Nayouf A, Boire D, Dadar M, Maranzano J

•papers•Sep 12 2025

The volume of in vivo human brains is determined with various MRI measurement tools that have not been assessed against a gold standard. The purpose of this study was to validate the MRI brain volumes by scanning ex vivo, in situ specimens, which allows the extraction of the brain after the scan to compare its volume with the gold-standard water displacement method (WDM). The 3T MRI T2-weighted, T1-weighted, and MP2RAGE images of seven anatomical heads fixed with an alcohol-formaldehyde solution were acquired. The gray and white matter were assessed using two methods: (i) a manual intensity-based threshold segmentation using Display (MINC-ToolKit) and (ii) an automatic deep learning-based segmentation tool (SynthSeg). The brains were extracted and their volumes measured with the WDM after the removal of their meninges and a midsagittal cut. Volumes from all methods were compared with the ground truth (WDM volumes) using a repeated-measures analysis of variance. Mean brain volumes, in cubic centimeters, were 1111.14 ± 121.78 for WDM, 1020.29 ± 70.01 for manual T2-weighted, 1056.29 ± 90.54 for automatic T2-weighted, 1094.69 ± 100.51 for automatic T1-weighted, 1066.56 ± 96.52 for automatic magnetization-prepared 2 rapid gradient-echo first inversion time, and 1156.18 ± 121.87 for automatic magnetization-prepared 2 rapid gradient-echo second inversion time. All volumetry methods were significantly different (F = 17.874; p < 0.001) from the WDM volumes, except the automatic T1-weighted volumes. SynthSeg accurately determined the brain volume in ex vivo, in situ T1-weighted MRI scans. The results suggested that given the contrast similarity between the ex vivo and in vivo sequences, the brain volumes of clinical studies are most probably sufficiently accurate, with some degree of underestimation depending on the sequence used.

MRI Segmentation Neurological Retrospective Clinical In Silico Benchmark SOTA

Assessing accuracy and legitimacy of multimodal large language models on Japan Diagnostic Radiology Board Examination.

Hirano Y, Miki S, Yamagishi Y, Hanaoka S, Nakao T, Kikuchi T, Nakamura Y, Nomura Y, Yoshikawa T, Abe O

•papers•Sep 12 2025

To assess and compare the accuracy and legitimacy of multimodal large language models (LLMs) on the Japan Diagnostic Radiology Board Examination (JDRBE). The dataset comprised questions from JDRBE 2021, 2023, and 2024, with ground-truth answers established through consensus among multiple board-certified diagnostic radiologists. Questions without associated images and those lacking unanimous agreement on answers were excluded. Eight LLMs were evaluated: GPT-4 Turbo, GPT-4o, GPT-4.5, GPT-4.1, o3, o4-mini, Claude 3.7 Sonnet, and Gemini 2.5 Pro. Each model was evaluated under two conditions: with inputting images (vision) and without (text-only). Performance differences between the conditions were assessed using McNemar's exact test. Two diagnostic radiologists (with 2 and 18 years of experience) independently rated the legitimacy of responses from four models (GPT-4 Turbo, Claude 3.7 Sonnet, o3, and Gemini 2.5 Pro) using a five-point Likert scale, blinded to model identity. Legitimacy scores were analyzed using Friedman's test, followed by pairwise Wilcoxon signed-rank tests with Holm correction. The dataset included 233 questions. Under the vision condition, o3 achieved the highest accuracy at 72%, followed by o4-mini (70%) and Gemini 2.5 Pro (70%). Under the text-only condition, o3 topped the list with an accuracy of 67%. Addition of image input significantly improved the accuracy of two models (Gemini 2.5 Pro and GPT-4.5), but not the others. Both o3 and Gemini 2.5 Pro received significantly higher legitimacy scores than GPT-4 Turbo and Claude 3.7 Sonnet from both raters. Recent multimodal LLMs, particularly o3 and Gemini 2.5 Pro, have demonstrated remarkable progress on JDRBE questions, reflecting their rapid evolution in diagnostic radiology. Eight multimodal large language models were evaluated on the Japan Diagnostic Radiology Board Examination. OpenAI's o3 and Google DeepMind's Gemini 2.5 Pro achieved high accuracy rates (72% and 70%) and received good legitimacy scores from human raters, demonstrating steady progress.

Mixed Modality LLM Radiology Report Retrospective Clinical In Silico Academic Lab Benchmark SOTA

MultiASNet: Multimodal Label Noise Robust Framework for the Classification of Aortic Stenosis in Echocardiography.

Wu V, Fung A, Khodabakhshian B, Abdelsamad B, Vaseli H, Ahmadi N, Goco JAD, Tsang MY, Luong C, Abolmaesumi P, Tsang TSM

•papers•Sep 12 2025

Aortic stenosis (AS), a prevalent and serious heart valve disorder, requires early detection but remains difficult to diagnose in routine practice. Although echocardiography with Doppler imaging is the clinical standard, these assessments are typically limited to trained specialists. Point-of-care ultrasound (POCUS) offers an accessible alternative for AS screening but is restricted to basic 2D B-mode imaging, often lacking the analysis Doppler provides. Our project introduces MultiASNet, a multimodal machine learning framework designed to enhance AS screening with POCUS by combining 2D B-mode videos with structured data from echocardiography reports, including Doppler parameters. Using contrastive learning, MultiASNet aligns video features with report features in tabular form from the same patient to improve interpretive quality. To address misalignment where a single report corresponds to multiple video views, some irrelevant to AS diagnosis, we use cross-attention in a transformer-based video and tabular network to assign less importance to irrelevant report data. The model integrates structured data only during training, enabling independent use with B-mode videos during inference for broader accessibility. MultiASNet also incorporates sample selection to counteract label noise from observer variability, yielding improved accuracy on two datasets. We achieved balanced accuracy scores of 93.0% on a private dataset and 83.9% on the public TMED-2 dataset for AS detection. For severity classification, balanced accuracy scores were 80.4% and 59.4% on the private and public datasets, respectively. This model facilitates reliable AS screening in non-specialist settings, bridging the gap left by Doppler data while reducing noise-related errors. Our code is publicly available at github.com/DeepRCL/MultiASNet.

Ultrasound Classification Cardiac Methodology In Silico Academic Lab Open Code

Deep learning-powered temperature prediction for optimizing transcranial MR-guided focused ultrasound treatment.

Xiong Y, Yang M, Arkin M, Li Y, Duan C, Bian X, Lu H, Zhang L, Wang S, Ren X, Li X, Zhang M, Zhou X, Pan L, Lou X

•papers•Sep 12 2025

Precise temperature control is challenging during transcranial MR-guided focused ultrasound (MRgFUS) treatment. The aim of this study was to develop a deep learning model integrating the treatment parameters for each sonication, along with patient-specific clinical information and skull metrics, for prediction of the MRgFUS therapeutic temperature. This is a retrospective analysis of sonications from patients with essential tremor or Parkinson's disease who underwent unilateral MRgFUS thalamotomy or pallidothalamic tractotomy at a single hospital from January 2019 to June 2023. For model training, a dataset of 600 sonications (72 patients) was used, while a validation dataset comprising 199 sonications (18 patients) was used to assess model performance. Additionally, an external dataset of 146 sonications (20 patients) was used for external validation. The developed deep learning model, called Fust-Net, achieved high predictive accuracy, with normalized mean absolute errors of 1.655°C for the internal dataset and 2.432°C for the external dataset, which closely matched the actual temperature. The graded evaluation showed that Fust-Net achieved an effective temperature prediction rate of 82.6%. These results showcase the exciting potential of Fust-Net for achieving precise temperature control during MRgFUS treatment, opening new doors for enhanced precision and safety in clinical applications.

MRI Registration Neurological Retrospective Clinical In Silico Academic Lab

The best diagnostic approach for classifying ischemic stroke onset time: A systematic review and meta-analysis.

Zakariaee SS, Kadir DH, Molazadeh M, Abdi S

•papers•Sep 12 2025

The success of intravenous thrombolysis with tPA (IV-tPA) as the fastest and easiest treatment for stroke patients is closely related to time since stroke onset (TSS). Administering IV-tPA after the recommended time interval (< 4.5 h) increases the risk of cerebral hemorrhage. Despite advances in diagnostic approaches have been made, the determination of TSS remains a clinical challenge. In this study, the performances of different diagnostic approaches were investigated to classify TSS. A systematic literature search was conducted in Web of Science, Pubmed, Scopus, Embase, and Cochrane databases until July 2025. The overall AUC, sensitivity, and specificity magnitudes with their 95%CIs were determined for each diagnostic approach to evaluate their classification performances. This systematic review retrieved a total number of 9030 stroke patients until July 2025. The results showed that the human readings of DWI-FLAIR mismatch as the current gold standard method with AUC = 0.71 (95%CI: 0.66-0.76), sensitivity = 0.62 (95%CI: 0.54-0.71), and specificity = 0.78 (95%CI: 0.72-0.84) has a moderate performance to identify the TSS. ML model fed by radiomic features of CT data with AUC = 0.89 (95%CI: 0.80-0.98), sensitivity = 0.85 (95%CI: 0.75-0.96), and specificity = 0.86 (95%CI: 0.73-1.00) has the best performance in classifying TSS among the models reviewed. ML models fed by radiomic features better classify TSS than the human reading of DWI-FLAIR mismatch. An efficient AI model fed by CT radiomic data could yield the best classification performance to determine patients' eligibility for IV-tPA treatment and improve treatment outcomes.

CT Classification Neurological Meta Analysis In Silico Benchmark SOTA

Predicting molecular subtypes of pediatric medulloblastoma using MRI-based artificial intelligence: A systematic review and meta-analysis.

Liu J, Zou Z, He Y, Guo Z, Yi C, Huang B

•papers•Sep 12 2025

This meta-analysis aims to assess the diagnostic performance of artificial intelligence (AI) based on magnetic resonance imaging (MRI) in detecting molecular subtypes of pediatric medulloblastoma (MB) in children. A thorough review of the literature was performed using PubMed, Embase, and Web of Science to locate pertinent studies released prior to October 2024. Selected studies focused on the diagnostic performance of AI based on MRI in detecting molecular subtypes of pediatric MB. A bivariate random-effects model was used to calculate pooled sensitivity and specificity, both with 95% confidence intervals (CI). Study heterogeneity was assessed using I2 statistics. Among the 540 studies determined, eight studies (involving 1195 patients) were included. For the wingless (WNT), the combined sensitivity, specificity, and receiver operating characteristic curve (AUC) based on MRI were 0.73 (95% CI: 0.61-0.83, I2 = 19%), 0.94 (95% CI: 0.79-0.99, I2 = 93%), and 0.80 (95% CI: 0.77-0.83), respectively. For the sonic hedgehog (SHH), the combined sensitivity, specificity, and AUC were 0.64 (95% CI: 0.51-0.75, I2 = 69%), 0.84 (95% CI: 0.80-0.88, I2 = 54%), and 0.85 (95% CI: 0.81-0.88), respectively. For Group 3 (G3), the combined sensitivity, specificity, and AUC were 0.89 (95% CI: 0.52-0.98, I2 = 82%), 0.70 (95% CI: 0.62-0.77, I2 = 44%), and 0.88 (95% CI: 0.84-0.90), respectively. For Group 4 (G4), the combined sensitivity, specificity, and AUC were 0.77 (95% CI: 0.64-0.87, I2 = 54%), 0.91 (95% CI: 0.68-0.98, I2 = 80%), and 0.86 (95% CI: 0.83-0.89), respectively. MRI-based artificial intelligence shows high diagnostic performance in detecting molecular subtypes of pediatric MB. However, all included studies employed retrospective designs, which may introduce potential biases. More researches using external validation datasets are needed to confirm the results and assess their clinical applicability.

MRI Classification Neurological Meta Analysis In Silico

Automatic approach for B-lines detection in lung ultrasound images using You Only Look Once algorithm.

Bottino A, Botrugno C, Casciaro E, Conversano F, Lay-Ekuakille A, Lombardi FA, Morello R, Pisani P, Vetrugno L, Casciaro S

•papers•Sep 11 2025

B-lines are among the key artifact signs observed in Lung Ultrasound (LUS), playing a critical role in differentiating pulmonary diseases and assessing overall lung condition. However, their accurate detection and quantification can be time-consuming and technically challenging, especially for less experienced operators. This study aims to evaluate the performance of a YOLO (You Only Look Once)-based algorithm for the automated detection of B-lines, offering a novel tool to support clinical decision-making. The proposed approach is designed to improve the efficiency and consistency of LUS interpretation, particularly for non-expert practitioners, and to enhance its utility in guiding respiratory management. In this observational agreement study, 644 images from both anonymized internal and clinical online database were evaluated. After a quality selection step, 386 images remained available for analysis from 46 patients. Ground truth was established by blinded expert sonographer identifying B-lines within rectangular Region Of Interest (ROI) on each frame. Algorithm performances were assessed through Precision, Recall and F1 Score, whereas to quantify the agreement between the YOLO-based algorithm and the expert operator, weighted kappa (kw) statistics were employed. The algorithm achieved a precision of 0.92 (95% CI 0.89-0.94), recall of 0.81 (95% CI 0.77-0.85), and F1-score of 0.86 (95% CI 0.83-0.88). The weighted kappa was 0.68 (95% CI 0.64-0.72), indicating substantial agreement algorithm and expert annotations. The proposed algorithm has demonstrated its potential to significantly enhance diagnostic support by accurately detecting B-lines in LUS images.

Ultrasound Detection Chest Retrospective Clinical In Silico Academic Lab

U-ConvNext: A Robust Approach to Glioma Segmentation in Intraoperative Ultrasound.

Vahdani AM, Rahmani M, Pour-Rashidi A, Ahmadian A, Farnia P

•papers•Sep 11 2025

Intraoperative tumor imaging is critical to achieving maximal safe resection during neurosurgery, especially for low-grade glioma resection. Given the convenience of ultrasound as an intraoperative imaging modality, but also the limitations of the ultrasound modality and the time-consuming process of manual tumor segmentation, we propose a learning-based model for the accurate segmentation of low-grade gliomas in ultrasound images. We developed a novel U-net-based architecture adopting the block architecture of the ConvNext V2 model, titled U-ConvNext, which also incorporates various architectural improvements including global response normalization, fine-tuned kernel sizes, and inception layers. We also adopted the CutMix data augmentation technique for semantic segmentation, aiming for enhanced texture detection. Conformal segmentation, a novel approach to conformal prediction for binary semantic segmentation, was also developed for uncertainty quantification, providing calibrated measures of model uncertainty in a visual format. The proposed models were trained and evaluated on three subsets of images in the RESECT dataset and achieved hold-out test Dice scores of 84.63%, 74.52%, and 90.82% on the "before," "during," and "after" subsets, respectively, which indicates increases of ~ 13-31% compared to the state of the art. Furthermore, external evaluation on the ReMIND dataset indicated a robust performance (dice score of 79.17% [95% CI: 77.82-81.62] and only a moderate decline of < 3% in expected calibration error. Our approach integrates various innovations in model design, model training, and uncertainty quantification, achieving improved results on the segmentation of low-grade glioma in ultrasound images during neurosurgery.

Ultrasound Segmentation Neurological Methodology In Silico Academic Lab Benchmark SOTA

Enhancing Oral Health Diagnostics With Hyperspectral Imaging and Computer Vision: Clinical Dataset Study.

Römer P, Ponciano JJ, Kloster K, Siegberg F, Plaß B, Vinayahalingam S, Al-Nawas B, Kämmerer PW, Klauer T, Thiem D

•papers•Sep 11 2025

Diseases of the oral cavity, including oral squamous cell carcinoma, pose major challenges to health care worldwide due to their late diagnosis and complicated differentiation of oral tissues. The combination of endoscopic hyperspectral imaging (HSI) and deep learning (DL) models offers a promising approach to the demand for modern, noninvasive tissue diagnostics. This study presents a large-scale in vivo dataset designed to support DL-based segmentation and classification of healthy oral tissues. This study aimed to develop a comprehensive, annotated endoscopic HSI dataset of the oral cavity and to demonstrate automated, reliable differentiation of intraoral tissue structures by integrating endoscopic HSI with advanced machine learning methods. A total of 226 participants (166 women [73.5%], 60 men [26.5%], aged 24-87 years) were examined using an endoscopic HSI system, capturing spectral data in the range of 500 to 1000 nm. Oral structures in red, green, and blue and HSI scans were annotated using RectLabel Pro (by Ryo Kawamura). DeepLabv3 (Google Research) with a ResNet-50 backbone was adapted for endoscopic HSI segmentation. The model was trained for 50 epochs on 70% of the dataset, with 30% for evaluation. Performance metrics (precision, recall, and F1-score) confirmed its efficacy in distinguishing oral tissue types. DeepLabv3 (ResNet-101) and U-Net (EfficientNet-B0/ResNet-50) achieved the highest overall F1-scores of 0.857 and 0.84, respectively, particularly excelling in segmenting the mucosa (0.915), retractor (0.94), tooth (0.90), and palate (0.90). Variability analysis confirmed high spectral diversity across tissue classes, supporting the dataset's complexity and authenticity for realistic clinical conditions. The presented dataset addresses a key gap in oral health imaging by developing and validating robust DL algorithms for endoscopic HSI data. It enables accurate classification of oral tissue and paves the way for future applications in individualized noninvasive pathological tissue analysis, early cancer detection, and intraoperative diagnostics of oral diseases.

OCT Segmentation Dataset Release In Silico Academic Lab Open Dataset

Filter Papers

Tags

Machine Learning for Preoperative Assessment and Postoperative Prediction in Cervical Cancer: Multicenter Retrospective Model Integrating MRI and Clinicopathological Data.

Ex vivo human brain volumetry: Validation of MRI measurements.

Assessing accuracy and legitimacy of multimodal large language models on Japan Diagnostic Radiology Board Examination.

MultiASNet: Multimodal Label Noise Robust Framework for the Classification of Aortic Stenosis in Echocardiography.

Deep learning-powered temperature prediction for optimizing transcranial MR-guided focused ultrasound treatment.

The best diagnostic approach for classifying ischemic stroke onset time: A systematic review and meta-analysis.

Predicting molecular subtypes of pediatric medulloblastoma using MRI-based artificial intelligence: A systematic review and meta-analysis.

Automatic approach for B-lines detection in lung ultrasound images using You Only Look Once algorithm.

U-ConvNext: A Robust Approach to Glioma Segmentation in Intraoperative Ultrasound.

Enhancing Oral Health Diagnostics With Hyperspectral Imaging and Computer Vision: Clinical Dataset Study.

Ready to Sharpen Your Edge?