Latest Papers on Radiology AI. Tags: None

TN5000: An Ultrasound Image Dataset for Thyroid Nodule Detection and Classification.

Zhang H, Liu Q, Han X, Niu L, Sun W

•papers•Aug 16 2025

Accurate diagnosis of thyroid nodules using ultrasonography is a highly valuable, but challenging task. With the emergence of artificial intelligence, deep learning based methods can provide assistance to radiologists, whose performance heavily depends on the quantity and quality of training data, but current ultrasound image datasets for thyroid nodule either directly utilize the TI-RADS assessments as labels or are not publicly available. Faced with these issues, an open-access ultrasound image dataset for thyroid nodule detection and classification is proposed, i.e. the TN5000, which comprises 5,000 B-mode ultrasound images of thyroid nodule, as well as complete annotations and biopsy confirmations by expert radiologists. Additionally, the statistical characteristics of this proposed dataset have been analyzed clearly, some baseline methods for the detection and classification of thyroid nodules are recommended as the benchmark, along with their evaluation results. To our best knowledge, TN5000 is the largest open-access ultrasound image dataset of thyroid nodule with professional labeling, and is the first ultrasound image dataset designed both for the thyroid nodule detection and classification. These kinds of images with annotations can contribute to analyze the intrinsic properties of thyroid nodules and to determine the necessity of FNA biopsy, which are crucial in ultrasound diagnosis.

Ultrasound Detection Dataset Release In Silico Open Dataset

VariMix: A variety-guided data mixing framework for explainable medical image classifications.

Xiong X, Sun Y, Liu X, Ke W, Lam CT, Gao Q, Tong T, Li S, Tan T

•papers•Aug 16 2025

Modern deep neural networks are highly over-parameterized, necessitating the use of data augmentation techniques to prevent overfitting and enhance generalization. Generative adversarial networks (GANs) are popular for synthesizing visually realistic images. However, these synthetic images often lack diversity and may have ambiguous class labels. Recent data mixing strategies address some of these issues by mixing image labels based on salient regions. Since the main diagnostic information is not always contained within the salient regions, we aim to address the resulting label mismatches in medical image classifications. We propose a variety-guided data mixing framework (VariMix), which exploits an absolute difference map (ADM) to address the label mismatch problems of mixed medical images. VariMix generates ADM using the image-to-image (I2I) GAN across multiple classes and allows for bidirectional mixing operations between the training samples. The proposed VariMix achieves the highest accuracy of 99.30% and 94.60% with a SwinT V2 classifier on a Chest X-ray (CXR) dataset and a Retinal dataset, respectively. It also achieves the highest accuracy of 87.73%, 99.28%, 95.13%, and 95.81% with a ConvNeXt classifier on a Breast Ultrasound (US) dataset, a CXR dataset, a Retinal dataset, and a Maternal-Fetal US dataset, respectively. Furthermore, the medical expert evaluation on generated images shows the great potential of our proposed I2I GAN in improving the accuracy of medical image classifications. Extensive experiments demonstrate the superiority of VariMix compared with the existing GAN- and Mixup-based methods on four public datasets using Swin Transformer V2 and ConvNeXt architectures. Furthermore, by projecting the source image to the hyperplanes of the classifiers, the proposed I2I GAN can generate hyperplane difference maps between the source image and the hyperplane image, demonstrating its ability to interpret medical image classifications. The source code is provided in https://github.com/yXiangXiong/VariMix.

Mixed Modality Classification Chest Methodology In Silico Academic Lab Open Code GenAI

Diagnostic performance of deep learning for predicting glioma isocitrate dehydrogenase and 1p/19q co-deletion in MRI: a systematic review and meta-analysis.

Farahani S, Hejazi M, Tabassum M, Di Ieva A, Mahdavifar N, Liu S

•papers•Aug 16 2025

We aimed to evaluate the diagnostic performance of deep learning (DL)-based radiomics models for the noninvasive prediction of isocitrate dehydrogenase (IDH) mutation and 1p/19q co-deletion status in glioma patients using MRI sequences, and to identify methodological factors influencing accuracy and generalizability. Following PRISMA guidelines, we systematically searched major databases (PubMed, Scopus, Embase, Web of Science, and Google Scholar) up to March 2025, screening studies that utilized DL to predict IDH and 1p/19q co-deletion status from MRI data. We assessed study quality and risk of bias using the Radiomics Quality Score and the QUADAS-2 tool. Our meta-analysis employed a bivariate model to compute pooled sensitivity and specificity, and meta-regression to assess interstudy heterogeneity. Among the 1517 unique publications, 104 were included in the qualitative synthesis, and 72 underwent meta-analysis. Pooled estimates for IDH prediction in test cohorts yielded a sensitivity of 0.80 (95% CI: 0.77-0.83) and specificity of 0.85 (95% CI: 0.81-0.87). For 1p/19q co-deletion, sensitivity was 0.75 (95% CI: 0.65-0.82) and specificity was 0.82 (95% CI: 0.75-0.88). Meta-regression identified the tumor segmentation method and the extent of DL integration into the radiomics pipeline as significant contributors to interstudy variability. Although DL models demonstrate strong potential for noninvasive molecular classification of gliomas, clinical translation requires several critical steps: harmonization of multi-center MRI data using techniques such as histogram matching and DL-based style transfer; adoption of standardized and automated segmentation protocols; extensive multi-center external validation; and prospective clinical validation. Question Can DL based radiomics using routine MRI noninvasively predict IDH mutation and 1p/19q co-deletion status in gliomas, and what factors affect diagnostic accuracy? Findings Meta-analysis showed 80% sensitivity and 85% specificity for predicting IDH mutation, and 75% sensitivity and 82% specificity for 1p/19q co-deletion status. Clinical relevance MRI-based DL models demonstrate clinically useful accuracy for noninvasive glioma molecular classification, but data harmonization, standardized automated segmentation, and rigorous multi-center external validation are essential for clinical adoption.

MRI Classification Neurological Meta Analysis In Silico Benchmark SOTA

Improving skull-stripping for infant MRI via weakly supervised domain adaptation using adversarial learning.

Omidi A, Shamaei A, Aktar M, King R, Leijser L, Souza R

•papers•Aug 16 2025

Skull-stripping is an essential preprocessing step in the analysis of brain Magnetic Resonance Imaging (MRI). While deep learning-based methods have shown success with this task, strong domain shifts between adult and newborn brain MR images complicate model transferability. We previously developed unsupervised domain adaptation techniques to address the domain shift between these data, without requiring newborn MRI data to be labeled. In this work, we build upon our previous domain adaptation framework by extensively expanding the training and validation datasets using weakly labeled newborn MRI scans from the Developing Human Connectome Project (dHCP), our private newborn dataset, and synthetic data generated by a Gaussian Mixture Model (GMM). While the core model architecture remains similar, we focus on validating the model's generalization across four diverse domains, adult, synthetic, public newborn, and private newborn MRI, demonstrating improved performance and robustness over our prior methods. These results highlight the impact of incorporating broader training data under weak supervision for newborn brain imaging analysis. The experimental results reveal that our proposed approach outperforms our previous work achieving a Dice coefficient of 0.9509±0.0055 and a Hausdorff distance of 3.0883±0.1833 for newborn MRI data, surpassing state-of-the-art models such as SynthStrip (Dice =0.9412±0.0063, Hausdorff =3.1570±0.1389). These results reveal that including weakly labeled newborn data results in improvements in model performance and generalization and is useful for newborn brain imaging analysis. Our code is available at: https://github.com/abbasomidi77/Weakly-Supervised-DAUnet.

MRI Segmentation Neurological Methodology In Silico Academic Lab Open Code Benchmark SOTA

Deep learning-based identification of necrosis and microvascular proliferation in adult diffuse gliomas from whole-slide images

Guo, Y., Huang, H., Liu, X., Zou, W., Qiu, F., Liu, Y., Chai, R., Jiang, T., Wang, J.

•preprint•Aug 16 2025

For adult diffuse gliomas (ADGs), most grading can be achieved through molecular subtyping, retaining only two key histopathological features for high-grade glioma (HGG): necrosis (NEC) and microvascular proliferation (MVP). We developed a deep learning (DL) framework to automatically identify and characterize these features. We trained patch-level models to detect and quantify NEC and MVP using a dataset that employed active learning, incorporating patches from 621 whole-slide images (WSIs) from the Chinese Glioma Genome Atlas (CGGA). Utilizing trained patch-level models, we effectively integrated the predicted outcomes and positions of individual patches within WSIs from The Cancer Genome Atlas (TCGA) cohort to form datasets. Subsequently, we introduced a patient-level model, named PLNet (Probability Localization Network), which was trained on these datasets to facilitate patient diagnosis. We also explored the subtypes of NEC and MVP based on the features extracted from patch-level models with clustering process applied on all positive patches. The patient-level models demonstrated exceptional performance, achieving an AUC of 0.9968, 0.9995 and AUPRC of 0.9788, 0.9860 for NEC and MVP, respectively. Compared to pathological reports, our patient-level models achieved the accuracy of 88.05% for NEC and 90.20% for MVP, along with a sensitivity of 73.68% and 77%. When sensitivity was set at 80%, the accuracy for NEC reached 79.28% and for MVP reached 77.55%. DL models enabled more efficient and accurate histopathological image analysis which will aid traditional glioma diagnosis. Clustering-based analyses utilizing features extracted from patch-level models could further investigate the subtypes of NEC and MVP.

Mixed Modality Detection Neurological Methodology In Silico Academic Lab Benchmark SOTA

Developing biomarkers and methods of risk stratification: Consensus statements from the International Kidney Cancer Symposium North America 2024 Think Tank.

Shapiro DD, Abel EJ, Albiges L, Battle D, Berg SA, Campbell MT, Cella D, Coleman K, Garmezy B, Geynisman DM, Hall T, Henske EP, Jonasch E, Karam JA, La Rosa S, Leibovich BC, Maranchie JK, Master VA, Maughan BL, McGregor BA, Msaouel P, Pal SK, Perez J, Plimack ER, Psutka SP, Riaz IB, Rini BI, Shuch B, Simon MC, Singer EA, Smith A, Staehler M, Tang C, Tannir NM, Vaishampayan U, Voss MH, Zakharia Y, Zhang Q, Zhang T, Carlo MI

•papers•Aug 16 2025

Accurate prognostication and personalized treatment selection remain major challenges in kidney cancer. This consensus initiative aimed to provide actionable expert guidance on the development and clinical integration of prognostic and predictive biomarkers and risk stratification tools to improve patient care and guide future research. A modified Delphi method was employed to develop consensus statements among a multidisciplinary panel of experts in urologic oncology, medical oncology, radiation oncology, pathology, molecular biology, radiology, outcomes research, biostatistics, industry, and patient advocacy. Over 3 rounds, including an in-person meeting 20 initial statements were evaluated, refined, and voted on. Consensus was defined a priori as a median Likert score ≥8. Nineteen final consensus statements were endorsed. These span key domains including biomarker prioritization (favoring prognostic biomarkers), rigorous methodology for subgroup and predictive analyses, the development of multi-institutional prospective registries, incorporation of biomarkers in trial design, and improvements in data/biospecimen access. The panel also identified high-priority biomarker types (e.g., AI-based image analysis, ctDNA) for future research. This is the first consensus statement specifically focused on biomarker and risk model development for kidney cancer using a structured Delphi process. The recommendations emphasize the need for rigorous methodology, collaborative infrastructure, prospective data collection, and focus on clinically translatable biomarkers. The resulting framework is intended to guide researchers, cooperative groups, and stakeholders in advancing personalized care for patients with kidney cancer.

Mixed Modality Classification Abdominal Review Concept Consortium Policy GenAI

Impact of Clinical Image Quality on Efficient Foundation Model Finetuning

Yucheng Tang, Pawel Rajwa, Alexander Ng, Yipei Wang, Wen Yan, Natasha Thorley, Aqua Asif, Clare Allen, Louise Dickinson, Francesco Giganti, Shonit Punwani, Daniel C. Alexander, Veeru Kasivisvanathan, Yipeng Hu

•preprint•Aug 16 2025

Foundation models in medical imaging have shown promising label efficiency, achieving high downstream performance with only a fraction of annotated data. Here, we evaluate this in prostate multiparametric MRI using ProFound, a domain-specific vision foundation model pretrained on large-scale prostate MRI datasets. We investigate how variable image quality affects label-efficient finetuning by measuring the generalisability of finetuned models. Experiments systematically vary high-/low-quality image ratios in finetuning and evaluation sets. Our findings indicate that image quality distribution and its finetune-and-test mismatch significantly affect model performance. In particular: a) Varying the ratio of high- to low-quality images between finetuning and test sets leads to notable differences in downstream performance; and b) The presence of sufficient high-quality images in the finetuning set is critical for maintaining strong performance, whilst the importance of matched finetuning and testing distribution varies between different downstream tasks, such as automated radiology reporting and prostate cancer detection.When quality ratios are consistent, finetuning needs far less labeled data than training from scratch, but label efficiency depends on image quality distribution. Without enough high-quality finetuning data, pretrained models may fail to outperform those trained without pretraining. This highlights the importance of assessing and aligning quality distributions between finetuning and deployment, and the need for quality standards in finetuning data for specific downstream tasks. Using ProFound, we show the value of quantifying image quality in both finetuning and deployment to fully realise the data and compute efficiency benefits of foundation models.

MRI Classification Abdominal Methodology In Silico Academic Lab Benchmark SOTA GenAI

A prognostic model integrating radiomics and deep learning based on CT for survival prediction in laryngeal squamous cell carcinoma.

Jiang H, Xie K, Chen X, Ning Y, Yu Q, Lv F, Liu R, Zhou Y, Xia S, Peng J

•papers•Aug 16 2025

Accurate prognostic prediction is crucial for patients with laryngeal squamous cell carcinoma (LSCC) to guide personalized treatment strategies. This study aimed to develop a comprehensive prognostic model leveraging clinical factors alongside radiomics and deep learning (DL) based on CT imaging to predict recurrence-free survival (RFS) in LSCC patients. We retrospectively enrolled 349 patients with LSCC from Center 1 (training set: n = 189; internal testing set: n = 82) and Center 2 (external testing set: n = 78). A combined model was developed using Cox regression analysis to predict RFS in LSCC patients by integrating independent clinical risk factors, radiomics score (RS), and deep learning score (DLS). Meanwhile, separate clinical, radiomics, and DL models were also constructed for comparison. Furthermore, the combined model was represented visually through a nomogram to provide personalized estimation of RFS, with its risk stratification capability evaluated using Kaplan-Meier analysis. The combined model achieved a higher C-index than did the clinical model, radiomics model, and DL model in the internal testing (0.810 vs. 0.634, 0.679, and 0.727, respectively) and external testing sets (0.742 vs. 0.602, 0.617, and 0.729, respectively). Additionally, following risk stratification via nomogram, patients in the low-risk group showed significantly higher survival probabilities compared to those in the high-risk group in the internal testing set [hazard ratio (HR) = 0.157, 95% confidence interval (CI): 0.063-0.392, p < 0.001] and external testing set (HR = 0.312, 95% CI: 0.137-0.711, p = 0.003). The proposed combined model demonstrated a reliable and accurate ability to predict RFS in patients with LSCC, potentially assisting in risk stratification.

CT Classification Retrospective Clinical In Silico Academic Lab

URFM: A general Ultrasound Representation Foundation Model for advancing ultrasound image diagnosis.

Kang Q, Lao Q, Gao J, Bao W, He Z, Du C, Lu Q, Li K

•papers•Aug 15 2025

Ultrasound imaging is critical for clinical diagnostics, providing insights into various diseases and organs. However, artificial intelligence (AI) in this field faces challenges, such as the need for large labeled datasets and limited task-specific model applicability, particularly due to ultrasound's low signal-to-noise ratio (SNR). To overcome these, we introduce the Ultrasound Representation Foundation Model (URFM), designed to learn robust, generalizable representations from unlabeled ultrasound images, enabling label-efficient adaptation to diverse diagnostic tasks. URFM is pre-trained on over 1M images from 15 major anatomical organs using representation-based masked image modeling (MIM), an advanced self-supervised learning. Unlike traditional pixel-based MIM, URFM integrates high-level representations from BiomedCLIP, a specialized medical vision-language model, to address the low SNR issue. Extensive evaluation shows that URFM outperforms state-of-the-art methods, offering enhanced generalization, label efficiency, and training-time efficiency. URFM's scalability and flexibility signal a significant advancement in diagnostic accuracy and clinical workflow optimization in ultrasound imaging.

Ultrasound Classification Whole Body Methodology In Silico Academic Lab Breakthrough

Benchmarking GPT-5 for Zero-Shot Multimodal Medical Reasoning in Radiology and Radiation Oncology

Mingzhe Hu, Zach Eidex, Shansong Wang, Mojtaba Safari, Qiang Li, Xiaofeng Yang

•preprint•Aug 15 2025

Radiology, radiation oncology, and medical physics require decision-making that integrates medical images, textual reports, and quantitative data under high-stakes conditions. With the introduction of GPT-5, it is critical to assess whether recent advances in large multimodal models translate into measurable gains in these safety-critical domains. We present a targeted zero-shot evaluation of GPT-5 and its smaller variants (GPT-5-mini, GPT-5-nano) against GPT-4o across three representative tasks. We present a targeted zero-shot evaluation of GPT-5 and its smaller variants (GPT-5-mini, GPT-5-nano) against GPT-4o across three representative tasks: (1) VQA-RAD, a benchmark for visual question answering in radiology; (2) SLAKE, a semantically annotated, multilingual VQA dataset testing cross-modal grounding; and (3) a curated Medical Physics Board Examination-style dataset of 150 multiple-choice questions spanning treatment planning, dosimetry, imaging, and quality assurance. Across all datasets, GPT-5 achieved the highest accuracy, with substantial gains over GPT-4o up to +20.00% in challenging anatomical regions such as the chest-mediastinal, +13.60% in lung-focused questions, and +11.44% in brain-tissue interpretation. On the board-style physics questions, GPT-5 attained 90.7% accuracy (136/150), exceeding the estimated human passing threshold, while GPT-4o trailed at 78.0%. These results demonstrate that GPT-5 delivers consistent and often pronounced performance improvements over GPT-4o in both image-grounded reasoning and domain-specific numerical problem-solving, highlighting its potential to augment expert workflows in medical imaging and therapeutic physics.

Mixed Modality Classification Methodology In Silico Academic Lab Benchmark SOTA GenAI

Filter Papers

Tags

TN5000: An Ultrasound Image Dataset for Thyroid Nodule Detection and Classification.

VariMix: A variety-guided data mixing framework for explainable medical image classifications.

Diagnostic performance of deep learning for predicting glioma isocitrate dehydrogenase and 1p/19q co-deletion in MRI: a systematic review and meta-analysis.

Improving skull-stripping for infant MRI via weakly supervised domain adaptation using adversarial learning.

Deep learning-based identification of necrosis and microvascular proliferation in adult diffuse gliomas from whole-slide images

Developing biomarkers and methods of risk stratification: Consensus statements from the International Kidney Cancer Symposium North America 2024 Think Tank.

Impact of Clinical Image Quality on Efficient Foundation Model Finetuning

A prognostic model integrating radiomics and deep learning based on CT for survival prediction in laryngeal squamous cell carcinoma.

URFM: A general Ultrasound Representation Foundation Model for advancing ultrasound image diagnosis.

Benchmarking GPT-5 for Zero-Shot Multimodal Medical Reasoning in Radiology and Radiation Oncology

Ready to Sharpen Your Edge?