Latest Papers on Radiology AI. Tags: Open Dataset

Impact of test set composition on AI performance in pediatric wrist fracture detection in X-rays.

Till T, Scherkl M, Stranger N, Singer G, Hankel S, Flucher C, Hržić F, Štajduhar I, Tschauner S

•papers•May 16 2025

To evaluate how different test set sampling strategies-random selection and balanced sampling-affect the performance of artificial intelligence (AI) models in pediatric wrist fracture detection using radiographs, aiming to highlight the need for standardization in test set design. This retrospective study utilized the open-sourced GRAZPEDWRI-DX dataset of 6091 pediatric wrist radiographs. Two test sets, each containing 4588 images, were constructed: one using a balanced approach based on case difficulty, projection type, and fracture presence and the other a random selection. EfficientNet and YOLOv11 models were trained and validated on 18,762 radiographs and tested on both sets. Binary classification and object detection tasks were evaluated using metrics such as precision, recall, F1 score, AP50, and AP50-95. Statistical comparisons between test sets were performed using nonparametric tests. Performance metrics significantly decreased in the balanced test set with more challenging cases. For example, the precision for YOLOv11 models decreased from 0.95 in the random set to 0.83 in the balanced set. Similar trends were observed for recall, accuracy, and F1 score, indicating that models trained on easy-to-recognize cases performed poorly on more complex ones. These results were consistent across all model variants tested. AI models for pediatric wrist fracture detection exhibit reduced performance when tested on balanced datasets containing more difficult cases, compared to randomly selected cases. This highlights the importance of constructing representative and standardized test sets that account for clinical complexity to ensure robust AI performance in real-world settings. Question Do different sampling strategies based on samples' complexity have an influence in deep learning models' performance in fracture detection? Findings AI performance in pediatric wrist fracture detection significantly drops when tested on balanced datasets with more challenging cases, compared to randomly selected cases. Clinical relevance Without standardized and validated test datasets for AI that reflect clinical complexities, performance metrics may be overestimated, limiting the utility of AI in real-world settings.

X-Ray Detection Musculoskeletal Retrospective Clinical In Silico Academic Lab Open Dataset

Comparative analysis of deep learning methods for breast ultrasound lesion detection and classification.

Vallez N, Mateos-Aparicio-Ruiz I, Rienda MA, Deniz O, Bueno G

•papers•May 16 2025

Breast ultrasound (BUS) computer-aided diagnosis (CAD) systems aims to perform two major steps: detecting lesions and classifying them as benign or malignant. However, the impact of combining both steps has not been previously addressed. Moreover, the specific method employed can influence the final outcome of the system. In this work, a comparison of the effects of using object detection, semantic segmentation and instance segmentation to detect lesions in BUS images was conducted. To this end, four approaches were examined: a) multi-class object detection, b) one-class object detection followed by localized region classification, c) multi-class segmentation, and d) one-class segmentation followed by segmented region classification. Additionally, a novel dataset for BUS segmentation, called BUS-UCLM, has been gathered, annotated and shared publicly. The evaluation of the methods proposed was carried out with this new dataset and four publicly available datasets: BUSI, OASBUD, RODTOOK and UDIAT. Among the four approaches compared, multi-class detection and multi-class segmentation achieved the best results when instance segmentation CNNs are used. The best results in detection were obtained with a multi-class Mask R-CNN with a COCO AP50 metric of 72.9%. In the multi-class segmentation scenario, Poolformer achieved the best results with a Dice score of 77.7%. The analysis of detection and segmentation models in BUS highlights several key challenges, emphasizing the complexity of accurately identifying and segmenting lesions. Among the methods evaluated, instance segmentation has proven to be the most effective for BUS images, offering superior performance in delineating individual lesions.

Ultrasound Detection Breast Methodology In Silico Academic Lab Open Dataset

Pancreas segmentation using AI developed on the largest CT dataset with multi-institutional validation and implications for early cancer detection.

Mukherjee S, Antony A, Patnam NG, Trivedi KH, Karbhari A, Nagaraj M, Murlidhar M, Goenka AH

•papers•May 16 2025

Accurate and fully automated pancreas segmentation is critical for advancing imaging biomarkers in early pancreatic cancer detection and for biomarker discovery in endocrine and exocrine pancreatic diseases. We developed and evaluated a deep learning (DL)-based convolutional neural network (CNN) for automated pancreas segmentation using the largest single-institution dataset to date (n = 3031 CTs). Ground truth segmentations were performed by radiologists, which were used to train a 3D nnU-Net model through five-fold cross-validation, generating an ensemble of top-performing models. To assess generalizability, the model was externally validated on the multi-institutional AbdomenCT-1K dataset (n = 585), for which volumetric segmentations were newly generated by expert radiologists and will be made publicly available. In the test subset (n = 452), the CNN achieved a mean Dice Similarity Coefficient (DSC) of 0.94 (SD 0.05), demonstrating high spatial concordance with radiologist-annotated volumes (Concordance Correlation Coefficient [CCC]: 0.95). On the AbdomenCT-1K dataset, the model achieved a DSC of 0.96 (SD 0.04) and a CCC of 0.98, confirming its robustness across diverse imaging conditions. The proposed DL model establishes new performance benchmarks for fully automated pancreas segmentation, offering a scalable and generalizable solution for large-scale imaging biomarker research and clinical translation.

CT Segmentation Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA Open Dataset

CheXGenBench: A Unified Benchmark For Fidelity, Privacy and Utility of Synthetic Chest Radiographs

Raman Dutt, Pedro Sanchez, Yongchen Yao, Steven McDonagh, Sotirios A. Tsaftaris, Timothy Hospedales

•preprint•May 15 2025

We introduce CheXGenBench, a rigorous and multifaceted evaluation framework for synthetic chest radiograph generation that simultaneously assesses fidelity, privacy risks, and clinical utility across state-of-the-art text-to-image generative models. Despite rapid advancements in generative AI for real-world imagery, medical domain evaluations have been hindered by methodological inconsistencies, outdated architectural comparisons, and disconnected assessment criteria that rarely address the practical clinical value of synthetic samples. CheXGenBench overcomes these limitations through standardised data partitioning and a unified evaluation protocol comprising over 20 quantitative metrics that systematically analyse generation quality, potential privacy vulnerabilities, and downstream clinical applicability across 11 leading text-to-image architectures. Our results reveal critical inefficiencies in the existing evaluation protocols, particularly in assessing generative fidelity, leading to inconsistent and uninformative comparisons. Our framework establishes a standardised benchmark for the medical AI community, enabling objective and reproducible comparisons while facilitating seamless integration of both existing and future generative models. Additionally, we release a high-quality, synthetic dataset, SynthCheX-75K, comprising 75K radiographs generated by the top-performing model (Sana 0.6B) in our benchmark to support further research in this critical domain. Through CheXGenBench, we establish a new state-of-the-art and release our framework, models, and SynthCheX-75K dataset at https://raman1121.github.io/CheXGenBench/

X-Ray Image Synthesis Chest Dataset Release In Silico Academic Lab Open Dataset Open Code Benchmark SOTA

An Annotated Multi-Site and Multi-Contrast Magnetic Resonance Imaging Dataset for the study of the Human Tongue Musculature.

Ribeiro FL, Zhu X, Ye X, Tu S, Ngo ST, Henderson RD, Steyn FJ, Kiernan MC, Barth M, Bollmann S, Shaw TB

•papers•May 14 2025

This dataset provides the first annotated, openly available MRI-based imaging dataset for investigations of tongue musculature, including multi-contrast and multi-site MRI data from non-disease participants. The present dataset includes 47 participants collated from three studies: BeLong (four participants; T2-weighted images), EATT4MND (19 participants; T2-weighted images), and BMC (24 participants; T1-weighted images). We provide manually corrected segmentations of five key tongue muscles: the superior longitudinal, combined transverse/vertical, genioglossus, and inferior longitudinal muscles. Other phenotypic measures, including age, sex, weight, height, and tongue muscle volume, are also available for use. This dataset will benefit researchers across domains interested in the structure and function of the tongue in health and disease. For instance, researchers can use this data to train new machine learning models for tongue segmentation, which can be leveraged for segmentation and tracking of different tongue muscles engaged in speech formation in health and disease. Altogether, this dataset provides the means to the scientific community for investigation of the intricate tongue musculature and its role in physiological processes and speech production.

MRI Segmentation Dataset Release In Silico Academic Lab Open Dataset

Improving AI models for rare thyroid cancer subtype by text guided diffusion models.

Dai F, Yao S, Wang M, Zhu Y, Qiu X, Sun P, Qiu C, Yin J, Shen G, Sun J, Wang M, Wang Y, Yang Z, Sang J, Wang X, Sun F, Cai W, Zhang X, Lu H

•papers•May 13 2025

Artificial intelligence applications in oncology imaging often struggle with diagnosing rare tumors. We identify significant gaps in detecting uncommon thyroid cancer types with ultrasound, where scarce data leads to frequent misdiagnosis. Traditional augmentation strategies do not capture the unique disease variations, hindering model training and performance. To overcome this, we propose a text-driven generative method that fuses clinical insights with image generation, producing synthetic samples that realistically reflect rare subtypes. In rigorous evaluations, our approach achieves substantial gains in diagnostic metrics, surpasses existing methods in authenticity and diversity measures, and generalizes effectively to other private and public datasets with various rare cancers. In this work, we demonstrate that text-guided image augmentation substantially enhances model accuracy and robustness for rare tumor detection, offering a promising avenue for more reliable and widespread clinical adoption.

Ultrasound Detection Abdominal Methodology In Silico Academic Lab GenAI Open Dataset

DEMAC-Net: A Dual-Encoder Multiattention Collaborative Network for Cervical Nerve Pathway and Adjacent Anatomical Structure Segmentation.

Cui H, Duan J, Lin L, Wu Q, Guo W, Zang Q, Zhou M, Fang W, Hu Y, Zou Z

•papers•May 13 2025

Currently, cervical anesthesia is performed using three main approaches: superficial cervical plexus block, deep cervical plexus block, and intermediate plexus nerve block. However, each technique carries inherent risks and demands significant clinical expertise. Ultrasound imaging, known for its real-time visualization capabilities and accessibility, is widely used in both diagnostic and interventional procedures. Nevertheless, accurate segmentation of small and irregularly shaped structures such as the cervical and brachial plexuses remains challenging due to image noise, complex anatomical morphology, and limited annotated training data. This study introduces DEMAC-Net-a dual-encoder, multiattention collaborative network-to significantly improve the segmentation accuracy of these neural structures. By precisely identifying the cervical nerve pathway (CNP) and adjacent anatomical tissues, DEMAC-Net aims to assist clinicians, especially those less experienced, in effectively guiding anesthesia procedures and accurately identifying optimal needle insertion points. Consequently, this improvement is expected to enhance clinical safety, reduce procedural risks, and streamline decision-making efficiency during ultrasound-guided regional anesthesia. DEMAC-Net combines a dual-encoder architecture with the Spatial Understanding Convolution Kernel (SUCK) and the Spatial-Channel Attention Module (SCAM) to extract multi-scale features effectively. Additionally, a Global Attention Gate (GAG) and inter-layer fusion modules refine relevant features while suppressing noise. A novel dataset, Neck Ultrasound Dataset (NUSD), was introduced, containing 1,500 annotated ultrasound images across seven anatomical regions. Extensive experiments were conducted on both NUSD and the BUSI public dataset, comparing DEMAC-Net to state-of-the-art models using metrics such as Dice Similarity Coefficient (DSC) and Intersection over Union (IoU). On the NUSD dataset, DEMAC-Net achieved a mean DSC of 93.3%, outperforming existing models. For external validation on the BUSI dataset, it demonstrated superior generalization, achieving a DSC of 87.2% and a mean IoU of 77.4%, surpassing other advanced methods. Notably, DEMAC-Net displayed consistent segmentation stability across all tested structures. The proposed DEMAC-Net significantly improves segmentation accuracy for small nerves and complex anatomical structures in ultrasound images, outperforming existing methods in terms of accuracy and computational efficiency. This framework holds great potential for enhancing ultrasound-guided procedures, such as peripheral nerve blocks, by providing more precise anatomical localization, ultimately improving clinical outcomes.

Ultrasound Segmentation Methodology In Silico Academic Lab Open Dataset Benchmark SOTA

AmygdalaGo-BOLT: an open and reliable AI tool to trace boundaries of human amygdala

Zhou, Q., Dong, B., Gao, P., Jintao, W., Xiao, J., Wang, W., Liang, P., Lin, D., Zuo, X.-N., He, H.

•preprint•May 13 2025

Each year, thousands of brain MRI scans are collected to study structural development in children and adolescents. However, the amygdala, a particularly small and complex structure, remains difficult to segment reliably, especially in developing populations where its volume is even smaller. To address this challenge, we developed AmygdalaGo-BOLT, a boundary-aware deep learning model tailored for human amygdala segmentation. It was trained and validated using 854 manually labeled scans from pediatric datasets, with independent samples used to ensure performance generalizability. The model integrates multiscale image features, spatial priors, and self-attention mechanisms within a compact encoder-decoder architecture to enhance boundary detection. Validation across multiple imaging centers and age groups shows that AmygdalaGo-BOLT closely matches expert manual labels, improves processing efficiency, and outperforms existing tools in accuracy. This enables robust and scalable analysis of amygdala morphology in developmental neuroimaging studies where manual tracing is impractical. To support open and reproducible science, we publicly release both the labeled datasets and the full source code.

MRI Segmentation Neurological Methodology In Silico Academic Lab Open Dataset Open Code Reproducibility

Deep Learning-accelerated MRI in Body and Chest.

Rajamohan N, Bagga B, Bansal B, Ginocchio L, Gupta A, Chandarana H

•papers•May 13 2025

Deep learning reconstruction (DLR) provides an elegant solution for MR acceleration while preserving image quality. This advancement is crucial for body imaging, which is frequently marred by the increased likelihood of motion-related artifacts. Multiple vendor-specific models focusing on T2, T1, and diffusion-weighted imaging have been developed for the abdomen, pelvis, and chest, with the liver and prostate being the most well-studied organ systems. Variational networks with supervised DL models, including data consistency layers and regularizers, are the most common DLR methods. The common theme for all single-center studies on this subject has been noninferior or superior image quality metrics and lesion conspicuity to conventional sequences despite significant acquisition time reduction. DLR also provides a potential for denoising, artifact reduction, increased resolution, and increased signal-noise ratio (SNR) and contrast-to-noise ratio (CNR) that can be balanced with acceleration benefits depending on the imaged organ system. Some specific challenges faced by DLR include slightly reduced lesion detection, cardiac motion-related signal loss, regional SNR variations, and variabilities in ADC measurements as reported in different organ systems. Continued investigations with large-scale multicenter prospective clinical validation of DLR to document generalizability and demonstrate noninferior diagnostic accuracy with histopathologic correlation are the need of the hour. The creation of vendor-neutral solutions, open data sharing, and diversifying training data sets are also critical to strengthening model robustness.

MRI Reconstruction Abdominal Review In Silico Academic Lab Breakthrough Open Dataset

A survey of deep-learning-based radiology report generation using multimodal inputs.

Wang X, Figueredo G, Li R, Zhang WE, Chen W, Chen X

•papers•May 13 2025

Automatic radiology report generation can alleviate the workload for physicians and minimize regional disparities in medical resources, therefore becoming an important topic in the medical image analysis field. It is a challenging task, as the computational model needs to mimic physicians to obtain information from multi-modal input data (i.e., medical images, clinical information, medical knowledge, etc.), and produce comprehensive and accurate reports. Recently, numerous works have emerged to address this issue using deep-learning-based methods, such as transformers, contrastive learning, and knowledge-base construction. This survey summarizes the key techniques developed in the most recent works and proposes a general workflow for deep-learning-based report generation with five main components, including multi-modality data acquisition, data preparation, feature learning, feature fusion and interaction, and report generation. The state-of-the-art methods for each of these components are highlighted. Additionally, we summarize the latest developments in large model-based methods and model explainability, along with public datasets, evaluation methods, current challenges, and future directions in this field. We have also conducted a quantitative comparison between different methods in the same experimental setting. This is the most up-to-date survey that focuses on multi-modality inputs and data fusion for radiology report generation. The aim is to provide comprehensive and rich information for researchers interested in automatic clinical report generation and medical image analysis, especially when using multimodal inputs, and to assist them in developing new algorithms to advance the field.

Mixed Modality Report Generation Review Academic Lab GenAI Open Dataset

Filter Papers

Tags

Impact of test set composition on AI performance in pediatric wrist fracture detection in X-rays.

Comparative analysis of deep learning methods for breast ultrasound lesion detection and classification.

Pancreas segmentation using AI developed on the largest CT dataset with multi-institutional validation and implications for early cancer detection.

CheXGenBench: A Unified Benchmark For Fidelity, Privacy and Utility of Synthetic Chest Radiographs

An Annotated Multi-Site and Multi-Contrast Magnetic Resonance Imaging Dataset for the study of the Human Tongue Musculature.

Improving AI models for rare thyroid cancer subtype by text guided diffusion models.

DEMAC-Net: A Dual-Encoder Multiattention Collaborative Network for Cervical Nerve Pathway and Adjacent Anatomical Structure Segmentation.

AmygdalaGo-BOLT: an open and reliable AI tool to trace boundaries of human amygdala

Deep Learning-accelerated MRI in Body and Chest.

A survey of deep-learning-based radiology report generation using multimodal inputs.

Ready to Sharpen Your Edge?