Latest Papers on Radiology AI. Category: papers, Tags: Dataset Release, Order: Best Match, Limit: 10.

Advancing respiratory disease diagnosis: A deep learning and vision transformer-based approach with a novel X-ray dataset.

Alghadhban A, Ramadan RA, Alazmi M

•papers•Jun 9 2025

With the increasing prevalence of respiratory diseases such as pneumonia and COVID-19, timely and accurate diagnosis is critical. This paper makes significant contributions to the field of respiratory disease classification by utilizing X-ray images and advanced machine learning techniques such as deep learning (DL) and Vision Transformers (ViT). First, the paper systematically reviews the current diagnostic methodologies, analyzing the recent advancement in DL and ViT techniques through a comprehensive analysis of the review articles published between 2017 and 2024, excluding short reviews and overviews. The review not only analyses the existing knowledge but also identifies the critical gaps in the field as well as the lack of diversity of the comprehensive and diverse datasets for training the machine learning models. To address such limitations, the paper extensively evaluates DL-based models on publicly available datasets, analyzing key performance metrics such as accuracy, precision, recall, and F1-score. Our evaluations reveal that the current datasets are mostly limited to the narrow subsets of pulmonary diseases, which might lead to some challenges, including overfitting, poor generalization, and reduced possibility of using advanced machine learning techniques in real-world applications. For instance, DL and ViT models require extensive data for effective learning. The primary contribution of this paper is not only the review of the most recent articles and surveys of respiratory diseases and DL models, including ViT, but also introduces a novel, diverse dataset comprising 7867 X-ray images from 5263 patients across three local hospitals, covering 49 distinct pulmonary diseases. The dataset is expected to enhance DL and ViT model training and improve the generalization of those models in various real-world medical image scenarios. By addressing the data scarcity issue, this paper paves the for more reliable and robust disease classification, improving clinical decision-making. Additionally, the article highlights the critical challenges that still need to be addressed, such as dataset bias and variations of X-ray image quality, as well as the need for further clinical validation. Furthermore, the study underscores the critical role of DL in medical diagnosis and highlights the necessity of comprehensive, well-annotated datasets to improve model robustness and clinical reliability. Through these contributions, the paper provides the basis and foundation of future research on respiratory disease diagnosis using AI-driven methodologies. Although the paper tries to cover all the work done between 2017 and 2024, this research might have some limitations of this research, including the review period before 2017 might have foundational work. At the same time, the rapid development of AI might make the earlier methods less relevant.

X-Ray Classification Chest Dataset Release In Silico None Academic Lab Open Dataset

UltraBones100k: A reliable automated labeling method and large-scale dataset for ultrasound-based bone surface extraction.

Wu L, Cavalcanti NA, Seibold M, Loggia G, Reissner L, Hein J, Beeler S, Viehöfer A, Wirth S, Calvet L, Fürnstahl P

•papers•Jun 4 2025

Ultrasound-based bone surface segmentation is crucial in computer-assisted orthopedic surgery. However, ultrasound images have limitations, including a low signal-to-noise ratio, acoustic shadowing, and speckle noise, which make interpretation difficult. Existing deep learning models for bone segmentation rely primarily on costly manual labeling by experts, limiting dataset size and model generalizability. Additionally, the complexity of ultrasound physics and acoustic shadow makes the images difficult for humans to interpret, leading to incomplete labels in low-intensity and anechoic regions and limiting model performance. To advance the state-of-the-art in ultrasound bone segmentation and establish effective model benchmarks, larger and higher-quality datasets are needed. We propose a methodology for collecting ex-vivo ultrasound datasets with automatically generated bone labels, including anechoic regions. The proposed labels are derived by accurately superimposing tracked bone Computed Tomography (CT) models onto the tracked ultrasound images. These initial labels are refined to account for ultrasound physics. To clinically evaluate the proposed method, an expert physician from our university hospital specialized in orthopedic sonography assessed the quality of the generated bone labels. A neural network for bone segmentation is trained on the collected dataset and its predictions are compared to expert manual labels, evaluating accuracy, completeness, and F1-score. We collected UltraBones100k, the largest known dataset comprising 100k ex-vivo ultrasound images of human lower limbs with bone annotations, specifically targeting the fibula, tibia, and foot bones. A Wilcoxon signed-rank test with Bonferroni correction confirmed that the bone alignment after our optimization pipeline significantly improved the quality of bone labeling (p<0.001). The model trained on UltraBones100k consistently outperforms manual labeling in all metrics, particularly in low-intensity regions (at a distance threshold of 0.5 mm: 320% improvement in completeness, 27.4% improvement in accuracy, and 197% improvement in F1 score) CONCLUSION:: This work is promising to facilitate research and clinical translation of ultrasound imaging in computer-assisted interventions, particularly for applications such as 2D bone segmentation, 3D bone surface reconstruction, and multi-modality bone registration.

Ultrasound Segmentation Musculoskeletal Dataset Release In Silico None Academic Lab Open Dataset

Mexican dataset of digital mammograms (MEXBreast) with suspicious clusters of microcalcifications.

Lozoya RSL, Barragán KN, Domínguez HJO, Azuela JHS, Sánchez VGC, Villegas OOV

•papers•Jun 1 2025

Breast cancer is one of the most prevalent cancers affecting women worldwide. Early detection and treatment are crucial in significantly reducing mortality rates Microcalcifications (MCs) are of particular importance among the various breast lesions. These tiny calcium deposits within breast tissue are present in approximately 30% of malignant tumors and can serve as critical indirect indicators of early-stage breast cancer. Three or more MCs within an area of 1 cm² are considered a Microcalcification Cluster (MCC) and assigned a BI-RADS category 4, indicating a suspicion of malignancy. Mammography is the most used technique for breast cancer detection. Approximately one in two mammograms showing MCCs is confirmed as cancerous through biopsy. MCCs are challenging to detect, even for experienced radiologists, underscoring the need for computer-aided detection tools such as Convolutional Neural Networks (CNNs). CNNs require large amounts of domain-specific data with consistent resolutions for effective training. However, most publicly available mammogram datasets either lack resolution information or are compiled from heterogeneous sources. Additionally, MCCs are often either unlabeled or sparsely represented in these datasets, limiting their utility for training CNNs. In this dataset, we present the MEXBreast, an annotated MCCs Mexican digital mammogram database, containing images from resolutions of 50, 70, and 100 microns. MEXBreast aims to support the training, validation, and testing of deep learning CNNs.

Mammography Detection Breast Dataset Release In Silico None Academic Lab Open Dataset

MSLesSeg: baseline and benchmarking of a new Multiple Sclerosis Lesion Segmentation dataset.

Guarnera F, Rondinella A, Crispino E, Russo G, Di Lorenzo C, Maimone D, Pappalardo F, Battiato S

•papers•May 31 2025

This paper presents MSLesSeg, a new, publicly accessible MRI dataset designed to advance research in Multiple Sclerosis (MS) lesion segmentation. The dataset comprises 115 scans of 75 patients including T1, T2 and FLAIR sequences, along with supplementary clinical data collected across different sources. Expert-validated annotations provide high-quality lesion segmentation labels, establishing a reliable human-labeled dataset for benchmarking. Part of the dataset was shared with expert scientists with the aim to compare the last automatic AI-based image segmentation solutions with an expert-biased handmade segmentation. In addition, an AI-based lesion segmentation of MSLesSeg was developed and technically validated against the last state-of-the-art methods. The dataset, the detailed analysis of researcher contributions, and the baseline results presented here mark a significant milestone for advancing automated MS lesion segmentation research.

MRI Segmentation Neurological Dataset Release In Silico None Academic Lab Open Dataset

HVAngleEst: A Dataset for End-to-end Automated Hallux Valgus Angle Measurement from X-Ray Images.

Wang Q, Ji D, Wang J, Liu L, Yang X, Zhang Y, Liang J, Liu P, Zhao H

•papers•May 30 2025

Accurate measurement of hallux valgus angle (HVA) and intermetatarsal angle (IMA) is essential for diagnosing hallux valgus and determining appropriate treatment strategies. Traditional manual measurement methods, while standardized, are time-consuming, labor-intensive, and subject to evaluator bias. Recent advancements in deep learning have been applied to hallux valgus angle estimation, but the development of effective algorithms requires large, well-annotated datasets. Existing X-ray datasets are typically limited to cropped foot regions images, and only one dataset containing very few samples is publicly available. To address these challenges, we introduce HVAngleEst, the first large-scale, open-access dataset specifically designed for hallux valgus angle estimation. HVAngleEst comprises 1,382 X-ray images from 1,150 patients and includes comprehensive annotations, such as foot localization, hallux valgus angles, and line segments for each phalanx. This dataset enables fully automated, end-to-end hallux valgus angle estimation, reducing manual labor and eliminating evaluator bias.

X-Ray Detection Musculoskeletal Dataset Release In Silico Academic Lab Open Dataset

A vessel bifurcation landmark pair dataset for abdominal CT deformable image registration (DIR) validation.

Criscuolo ER, Zhang Z, Hao Y, Yang D

•papers•May 28 2025

Deformable image registration (DIR) is an enabling technology in many diagnostic and therapeutic tasks. Despite this, DIR algorithms have limited clinical use, largely due to a lack of benchmark datasets for quality assurance during development. DIRs of intra-patient abdominal CTs are among the most challenging registration scenarios due to significant organ deformations and inconsistent image content. To support future algorithm development, here we introduce our first-of-its-kind abdominal CT DIR benchmark dataset, comprising large numbers of highly accurate landmark pairs on matching blood vessel bifurcations. Abdominal CT image pairs of 30 patients were acquired from several publicly available repositories as well as the authors' institution with IRB approval. The two CTs of each pair were originally acquired for the same patient but on different days. An image processing workflow was developed and applied to each CT image pair: (1) Abdominal organs were segmented with a deep learning model, and image intensity within organ masks was overwritten. (2) Matching image patches were manually identified between two CTs of each image pair. (3) Vessel bifurcation landmarks were labeled on one image of each image patch pair. (4) Image patches were deformably registered, and landmarks were projected onto the second image. (5) Landmark pair locations were refined manually or with an automated process. This workflow resulted in 1895 total landmark pairs, or 63 per case on average. Estimates of the landmark pair accuracy using digital phantoms were 0.7 mm ± 1.2 mm. The data are published in Zenodo at https://doi.org/10.5281/zenodo.14362785. Instructions for use can be found at https://github.com/deshanyang/Abdominal-DIR-QA. This dataset is a first-of-its-kind for abdominal DIR validation. The number, accuracy, and distribution of landmark pairs will allow for robust validation of DIR algorithms with precision beyond what is currently available.

CT Registration Abdominal Dataset Release In Silico Academic Lab Open Dataset

Large Scale MRI Collection and Segmentation of Cirrhotic Liver.

Jha D, Susladkar OK, Gorade V, Keles E, Antalek M, Seyithanoglu D, Cebeci T, Aktas HE, Kartal GD, Kaymakoglu S, Erturk SM, Velichko Y, Ladner DP, Borhani AA, Medetalibeyoglu A, Durak G, Bagci U

•papers•May 28 2025

Liver cirrhosis represents the end stage of chronic liver disease, characterized by extensive fibrosis and nodular regeneration that significantly increases mortality risk. While magnetic resonance imaging (MRI) offers a non-invasive assessment, accurately segmenting cirrhotic livers presents substantial challenges due to morphological alterations and heterogeneous signal characteristics. Deep learning approaches show promise for automating these tasks, but progress has been limited by the absence of large-scale, annotated datasets. Here, we present CirrMRI600+, the first comprehensive dataset comprising 628 high-resolution abdominal MRI scans (310 T1-weighted and 318 T2-weighted sequences, totaling nearly 40,000 annotated slices) with expert-validated segmentation labels for cirrhotic livers. The dataset includes demographic information, clinical parameters, and histopathological validation where available. Additionally, we provide benchmark results from 11 state-of-the-art deep learning experiments to establish performance standards. CirrMRI600+ enables the development and validation of advanced computational methods for cirrhotic liver analysis, potentially accelerating progress toward automated Cirrhosis visual staging and personalized treatment planning.

MRI Segmentation Abdominal Dataset Release In Silico None Academic Lab Open Dataset Benchmark SOTA

A dataset for quality evaluation of pelvic X-ray and diagnosis of developmental dysplasia of the hip.

Qi G, Jiao X, Li J, Qin C, Li X, Sun Z, Zhao Y, Jiang R, Zhu Z, Zhao G, Yu G

•papers•May 26 2025

Developmental Dysplasia of the Hip (DDH) stands as one of the preeminent hip disorders prevalent in pediatric orthopedics. Automated diagnostic instruments, driven by artificial intelligence methodologies, are capable of providing substantial assistance to clinicians in the diagnosis of DDH. We have developed a dataset designated as Multitasking DDH (MTDDH), which is composed of two sub-datasets. Dataset 1 encompasses 1,250 pelvic X-ray images, with annotations demarcating four discrete regions for the evaluation of pelvic X-ray quality, in tandem with eight pivotal points serving as support for DDH diagnosis. Dataset 2 contains 906 pelvic X-ray images, and each image has been annotated with eight key points for assisting in the diagnosis of DDH. Notably, MTDDH represents the pioneering dataset engineered for the comprehensive evaluation of pelvic X-ray quality while concurrently offering the most exhaustive set of eight key points to bolster DDH diagnosis, thus fulfilling the exigency for enhanced diagnostic precision. Ultimately, we presented the elaborate process of constructing the MTDDH and furnished a concise introduction regarding its application.

X-Ray Detection Musculoskeletal Dataset Release In Silico None Academic Lab Open Dataset

COVID-19CT+: A public dataset of CT images for COVID-19 retrospective analysis.

Sun Y, Du T, Wang B, Rahaman MM, Wang X, Huang X, Jiang T, Grzegorzek M, Sun H, Xu J, Li C

•papers•May 23 2025

Background and objectiveCOVID-19 is considered as the biggest global health disaster in the 21st century, and it has a huge impact on the world.MethodsThis paper publishes a publicly available dataset of CT images of multiple types of pneumonia (COVID-19CT+). Specifically, the dataset contains 409,619 CT images of 1333 patients, with subset-A containing 312 community-acquired pneumonia cases and subset-B containing 1021 COVID-19 cases. In order to demonstrate that there are differences in the methods used to classify COVID-19CT+ images across time, we selected 13 classical machine learning classifiers and 5 deep learning classifiers to test the image classification task.ResultsIn this study, two sets of experiments are conducted using traditional machine learning and deep learning methods, the first set of experiments is the classification of COVID-19 in Subset-B versus COVID-19 white lung disease, and the second set of experiments is the classification of community-acquired pneumonia in Subset-A versus COVID-19 in Subset-B, demonstrating that the different periods of the methods differed on COVID-19CT+. On the first set of experiments, the accuracy of traditional machine learning reaches a maximum of 97.3% and a minimum of only 62.6%. Deep learning algorithms reaches a maximum of 97.9% and a minimum of 85.7%. On the second set of experiments, traditional machine learning reaches a high of 94.6% accuracy and a low of 56.8%. The deep learning algorithm reaches a high of 91.9% and a low of 86.3%.ConclusionsThe COVID-19CT+ in this study covers a large number of CT images of patients with COVID-19 and community-acquired pneumonia and is one of the largest datasets available. We expect that this dataset will attract more researchers to participate in exploring new automated diagnostic algorithms to contribute to the improvement of the diagnostic accuracy and efficiency of COVID-19.

CT Classification Chest Dataset Release In Silico Academic Lab Open Dataset

An Annotated Multi-Site and Multi-Contrast Magnetic Resonance Imaging Dataset for the study of the Human Tongue Musculature.

Ribeiro FL, Zhu X, Ye X, Tu S, Ngo ST, Henderson RD, Steyn FJ, Kiernan MC, Barth M, Bollmann S, Shaw TB

•papers•May 14 2025

This dataset provides the first annotated, openly available MRI-based imaging dataset for investigations of tongue musculature, including multi-contrast and multi-site MRI data from non-disease participants. The present dataset includes 47 participants collated from three studies: BeLong (four participants; T2-weighted images), EATT4MND (19 participants; T2-weighted images), and BMC (24 participants; T1-weighted images). We provide manually corrected segmentations of five key tongue muscles: the superior longitudinal, combined transverse/vertical, genioglossus, and inferior longitudinal muscles. Other phenotypic measures, including age, sex, weight, height, and tongue muscle volume, are also available for use. This dataset will benefit researchers across domains interested in the structure and function of the tongue in health and disease. For instance, researchers can use this data to train new machine learning models for tongue segmentation, which can be leveraged for segmentation and tracking of different tongue muscles engaged in speech formation in health and disease. Altogether, this dataset provides the means to the scientific community for investigation of the intricate tongue musculature and its role in physiological processes and speech production.

MRI Segmentation Other Dataset Release In Silico Academic Lab Open Dataset