Latest Papers on Radiology AI. Tags: Open Dataset

PanoDiff-SR: Synthesizing Dental Panoramic Radiographs using Diffusion and Super-resolution

Sanyam Jain, Bruna Neves de Freitas, Andreas Basse-OConnor, Alexandros Iosifidis, Ruben Pauwels

•preprint•Jul 12 2025

There has been increasing interest in the generation of high-quality, realistic synthetic medical images in recent years. Such synthetic datasets can mitigate the scarcity of public datasets for artificial intelligence research, and can also be used for educational purposes. In this paper, we propose a combination of diffusion-based generation (PanoDiff) and Super-Resolution (SR) for generating synthetic dental panoramic radiographs (PRs). The former generates a low-resolution (LR) seed of a PR (256 X 128) which is then processed by the SR model to yield a high-resolution (HR) PR of size 1024 X 512. For SR, we propose a state-of-the-art transformer that learns local-global relationships, resulting in sharper edges and textures. Experimental results demonstrate a Frechet inception distance score of 40.69 between 7243 real and synthetic images (in HR). Inception scores were 2.55, 2.30, 2.90 and 2.98 for real HR, synthetic HR, real LR and synthetic LR images, respectively. Among a diverse group of six clinical experts, all evaluating a mixture of 100 synthetic and 100 real PRs in a time-limited observation, the average accuracy in distinguishing real from synthetic images was 68.5% (with 50% corresponding to random guessing).

X-Ray Image Synthesis Methodology In Silico Open Dataset

The REgistry of Flow and Perfusion Imaging for Artificial INtelligEnce with PET (REFINE PET): Rationale and Design.

Ramirez G, Lemley M, Shanbhag A, Kwiecinski J, Miller RJH, Kavanagh PB, Liang JX, Dey D, Slipczuk L, Travin MI, Alexanderson E, Carvajal-Juarez I, Packard RRS, Al-Mallah M, Einstein AJ, Feher A, Acampa W, Knight S, Le VT, Mason S, Sanghani R, Wopperer S, Chareonthaitawee P, Buechel RR, Rosamond TL, deKemp RA, Berman DS, Di Carli MF, Slomka PJ

•papers•Jul 11 2025

The REgistry of Flow and Perfusion Imaging for Artificial INtelligEnce with PET (REFINE PET) was established to aggregate PET and associated computed tomography (CT) images with clinical data from hospitals around the world into one comprehensive research resource. REFINE PET is a multicenter, international registry that contains both clinical and imaging data. The PET scans were processed using QPET software (Cedars-Sinai Medical Center, Los Angeles, CA), while the CT scans were processed using deep learning (DL) to detect coronary artery calcium (CAC). Patients were followed up for the occurrence of major adverse cardiovascular events (MACE), which include death, myocardial infarction, unstable angina, and late revascularization (>90 days from PET). The REFINE PET registry currently contains data for 35,588 patients from 14 sites, with additional patient data and sites anticipated. Comprehensive clinical data (including demographics, medical history, and stress test results) were integrated with more than 2200 imaging variables across 42 categories. The registry is poised to address a broad range of clinical questions, supported by correlating invasive angiography (within 6 months of MPI) in 5972 patients and a total of 9252 major adverse cardiovascular events during a median follow-up of 4.2 years. The REFINE PET registry leverages the integration of clinical, multimodality imaging, and novel quantitative and AI tools to advance the role of PET/CT MPI in diagnosis and risk stratification.

Mixed Modality Detection Cardiac Dataset Release In Silico Consortium Open Dataset

Cross-Domain Identity Representation for Skull to Face Matching with Benchmark DataSet

Ravi Shankar Prasad, Dinesh Singh

•preprint•Jul 11 2025

Craniofacial reconstruction in forensic science is crucial for the identification of the victims of crimes and disasters. The objective is to map a given skull to its corresponding face in a corpus of faces with known identities using recent advancements in computer vision, such as deep learning. In this paper, we presented a framework for the identification of a person given the X-ray image of a skull using convolutional Siamese networks for cross-domain identity representation. Siamese networks are twin networks that share the same architecture and can be trained to discover a feature space where nearby observations that are similar are grouped and dissimilar observations are moved apart. To do this, the network is exposed to two sets of comparable and different data. The Euclidean distance is then minimized between similar pairs and maximized between dissimilar ones. Since getting pairs of skull and face images are difficult, we prepared our own dataset of 40 volunteers whose front and side skull X-ray images and optical face images were collected. Experiments were conducted on the collected cross-domain dataset to train and validate the Siamese networks. The experimental results provide satisfactory results on the identification of a person from the given skull.

X-Ray Classification Neurological Methodology In Silico Academic Lab Open Dataset

Understanding Dataset Bias in Medical Imaging: A Case Study on Chest X-rays

Ethan Dack, Chengliang Dai

•preprint•Jul 10 2025

Recent works have revisited the infamous task ``Name That Dataset'', demonstrating that non-medical datasets contain underlying biases and that the dataset origin task can be solved with high accuracy. In this work, we revisit the same task applied to popular open-source chest X-ray datasets. Medical images are naturally more difficult to release for open-source due to their sensitive nature, which has led to certain open-source datasets being extremely popular for research purposes. By performing the same task, we wish to explore whether dataset bias also exists in these datasets. To extend our work, we apply simple transformations to the datasets, repeat the same task, and perform an analysis to identify and explain any detected biases. Given the importance of AI applications in medical imaging, it's vital to establish whether modern methods are taking shortcuts or are focused on the relevant pathology. We implement a range of different network architectures on the datasets: NIH, CheXpert, MIMIC-CXR and PadChest. We hope this work will encourage more explainable research being performed in medical imaging and the creation of more open-source datasets in the medical domain. Our code can be found here: https://github.com/eedack01/x_ray_ds_bias.

X-Ray Classification Chest Methodology In Silico Academic Lab Open Code Open Dataset Reproducibility

PediMS: A Pediatric Multiple Sclerosis Lesion Segmentation Dataset.

Popa M, Vișa GA, Șofariu CR

•papers•Jul 10 2025

Multiple Sclerosis (MS) is a chronic autoimmune disease that primarily affects the central nervous system and is predominantly diagnosed in adults, making pediatric cases rare and underrepresented in medical research. This paper introduces the first publicly available MRI dataset specifically dedicated to pediatric multiple sclerosis lesion segmentation. The dataset comprises longitudinal MRI scans from 9 pediatric patients, each with between one and six timepoints, with a total of 28 MRI scans. It includes T1-weighted (MPRAGE), T2-weighted, and FLAIR sequences. Additionally, it provides clinical data and initial symptoms for each patient, offering valuable insights into disease progression. Lesion segmentation was performed by senior experts, ensuring high-quality annotations. To demonstrate the dataset's reliability and utility, we evaluated two deep learning models, achieving competitive segmentation performance. This dataset aims to advance research in pediatric MS, improve lesion segmentation models, and contribute to federated learning approaches.

MRI Segmentation Neurological Dataset Release In Silico Academic Lab Open Dataset

Understanding Dataset Bias in Medical Imaging: A Case Study on Chest X-rays

Ethan Dack, Chengliang Dai

•preprint•Jul 10 2025

X-Ray Classification Chest Methodology In Silico Academic Lab Open Code Open Dataset Reproducibility

Understanding Dataset Bias in Medical Imaging: A Case Study on Chest X-rays

Ethan Dack, Chengliang Dai

•preprint•Jul 10 2025

Recent work has revisited the infamous task Name that dataset and established that in non-medical datasets, there is an underlying bias and achieved high Accuracies on the dataset origin task. In this work, we revisit the same task applied to popular open-source chest X-ray datasets. Medical images are naturally more difficult to release for open-source due to their sensitive nature, which has led to certain open-source datasets being extremely popular for research purposes. By performing the same task, we wish to explore whether dataset bias also exists in these datasets. % We deliberately try to increase the difficulty of the task by dataset transformations. We apply simple transformations of the datasets to try to identify bias. Given the importance of AI applications in medical imaging, it's vital to establish whether modern methods are taking shortcuts or are focused on the relevant pathology. We implement a range of different network architectures on the datasets: NIH, CheXpert, MIMIC-CXR and PadChest. We hope this work will encourage more explainable research being performed in medical imaging and the creation of more open-source datasets in the medical domain. The corresponding code will be released upon acceptance.

X-Ray Classification Chest Methodology In Silico Open Dataset Open Code Reproducibility

Cross-Modality Masked Learning for Survival Prediction in ICI Treated NSCLC Patients

Qilong Xing, Zikai Song, Bingxin Gong, Lian Yang, Junqing Yu, Wei Yang

•preprint•Jul 9 2025

Accurate prognosis of non-small cell lung cancer (NSCLC) patients undergoing immunotherapy is essential for personalized treatment planning, enabling informed patient decisions, and improving both treatment outcomes and quality of life. However, the lack of large, relevant datasets and effective multi-modal feature fusion strategies pose significant challenges in this domain. To address these challenges, we present a large-scale dataset and introduce a novel framework for multi-modal feature fusion aimed at enhancing the accuracy of survival prediction. The dataset comprises 3D CT images and corresponding clinical records from NSCLC patients treated with immune checkpoint inhibitors (ICI), along with progression-free survival (PFS) and overall survival (OS) data. We further propose a cross-modality masked learning approach for medical feature fusion, consisting of two distinct branches, each tailored to its respective modality: a Slice-Depth Transformer for extracting 3D features from CT images and a graph-based Transformer for learning node features and relationships among clinical variables in tabular data. The fusion process is guided by a masked modality learning strategy, wherein the model utilizes the intact modality to reconstruct missing components. This mechanism improves the integration of modality-specific features, fostering more effective inter-modality relationships and feature interactions. Our approach demonstrates superior performance in multi-modal integration for NSCLC survival prediction, surpassing existing methods and setting a new benchmark for prognostic models in this context.

CT Classification Chest Methodology In Silico Academic Lab Benchmark SOTA Open Dataset

Dataset and Benchmark for Enhancing Critical Retained Foreign Object Detection

Yuli Wang, Victoria R. Shi, Liwei Zhou, Richard Chin, Yuwei Dai, Yuanyun Hu, Cheng-Yi Li, Haoyue Guan, Jiashu Cheng, Yu Sun, Cheng Ting Lin, Ihab Kamel, Premal Trivedi, Pamela Johnson, John Eng, Harrison Bai

•preprint•Jul 9 2025

Critical retained foreign objects (RFOs), including surgical instruments like sponges and needles, pose serious patient safety risks and carry significant financial and legal implications for healthcare institutions. Detecting critical RFOs using artificial intelligence remains challenging due to their rarity and the limited availability of chest X-ray datasets that specifically feature critical RFOs cases. Existing datasets only contain non-critical RFOs, like necklace or zipper, further limiting their utility for developing clinically impactful detection algorithms. To address these limitations, we introduce "Hopkins RFOs Bench", the first and largest dataset of its kind, containing 144 chest X-ray images of critical RFO cases collected over 18 years from the Johns Hopkins Health System. Using this dataset, we benchmark several state-of-the-art object detection models, highlighting the need for enhanced detection methodologies for critical RFO cases. Recognizing data scarcity challenges, we further explore image synthetic methods to bridge this gap. We evaluate two advanced synthetic image methods, DeepDRR-RFO, a physics-based method, and RoentGen-RFO, a diffusion-based method, for creating realistic radiographs featuring critical RFOs. Our comprehensive analysis identifies the strengths and limitations of each synthetic method, providing insights into effectively utilizing synthetic data to enhance model training. The Hopkins RFOs Bench and our findings significantly advance the development of reliable, generalizable AI-driven solutions for detecting critical RFOs in clinical chest X-rays.

X-Ray Detection Chest Dataset Release In Silico Academic Lab Benchmark SOTA Open Dataset

MMDental - A multimodal dataset of tooth CBCT images with expert medical records.

Wang C, Zhang Y, Wu C, Liu J, Wu L, Wang Y, Huang X, Feng X, Wang Y

•papers•Jul 9 2025

In the rapidly evolving field of dental intelligent healthcare, where Artificial Intelligence (AI) plays a pivotal role, the demand for multimodal datasets is critical. Existing public datasets are primarily composed of single-modal data, predominantly dental radiographs or scans, which limits the development of AI-driven applications for intelligent dental treatment. In this paper, we collect a MultiModal Dental (MMDental) dataset to address this gap. MMDental comprises data from 660 patients, including 3D Cone-beam Computed Tomography (CBCT) images and corresponding detailed expert medical records with initial diagnoses and follow-up documentation. All CBCT scans are conducted under the guidance of professional physicians, and all patient records are reviewed by senior doctors. To the best of our knowledge, this is the first and largest dataset containing 3D CBCT images of teeth with corresponding medical records. Furthermore, we provide a comprehensive analysis of the dataset by exploring patient demographics, prevalence of various dental conditions, and the disease distribution across age groups. We believe this work will be beneficial for further advancements in dental intelligent treatment.

CT Dataset Release Concept Open Dataset

Filter Papers

Tags

PanoDiff-SR: Synthesizing Dental Panoramic Radiographs using Diffusion and Super-resolution

The REgistry of Flow and Perfusion Imaging for Artificial INtelligEnce with PET (REFINE PET): Rationale and Design.

Cross-Domain Identity Representation for Skull to Face Matching with Benchmark DataSet

Understanding Dataset Bias in Medical Imaging: A Case Study on Chest X-rays

PediMS: A Pediatric Multiple Sclerosis Lesion Segmentation Dataset.

Understanding Dataset Bias in Medical Imaging: A Case Study on Chest X-rays

Understanding Dataset Bias in Medical Imaging: A Case Study on Chest X-rays

Cross-Modality Masked Learning for Survival Prediction in ICI Treated NSCLC Patients

Dataset and Benchmark for Enhancing Critical Retained Foreign Object Detection

MMDental - A multimodal dataset of tooth CBCT images with expert medical records.

Ready to Sharpen Your Edge?