Latest Papers on Radiology AI. Tags: None

Incremental diagnostic value of AI-derived coronary artery calcium in 18F-flurpiridaz PET Myocardial Perfusion Imaging

Barrett, O., Shanbhag, A., Zaid, R., Miller, R. J., Lemley, M., Builoff, V., Liang, J., Kavanagh, P., Buckley, C., Dey, D., Berman, D. S., Slomka, P.

•preprint•Jul 11 2025

BackgroundPositron Emission Tomography (PET) myocardial perfusion imaging (MPI) is a powerful tool for predicting coronary artery disease (CAD). Coronary artery calcium (CAC) provides incremental risk stratification to PET-MPI and enhances diagnostic accuracy. We assessed additive value of CAC score, derived from PET/CT attenuation maps to stress TPD results using the novel 18F-flurpiridaz tracer in detecting significant CAD. Methods and ResultsPatients from 18F-flurpiridaz phase III clinical trial who underwent PET/CT MPI with 18F-flurpiridaz tracer, had available CT attenuation correction (CTAC) scans for CAC scoring, and underwent invasive coronary angiography (ICA) within a 6-month period between 2011 and 2013, were included. Total perfusion deficit (TPD) was quantified automatically, and CAC scores from CTAC scans were assessed using artificial intelligence (AI)-derived segmentation and manual scoring. Obstructive CAD was defined as [≥]50% stenosis in Left Main (LM) artery, or 70% or more stenosis in any of the other major epicardial vessels. Prediction performance for CAD was assessed by comparing the area under receiver operating characteristic curve (AUC) for stress TPD alone and in combination with CAC score. Among 498 patients (72% males, median age 63 years) 30.1% had CAD. Incorporating CAC score resulted in a greater AUC: manual scoring (AUC=0.87, 95% Confidence Interval [CI] 0.34-0.90; p=0.015) and AI-based scoring (AUC=0.88, 95%CI 0.85-0.90; p=0.002) compared to stress TPD alone (AUC 0.84, 95% CI 0.80-0.92). ConclusionsCombining automatically derived TPD and CAC score enhances 18F-flurpiridaz PET MPI accuracy in detecting significant CAD, offering a method that can be routinely used with PET/CT scanners without additional scanning or technologist time. CONDENSED ABSTRACTO_ST_ABSBackgroundC_ST_ABSWe assessed the added value of CAC score from hybrid PET/CT CTAC scans combined with stress TPD for detecting significant CAD using novel 18F-flurpiridaz tracer Methods and resultsPatients from the 18F-flurpiridaz phase III clinical trial (n=498, 72% male, median age 63) who underwent PET/CT MPI and ICA within 6-months were included. TPD was quantified automatically, and CAC scores were assessed by AI and manual methods. Adding CAC score to TPD improved AUC for manual (0.87) and AI-based (0.88) scoring versus TPD alone (0.84). ConclusionsCombining TPD and CAC score enhances 18F-flurpiridaz PET MPI accuracy for CAD detection O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=110 SRC="FIGDIR/small/25330013v1_ufig1.gif" ALT="Figure 1"> View larger version (37K): [email protected]@ba93d1org.highwire.dtl.DTLVardef@13eabd9org.highwire.dtl.DTLVardef@1845505_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOGraphical Abstract.C_FLOATNO Overview of the study design. C_FIG

Mixed Modality Segmentation Cardiac Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Understanding Dataset Bias in Medical Imaging: A Case Study on Chest X-rays

Ethan Dack, Chengliang Dai

•preprint•Jul 10 2025

Recent work has revisited the infamous task Name that dataset and established that in non-medical datasets, there is an underlying bias and achieved high Accuracies on the dataset origin task. In this work, we revisit the same task applied to popular open-source chest X-ray datasets. Medical images are naturally more difficult to release for open-source due to their sensitive nature, which has led to certain open-source datasets being extremely popular for research purposes. By performing the same task, we wish to explore whether dataset bias also exists in these datasets. % We deliberately try to increase the difficulty of the task by dataset transformations. We apply simple transformations of the datasets to try to identify bias. Given the importance of AI applications in medical imaging, it's vital to establish whether modern methods are taking shortcuts or are focused on the relevant pathology. We implement a range of different network architectures on the datasets: NIH, CheXpert, MIMIC-CXR and PadChest. We hope this work will encourage more explainable research being performed in medical imaging and the creation of more open-source datasets in the medical domain. The corresponding code will be released upon acceptance.

X-Ray Classification Chest Methodology In Silico Open Dataset Open Code Reproducibility

Data Extraction and Curation from Radiology Reports for Pancreatic Cyst Surveillance Using Large Language Models.

Choubey AP, Eguia E, Hollingsworth A, Chatterjee S, D'Angelica MI, Jarnagin WR, Wei AC, Schattner MA, Do RKG, Soares KC

•papers•Jul 10 2025

Manual curation of radiographic features in pancreatic cyst registries for data abstraction and longitudinal evaluation is time consuming and limits widespread implementation. We examined the feasibility and accuracy of using large language models (LLMs) to extract clinical variables from radiology reports. A single center retrospective study included patients under surveillance for pancreatic cysts. Nine radiographic elements used to monitor cyst progression were included: cyst size, main pancreatic duct (MPD) size (continuous variable), number of lesions, MPD dilation ≥5mm (categorical), branch duct dilation, presence of solid component, calcific lesion, pancreatic atrophy, and pancreatitis. LLMs (GPT) on the OpenAI GPT-4 platform were employed to extract elements of interest with a zero-shot learning approach using prompting to facilitate annotation without any training data. A manually annotated institutional cyst database was used as the ground truth (GT) for comparison. Overall, 3198 longitudinal scans from 991 patients were included. GPT successfully extracted the selected radiographic elements with high accuracy. Among categorical variables, accuracy ranged from 97% for solid component to 99% for calcific lesions. In the continuous variables, accuracy varied from 92% for cyst size to 97% for MPD size. However, Cohen's Kappa was higher for cyst size (0.92) compared to MPD size (0.82). Lowest accuracy (81%) was noted in the multi-class variable for number of cysts. LLM can accurately extract and curate data from radiology reports for pancreatic cyst surveillance and can be reliably used to assemble longitudinal databases. Future application of this work may potentiate the development of artificial intelligence-based surveillance models.

Mixed Modality LLM Radiology Report Abdominal Retrospective Clinical In Silico Academic Lab GenAI

An Enhanced Privacy-preserving Federated Few-shot Learning Framework for Respiratory Disease Diagnosis

Ming Wang, Zhaoyang Duan, Dong Xue, Fangzhou Liu, Zhongheng Zhang

•preprint•Jul 10 2025

The labor-intensive nature of medical data annotation presents a significant challenge for respiratory disease diagnosis, resulting in a scarcity of high-quality labeled datasets in resource-constrained settings. Moreover, patient privacy concerns complicate the direct sharing of local medical data across institutions, and existing centralized data-driven approaches, which rely on amounts of available data, often compromise data privacy. This study proposes a federated few-shot learning framework with privacy-preserving mechanisms to address the issues of limited labeled data and privacy protection in diagnosing respiratory diseases. In particular, a meta-stochastic gradient descent algorithm is proposed to mitigate the overfitting problem that arises from insufficient data when employing traditional gradient descent methods for neural network training. Furthermore, to ensure data privacy against gradient leakage, differential privacy noise from a standard Gaussian distribution is integrated into the gradients during the training of private models with local data, thereby preventing the reconstruction of medical images. Given the impracticality of centralizing respiratory disease data dispersed across various medical institutions, a weighted average algorithm is employed to aggregate local diagnostic models from different clients, enhancing the adaptability of a model across diverse scenarios. Experimental results show that the proposed method yields compelling results with the implementation of differential privacy, while effectively diagnosing respiratory diseases using data from different structures, categories, and distributions.

X-Ray Classification Chest Methodology In Silico Academic Lab Ethics GenAI

Recurrence prediction of invasive ductal carcinoma from preoperative contrast-enhanced computed tomography using deep convolutional neural network.

Umezu M, Kondo Y, Ichikawa S, Sasaki Y, Kaneko K, Ozaki T, Koizumi N, Seki H

•papers•Jul 10 2025

Predicting the risk of breast cancer recurrence is crucial for guiding therapeutic strategies, including enhanced surveillance and the consideration of additional treatment after surgery. In this study, we developed a deep convolutional neural network (DCNN) model to predict recurrence within six years after surgery using preoperative contrast-enhanced computed tomography (CECT) images, which are widely available and effective for detecting distant metastases. This retrospective study included preoperative CECT images from 133 patients with invasive ductal carcinoma. The images were classified into recurrence and no-recurrence groups using ResNet-101 and DenseNet-201. Classification performance was evaluated using the area under the receiver operating curve (AUC) with leave-one-patient-out cross-validation. At the optimal threshold, the classification accuracies for ResNet-101 and DenseNet-201 were 0.73 and 0.72, respectively. The median (interquartile range) AUC of DenseNet-201 (0.70 [0.69-0.72]) was statistically higher than that of ResNet-101 (0.68 [0.66-0.68]) (p < 0.05). These results suggest the potential of preoperative CECT-based DCNN models to predict breast cancer recurrence without the need for additional invasive procedures.

CT Classification Breast Retrospective Clinical In Silico Academic Lab

Understanding Dataset Bias in Medical Imaging: A Case Study on Chest X-rays

Ethan Dack, Chengliang Dai

•preprint•Jul 10 2025

Recent works have revisited the infamous task ``Name That Dataset'', demonstrating that non-medical datasets contain underlying biases and that the dataset origin task can be solved with high accuracy. In this work, we revisit the same task applied to popular open-source chest X-ray datasets. Medical images are naturally more difficult to release for open-source due to their sensitive nature, which has led to certain open-source datasets being extremely popular for research purposes. By performing the same task, we wish to explore whether dataset bias also exists in these datasets. To extend our work, we apply simple transformations to the datasets, repeat the same task, and perform an analysis to identify and explain any detected biases. Given the importance of AI applications in medical imaging, it's vital to establish whether modern methods are taking shortcuts or are focused on the relevant pathology. We implement a range of different network architectures on the datasets: NIH, CheXpert, MIMIC-CXR and PadChest. We hope this work will encourage more explainable research being performed in medical imaging and the creation of more open-source datasets in the medical domain. Our code can be found here: https://github.com/eedack01/x_ray_ds_bias.

X-Ray Classification Chest Methodology In Silico Academic Lab Open Code Open Dataset Reproducibility

Depth-Sequence Transformer (DST) for Segment-Specific ICA Calcification Mapping on Non-Contrast CT

Xiangjian Hou, Ebru Yaman Akcicek, Xin Wang, Kazem Hashemizadeh, Scott Mcnally, Chun Yuan, Xiaodong Ma

•preprint•Jul 10 2025

While total intracranial carotid artery calcification (ICAC) volume is an established stroke biomarker, growing evidence shows this aggregate metric ignores the critical influence of plaque location, since calcification in different segments carries distinct prognostic and procedural risks. However, a finer-grained, segment-specific quantification has remained technically infeasible. Conventional 3D models are forced to process downsampled volumes or isolated patches, sacrificing the global context required to resolve anatomical ambiguity and render reliable landmark localization. To overcome this, we reformulate the 3D challenge as a \textbf{Parallel Probabilistic Landmark Localization} task along the 1D axial dimension. We propose the \textbf{Depth-Sequence Transformer (DST)}, a framework that processes full-resolution CT volumes as sequences of 2D slices, learning to predict $N=6$ independent probability distributions that pinpoint key anatomical landmarks. Our DST framework demonstrates exceptional accuracy and robustness. Evaluated on a 100-patient clinical cohort with rigorous 5-fold cross-validation, it achieves a Mean Absolute Error (MAE) of \textbf{0.1 slices}, with \textbf{96\%} of predictions falling within a $\pm1$ slice tolerance. Furthermore, to validate its architectural power, the DST backbone establishes the best result on the public Clean-CC-CCII classification benchmark under an end-to-end evaluation protocol. Our work delivers the first practical tool for automated segment-specific ICAC analysis. The proposed framework provides a foundation for further studies on the role of location-specific biomarkers in diagnosis, prognosis, and procedural planning. Our code will be made publicly available.

CT Detection Vascular Methodology In Silico Academic Lab Open Code

A deep learning-based clinical decision support system for glioma grading using ensemble learning and knowledge distillation.

Liu Y, Shi Z, Xiao C, Wang B

•papers•Jul 10 2025

Gliomas are the most common malignant primary brain tumors, and grading their severity, particularly the diagnosis of low-grade gliomas, remains a challenging task for clinicians and radiologists. With advancements in deep learning and medical image processing technologies, the development of Clinical Decision Support Systems (CDSS) for glioma grading offers significant benefits for clinical treatment. This study proposes a CDSS for glioma grading, integrating a novel feature extraction framework. The method is based on combining ensemble learning and knowledge distillation: teacher models were constructed through ensemble learning, while uncertainty-weighted ensemble averaging is applied during student model training to refine knowledge transfer. This approach bridges the teacher-student performance gap, enhancing grading accuracy, reliability, and clinical applicability with lightweight deployment. Experimental results show 85.96 % Accuracy (5.2 % improvement over baseline), with Precision (83.90 %), Recall (87.40 %), and F1-score (83.90 %) increasing by 7.5 %, 5.1 %, and 5.1 % respectively. The teacher-student performance gap is reduced to 3.2 %, confirming effectiveness. Furthermore, the developed CDSS not only ensures rapid and accurate glioma grading but also includes critical features influencing the grading results, seamlessly integrating a methodology for generating comprehensive diagnostic reports. Consequently, the glioma grading CDSS represents a practical clinical decision support tool capable of delivering accurate and efficient auxiliary diagnostic decisions for physicians and patients.

MRI Classification Neurological Methodology In Silico

Hierarchical deep learning system for orbital fracture detection and trap-door classification on CT images.

Oku H, Nakamura Y, Kanematsu Y, Akagi A, Kinoshita S, Sotozono C, Koizumi N, Watanabe A, Okumura N

•papers•Jul 10 2025

To develop and evaluate a hierarchical deep learning system that detects orbital fractures on computed tomography (CT) images and classifies them as depressed or trap-door types. A retrospective diagnostic accuracy study analyzing CT images from patients with confirmed orbital fractures. We collected CT images from 686 patients with orbital fractures treated at a single institution (2010-2025), resulting in 46,013 orbital CT slices. After preprocessing, 7809 slices were selected as regions of interest and partitioned into training (6508 slices) and test (1301 slices) datasets. Our hierarchical approach consisted of a first-stage classifier (YOLOv8) for fracture detection and a second-stage classifier (Vision Transformer) for distinguishing depressed from trap-door fractures. Performance was evaluated at both slice and patient levels, focusing on accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC-ROC) at both slice and patient levels. For fracture detection, YOLOv8 achieved a slice-level sensitivity of 80.4 % and specificity of 79.2 %, with patient-level performance improving to 94.7 % sensitivity and 90.0 % specificity. For fracture classification, Vision Transformer demonstrated a slice-level sensitivity of 91.5 % and specificity of 83.5 % for trap-door and depressed fractures, with patient-level metrics of 100 % sensitivity and 88.9 % specificity. The complete system correctly identified 18/20 no-fracture cases, 35/40 depressed fracture cases, and 15/17 trap-door fracture cases. Our hierarchical deep learning system effectively detects orbital fractures and distinguishes between depressed and trap-door types with high accuracy. This approach could aid in the timely identification of trap-door fractures requiring urgent surgical intervention, particularly in settings lacking specialized expertise.

CT Detection Retrospective Clinical In Silico Academic Lab

Multiparametric ultrasound techniques are superior to AI-assisted ultrasound for assessment of solid thyroid nodules: a prospective study.

Li Y, Li X, Yan L, Xiao J, Yang Z, Zhang M, Luo Y

•papers•Jul 10 2025

To evaluate the diagnostic performance of multiparametric ultrasound (mpUS) and AI-assisted B-mode ultrasound (AI-US), and their potential to reduce unnecessary biopsies to B-mode for solid thyroid nodules. This prospective study enrolled 226 solid thyroid nodules with 145 malignant and 81 benign pathological results from 189 patients (35 men and 154 women; age range, 19-73 years; mean age, 45 years). Each nodule was examined using B-mode, microvascular flow imaging (MVFI), elastography with elasticity contrast index (ECI), and an AI system. Image data were recorded for each modality. Ten readers with different experience levels independently evaluated the B-mode images of each nodule to make a "benign" or "malignant" diagnosis in both an unblinded and blinded manner to the AI reports. The most accurate ECI value and MVFI mode were selected and combined with the dichotomous prediction of all readers. Descriptive statistics and AUCs were used to evaluate the diagnostic performances of mpUS and AI-US. Triple mpUS with B-mode, MVFI, and ECI exhibited the highest diagnostic performance (average AUC = 0.811 vs. 0.677 for B-mode, p = 0.001), followed by AI-US (average AUC = 0.718, p = 0.315). Triple mpUS significantly reduced the unnecessary biopsy rate by up to 12% (p = 0.007). AUC and specificity were significantly higher for triple mpUS than for AI-US mode (both p < 0.05). Compared to AI-US, triple mpUS (B-mode, MVFI, and ECI) exhibited better diagnostic performance for thyroid cancer diagnosis, and resulted in a significant reduction in unnecessary biopsy rate. AI systems are expected to take advantage of multi-modal information to facilitate diagnoses.

Ultrasound Classification Prospective Clinical Pilot Academic Lab

Filter Papers

Tags

Incremental diagnostic value of AI-derived coronary artery calcium in 18F-flurpiridaz PET Myocardial Perfusion Imaging

Understanding Dataset Bias in Medical Imaging: A Case Study on Chest X-rays

Data Extraction and Curation from Radiology Reports for Pancreatic Cyst Surveillance Using Large Language Models.

An Enhanced Privacy-preserving Federated Few-shot Learning Framework for Respiratory Disease Diagnosis

Recurrence prediction of invasive ductal carcinoma from preoperative contrast-enhanced computed tomography using deep convolutional neural network.

Understanding Dataset Bias in Medical Imaging: A Case Study on Chest X-rays

Depth-Sequence Transformer (DST) for Segment-Specific ICA Calcification Mapping on Non-Contrast CT

A deep learning-based clinical decision support system for glioma grading using ensemble learning and knowledge distillation.

Hierarchical deep learning system for orbital fracture detection and trap-door classification on CT images.

Multiparametric ultrasound techniques are superior to AI-assisted ultrasound for assessment of solid thyroid nodules: a prospective study.

Ready to Sharpen Your Edge?