Latest Papers on Radiology AI.

Analysis on artificial intelligence-based chest computed tomography in multidisciplinary treatment models for discriminating benign and malignant pulmonary nodules.

Liu XY, Shan FC, Li H, Zhu JB

•papers•Aug 4 2025

To evaluate the effectiveness of AI-based chest Computed Tomography (CT) in a Multidisciplinary Diagnosis and Treatment (MDT) model for differentiating benign and malignant pulmonary nodules. This retrospective study screened a total of 87 patients with pulmonary nodules who were treated between January 2019 and December 2020 at Binzhou People's Hospital, Qingdao Municipal Hospital, and Laiwu People's Hospital. AI analysis, MDT consultation, and a combined diagnostic approach were assessed using postoperative pathology as the reference standard. Among 87 nodules, 69 (79.31 %) were malignant, and 18 (20.69 %) were benign. AI analysis showed moderate agreement with pathology (κ = 0.637, p < 0.05), while MDT and the combined approach demonstrated higher consistency (κ = 0.847, 0.888, p < 0.05). Sensitivity and specificity were as follows: AI (89.86 %, 77.78 %, AUC = 0.838), MDT (100 %, 77.78 %, AUC = 0.889), and the combined approach (100 %, 83.33 %, AUC = 0.917). The accuracy of the combined method (96.55 %) was superior to MDT (95.40 %) and AI alone (87.36 %) (p < 0.05). AI-based chest CT combined with MDT may improve diagnostic accuracy and shows potential for broader clinical application.

CT Classification Chest Retrospective Clinical Clinical Pilot

Vessel-specific reliability of artificial intelligence-based coronary artery calcium scoring on non-ECG-gated chest CT: a comparative study with ECG-gated cardiac CT.

Zhang J, Liu K, You C, Gong J

•papers•Aug 4 2025

To evaluate the performance of artificial intelligence (AI)-based coronary artery calcium scoring (CACS) on non-electrocardiogram (ECG)-gated chest CT, using manual quantification as the reference standard, while characterizing per-vessel reliability and clinical risk classification impacts. Retrospective study of 290 patients (June 2023-2024) with paired non-ECG-gated chest CT and ECG-gated cardiac CT (median time was 2 days). AI-based CACS and manual CACS (CACS_man) were compared using intraclass correlation coefficient (ICC) and weighted Cohen's kappa (3,1). Error types, anatomical distributions, and CACS of the lesions of individual arteries or segments were assessed in accordance with the Society of Cardiovascular Computed Tomography (SCCT) guidelines. The total CACS of chest CT demonstrated excellent concordance with CACS_man (ICC = 0.87, 95 % CI 0.84-0.90). Non-ECG-gated chest showed a 7.5-fold increased risk misclassification rate compared to ECG-gated cardiac CT (41.4 % vs. 5.5 %), with 35.5 % overclassification and 5.9 % underclassification. Vessel-specific analysis revealed paradoxical reliability of the left anterior descending artery (LAD) due to stent misclassification in four cases (ICC = 0.93 on chest CT vs 0.82 on cardiac CT), while the right coronary artery (RCA) demonstrated suboptimal performance with ICCs ranging from 0.60 to 0.68. Chest CT exhibited higher false-positive (1.9 % vs 0.5 %) and false-negative rates (14.4 % vs 4.3 %). False positive mainly derived from image noise in proximal LAD/RCA (median CACS 5.97 vs 3.45) and anatomical error, while false negatives involved RCA microcalcifications (median CACS 2.64). AI-based non-ECG-gated chest CT demonstrates utility for opportunistic screening but requires protocol optimization to address vessel-specific limitations and mitigate 41.4 % risk misclassification rates.

CT Detection Cardiac Retrospective Clinical In Silico

Scaling Artificial Intelligence for Prostate Cancer Detection on MRI towards Population-Based Screening and Primary Diagnosis in a Global, Multiethnic Population (Study Protocol)

Anindo Saha, Joeran S. Bosma, Jasper J. Twilt, Alexander B. C. D. Ng, Aqua Asif, Kirti Magudia, Peder Larson, Qinglin Xie, Xiaodong Zhang, Chi Pham Minh, Samuel N. Gitau, Ivo G. Schoots, Martijn F. Boomsma, Renato Cuocolo, Nikolaos Papanikolaou, Daniele Regge, Derya Yakar, Mattijs Elschot, Jeroen Veltman, Baris Turkbey, Nancy A. Obuchowski, Jurgen J. Fütterer, Anwar R. Padhani, Hashim U. Ahmed, Tobias Nordström, Martin Eklund, Veeru Kasivisvanathan, Maarten de Rooij, Henkjan Huisman

•preprint•Aug 4 2025

In this intercontinental, confirmatory study, we include a retrospective cohort of 22,481 MRI examinations (21,288 patients; 46 cities in 22 countries) to train and externally validate the PI-CAI-2B model, i.e., an efficient, next-generation iteration of the state-of-the-art AI system that was developed for detecting Gleason grade group $\geq$2 prostate cancer on MRI during the PI-CAI study. Of these examinations, 20,471 cases (19,278 patients; 26 cities in 14 countries) from two EU Horizon projects (ProCAncer-I, COMFORT) and 12 independent centers based in Europe, North America, Asia and Africa, are used for training and internal testing. Additionally, 2010 cases (2010 patients; 20 external cities in 12 countries) from population-based screening (STHLM3-MRI, IP1-PROSTAGRAM trials) and primary diagnostic settings (PRIME trial) based in Europe, North and South Americas, Asia and Australia, are used for external testing. Primary endpoint is the proportion of AI-based assessments in agreement with the standard of care diagnoses (i.e., clinical assessments made by expert uropathologists on histopathology, if available, or at least two expert urogenital radiologists in consensus; with access to patient history and peer consultation) in the detection of Gleason grade group $\geq$2 prostate cancer within the external testing cohorts. Our statistical analysis plan is prespecified with a hypothesis of diagnostic interchangeability to the standard of care at the PI-RADS $\geq$3 (primary diagnosis) or $\geq$4 (screening) cut-off, considering an absolute margin of 0.05 and reader estimates derived from the PI-CAI observer study (62 radiologists reading 400 cases). Secondary measures comprise the area under the receiver operating characteristic curve (AUROC) of the AI system stratified by imaging quality, patient age and patient ethnicity to identify underlying biases (if any).

MRI Detection Abdominal Retrospective Clinical In Silico Consortium Benchmark SOTA Open Dataset Ethics

The Use of Artificial Intelligence to Improve Detection of Acute Incidental Pulmonary Emboli.

Kuzo RS, Levin DL, Bratt AK, Walkoff LA, Suman G, Houghton DE

•papers•Aug 4 2025

Incidental pulmonary emboli (IPE) are frequently overlooked by radiologists. Artificial intelligence (AI) algorithms have been developed to aid detection of pulmonary emboli. To measure diagnostic performance of AI compared with prospective interpretation by radiologists. A commercially available AI algorithm was used to retrospectively review 14,453 contrast-enhanced outpatient CT CAP exams in 9171 patients where PE was not clinically suspected. Natural language processing (NLP) searches of reports identified IPE detected prospectively. Thoracic radiologists reviewed all cases read as positive by AI or NLP to confirm IPE and assess the most proximal level of clot and overall clot burden. 1,400 cases read as negative by both the initial radiologist and AI were re-reviewed to assess for additional IPE. Radiologists prospectively detected 218 IPE and AI detected an additional 36 unreported cases. AI missed 30 cases of IPE detected by the radiologist and had 94 false positives. For 36 IPE missed by the radiologist, median clot burden was 1 and 19 were solitary segmental or subsegmental. For 30 IPE missed by AI, one case had large central emboli and the others were small with 23 solitary subsegmental emboli. Radiologist re-review of 1,400 exams interpreted as negative found 8 additional cases of IPE. Compared with radiologists, AI had similar sensitivity but reduced positive predictive value. Our experience indicates that the AI tool is not ready to be used autonomously without human oversight, but a human observer plus AI is better than either alone for detection of incidental pulmonary emboli.

CT Detection Chest Retrospective Clinical In Silico Academic Lab

Artificial intelligence: a new era in prostate cancer diagnosis and treatment.

Vidiyala N, Parupathi P, Sunkishala P, Sree C, Gujja A, Kanagala P, Meduri SK, Nyavanandi D

•papers•Aug 4 2025

Prostate cancer (PCa) represents one of the most prevalent cancers among men, with substantial challenges in timely and accurate diagnosis and subsequent treatment. Traditional diagnosis and treatment methods for PCa, such as prostate-specific antigen (PSA) biomarker detection, digital rectal examination, imaging (CT/MRI) analysis, and biopsy histopathological examination, suffer from limitations such as a lack of specificity, generation of false positives or negatives, and difficulty in handling large data, leading to overdiagnosis and overtreatment. The integration of artificial intelligence (AI) in PCa diagnosis and treatment is revolutionizing traditional approaches by offering advanced tools for early detection, personalized treatment planning, and patient management. AI technologies, especially machine learning and deep learning, improve diagnostic accuracy and treatment planning. The AI algorithms analyze imaging data, like MRI and ultrasound, to identify cancerous lesions effectively with great precision. In addition, AI algorithms enhance risk assessment and prognosis by combining clinical, genomic, and imaging data. This leads to more tailored treatment strategies, enabling informed decisions about active surveillance, surgery, or new therapies, thereby improving quality of life while reducing unnecessary diagnoses and treatments. This review examines current AI applications in PCa care, focusing on their transformative impact on diagnosis and treatment planning while recognizing potential challenges. It also outlines expected improvements in diagnosis through AI-integrated systems and decision support tools for healthcare teams. The findings highlight AI's potential to enhance clinical outcomes, operational efficiency, and patient-centred care in managing PCa.

Mixed Modality Detection Abdominal Review Concept

ESR Essentials: common performance metrics in AI-practice recommendations by the European Society of Medical Imaging Informatics.

Klontzas ME, Groot Lipman KBW, Akinci D' Antonoli T, Andreychenko A, Cuocolo R, Dietzel M, Gitto S, Huisman H, Santinha J, Vernuccio F, Visser JJ, Huisman M

•papers•Aug 3 2025

This article provides radiologists with practical recommendations for evaluating AI performance in radiology, ensuring alignment with clinical goals and patient safety. It outlines key performance metrics, including overlap metrics for segmentation, test-based metrics (e.g., sensitivity, specificity, and area under the receiver operating characteristic curve), and outcome-based metrics (e.g., precision, negative predictive value, F1-score, Matthews correlation coefficient, and area under the precision-recall curve). Key recommendations emphasize local validation using independent datasets, selecting task-specific metrics, and considering deployment context to ensure real-world performance matches claimed efficacy. Common pitfalls, such as overreliance on a single metric, misinterpretation in low-prevalence settings, and failure to account for clinical workflow, are addressed with mitigation strategies. Additional guidance is provided on threshold selection, prevalence-adjusted evaluation, and AI-generated image quality assessment. This guide equips radiologists to critically evaluate both commercially available and in-house developed AI tools, ensuring their safe and effective integration into clinical practice. CLINICAL RELEVANCE STATEMENT: This review provides guidance on selecting and interpreting AI performance metrics in radiology, ensuring clinically meaningful evaluation and safe deployment of AI tools. By addressing common pitfalls and promoting standardized reporting, it supports radiologists in making informed decisions, ultimately improving diagnostic accuracy and patient outcomes. KEY POINTS: Radiologists must evaluate performance metrics as they reflect acceptable performance in specific datasets but do not guarantee clinical utility. Independent evaluation tailored to the clinical setting is essential. Performance metrics must align with the intended task of the AI application-segmentation, detection, or classification-and be selected based on domain knowledge and clinical context. Sensitivity, specificity, area under the ROC curve, and accuracy must be interpreted with prevalence-dependent metrics (e.g., precision, F1 score, and Matthew's correlation coefficient) calculated for the target population to ensure safe and effective clinical use.

Review Consortium Policy Ethics

External evaluation of an open-source deep learning model for prostate cancer detection on bi-parametric MRI.

Johnson PM, Tong A, Ginocchio L, Del Hoyo JL, Smereka P, Harmon SA, Turkbey B, Chandarana H

•papers•Aug 3 2025

This study aims to evaluate the diagnostic accuracy of an open-source deep learning (DL) model for detecting clinically significant prostate cancer (csPCa) in biparametric MRI (bpMRI). It also aims to outline the necessary components of the model that facilitate effective sharing and external evaluation of PCa detection models. This retrospective diagnostic accuracy study evaluated a publicly available DL model trained to detect PCa on bpMRI. External validation was performed on bpMRI exams from 151 biologically male patients (mean age, 65 ± 8 years). The model's performance was evaluated using patient-level classification of PCa with both radiologist interpretation and histopathology serving as the ground truth. The model processed bpMRI inputs to generate lesion probability maps. Performance was assessed using the area under the receiver operating characteristic curve (AUC) for PI-RADS ≥ 3, PI-RADS ≥ 4, and csPCa (defined as Gleason ≥ 7) at an exam level. The model achieved AUCs of 0.86 (95% CI: 0.80-0.92) and 0.91 (95% CI: 0.85-0.96) for predicting PI-RADS ≥ 3 and ≥ 4 exams, respectively, and 0.78 (95% CI: 0.71-0.86) for csPCa. Sensitivity and specificity for csPCa were 0.87 and 0.53, respectively. Fleiss' kappa for inter-reader agreement was 0.51. The open-source DL model offers high sensitivity to clinically significant prostate cancer. The study underscores the importance of sharing model code and weights to enable effective external validation and further research. Question Inter-reader variability hinders the consistent and accurate detection of clinically significant prostate cancer in MRI. Findings An open-source deep learning model demonstrated reproducible diagnostic accuracy, achieving AUCs of 0.86 for PI-RADS ≥ 3 and 0.78 for CsPCa lesions. Clinical relevance The model's high sensitivity for MRI-positive lesions (PI-RADS ≥ 3) may provide support for radiologists. Its open-source deployment facilitates further development and evaluation across diverse clinical settings, maximizing its potential utility.

MRI Classification Abdominal Retrospective Clinical In Silico Open Source Open Code Benchmark SOTA

Functional immune state classification of unlabeled live human monocytes using holotomography and machine learning

Lee, M., Kim, G., Lee, M. S., Shin, J. W., Lee, J. H., Ryu, D. H., Kim, Y. S., Chung, Y., Kim, K. S., Park, Y.

•preprint•Aug 3 2025

Sepsis is an abnormally dysregulated immune response against infection in which the human immune system ranges from a hyper-inflammatory phase to an immune-suppressive phase. Current assessment methods are limiting owing to time-consuming and laborious sample preparation protocols. We propose a rapid label-free imaging-based technique to assess the immune status of individual human monocytes. High-resolution intracellular compositions of individual monocytes are quantitatively measured in terms of the three-dimensional distribution of refractive index values using holotomography, which are then analyzed using machine-learning algorithms to train for the classification into three distinct immune states: normal, hyper-inflammation, and immune suppression. The immune status prediction accuracy of the machine-learning holotomography classifier was 83.7% and 99.9% for one and six cell measurements, respectively. Our results suggested that this technique can provide a rapid deterministic method for the real-time evaluation of the immune status of an individual.

OCT Classification Methodology In Silico Academic Lab

Adapting foundation models for rapid clinical response: intracerebral hemorrhage segmentation in emergency settings.

Gerbasi A, Mazzacane F, Ferrari F, Del Bello B, Cavallini A, Bellazzi R, Quaglini S

•papers•Aug 3 2025

Intracerebral hemorrhage (ICH) is a medical emergency that demands rapid and accurate diagnosis for optimal patient management. Hemorrhagic lesions' segmentation on CT scans is a necessary first step for acquiring quantitative imaging data that are becoming increasingly useful in the clinical setting. However, traditional manual segmentation is time-consuming and prone to inter-rater variability, creating a need for automated solutions. This study introduces a novel approach combining advanced deep learning models to segment extensive and morphologically variable ICH lesions in non-contrast CT scans. We propose a two-step methodology that begins with a user-defined loose bounding box around the lesion, followed by a fine-tuned YOLOv8-S object detection model to generate precise, slice-specific bounding boxes. These bounding boxes are then used to prompt the Medical Segment Anything Model for accurate lesion segmentation. Our pipeline achieves high segmentation accuracy with minimal supervision, demonstrating strong potential as a practical alternative to task-specific models. We evaluated the model on a dataset of 252 CT scans demonstrating high performance in segmentation accuracy and robustness. Finally, the resulting segmentation tool is integrated into a user-friendly web application prototype, offering clinicians a simple interface for lesion identification and radiomic quantification.

CT Segmentation Neurological Methodology Prototype GenAI

Medical Image De-Identification Resources: Synthetic DICOM Data and Tools for Validation

Michael W. Rutherford, Tracy Nolan, Linmin Pei, Ulrike Wagner, Qinyan Pan, Phillip Farmer, Kirk Smith, Benjamin Kopchick, Laura Opsahl-Ong, Granger Sutton, David Clunie, Keyvan Farahani, Fred Prior

•preprint•Aug 3 2025

Medical imaging research increasingly depends on large-scale data sharing to promote reproducibility and train Artificial Intelligence (AI) models. Ensuring patient privacy remains a significant challenge for open-access data sharing. Digital Imaging and Communications in Medicine (DICOM), the global standard data format for medical imaging, encodes both essential clinical metadata and extensive protected health information (PHI) and personally identifiable information (PII). Effective de-identification must remove identifiers, preserve scientific utility, and maintain DICOM validity. Tools exist to perform de-identification, but few assess its effectiveness, and most rely on subjective reviews, limiting reproducibility and regulatory confidence. To address this gap, we developed an openly accessible DICOM dataset infused with synthetic PHI/PII and an evaluation framework for benchmarking image de-identification workflows. The Medical Image de-identification (MIDI) dataset was built using publicly available de-identified data from The Cancer Imaging Archive (TCIA). It includes 538 subjects (216 for validation, 322 for testing), 605 studies, 708 series, and 53,581 DICOM image instances. These span multiple vendors, imaging modalities, and cancer types. Synthetic PHI and PII were embedded into structured data elements, plain text data elements, and pixel data to simulate real-world identity leaks encountered by TCIA curation teams. Accompanying evaluation tools include a Python script, answer keys (known truth), and mapping files that enable automated comparison of curated data against expected transformations. The framework is aligned with the HIPAA Privacy Rule "Safe Harbor" method, DICOM PS3.15 Confidentiality Profiles, and TCIA best practices. It supports objective, standards-driven evaluation of de-identification workflows, promoting safer and more consistent medical image sharing.

Mixed Modality Classification Dataset Release In Silico Academic Lab Open Dataset Reproducibility

Filter Papers

Tags

Analysis on artificial intelligence-based chest computed tomography in multidisciplinary treatment models for discriminating benign and malignant pulmonary nodules.

Vessel-specific reliability of artificial intelligence-based coronary artery calcium scoring on non-ECG-gated chest CT: a comparative study with ECG-gated cardiac CT.

Scaling Artificial Intelligence for Prostate Cancer Detection on MRI towards Population-Based Screening and Primary Diagnosis in a Global, Multiethnic Population (Study Protocol)

The Use of Artificial Intelligence to Improve Detection of Acute Incidental Pulmonary Emboli.

Artificial intelligence: a new era in prostate cancer diagnosis and treatment.

ESR Essentials: common performance metrics in AI-practice recommendations by the European Society of Medical Imaging Informatics.

External evaluation of an open-source deep learning model for prostate cancer detection on bi-parametric MRI.

Functional immune state classification of unlabeled live human monocytes using holotomography and machine learning

Adapting foundation models for rapid clinical response: intracerebral hemorrhage segmentation in emergency settings.

Medical Image De-Identification Resources: Synthetic DICOM Data and Tools for Validation

Ready to Sharpen Your Edge?