Latest Papers on Radiology AI. Tags: Classification, Order: Best Match, Limit: 10.

Physician-level classification performance across multiple imaging domains with a diagnostic medical foundation model and a large dataset of annotated medical images

Thieme, A. H., Miri, T., Marra, A. R., Kobayashi, T., Rodriguez-Nava, G., Li, Y., Barba, T., Er, A. G., Benzler, J., Gertler, M., Riechers, M., Hinze, C., Zheng, Y., Pelz, K., Nagaraj, D., Chen, A., Loeser, A., Ruehle, A., Zamboglou, C., Alyahya, L., Uhlig, M., Machiraju, G., Weimann, K., Lippert, C., Conrad, T., Ma, J., Novoa, R., Moor, M., Hernandez-Boussard, T., Alawad, M., Salinas, J. L., Mittermaier, M., Gevaert, O.

•preprint•May 31 2025

A diagnostic medical foundation model (MedFM) is an artificial intelligence (AI) system engineered to accurately determine diagnoses across various medical imaging modalities and specialties. To train MedFM, we created the PubMed Central Medical Images Dataset (PMCMID), the largest annotated medical image dataset to date, comprising 16,126,659 images from 3,021,780 medical publications. Using AI- and ontology-based methods, we identified 4,482,237 medical images (e.g., clinical photos, X-rays, ultrasounds) and generated comprehensive annotations. To optimize MedFMs performance and assess biases, 13,266 images were manually annotated to establish a multimodal benchmark. MedFM achieved physician-level performance in diagnosis tasks spanning radiology, dermatology, and infectious diseases without requiring specific training. Additionally, we developed the Image2Paper app, allowing clinicians to upload medical images and retrieve relevant literature. The correct diagnoses appeared within the top ten results in 88.4% and at least one relevant differential diagnosis in 93.0%. MedFM and PMCMID were made publicly available. FundingResearch reported here was partially supported by the National Cancer Institute (NCI) (R01 CA260271), the Saudi Company for Artificial Intelligence (SCAI) Authority, and the German Federal Ministry for Economic Affairs and Climate Action (BMWK) under the project DAKI-FWS (01MK21009E). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Mixed Modality Classification Other Methodology In Silico Academic Lab Breakthrough Open Dataset Open Code

Machine learning-based hemodynamics quantitative assessment of pulmonary circulation using computed tomographic pulmonary angiography.

Xie H, Zhao X, Zhang N, Liu J, Yang G, Cao Y, Xu J, Xu L, Sun Z, Wen Z, Chai S, Liu D

•papers•May 30 2025

Pulmonary hypertension (pH) is a malignant pulmonary circulation disease. Right heart catheterization (RHC) is the gold standard procedure for quantitative evaluation of pulmonary hemodynamics. Accurate and noninvasive quantitative evaluation of pulmonary hemodynamics is challenging due to the limitations of currently available assessment methods. Patients who underwent computed tomographic pulmonary angiography (CTPA) and RHC examinations within 2 weeks were included. The dataset was randomly divided into a training set and a test set at an 8:2 ratio. A radiomic feature model and another two-dimensional (2D) feature model aimed to quantitatively evaluate of pulmonary hemodynamics were constructed. The performance of models was determined by calculating the mean squared error, the intraclass correlation coefficient (ICC) and the area under the precision-recall curve (AUC-PR) and performing Bland-Altman analyses. 345 patients: 271 patients with PH (mean age 50 ± 17 years, 93 men) and 74 without PH (mean age 55 ± 16 years, 26 men) were identified. The predictive results of pulmonary hemodynamics of radiomic feature model integrating 5 2D features and other 30 radiomic features were consistent with the results from RHC, and outperformed another 2D feature model. The radiomic feature model exhibited moderate to good reproducibility to predict pulmonary hemodynamic parameters (ICC reached 0.87). In addition, pH can be accurately identified based on a classification model (AUC-PR =0.99). This study provides a noninvasive method for comprehensively and quantitatively evaluating pulmonary hemodynamics using CTPA images, which has the potential to serve as an alternative to RHC, pending further validation.

CT Classification Chest Retrospective Clinical In Silico None Academic Lab Benchmark SOTA

Strategies for Treatment De-escalation in Metastatic Renal Cell Carcinoma.

Gulati S, Nardo L, Lara PN

•papers•May 30 2025

Immune checkpoint inhibitors (ICIs) and targeted therapies have revolutionized the management of metastatic renal cell carcinoma (mRCC). Currently, the frontline standard of care for patients with mRCC involves the provision of systemic ICI-based combination therapy with no clear guidelines on holding or de-escalating treatment, even with a complete or partial radiological response. Treatments usually continue until disease progression or unacceptable toxicity, frequently leading to overtreatment, which can elevate the risk of toxicity without providing a corresponding increase in therapeutic efficacy. In addition, the ongoing use of expensive antineoplastic drugs increases the financial burden on the already overstretched health care systems and on patients and their families. De-escalation strategies could be designed by integrating contemporary technologies, such as circulating tumor DNA, and advanced imaging techniques, such as computed tomography (CT) scans, positron emission tomography CT, magnetic resonance imaging, and machine learning models. Treatment de-escalation, when appropriate, can minimize treatment-related toxicities, reduce health care costs, and optimize the patients' quality of life while maintaining effective cancer control. This paper discusses the advantages, challenges, and clinical implications of de-escalation strategies in the management of mRCC. PATIENT SUMMARY: In this report, we describe the burden of overtreatment in patients who are never able to stop treatments for metastatic kidney cancer. We discuss the application of the latest technology that can help in making de-escalation decisions.

Mixed Modality Classification Abdominal Review Concept Academic Lab

A Study on Predicting the Efficacy of Posterior Lumbar Interbody Fusion Surgery Using a Deep Learning Radiomics Model.

Fang L, Pan Y, Zheng H, Li F, Zhang W, Liu J, Zhou Q

•papers•May 30 2025

This study seeks to develop a combined model integrating clinical data, radiomics, and deep learning (DL) for predicting the efficacy of posterior lumbar interbody fusion (PLIF) surgery. A retrospective review was conducted on 461 patients who underwent PLIF for degenerative lumbar diseases. These patients were partitioned into a training set (n=368) and a test set (n=93) in an 8:2 ratio. Clinical models, radiomics models, and DL models were constructed based on logistic regression and random forest, respectively. A combined model was established by integrating these three models. All radiomics and DL features were extracted from sagittal T2-weighted images using 3D slicer software. The least absolute shrinkage and selection operator method selected the optimal radiomics and DL features to build the models. In addition to analyzing the original region of interest (ROI), we also conducted different degrees of mask expansion on the ROI to determine the optimal ROI. The performance of the model was evaluated by using the receiver operating characteristic curve (ROC) and the area under the ROC curve. The differences in AUC were compared by DeLong test. Among the clinical characteristics, patient age, body weight, and preoperative intervertebral distance at the surgical segment are risk factors affecting the fusion outcome. The radiomics model based on MRI with expanded 10 mm mask showed excellent performance (training set AUC=0.814, 95% CI: (0.761-0.866); test set AUC=0.749, 95% CI: [0.631-0.866]). Among all single models, the DL model had the best diagnostic prediction performance, with AUC values of (0.995, 95% CI: [0.991-0.999]) for the training set and (0.803, 95% CI: [0.705-0.902]) for the test set. Compared to all the models, the combined model of clinical features, radiomics features, and DL features had the best diagnostic prediction performance, with AUC values of (0.993, 95% CI: [0.987-0.999]) for the training set and (0.866, 95% CI: [0.778-0.955]) for the test set. The proposed clinical feature-deep learning radiomics model can effectively predict the postoperative efficacy of patients undergoing PLIF surgery and has good clinical applicability.

MRI Classification Musculoskeletal Retrospective Clinical In Silico None Academic Lab

Imaging-based machine learning to evaluate the severity of ischemic stroke in the middle cerebral artery territory.

Xie G, Gao J, Liu J, Zhou X, Zhao Z, Tang W, Zhang Y, Zhang L, Li K

•papers•May 30 2025

This study aims to develop an imaging-based machine learning model for evaluating the severity of ischemic stroke in the middle cerebral artery (MCA) territory. This retrospective study included 173 patients diagnosed with acute ischemic stroke (AIS) in the MCA territory from two centers, with 114 in the training set and 59 in the test set. In the training set, spearman correlation coefficient and multiple linear regression were utilized to analyze the correlation between the CT imaging features of patients prior to treatment and the national institutes of health stroke scale (NIHSS) score. Subsequently, an optimal machine learning algorithm was determined by comparing seven different algorithms. This algorithm was then used to construct a imaging-based prediction model for stroke severity (severe and non-severe). Finally, the model was validated in the test set. After conducting correlation analysis, CT imaging features such as infarction side, basal ganglia area involvement, dense MCA sign, and infarction volume were found to be independently associated with NIHSS score (P < 0.05). The Logistic Regression algorithm was determined to be the optimal method for constructing the prediction model for stroke severity. The area under the receiver operating characteristic curve of the model in both the training set and test set were 0.815 (95% CI: 0.736-0.893) and 0.780 (95% CI: 0.646-0.914), respectively, with accuracies of 0.772 and 0.814. Imaging-based machine learning model can effectively evaluate the severity (severe or non-severe) of ischemic stroke in the MCA territory. Not applicable.

CT Classification Neurological Retrospective Clinical In Silico None Academic Lab

Radiomics-based differentiation of upper urinary tract urothelial and renal cell carcinoma in preoperative computed tomography datasets.

Marcon J, Weinhold P, Rzany M, Fabritius MP, Winkelmann M, Buchner A, Eismann L, Jokisch JF, Casuscelli J, Schulz GB, Knösel T, Ingrisch M, Ricke J, Stief CG, Rodler S, Kazmierczak PM

•papers•May 30 2025

To investigate a non-invasive radiomics-based machine learning algorithm to differentiate upper urinary tract urothelial carcinoma (UTUC) from renal cell carcinoma (RCC) prior to surgical intervention. Preoperative computed tomography venous-phase datasets from patients that underwent procedures for histopathologically confirmed UTUC or RCC were retrospectively analyzed. Tumor segmentation was performed manually, and radiomic features were extracted according to the International Image Biomarker Standardization Initiative. Features were normalized using z-scores, and a predictive model was developed using the least absolute shrinkage and selection operator (LASSO). The dataset was split into a training cohort (70%) and a test cohort (30%). A total of 236 patients [30.5% female, median age 70.5 years (IQR: 59.5-77), median tumor size 5.8 cm (range: 4.1-8.2 cm)] were included. For differentiating UTUC from RCC, the model achieved a sensitivity of 88.4% and specificity of 81% (AUC: 0.93, radiomics score cutoff: 0.467) in the training cohort. In the validation cohort, the sensitivity was 80.6% and specificity 80% (AUC: 0.87, radiomics score cutoff: 0.601). Subgroup analysis of the validation cohort demonstrated robust performance, particularly in distinguishing clear cell RCC from high-grade UTUC (sensitivity: 84%, specificity: 73.1%, AUC: 0.84) and high-grade from low-grade UTUC (sensitivity: 57.7%, specificity: 88.9%, AUC: 0.68). Limitations include the need for independent validation in future randomized controlled trials (RCTs). Machine learning-based radiomics models can reliably differentiate between RCC and UTUC in preoperative CT imaging. With a suggested performance benefit compared to conventional imaging, this technology might be added to the current preoperative diagnostic workflow. Local ethics committee no. 20-179.

CT Classification Abdominal Retrospective Clinical In Silico None Academic Lab

Deep learning based motion correction in ultrasound microvessel imaging approach improves thyroid nodule classification.

Saini M, Larson NB, Fatemi M, Alizad A

•papers•May 30 2025

To address inter-frame motion artifacts in ultrasound quantitative high-definition microvasculature imaging (qHDMI), we introduced a novel deep learning-based motion correction technique. This approach enables the derivation of more accurate quantitative biomarkers from motion-corrected HDMI images, improving the classification of thyroid nodules. Inter-frame motion, often caused by carotid artery pulsation near the thyroid, can degrade image quality and compromise biomarker reliability, potentially leading to misdiagnosis. Our proposed technique compensates for these motion-induced artifacts, preserving the fine vascular structures critical for accurate biomarker extraction. In this study, we utilized the motion-corrected images obtained through this framework to derive the quantitative biomarkers and evaluated their effectiveness in thyroid nodule classification. We segregated the dataset according to the amount of motion into low and high motion containing cases based on the inter-frame correlation values and performed the thyroid nodule classification for the high motion containing cases and the full dataset. A comprehensive analysis of the biomarker distributions obtained after using the corresponding motion-corrected images demonstrates the significant differences between benign and malignant nodule biomarker characteristics compared to the original motion-containing images. Specifically, the bifurcation angle values derived from the quantitative high-definition microvasculature imaging (qHDMI) become more consistent with the usual trend after motion correction. The classification results demonstrated that sensitivity remained unchanged for groups with less motion, while improved by 9.2% for groups with high motion. These findings highlight that motion correction helps in deriving more accurate biomarkers, which improves the overall classification performance.

Ultrasound Classification Abdominal Retrospective Clinical In Silico None Academic Lab

Machine Learning Models of Voxel-Level [18F] Fluorodeoxyglucose Positron Emission Tomography Data Excel at Predicting Progressive Supranuclear Palsy Pathology.

Braun AS, Satoh R, Pham NTT, Singh-Reilly N, Ali F, Dickson DW, Lowe VJ, Whitwell JL, Josephs KA

•papers•May 30 2025

To determine whether a machine learning model of voxel level [18f]fluorodeoxyglucose positron emission tomography (PET) data could predict progressive supranuclear palsy (PSP) pathology, as well as outperform currently available biomarkers. One hundred and thirty-seven autopsied patients with PSP (n = 42) and other neurodegenerative diseases (n = 95) who underwent antemortem [18f]fluorodeoxyglucose PET and 3.0 Tesla magnetic resonance imaging (MRI) scans were analyzed. A linear support vector machine was applied to differentiate pathological groups with sensitivity analyses performed to assess the influence of voxel size and region removal. A radial basis function was also prepared to create a secondary model using the most important voxels. The models were optimized on the main dataset (n = 104), and their performance was compared with the magnetic resonance parkinsonism index measured on MRI in the independent test dataset (n = 33). The model had the highest accuracy (0.91) and F-score (0.86) when voxel size was 6mm. In this optimized model, important voxels for differentiating the groups were observed in the thalamus, midbrain, and cerebellar dentate. The secondary models found the combination of thalamus and dentate to have the highest accuracy (0.89) and F-score (0.81). The optimized secondary model showed the highest accuracy (0.91) and F-scores (0.86) in the test dataset and outperformed the magnetic resonance parkinsonism index (0.81 and 0.70, respectively). The results suggest that glucose hypometabolism in the thalamus and cerebellar dentate have the highest potential for predicting PSP pathology. Our optimized machine learning model outperformed the best currently available biomarker to predict PSP pathology. ANN NEUROL 2025.

PET Classification Neurological Retrospective Clinical In Silico None Academic Lab

Artificial Intelligence for Assessment of Digital Mammography Positioning Reveals Persistent Challenges.

Margolies LR, Spear GG, Payne JI, Iles SE, Abdolell M

•papers•May 30 2025

Mammographic breast cancer detection depends on high-quality positioning, which is traditionally assessed and monitored subjectively. This study used artificial intelligence (AI) to evaluate mammography positioning on digital screening mammograms to identify and quantify unmet mammography positioning quality (MPQ). Data were collected within an IRB-approved collaboration. In total, 126 367 digital mammography studies (553 339 images) were processed. Unmet MPQ criteria, including exaggeration, portion cutoff, posterior tissue missing, nipple not in profile, too high on image receptor, inadequate pectoralis length, sagging, and posterior nipple line (PNL) length difference, were evaluated using MPQ AI algorithms. The similarity of unmet MPQ occurrence and rank order was compared for each health system. Altogether, 163 759 and 219 785 unmet MPQ criteria were identified, respectively, at the health systems. The rank order and the probability distribution of the unmet MPQ criteria were not statistically significantly different between health systems (P = .844 and P = .92, respectively). The 3 most-common unmet MPQ criteria were: short PNL length on the craniocaudal (CC) view, inadequate pectoralis muscle, and excessive exaggeration on the CC view. The percentages of unmet positioning criteria out of the total potential unmet positioning criteria at health system 1 and health system 2 were 8.4% (163 759/1 949 922) and 7.3% (219 785/3 030 129), respectively. Artificial intelligence identified a similar distribution of unmet MPQ criteria in 2 health systems' daily work. Knowledge of current commonly unmet MPQ criteria can facilitate the improvement of mammography quality through tailored education strategies.

Mammography Classification Breast Retrospective Clinical In Silico None Academic Lab

Deep learning without borders: recent advances in ultrasound image classification for liver diseases diagnosis.

Yousefzamani M, Babapour Mofrad F

•papers•May 30 2025

Liver diseases are among the top global health burdens. Recently, there has been an increasing significance of diagnostics without discomfort to the patient; among them, ultrasound is the most used. Deep learning, in particular convolutional neural networks, has revolutionized the classification of liver diseases by automatically performing some specific analyses of difficult images. This review summarizes the progress that has been made in deep learning techniques for the classification of liver diseases using ultrasound imaging. It evaluates various models from CNNs to their hybrid versions, such as CNN-Transformer, for detecting fatty liver, fibrosis, and liver cancer, among others. Several challenges in the generalization of data and models across a different clinical environment are also discussed. Deep learning has great prospects for automatic diagnosis of liver diseases. Most of the models have performed with high accuracy in different clinical studies. Despite this promise, challenges relating to generalization have remained. Future hardware developments and access to quality clinical data continue to further improve the performance of these models and ensure their vital role in the diagnosis of liver diseases.

Ultrasound Classification Abdominal Review Academic Lab

Physician-level classification performance across multiple imaging domains with a diagnostic medical foundation model and a large dataset of annotated medical images

Machine learning-based hemodynamics quantitative assessment of pulmonary circulation using computed tomographic pulmonary angiography.

Strategies for Treatment De-escalation in Metastatic Renal Cell Carcinoma.

A Study on Predicting the Efficacy of Posterior Lumbar Interbody Fusion Surgery Using a Deep Learning Radiomics Model.

Imaging-based machine learning to evaluate the severity of ischemic stroke in the middle cerebral artery territory.

Radiomics-based differentiation of upper urinary tract urothelial and renal cell carcinoma in preoperative computed tomography datasets.

Deep learning based motion correction in ultrasound microvessel imaging approach improves thyroid nodule classification.

Machine Learning Models of Voxel-Level [<sup>18</sup>F] Fluorodeoxyglucose Positron Emission Tomography Data Excel at Predicting Progressive Supranuclear Palsy Pathology.

Artificial Intelligence for Assessment of Digital Mammography Positioning Reveals Persistent Challenges.

Deep learning without borders: recent advances in ultrasound image classification for liver diseases diagnosis.

Ready to Sharpen Your Edge?