Latest Papers on Radiology AI. Sources: pubmed, Tags: Benchmark SOTA, Order: Best Match, Limit: 10.

Children Are Not Small Adults: Addressing Limited Generalizability of an Adult Deep Learning CT Organ Segmentation Model to the Pediatric Population.

Chatterjee D, Kanhere A, Doo FX, Zhao J, Chan A, Welsh A, Kulkarni P, Trang A, Parekh VS, Yi PH

•papers•Jun 1 2025

Deep learning (DL) tools developed on adult data sets may not generalize well to pediatric patients, posing potential safety risks. We evaluated the performance of TotalSegmentator, a state-of-the-art adult-trained CT organ segmentation model, on a subset of organs in a pediatric CT dataset and explored optimization strategies to improve pediatric segmentation performance. TotalSegmentator was retrospectively evaluated on abdominal CT scans from an external adult dataset (n = 300) and an external pediatric data set (n = 359). Generalizability was quantified by comparing Dice scores between adult and pediatric external data sets using Mann-Whitney U tests. Two DL optimization approaches were then evaluated: (1) 3D nnU-Net model trained on only pediatric data, and (2) an adult nnU-Net model fine-tuned on the pediatric cases. Our results show TotalSegmentator had significantly lower overall mean Dice scores on pediatric vs. adult CT scans (0.73 vs. 0.81, P < .001) demonstrating limited generalizability to pediatric CT scans. Stratified by organ, there was lower mean pediatric Dice score for four organs (P < .001, all): right and left adrenal glands (right adrenal, 0.41 [0.39-0.43] vs. 0.69 [0.66-0.71]; left adrenal, 0.35 [0.32-0.37] vs. 0.68 [0.65-0.71]); duodenum (0.47 [0.45-0.49] vs. 0.67 [0.64-0.69]); and pancreas (0.73 [0.72-0.74] vs. 0.79 [0.77-0.81]). Performance on pediatric CT scans improved by developing pediatric-specific models and fine-tuning an adult-trained model on pediatric images where both methods significantly improved segmentation accuracy over TotalSegmentator for all organs, especially for smaller anatomical structures (e.g., > 0.2 higher mean Dice for adrenal glands; P < .001).

CT Segmentation Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Deep Conformal Supervision: Leveraging Intermediate Features for Robust Uncertainty Quantification.

Vahdani AM, Faghani S

•papers•Jun 1 2025

Trustworthiness is crucial for artificial intelligence (AI) models in clinical settings, and a fundamental aspect of trustworthy AI is uncertainty quantification (UQ). Conformal prediction as a robust uncertainty quantification (UQ) framework has been receiving increasing attention as a valuable tool in improving model trustworthiness. An area of active research is the method of non-conformity score calculation for conformal prediction. We propose deep conformal supervision (DCS), which leverages the intermediate outputs of deep supervision for non-conformity score calculation, via weighted averaging based on the inverse of mean calibration error for each stage. We benchmarked our method on two publicly available datasets focused on medical image classification: a pneumonia chest radiography dataset and a preprocessed version of the 2019 RSNA Intracranial Hemorrhage dataset. Our method achieved mean coverage errors of 16e-4 (CI: 1e-4, 41e-4) and 5e-4 (CI: 1e-4, 10e-4) compared to baseline mean coverage errors of 28e-4 (CI: 2e-4, 64e-4) and 21e-4 (CI: 8e-4, 3e-4) on the two datasets, respectively (p < 0.001 on both datasets). Based on our findings, the baseline results of conformal prediction already exhibit small coverage errors. However, our method shows a significant improvement on coverage error, particularly noticeable in scenarios involving smaller datasets or when considering smaller acceptable error levels, which are crucial in developing UQ frameworks for healthcare AI applications.

Mixed Modality Classification Methodology In Silico Academic Lab Benchmark SOTA

Using Machine Learning on MRI Radiomics to Diagnose Parotid Tumours Before Comparing Performance with Radiologists: A Pilot Study.

Ammari S, Quillent A, Elvira V, Bidault F, Garcia GCTE, Hartl DM, Balleyguier C, Lassau N, Chouzenoux É

•papers•Jun 1 2025

The parotid glands are the largest of the major salivary glands. They can harbour both benign and malignant tumours. Preoperative work-up relies on MR images and fine needle aspiration biopsy, but these diagnostic tools have low sensitivity and specificity, often leading to surgery for diagnostic purposes. The aim of this paper is (1) to develop a machine learning algorithm based on MR images characteristics to automatically classify parotid gland tumours and (2) compare its results with the diagnoses of junior and senior radiologists in order to evaluate its utility in routine practice. While automatic algorithms applied to parotid tumours classification have been developed in the past, we believe that our study is one of the first to leverage four different MRI sequences and propose a comparison with clinicians. In this study, we leverage data coming from a cohort of 134 patients treated for benign or malignant parotid tumours. Using radiomics extracted from the MR images of the gland, we train a random forest and a logistic regression to predict the corresponding histopathological subtypes. On the test set, the best results are given by the random forest: we obtain a 0.720 accuracy, a 0.860 specificity, and a 0.720 sensitivity over all histopathological subtypes, with an average AUC of 0.838. When considering the discrimination between benign and malignant tumours, the algorithm results in a 0.760 accuracy and a 0.769 AUC, both on test set. Moreover, the clinical experiment shows that our model helps to improve diagnostic abilities of junior radiologists as their sensitivity and accuracy raised by 6 % when using our proposed method. This algorithm may be useful for training of physicians. Radiomics with a machine learning algorithm may help improve discrimination between benign and malignant parotid tumours, decreasing the need for diagnostic surgery. Further studies are warranted to validate our algorithm for routine use.

MRI Classification Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Prediction of Malignancy and Pathological Types of Solid Lung Nodules on CT Scans Using a Volumetric SWIN Transformer.

Chen H, Wen Y, Wu W, Zhang Y, Pan X, Guan Y, Qin D

•papers•Jun 1 2025

Lung adenocarcinoma and squamous cell carcinoma are the two most common pathological lung cancer subtypes. Accurate diagnosis and pathological subtyping are crucial for lung cancer treatment. Solitary solid lung nodules with lobulation and spiculation signs are often indicative of lung cancer; however, in some cases, postoperative pathology finds benign solid lung nodules. It is critical to accurately identify solid lung nodules with lobulation and spiculation signs before surgery; however, traditional diagnostic imaging is prone to misdiagnosis, and studies on artificial intelligence-assisted diagnosis are few. Therefore, we introduce a volumetric SWIN Transformer-based method. It is a multi-scale, multi-task, and highly interpretable model for distinguishing between benign solid lung nodules with lobulation and spiculation signs, lung adenocarcinomas, and lung squamous cell carcinoma. The technique's effectiveness was improved by using 3-dimensional (3D) computed tomography (CT) images instead of conventional 2-dimensional (2D) images to combine as much information as possible. The model was trained using 352 of the 441 CT image sequences and validated using the rest. The experimental results showed that our model could accurately differentiate between benign lung nodules with lobulation and spiculation signs, lung adenocarcinoma, and squamous cell carcinoma. On the test set, our model achieves an accuracy of 0.9888, precision of 0.9892, recall of 0.9888, and an F1-score of 0.9888, along with a class activation mapping (CAM) visualization of the 3D model. Consequently, our method could be used as a preoperative tool to assist in diagnosing solitary solid lung nodules with lobulation and spiculation signs accurately and provide a theoretical basis for developing appropriate clinical diagnosis and treatment plans for the patients.

CT Classification Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Foundational Segmentation Models and Clinical Data Mining Enable Accurate Computer Vision for Lung Cancer.

Swinburne NC, Jackson CB, Pagano AM, Stember JN, Schefflein J, Marinelli B, Panyam PK, Autz A, Chopra MS, Holodny AI, Ginsberg MS

•papers•Jun 1 2025

This study aims to assess the effectiveness of integrating Segment Anything Model (SAM) and its variant MedSAM into the automated mining, object detection, and segmentation (MODS) methodology for developing robust lung cancer detection and segmentation models without post hoc labeling of training images. In a retrospective analysis, 10,000 chest computed tomography scans from patients with lung cancer were mined. Line measurement annotations were converted to bounding boxes, excluding boxes < 1 cm or > 7 cm. The You Only Look Once object detection architecture was used for teacher-student learning to label unannotated lesions on the training images. Subsequently, a final tumor detection model was trained and employed with SAM and MedSAM for tumor segmentation. Model performance was assessed on a manually annotated test dataset, with additional evaluations conducted on an external lung cancer dataset before and after detection model fine-tuning. Bootstrap resampling was used to calculate 95% confidence intervals. Data mining yielded 10,789 line annotations, resulting in 5403 training boxes. The baseline detection model achieved an internal F1 score of 0.847, improving to 0.860 after self-labeling. Tumor segmentation using the final detection model attained internal Dice similarity coefficients (DSCs) of 0.842 (SAM) and 0.822 (MedSAM). After fine-tuning, external validation showed an F1 of 0.832 and DSCs of 0.802 (SAM) and 0.804 (MedSAM). Integrating foundational segmentation models into the MODS framework results in high-performing lung cancer detection and segmentation models using only mined clinical data. Both SAM and MedSAM hold promise as foundational segmentation models for radiology images.

CT Detection Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Deep learning-enhanced zero echo time MRI for glenohumeral assessment in shoulder instability: a comparative study with CT.

Carretero-Gómez L, Fung M, Wiesinger F, Carl M, McKinnon G, de Arcos J, Mandava S, Arauz S, Sánchez-Lacalle E, Nagrani S, López-Alcorocho JM, Rodríguez-Íñigo E, Malpica N, Padrón M

•papers•Jun 1 2025

To evaluate image quality and lesion conspicuity of zero echo time (ZTE) MRI reconstructed with deep learning (DL)-based algorithm versus conventional reconstruction and to assess DL ZTE performance against CT for bone loss measurements in shoulder instability. Forty-four patients (9 females; 33.5 ± 15.65 years) with symptomatic anterior glenohumeral instability and no previous shoulder surgery underwent ZTE MRI and CT on the same day. ZTE images were reconstructed with conventional and DL methods and post-processed for CT-like contrast. Two musculoskeletal radiologists, blinded to the reconstruction method, independently evaluated 20 randomized MR ZTE datasets with and without DL-enhancement for perceived signal-to-noise ratio, resolution, and lesion conspicuity at humerus and glenoid using a 4-point Likert scale. Inter-reader reliability was assessed using weighted Cohen's kappa (K). An ordinal logistic regression model analyzed Likert scores, with the reconstruction method (DL-enhanced vs. conventional) as the predictor. Glenoid track (GT) and Hill-Sachs interval (HSI) measurements were performed by another radiologist on both DL ZTE and CT datasets. Intermodal agreement was assessed through intraclass correlation coefficients (ICCs) and Bland-Altman analysis. DL ZTE MR bone images scored higher than conventional ZTE across all items, with significantly improved perceived resolution (odds ratio (OR) = 7.67, p = 0.01) and glenoid lesion conspicuity (OR = 25.12, p = 0.01), with substantial inter-rater agreement (K = 0.61 (0.38-0.83) to 0.77 (0.58-0.95)). Inter-modality assessment showed almost perfect agreement between DL ZTE MR and CT for all bone measurements (overall ICC = 0.99 (0.97-0.99)), with mean differences of 0.08 (- 0.80 to 0.96) mm for GT and - 0.07 (- 1.24 to 1.10) mm for HSI. DL-based reconstruction enhances ZTE MRI quality for glenohumeral assessment, offering osseous evaluation and quantification equivalent to gold-standard CT, potentially simplifying preoperative workflow, and reducing CT radiation exposure.

MRI Reconstruction Musculoskeletal Retrospective Clinical Clinical Pilot Big Tech Benchmark SOTA

The role of deep learning in diagnostic imaging of spondyloarthropathies: a systematic review.

Omar M, Watad A, McGonagle D, Soffer S, Glicksberg BS, Nadkarni GN, Klang E

•papers•Jun 1 2025

Diagnostic imaging is an integral part of identifying spondyloarthropathies (SpA), yet the interpretation of these images can be challenging. This review evaluated the use of deep learning models to enhance the diagnostic accuracy of SpA imaging. Following PRISMA guidelines, we systematically searched major databases up to February 2024, focusing on studies that applied deep learning to SpA imaging. Performance metrics, model types, and diagnostic tasks were extracted and analyzed. Study quality was assessed using QUADAS-2. We analyzed 21 studies employing deep learning in SpA imaging diagnosis across MRI, CT, and X-ray modalities. These models, particularly advanced CNNs and U-Nets, demonstrated high accuracy in diagnosing SpA, differentiating arthritis forms, and assessing disease progression. Performance metrics frequently surpassed traditional methods, with some models achieving AUCs up to 0.98 and matching expert radiologist performance. This systematic review underscores the effectiveness of deep learning in SpA imaging diagnostics across MRI, CT, and X-ray modalities. The studies reviewed demonstrated high diagnostic accuracy. However, the presence of small sample sizes in some studies highlights the need for more extensive datasets and further prospective and external validation to enhance the generalizability of these AI models. Question How can deep learning models improve diagnostic accuracy in imaging for spondyloarthropathies (SpA), addressing challenges in early detection and differentiation from other forms of arthritis? Findings Deep learning models, especially CNNs and U-Nets, showed high accuracy in SpA imaging across MRI, CT, and X-ray, often matching or surpassing expert radiologists. Clinical relevance Deep learning models can enhance diagnostic precision in SpA imaging, potentially reducing diagnostic delays and improving treatment decisions, but further validation on larger datasets is required for clinical integration.

Mixed Modality Classification Musculoskeletal Review In Silico Academic Lab Benchmark SOTA GenAI

Age-dependent changes in CT vertebral attenuation values in opportunistic screening for osteoporosis: a nationwide multi-center study.

Kim Y, Kim HY, Lee S, Hong S, Lee JW

•papers•Jun 1 2025

To examine how vertebral attenuation changes with aging, and to establish age-adjusted CT attenuation value cutoffs for diagnosing osteoporosis. This multi-center retrospective study included 11,246 patients (mean age ± standard deviation, 50 ± 13 years; 7139 men) who underwent CT and dual-energy X-ray absorptiometry (DXA) in six health-screening centers between 2022 and 2023. Using deep-learning-based software, attenuation values of L1 vertebral bodies were measured. Segmented linear regression in women and simple linear regression in men were used to assess how attenuation values change with aging. A multivariable linear regression analysis was performed to determine whether age is associated with CT attenuation values independently of the DXA T-score. Age-adjusted cutoffs targeting either 90% sensitivity or 90% specificity were derived using quantile regression. Performance of both age-adjusted and age-unadjusted cutoffs was measured, where the target sensitivity or specificity was considered achieved if a 95% confidence interval encompassed 90%. While attenuation values declined consistently with age in men, they declined abruptly in women aged > 42 years. Such decline occurred independently of the DXA T-score (p < 0.001). Age adjustment seemed critical for age ≥ 65 years, where the age-adjusted cutoffs achieved the target (sensitivity of 91.5% (86.3-95.2%) when targeting 90% sensitivity and specificity of 90.0% (88.3-91.6%) when targeting 90% specificity), but age-unadjusted cutoffs did not (95.5% (91.2-98.0%) and 73.8% (71.4-76.1%), respectively). Age-adjusted cutoffs provided a more reliable diagnosis of osteoporosis than age-unadjusted cutoffs since vertebral attenuation values decrease with age, regardless of DXA T-scores. Question How does vertebral CT attenuation change with age? Findings Independent of dual-energy X-ray absorptiometry T-score, vertebral attenuation values on CT declined at a constant rate in men and abruptly in women over 42 years of age. Clinical relevance Age adjustments are needed in opportunistic osteoporosis screening, especially among the elderly.

CT Segmentation Musculoskeletal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Radiomics-driven spectral profiling of six kidney stone types with monoenergetic CT reconstructions in photon-counting CT.

Hertel A, Froelich MF, Overhoff D, Nestler T, Faby S, Jürgens M, Schmidt B, Vellala A, Hesse A, Nörenberg D, Stoll R, Schmelz H, Schoenberg SO, Waldeck S

•papers•Jun 1 2025

Urolithiasis, a common and painful urological condition, is influenced by factors such as lifestyle, genetics, and medication. Differentiating between different types of kidney stones is crucial for personalized therapy. The purpose of this study is to investigate the use of photon-counting computed tomography (PCCT) in combination with radiomics and machine learning to develop a method for automated and detailed characterization of kidney stones. This approach aims to enhance the accuracy and detail of stone classification beyond what is achievable with conventional computed tomography (CT) and dual-energy CT (DECT). In this ex vivo study, 135 kidney stones were first classified using infrared spectroscopy. All stones were then scanned in a PCCT embedded in a phantom. Various monoenergetic reconstructions were generated, and radiomics features were extracted. Statistical analysis was performed using Random Forest (RF) classifiers for both individual reconstructions and a combined model. The combined model, using radiomics features from all monoenergetic reconstructions, significantly outperformed individual reconstructions and SPP parameters, with an AUC of 0.95 and test accuracy of 0.81 for differentiating all six stone types. Feature importance analysis identified key parameters, including NGTDM_Strength and wavelet-LLH_firstorder_Variance. This ex vivo study demonstrates that radiomics-driven PCCT analysis can improve differentiation between kidney stone subtypes. The combined model outperformed individual monoenergetic levels, highlighting the potential of spectral profiling in PCCT to optimize treatment through image-based strategies. Question How can photon-counting computed tomography (PCCT) combined with radiomics improve the differentiation of kidney stone types beyond conventional CT and dual-energy CT, enhancing personalized therapy? Findings Our ex vivo study demonstrates that a combined spectral-driven radiomics model achieved 95% AUC and 81% test accuracy in differentiating six kidney stone types. Clinical relevance Implementing PCCT-based spectral-driven radiomics allows for precise non-invasive differentiation of kidney stone types, leading to improved diagnostic accuracy and more personalized, effective treatment strategies, potentially reducing the need for invasive procedures and recurrence.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Comparing Artificial Intelligence and Traditional Regression Models in Lung Cancer Risk Prediction Using A Systematic Review and Meta-Analysis.

Leonard S, Patel MA, Zhou Z, Le H, Mondal P, Adams SJ

•papers•Jun 1 2025

Accurately identifying individuals who are at high risk of lung cancer is critical to optimize lung cancer screening with low-dose CT (LDCT). We sought to compare the performance of traditional regression models and artificial intelligence (AI)-based models in predicting future lung cancer risk. A systematic review and meta-analysis were conducted with reporting according to Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. We searched MEDLINE, Embase, Scopus, and the Cumulative Index to Nursing and Allied Health Literature databases for studies reporting the performance of AI or traditional regression models for predicting lung cancer risk. Two researchers screened articles, and a third researcher resolved conflicts. Model characteristics and predictive performance metrics were extracted. The quality of studies was assessed using the Prediction model Risk of Bias Assessment Tool. A meta-analysis assessed the discrimination performance of models, based on area under the receiver operating characteristic curve (AUC). One hundred forty studies met inclusion criteria and included 185 traditional and 64 AI-based models. Of these, 16 AI models and 65 traditional models have been externally validated. The pooled AUC of external validations of AI models was 0.82 (95% confidence interval [CI], 0.80-0.85), and the pooled AUC for traditional regression models was 0.73 (95% CI, 0.72-0.74). In a subgroup analysis, AI models that included LDCT had a pooled AUC of 0.85 (95% CI, 0.82-0.88). Overall risk of bias was high for both AI and traditional models. AI-based models, particularly those using imaging data, show promise for improving lung cancer risk prediction over traditional regression models. Future research should focus on prospective validation of AI models and direct comparisons with traditional methods in diverse populations.

CT Classification Chest Meta Analysis Post Market Academic Lab Benchmark SOTA

Children Are Not Small Adults: Addressing Limited Generalizability of an Adult Deep Learning CT Organ Segmentation Model to the Pediatric Population.

Deep Conformal Supervision: Leveraging Intermediate Features for Robust Uncertainty Quantification.

Using Machine Learning on MRI Radiomics to Diagnose Parotid Tumours Before Comparing Performance with Radiologists: A Pilot Study.

Prediction of Malignancy and Pathological Types of Solid Lung Nodules on CT Scans Using a Volumetric SWIN Transformer.

Foundational Segmentation Models and Clinical Data Mining Enable Accurate Computer Vision for Lung Cancer.

Deep learning-enhanced zero echo time MRI for glenohumeral assessment in shoulder instability: a comparative study with CT.

The role of deep learning in diagnostic imaging of spondyloarthropathies: a systematic review.

Age-dependent changes in CT vertebral attenuation values in opportunistic screening for osteoporosis: a nationwide multi-center study.

Radiomics-driven spectral profiling of six kidney stone types with monoenergetic CT reconstructions in photon-counting CT.

Comparing Artificial Intelligence and Traditional Regression Models in Lung Cancer Risk Prediction Using A Systematic Review and Meta-Analysis.

Ready to Sharpen Your Edge?