Latest Papers on Radiology AI. Tags: Classification

A Large Language Model to Detect Negated Expressions in Radiology Reports.

Su Y, Babore YB, Kahn CE

•papers•Jun 1 2025

Natural language processing (NLP) is crucial to extract information accurately from unstructured text to provide insights for clinical decision-making, quality improvement, and medical research. This study compared the performance of a rule-based NLP system and a medical-domain transformer-based model to detect negated concepts in radiology reports. Using a corpus of 984 de-identified radiology reports from a large U.S.-based academic health system (1000 consecutive reports, excluding 16 duplicates), the investigators compared the rule-based medspaCy system and the Clinical Assertion and Negation Classification Bidirectional Encoder Representations from Transformers (CAN-BERT) system to detect negated expressions of terms from RadLex, the Unified Medical Language System Metathesaurus, and the Radiology Gamuts Ontology. Power analysis determined a sample size of 382 terms to achieve α = 0.05 and β = 0.8 for McNemar's test; based on an estimate of 15% negated terms, 2800 randomly selected terms were annotated manually as negated or not negated. Precision, recall, and F1 of the two models were compared using McNemar's test. Of the 2800 terms, 387 (13.8%) were negated. For negation detection, medspaCy attained a recall of 0.795, precision of 0.356, and F1 of 0.492. CAN-BERT achieved a recall of 0.785, precision of 0.768, and F1 of 0.777. Although recall was not significantly different, CAN-BERT had significantly better precision (χ2 = 304.64; p < 0.001). The transformer-based CAN-BERT model detected negated terms in radiology reports with high precision and recall; its precision significantly exceeded that of the rule-based medspaCy system. Use of this system will improve data extraction from textual reports to support information retrieval, AI model training, and discovery of causal relationships.

Mixed Modality Classification Retrospective Clinical In Silico Academic Lab GenAI

MR Image Fusion-Based Parotid Gland Tumor Detection.

Sunnetci KM, Kaba E, Celiker FB, Alkan A

•papers•Jun 1 2025

The differentiation of benign and malignant parotid gland tumors is of major significance as it directly affects the treatment process. In addition, it is also a vital task in terms of early and accurate diagnosis of parotid gland tumors and the determination of treatment planning accordingly. As in other diseases, the differentiation of tumor types involves several challenging, time-consuming, and laborious processes. In the study, Magnetic Resonance (MR) images of 114 patients with parotid gland tumors are used for training and testing purposes by Image Fusion (IF). After the Apparent Diffusion Coefficient (ADC), Contrast-enhanced T1-w (T1C-w), and T2-w sequences are cropped, IF (ADC, T1C-w), IF (ADC, T2-w), IF (T1C-w, T2-w), and IF (ADC, T1C-w, T2-w) datasets are obtained for different combinations of these sequences using a two-dimensional Discrete Wavelet Transform (DWT)-based fusion technique. For each of these four datasets, ResNet18, GoogLeNet, and DenseNet-201 architectures are trained separately, and thus, 12 models are obtained in the study. A Graphical User Interface (GUI) application that contains the most successful of these trained architectures for each data is also designed to support the users. The designed GUI application not only allows the fusing of different sequence images but also predicts whether the label of the fused image is benign or malignant. The results show that the DenseNet-201 models for IF (ADC, T1C-w), IF (ADC, T2-w), and IF (ADC, T1C-w, T2-w) are better than the others, with accuracies of 95.45%, 95.96%, and 92.93%, respectively. It is also noted in the study that the most successful model for IF (T1C-w, T2-w) is ResNet18, and its accuracy is equal to 94.95%.

MRI Classification Retrospective Clinical In Silico Academic Lab

Deep Learning Approaches for Brain Tumor Detection and Classification Using MRI Images (2020 to 2024): A Systematic Review.

Bouhafra S, El Bahi H

•papers•Jun 1 2025

Brain tumor is a type of disease caused by uncontrolled cell proliferation in the brain leading to serious health issues such as memory loss and motor impairment. Therefore, early diagnosis of brain tumors plays a crucial role to extend the survival of patients. However, given the busy nature of the work of radiologists and aiming to reduce the likelihood of false diagnoses, advancing technologies including computer-aided diagnosis and artificial intelligence have shown an important role in assisting radiologists. In recent years, a number of deep learning-based methods have been applied for brain tumor detection and classification using MRI images and achieved promising results. The main objective of this paper is to present a detailed review of the previous researches in this field. In addition, This work summarizes the existing limitations and significant highlights. The study systematically reviews 60 articles researches published between 2020 and January 2024, extensively covering methods such as transfer learning, autoencoders, transformers, and attention mechanisms. The key findings formulated in this paper provide an analytic comparison and future directions. The review aims to provide a comprehensive understanding of automatic techniques that may be useful for professionals and academic communities working on brain tumor classification and detection.

MRI Classification Neurological Review Post Market Academic Lab Benchmark SOTA

Automated Neural Architecture Search for Cardiac Amyloidosis Classification from [18F]-Florbetaben PET Images.

Bargagna F, Zigrino D, De Santi LA, Genovesi D, Scipioni M, Favilli B, Vergaro G, Emdin M, Giorgetti A, Positano V, Santarelli MF

•papers•Jun 1 2025

Medical image classification using convolutional neural networks (CNNs) is promising but often requires extensive manual tuning for optimal model definition. Neural architecture search (NAS) automates this process, reducing human intervention significantly. This study applies NAS to [18F]-Florbetaben PET cardiac images for classifying cardiac amyloidosis (CA) sub-types (amyloid light chain (AL) and transthyretin amyloid (ATTR)) and controls. Following data preprocessing and augmentation, an evolutionary cell-based NAS approach with a fixed network macro-structure is employed, automatically deriving cells' micro-structure. The algorithm is executed five times, evaluating 100 mutating architectures per run on an augmented dataset of 4048 images (originally 597), totaling 5000 architectures evaluated. The best network (NAS-Net) achieves 76.95% overall accuracy. K-fold analysis yields mean ± SD percentages of sensitivity, specificity, and accuracy on the test dataset: AL subjects (98.7 ± 2.9, 99.3 ± 1.1, 99.7 ± 0.7), ATTR-CA subjects (93.3 ± 7.8, 78.0 ± 2.9, 70.9 ± 3.7), and controls (35.8 ± 14.6, 77.1 ± 2.0, 96.7 ± 4.4). NAS-derived network performance rivals manually determined networks in the literature while using fewer parameters, validating its automatic approach's efficacy.

PET Classification Cardiac Methodology In Silico Academic Lab Benchmark SOTA

Leveraging Ensemble Models and Follow-up Data for Accurate Prediction of mRS Scores from Radiomic Features of DSC-PWI Images.

Yassin MM, Zaman A, Lu J, Yang H, Cao A, Hassan H, Han T, Miao X, Shi Y, Guo Y, Luo Y, Kang Y

•papers•Jun 1 2025

Predicting long-term clinical outcomes based on the early DSC PWI MRI scan is valuable for prognostication, resource management, clinical trials, and patient expectations. Current methods require subjective decisions about which imaging features to assess and may require time-consuming postprocessing. This study's goal was to predict multilabel 90-day modified Rankin Scale (mRS) score in acute ischemic stroke patients by combining ensemble models and different configurations of radiomic features generated from Dynamic susceptibility contrast perfusion-weighted imaging. In Follow-up studies, a total of 70 acute ischemic stroke (AIS) patients underwent magnetic resonance imaging within 24 hours poststroke and had a follow-up scan. In the single study, 150 DSC PWI Image scans for AIS patients. The DRF are extracted from DSC-PWI Scans. Then Lasso algorithm is applied for feature selection, then new features are generated from initial and follow-up scans. Then we applied different ensemble models to classify between three classes normal outcome (0, 1 mRS score), moderate outcome (2,3,4 mRS score), and severe outcome (5,6 mRS score). ANOVA and post-hoc Tukey HSD tests confirmed significant differences in model style performance across various studies and classification techniques. Stacking models consistently on average outperformed others, achieving an Accuracy of 0.68 ± 0.15, Precision of 0.68 ± 0.17, Recall of 0.65 ± 0.14, and F1 score of 0.63 ± 0.15 in the follow-up time study. Techniques like Bo_Smote showed significantly higher recall and F1 scores, highlighting their robustness and effectiveness in handling imbalanced data. Ensemble models, particularly Bagging and Stacking, demonstrated superior performance, achieving nearly 0.93 in Accuracy, 0.95 in Precision, 0.94 in Recall, and 0.94 in F1 metrics in follow-up conditions, significantly outperforming single models. Ensemble models based on radiomics generated from combining Initial and follow-up scans can be used to predict multilabel 90-day stroke outcomes with reduced subjectivity and user burden.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab

A Robust [18F]-PSMA-1007 Radiomics Ensemble Model for Prostate Cancer Risk Stratification.

Pasini G, Stefano A, Mantarro C, Richiusa S, Comelli A, Russo GI, Sabini MG, Cosentino S, Ippolito M, Russo G

•papers•Jun 1 2025

The aim of this study is to investigate the role of [18F]-PSMA-1007 PET in differentiating high- and low-risk prostate cancer (PCa) through a robust radiomics ensemble model. This retrospective study included 143 PCa patients who underwent [18F]-PSMA-1007 PET/CT imaging. PCa areas were manually contoured on PET images and 1781 image biomarker standardization initiative (IBSI)-compliant radiomics features were extracted. A 30 times iterated preliminary analysis pipeline, comprising of the least absolute shrinkage and selection operator (LASSO) for feature selection and fivefold cross-validation for model optimization, was adopted to identify the most robust features to dataset variations, select candidate models for ensemble modelling, and optimize hyperparameters. Thirteen subsets of selected features, 11 generated from the preliminary analysis plus two additional subsets, the first based on the combination of robust and fine-tuning features, and the second only on fine-tuning features were used to train the model ensemble. Accuracy, area under curve (AUC), sensitivity, specificity, precision, and f-score values were calculated to provide models' performance. Friedman test, followed by post hoc tests corrected with Dunn-Sidak correction for multiple comparisons, was used to verify if statistically significant differences were found in the different ensemble models over the 30 iterations. The model ensemble trained with the combination of robust and fine-tuning features obtained the highest average accuracy (79.52%), AUC (85.75%), specificity (84.29%), precision (82.85%), and f-score (78.26%). Statistically significant differences (p < 0.05) were found for some performance metrics. These findings support the role of [18F]-PSMA-1007 PET radiomics in improving risk stratification for PCa, by reducing dependence on biopsies.

PET Classification Abdominal Retrospective Clinical In Silico Academic Lab

Deep Conformal Supervision: Leveraging Intermediate Features for Robust Uncertainty Quantification.

Vahdani AM, Faghani S

•papers•Jun 1 2025

Trustworthiness is crucial for artificial intelligence (AI) models in clinical settings, and a fundamental aspect of trustworthy AI is uncertainty quantification (UQ). Conformal prediction as a robust uncertainty quantification (UQ) framework has been receiving increasing attention as a valuable tool in improving model trustworthiness. An area of active research is the method of non-conformity score calculation for conformal prediction. We propose deep conformal supervision (DCS), which leverages the intermediate outputs of deep supervision for non-conformity score calculation, via weighted averaging based on the inverse of mean calibration error for each stage. We benchmarked our method on two publicly available datasets focused on medical image classification: a pneumonia chest radiography dataset and a preprocessed version of the 2019 RSNA Intracranial Hemorrhage dataset. Our method achieved mean coverage errors of 16e-4 (CI: 1e-4, 41e-4) and 5e-4 (CI: 1e-4, 10e-4) compared to baseline mean coverage errors of 28e-4 (CI: 2e-4, 64e-4) and 21e-4 (CI: 8e-4, 3e-4) on the two datasets, respectively (p < 0.001 on both datasets). Based on our findings, the baseline results of conformal prediction already exhibit small coverage errors. However, our method shows a significant improvement on coverage error, particularly noticeable in scenarios involving smaller datasets or when considering smaller acceptable error levels, which are crucial in developing UQ frameworks for healthcare AI applications.

Mixed Modality Classification Methodology In Silico Academic Lab Benchmark SOTA

Using Machine Learning on MRI Radiomics to Diagnose Parotid Tumours Before Comparing Performance with Radiologists: A Pilot Study.

Ammari S, Quillent A, Elvira V, Bidault F, Garcia GCTE, Hartl DM, Balleyguier C, Lassau N, Chouzenoux É

•papers•Jun 1 2025

The parotid glands are the largest of the major salivary glands. They can harbour both benign and malignant tumours. Preoperative work-up relies on MR images and fine needle aspiration biopsy, but these diagnostic tools have low sensitivity and specificity, often leading to surgery for diagnostic purposes. The aim of this paper is (1) to develop a machine learning algorithm based on MR images characteristics to automatically classify parotid gland tumours and (2) compare its results with the diagnoses of junior and senior radiologists in order to evaluate its utility in routine practice. While automatic algorithms applied to parotid tumours classification have been developed in the past, we believe that our study is one of the first to leverage four different MRI sequences and propose a comparison with clinicians. In this study, we leverage data coming from a cohort of 134 patients treated for benign or malignant parotid tumours. Using radiomics extracted from the MR images of the gland, we train a random forest and a logistic regression to predict the corresponding histopathological subtypes. On the test set, the best results are given by the random forest: we obtain a 0.720 accuracy, a 0.860 specificity, and a 0.720 sensitivity over all histopathological subtypes, with an average AUC of 0.838. When considering the discrimination between benign and malignant tumours, the algorithm results in a 0.760 accuracy and a 0.769 AUC, both on test set. Moreover, the clinical experiment shows that our model helps to improve diagnostic abilities of junior radiologists as their sensitivity and accuracy raised by 6 % when using our proposed method. This algorithm may be useful for training of physicians. Radiomics with a machine learning algorithm may help improve discrimination between benign and malignant parotid tumours, decreasing the need for diagnostic surgery. Further studies are warranted to validate our algorithm for routine use.

MRI Classification Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Prediction of Malignancy and Pathological Types of Solid Lung Nodules on CT Scans Using a Volumetric SWIN Transformer.

Chen H, Wen Y, Wu W, Zhang Y, Pan X, Guan Y, Qin D

•papers•Jun 1 2025

Lung adenocarcinoma and squamous cell carcinoma are the two most common pathological lung cancer subtypes. Accurate diagnosis and pathological subtyping are crucial for lung cancer treatment. Solitary solid lung nodules with lobulation and spiculation signs are often indicative of lung cancer; however, in some cases, postoperative pathology finds benign solid lung nodules. It is critical to accurately identify solid lung nodules with lobulation and spiculation signs before surgery; however, traditional diagnostic imaging is prone to misdiagnosis, and studies on artificial intelligence-assisted diagnosis are few. Therefore, we introduce a volumetric SWIN Transformer-based method. It is a multi-scale, multi-task, and highly interpretable model for distinguishing between benign solid lung nodules with lobulation and spiculation signs, lung adenocarcinomas, and lung squamous cell carcinoma. The technique's effectiveness was improved by using 3-dimensional (3D) computed tomography (CT) images instead of conventional 2-dimensional (2D) images to combine as much information as possible. The model was trained using 352 of the 441 CT image sequences and validated using the rest. The experimental results showed that our model could accurately differentiate between benign lung nodules with lobulation and spiculation signs, lung adenocarcinoma, and squamous cell carcinoma. On the test set, our model achieves an accuracy of 0.9888, precision of 0.9892, recall of 0.9888, and an F1-score of 0.9888, along with a class activation mapping (CAM) visualization of the 3D model. Consequently, our method could be used as a preoperative tool to assist in diagnosing solitary solid lung nodules with lobulation and spiculation signs accurately and provide a theoretical basis for developing appropriate clinical diagnosis and treatment plans for the patients.

CT Classification Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

A Machine Learning Model Based on Global Mammographic Radiomic Features Can Predict Which Normal Mammographic Cases Radiology Trainees Find Most Difficult.

Siviengphanom S, Brennan PC, Lewis SJ, Trieu PD, Gandomkar Z

•papers•Jun 1 2025

This study aims to investigate whether global mammographic radiomic features (GMRFs) can distinguish hardest- from easiest-to-interpret normal cases for radiology trainees (RTs). Data from 137 RTs were analysed, with each interpreting seven educational self-assessment test sets comprising 60 cases (40 normal and 20 cancer). The study only examined normal cases. Difficulty scores were computed based on the percentage of readers who incorrectly classified each case, leading to their classification as hardest- or easiest-to-interpret based on whether their difficulty scores fell within and above the 75th or within and below the 25th percentile, respectively (resulted in 140 cases in total used). Fifty-nine low-density and 81 high-density cases were identified. Thirty-four GMRFs were extracted for each case. A random forest machine learning model was trained to differentiate between hardest- and easiest-to-interpret normal cases and validated using leave-one-out-cross-validation approach. The model's performance was evaluated using the area under receiver operating characteristic curve (AUC). Significant features were identified through feature importance analysis. Difference between hardest- and easiest-to-interpret cases among 34 GMRFs and in difficulty level between low- and high-density cases was tested using Kruskal-Wallis. The model achieved AUC = 0.75 with cluster prominence and range emerging as the most useful features. Fifteen GMRFs differed significantly (p < 0.05) between hardest- and easiest-to-interpret cases. Difficulty level among low- vs high-density cases did not differ significantly (p = 0.12). GMRFs can predict hardest-to-interpret normal cases for RTs, underscoring the importance of GMRFs in identifying the most difficult normal cases for RTs and facilitating customised training programmes tailored to trainees' learning needs.

Mammography Classification Breast Retrospective Clinical In Silico Academic Lab

Filter Papers

Tags

A Large Language Model to Detect Negated Expressions in Radiology Reports.

MR Image Fusion-Based Parotid Gland Tumor Detection.

Deep Learning Approaches for Brain Tumor Detection and Classification Using MRI Images (2020 to 2024): A Systematic Review.

Automated Neural Architecture Search for Cardiac Amyloidosis Classification from [18F]-Florbetaben PET Images.

Leveraging Ensemble Models and Follow-up Data for Accurate Prediction of mRS Scores from Radiomic Features of DSC-PWI Images.

A Robust [<sup>18</sup>F]-PSMA-1007 Radiomics Ensemble Model for Prostate Cancer Risk Stratification.

Deep Conformal Supervision: Leveraging Intermediate Features for Robust Uncertainty Quantification.

Using Machine Learning on MRI Radiomics to Diagnose Parotid Tumours Before Comparing Performance with Radiologists: A Pilot Study.

Prediction of Malignancy and Pathological Types of Solid Lung Nodules on CT Scans Using a Volumetric SWIN Transformer.

A Machine Learning Model Based on Global Mammographic Radiomic Features Can Predict Which Normal Mammographic Cases Radiology Trainees Find Most Difficult.

Ready to Sharpen Your Edge?