Latest Papers on Radiology AI. Tags: CT

Comparing large language models and text embedding models for automated classification of textual, semantic, and critical changes in radiology reports.

Lindholz M, Burdenski A, Ruppel R, Schulze-Weddige S, Baumgärtner GL, Schobert I, Haack AM, Eminovic S, Milnik A, Hamm CA, Frisch A, Penzkofer T

•papers•Jul 14 2025

Radiology reports can change during workflows, especially when residents draft preliminary versions that attending physicians finalize. We explored how large language models (LLMs) and embedding techniques can categorize these changes into textual, semantic, or clinically actionable types. We evaluated 400 adult CT reports drafted by residents against finalized versions by attending physicians. Changes were rated on a five-point scale from no changes to critical ones. We examined open-source LLMs alongside traditional metrics like normalized word differences, Levenshtein and Jaccard similarity, and text embedding similarity. Model performance was assessed using quadratic weighted Cohen's kappa (κ), (balanced) accuracy, F<sub>1</sub>, precision, and recall. Inter-rater reliability among evaluators was excellent (κ = 0.990). Of the reports analyzed, 1.3 % contained critical changes. The tested methods showed significant performance differences (P < 0.001). The Qwen3-235B-A22B model using a zero-shot prompt, most closely aligned with human assessments of changes in clinical reports, achieving a κ of 0.822 (SD 0.031). The best conventional metric, word difference, had a κ of 0.732 (SD 0.048), the difference between the two showed statistical significance in unadjusted post-hoc tests (P = 0.038) but lost significance after adjusting for multiple testing (P = 0.064). Embedding models underperformed compared to LLMs and classical methods, showing statistical significance in most cases. Large language models like Qwen3-235B-A22B demonstrated moderate to strong alignment with expert evaluations of the clinical significance of changes in radiology reports. LLMs outperformed embedding methods and traditional string and word approaches, achieving statistical significance in most instances. This demonstrates their potential as tools to support peer review.

CT Classification Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Classification of Renal Lesions by Leveraging Hybrid Features from CT Images Using Machine Learning Techniques.

Kaur R, Khattar S, Singla S

•papers•Jul 14 2025

Renal cancer is amid the several reasons of increasing mortality rates globally, which can be reduced by early detection and diagnosis. The classification of lesions is based mostly on their characteristics, which include varied shape and texture properties. Computed tomography (CT) imaging is a regularly used imaging modality for study of the renal soft tissues. Furthermore, a radiologist's ability to assess a corpus of CT images is limited, which can lead to misdiagnosis of kidney lesions, which might lead to cancer progression or unnecessary chemotherapy. To address these challenges, this study presents a machine learning technique based on a novel feature vector for the automated classification of renal lesions using a multi-model texture-based feature extraction. The proposed feature vector could serve as an integral component in improving the accuracy of a computer aided diagnosis (CAD) system for identifying the texture of renal lesion and can assist physicians in order to provide more precise lesion interpretation. In this work, the authors employed different texture models for the analysis of CT scans, in order to classify benign and malignant kidney lesions. Texture analysis is performed using features such as first-order statistics (FoS), spatial gray level co-occurrence matrix (SGLCM), Fourier power spectrum (FPS), statistical feature matrix (SFM), Law's texture energy measures (TEM), gray level difference statistics (GLDS), fractal, and neighborhood gray tone difference matrix (NGTDM). Multiple texture models were utilized to quantify the renal texture patterns, which used image texture analysis on a selected region of interest (ROI) from the renal lesions. In addition, dimensionality reduction is employed to discover the most discriminative features for categorization of benign and malignant lesions, and a unique feature vector based on correlation-based feature selection, information gain, and gain ratio is proposed. Different machine learning-based classifiers were employed to test the performance of the proposed features, out of which the random forest (RF) model outperforms all other techniques to distinguish benign from malignant tumors in terms of distinct performance evaluation metrics. The final feature set is evaluated using various machine learning classifiers, with the RF model achieving the highest performance. The proposed system is validated on a dataset of 50 subjects, achieving a classification accuracy of 95.8%, outperforming other conventional models.

CT Classification Abdominal Retrospective Clinical In Silico

A radiomics-clinical predictive model for difficult laparoscopic cholecystectomy based on preoperative CT imaging: a retrospective single center study.

Sun RT, Li CL, Jiang YM, Hao AY, Liu K, Li K, Tan B, Yang XN, Cui JF, Bai WY, Hu WY, Cao JY, Qu C

•papers•Jul 14 2025

Accurately identifying difficult laparoscopic cholecystectomy (DLC) preoperatively remains a clinical challenge. Previous studies utilizing clinical variables or morphological imaging markers have demonstrated suboptimal predictive performance. This study aims to develop an optimal radiomics-clinical model by integrating preoperative CT-based radiomics features with clinical characteristics. A retrospective analysis was conducted on 2,055 patients who underwent laparoscopic cholecystectomy (LC) for cholecystitis at our center. Preoperative CT images were processed with super-resolution reconstruction to improve consistency, and high-throughput radiomic features were extracted from the gallbladder wall region. A combination of radiomic and clinical features was selected using the Boruta-LASSO algorithm. Predictive models were constructed using six machine learning algorithms and validated, with model performance evaluated based on the AUC, accuracy, Brier score, and DCA to identify the optimal model. Model interpretability was further enhanced using the SHAP method. The Boruta-LASSO algorithm identified 10 key radiomic and clinical features for model construction, including the Rad-Score, gallbladder wall thickness, fibrinogen, C-reactive protein, and low-density lipoprotein cholesterol. Among the six machine learning models developed, the radiomics-clinical model based on the random forest algorithm demonstrated the best predictive performance, with an AUC of 0.938 in the training cohort and 0.874 in the validation cohort. The Brier score, calibration curve, and DCA confirmed the superior predictive capability of this model, significantly outperforming previously published models. The SHAP analysis further visualized the importance of features, enhancing model interpretability. This study developed the first radiomics-clinical random forest model for the preoperative prediction of DLC by machine learning algorithms. This predictive model supports safer and individualized surgical planning and treatment strategies.

CT Classification Abdominal Retrospective Clinical In Silico

Deep Learning-Based Prediction for Bone Cement Leakage During Percutaneous Kyphoplasty Using Preoperative Computed Tomography: MODEL Development and Validation.

Chen R, Wang T, Liu X, Xi Y, Liu D, Xie T, Wang A, Fan N, Yuan S, Du P, Jiao S, Zhang Y, Zang L

•papers•Jul 14 2025

Retrospective study. To develop a deep learning (DL) model to predict bone cement leakage (BCL) subtypes during percutaneous kyphoplasty (PKP) using preoperative computed tomography (CT) as well as employing multicenter data to evaluate the effectiveness and generalizability of the model. DL excels at automatically extracting features from medical images. However, there is a lack of models that can predict BCL subtypes based on preoperative images. This study included an internal dataset for DL model training, validation, and testing as well as an external dataset for additional model testing. Our model integrated a segment localization module based on vertebral segmentation via three-dimensional (3D) U-Net with a classification module based on 3D ResNet-50. Vertebral level mismatch rates were calculated, and confusion matrixes were used to compare the performance of the DL model with that of spine surgeons in predicting BCL subtypes. Furthermore, the simple Cohen's kappa coefficient was used to assess the reliability of spine surgeons and the DL model against the reference standard. A total of 901 patients containing 997 eligible segments were included in the internal dataset. The model demonstrated a vertebral segment identification accuracy of 96.9%. It also showed high area under the curve (AUC) values of 0.734-0.831 and sensitivities of 0.649-0.900 for BCL prediction in the internal dataset. Similar favorable AUC values of 0.709-0.818 and sensitivities of 0.706-0.857 were observed in the external dataset, indicating the stability and generalizability of the model. Moreover, the model outperformed nonexpert spine surgeons in predicting BCL subtypes, except for type II. The model achieved satisfactory accuracy, reliability, generalizability, and interpretability in predicting BCL subtypes, outperforming nonexpert spine surgeons. This study offers valuable insights for assessing osteoporotic vertebral compression fractures, thereby aiding preoperative surgical decision-making. 3.

CT Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab

An improved U-NET3+ with transformer and adaptive attention map for lung segmentation.

Joseph Raj V, Christopher P

•papers•Jul 13 2025

Accurate segmentation of lung regions from CT scan images is critical for diagnosing and monitoring respiratory diseases. This study introduces a novel hybrid architecture Adaptive Attention U-NetAA, which combines the strengths of U-Net3 + and Transformer based attention mechanisms models for high-precision lung segmentation. The U-Net3 + module effectively segments the lung region by leveraging its deep convolutional network with nested skip connections, ensuring rich multi-scale feature extraction. A key innovation is introducing an adaptive attention mechanism within the Transformer module, which dynamically adjusts the focus on critical regions in the image based on local and global contextual relationships. This model's adaptive attention mechanism addresses variations in lung morphology, image artifacts, and low-contrast regions, leading to improved segmentation accuracy. The combined convolutional and attention-based architecture enhances robustness and precision. Experimental results on benchmark CT datasets demonstrate that the proposed model achieves an IoU of 0.984, a Dice coefficient of 0.989, a MIoU of 0.972, and an HD95 of 1.22 mm, surpassing state-of-the-art methods. These results establish U-NetAA as a superior tool for clinical lung segmentation, with enhanced accuracy, sensitivity, and generalization capability.

CT Segmentation Chest Methodology In Silico Benchmark SOTA

Brain Stroke Detection and Classification Using CT Imaging with Transformer Models and Explainable AI

Shomukh Qari, Maha A. Thafar

•preprint•Jul 13 2025

Stroke is one of the leading causes of death globally, making early and accurate diagnosis essential for improving patient outcomes, particularly in emergency settings where timely intervention is critical. CT scans are the key imaging modality because of their speed, accessibility, and cost-effectiveness. This study proposed an artificial intelligence framework for multiclass stroke classification (ischemic, hemorrhagic, and no stroke) using CT scan images from a dataset provided by the Republic of Turkey's Ministry of Health. The proposed method adopted MaxViT, a state-of-the-art Vision Transformer, as the primary deep learning model for image-based stroke classification, with additional transformer variants (vision transformer, transformer-in-transformer, and ConvNext). To enhance model generalization and address class imbalance, we applied data augmentation techniques, including synthetic image generation. The MaxViT model trained with augmentation achieved the best performance, reaching an accuracy and F1-score of 98.00%, outperforming all other evaluated models and the baseline methods. The primary goal of this study was to distinguish between stroke types with high accuracy while addressing crucial issues of transparency and trust in artificial intelligence models. To achieve this, Explainable Artificial Intelligence (XAI) was integrated into the framework, particularly Grad-CAM++. It provides visual explanations of the model's decisions by highlighting relevant stroke regions in the CT scans and establishing an accurate, interpretable, and clinically applicable solution for early stroke detection. This research contributed to the development of a trustworthy AI-assisted diagnostic tool for stroke, facilitating its integration into clinical practice and enhancing access to timely and optimal stroke diagnosis in emergency departments, thereby saving more lives.

CT Classification Neurological Methodology In Silico Academic Lab GenAI

Seeing is Believing-On the Utility of CT in Phenotyping COPD.

Awan HA, Chaudhary MFA, Reinhardt JM

•papers•Jul 12 2025

Chronic obstructive pulmonary disease (COPD) is a heterogeneous condition with complicated structural and functional impairments. For decades now, chest computed tomography (CT) has been used to quantify various abnormalities related to COPD. More recently, with the newer data-driven approaches, biomarker development and validation have evolved rapidly. Studies now target multiple anatomical structures including lung parenchyma, the airways, the vasculature, and the fissures to better characterize COPD. This review explores the evolution of chest CT biomarkers in COPD, beginning with traditional thresholding approaches that quantify emphysema and airway dimensions. We then highlight some of the texture analysis efforts that have been made over the years for subtyping lung tissue. We also discuss image registration-based biomarkers that have enabled spatially-aware mechanisms for understanding local abnormalities within the lungs. More recently, deep learning has enabled automated biomarker extraction, offering improved precision in phenotype characterization and outcome prediction. We highlight the most recent of these approaches as well. Despite these advancements, several challenges remain in terms of dataset heterogeneity, model generalizability, and clinical interpretability. This review lastly provides a structured overview of these limitations and highlights future potential of CT biomarkers in personalized COPD management.

CT Classification Chest Review In Silico GenAI

Establishing an AI-based diagnostic framework for pulmonary nodules in computed tomography.

Jia R, Liu B, Ali M

•papers•Jul 12 2025

Pulmonary nodules seen by computed tomography (CT) can be benign or malignant, and early detection is important for optimal management. The existing manual methods of identifying nodules have limitations, such as being time-consuming and erroneous. This study aims to develop an Artificial Intelligence (AI) diagnostic scheme that improves the performance of identifying and categorizing pulmonary nodules using CT scans. The proposed deep learning framework used convolutional neural networks, and the image database totaled 1,056 3D-DICOM CT images. The framework was initially preprocessing, including lung segmentation, nodule detection, and classification. Nodule detection was done using the Retina-UNet model, while the features were classified using a Support Vector Machine (SVM). Performance measures, including accreditation, sensitivity, specificity, and the AUROC, were used to evaluate the model's performance during training and validation. Overall, the developed AI model received an AUROC of 0.9058. The diagnostic accuracy was 90.58%, with an overall positive predictive value of 89% and an overall negative predictive value of 86%. The algorithm effectively handled the CT images at the preprocessing stage, and the deep learning model performed well in detecting and classifying nodules. The application of the new diagnostic framework based on AI algorithms increased the accuracy of the diagnosis compared with the traditional approach. It also provides high reliability for detecting pulmonary nodules and classifying the lesions, thus minimizing intra-observer differences and improving the clinical outcome. In perspective, the advancements may include increasing the size of the annotated data-set and fine-tuning the model due to detection issues of non-solitary nodules.

CT Detection Chest Methodology In Silico

Vision-language model for report generation and outcome prediction in CT pulmonary angiogram.

Zhong Z, Wang Y, Wu J, Hsu WC, Somasundaram V, Bi L, Kulkarni S, Ma Z, Collins S, Baird G, Ahn SH, Feng X, Kamel I, Lin CT, Greineder C, Atalay M, Jiao Z, Bai H

•papers•Jul 12 2025

Accurate and comprehensive interpretation of pulmonary embolism (PE) from Computed Tomography Pulmonary Angiography (CTPA) scans remains a clinical challenge due to the limited specificity and structure of existing AI tools. We propose an agent-based framework that integrates Vision-Language Models (VLMs) for detecting 32 PE-related abnormalities and Large Language Models (LLMs) for structured report generation. Trained on over 69,000 CTPA studies from 24,890 patients across Brown University Health (BUH), Johns Hopkins University (JHU), and the INSPECT dataset from Stanford, the model demonstrates strong performance in abnormality classification and report generation. For abnormality classification, it achieved AUROC scores of 0.788 (BUH), 0.754 (INSPECT), and 0.710 (JHU), with corresponding BERT-F1 scores of 0.891, 0.829, and 0.842. The abnormality-guided reporting strategy consistently outperformed the organ-based and holistic captioning baselines. For survival prediction, a multimodal fusion model that incorporates imaging, clinical variables, diagnostic outputs, and generated reports achieved concordance indices of 0.863 (BUH) and 0.731 (JHU), outperforming traditional PESI scores. This framework provides a clinically meaningful and interpretable solution for end-to-end PE diagnosis, structured reporting, and outcome prediction.

CT LLM Radiology Report Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Impact of heart rate on coronary artery stenosis grading accuracy using deep learning-based fast kV-switching CT: A phantom study.

Mikayama R, Kojima T, Shirasaka T, Yamane S, Funatsu R, Kato T, Yabuuchi H

•papers•Jul 11 2025

Deep learning-based fast kV-switching CT (DL-FKSCT) generates complete sinograms for fast kV-switching dual-energy CT (DECT) scans by using a trained neural network to restore missing views. Such restoration significantly enhances the image quality of coronary CT angiography (CCTA), and the allowable heart rate (HR) may vary between DECT and single-energy CT (SECT). This study aimed to examine HR's effect onCCTA using DL-FKSCT. We scanned stenotic coronary artery phantoms attached to a pulsating cardiac phantom with DECT and SECT modes on a DL-FKSCT scanner. The phantom unit was operated with simulated HRs ranging from 0 (static) to 50-70 beats per minute (bpm). The sharpness and stenosis ratio of the coronary model were quantitatively compared between DECT and SECT, stratified by simulated HR settings using the paired t-test (significance was set at p < 0.01 with a Bonferroni adjustment for multiple comparisons). Regarding image sharpness, DECT showed significant superiority over SECT. In terms of the stenosis ratio compared to a static image reference, 70 keV virtual monochromatic image in DECT exhibited errors exceeding 10 % at HRs surpassing 65 bpm (p < 0.01), whereas 120 kVp SECT registered errors below 10 % across all HR settings, with no significant differences observed. In DL-FKSCT, DECT exhibited a lower upper limit of HR than SECT. Therefore, HR control is important for DECT scans in DL-FKSCT.

CT Reconstruction Cardiac Methodology Phantom/Animal

Filter Papers

Tags

Comparing large language models and text embedding models for automated classification of textual, semantic, and critical changes in radiology reports.

Classification of Renal Lesions by Leveraging Hybrid Features from CT Images Using Machine Learning Techniques.

A radiomics-clinical predictive model for difficult laparoscopic cholecystectomy based on preoperative CT imaging: a retrospective single center study.

Deep Learning-Based Prediction for Bone Cement Leakage During Percutaneous Kyphoplasty Using Preoperative Computed Tomography: MODEL Development and Validation.

An improved U-NET3+ with transformer and adaptive attention map for lung segmentation.

Brain Stroke Detection and Classification Using CT Imaging with Transformer Models and Explainable AI

Seeing is Believing-On the Utility of CT in Phenotyping COPD.

Establishing an AI-based diagnostic framework for pulmonary nodules in computed tomography.

Vision-language model for report generation and outcome prediction in CT pulmonary angiogram.

Impact of heart rate on coronary artery stenosis grading accuracy using deep learning-based fast kV-switching CT: A phantom study.

Ready to Sharpen Your Edge?