Latest Papers on Radiology AI. Sources: pubmed, Order: Best Match, Limit: 10.

Advances and challenges in AI-assisted MRI for lumbar disc degeneration detection and classification.

Zhao P, Zhu S

•papers•Jul 25 2025

Intervertebral disc degeneration (IDD) is a major contributor to chronic low back pain. Magnetic resonance imaging (MRI) serves as the gold standard for IDD assessment, yet manual grading is often subjective and inconsistent. With advances in artificial intelligence (AI), particularly deep learning, automated detection and classification of IDD from MRI has become increasingly feasible. This narrative review aims to provide a comprehensive overview of AI applications-especially machine learning and deep learning techniques-for MRI-based detection and grading of lumbar disc degeneration, highlighting their clinical value, current limitations, and future directions. Relevant studies were reviewed and summarized based on thematic structure. The review covers classical methods (e.g., support vector machines), deep learning models (e.g., CNNs, SpineNet, ResNet, U-Net), and hybrid approaches incorporating transformers and multitask learning. Technical details, model architectures, performance metrics, and representative datasets were synthesized and discussed. AI systems have demonstrated promising performance in automatic IDD grading, in some cases matching or surpassing expert radiologists. CNN-based models showed high accuracy and reproducibility, while hybrid models further enhanced segmentation and classification tasks. However, challenges remain in generalizability, data imbalance, interpretability, and regulatory integration. Tools such as Grad-CAM and SHAP improve model transparency, while methods like few-shot learning and data augmentation can alleviate data limitations. AI-assisted analysis of MRI for lumbar disc degeneration offers significant potential to enhance diagnostic efficiency and consistency. While current models are encouraging, real-world clinical implementation requires further advancements in interpretability, data diversity, ethical standards, and large-scale validation.

MRI Classification Musculoskeletal Review In Silico

Exploring AI-Based System Design for Pixel-Level Protected Health Information Detection in Medical Images.

Truong T, Baltruschat IM, Klemens M, Werner G, Lenga M

•papers•Jul 25 2025

De-identification of medical images is a critical step to ensure privacy during data sharing in research and clinical settings. The initial step in this process involves detecting Protected Health Information (PHI), which can be found in image metadata or imprinted within image pixels. Despite the importance of such systems, there has been limited evaluation of existing AI-based solutions, creating barriers to the development of reliable and robust tools. In this study, we present an AI-based pipeline for PHI detection, comprising three key modules: text detection, text extraction, and text analysis. We benchmark three models-YOLOv11, EasyOCR, and GPT-4o- across different setups corresponding to these modules, evaluating their performance on two different datasets encompassing multiple imaging modalities and PHI categories. Our findings indicate that the optimal setup involves utilizing dedicated vision and language models for each module, which achieves a commendable balance in performance, latency, and cost associated with the usage of large language models (LLMs). Additionally, we show that the application of LLMs not only involves identifying PHI content but also enhances OCR tasks and facilitates an end-to-end PHI detection pipeline, showcasing promising outcomes through our analysis.

Mixed Modality Detection Methodology In Silico Academic Lab GenAI Benchmark SOTA

Automatic Prediction of TMJ Disc Displacement in CBCT Images Using Machine Learning.

Choi H, Jeon KJ, Lee C, Choi YJ, Jo GD, Han SS

•papers•Jul 25 2025

Magnetic resonance imaging (MRI) is the gold standard for diagnosing disc displacement in temporomandibular joint (TMJ) disorders, but its high cost and practical challenges limit its accessibility. This study aimed to develop a machine learning (ML) model that can predict TMJ disc displacement using only cone-beam computed tomography (CBCT)-based radiomics features without MRI. CBCT images of 247 mandibular condyles from 134 patients who also underwent MRI scans were analyzed. To conduct three experiments based on the classification of various patient groups, we trained two ML models, random forest (RF) and extreme gradient boosting (XGBoost). Experiment 1 classified the data into three groups: Normal, disc displacement with reduction (DDWR), and disc displacement without reduction (DDWOR). Experiment 2 classified Normal versus disc displacement group (DDWR and DDWOR), and Experiment 3 classified Normal and DDWR versus DDWOR group. The RF model showed higher performance than XGBoost across all three experiments, and in particular, Experiment 3, which differentiated DDWOR from other conditions, achieved the highest accuracy with an area under the receiver operating characteristic curve (AUC) values of 0.86 (RF) and 0.85 (XGBoost). Experiment 2 followed with AUC values of 0.76 (RF) and 0.75 (XGBoost), while Experiment 1, which classified all three groups, had the lowest accuracy of 0.63 (RF) and 0.59 (XGBoost). The RF model, utilizing radiomics features from CBCT images, demonstrated potential as an assistant tool for predicting DDWOR, which requires the most careful management.

CT Classification Retrospective Clinical In Silico

CT-free kidney single-photon emission computed tomography for glomerular filtration rate.

Kwon K, Oh D, Kim JH, Yoo J, Lee WW

•papers•Jul 25 2025

This study explores an artificial intelligence-based approach to perform CT-free quantitative SPECT for kidney imaging using Tc-99 m DTPA, aiming to estimate glomerular filtration rate (GFR) without relying on CT. A total of 1000 SPECT/CT scans were used to train and test a deep-learning model that segments kidneys automatically based on synthetic attenuation maps (µ-maps) derived from SPECT alone. The model employed a residual U-Net with edge attention and was optimized using windowing-maximum normalization and a generalized Dice similarity loss function. Performance evaluation showed strong agreement with manual CT-based segmentation, achieving a Dice score of 0.818 ± 0.056 and minimal volume differences of 17.9 ± 43.6 mL (mean ± standard deviation). An additional set of 50 scans confirmed that GFR calculated from the AI-based CT-free SPECT (109.3 ± 17.3 mL/min) was nearly identical to the conventional SPECT/CT method (109.2 ± 18.4 mL/min, p = 0.9396). This CT-free method reduced radiation exposure by up to 78.8% and shortened segmentation time from 40 min to under 1 min. The findings suggest that AI can effectively replace CT in kidney SPECT imaging, maintaining quantitative accuracy while improving safety and efficiency.

SPECT Segmentation Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

XVertNet: Unsupervised Contrast Enhancement of Vertebral Structures with Dynamic Self-Tuning Guidance and Multi-Stage Analysis.

Eidlin E, Hoogi A, Rozen H, Badarne M, Netanyahu NS

•papers•Jul 25 2025

Chest X-ray is one of the main diagnostic tools in emergency medicine, yet its limited ability to capture fine anatomical details can result in missed or delayed diagnoses. To address this, we introduce XVertNet, a novel deep-learning framework designed to enhance vertebral structure visualization in X-ray images significantly. Our framework introduces two key innovations: (1) an unsupervised learning architecture that eliminates reliance on manually labeled training data-a persistent bottleneck in medical imaging, and (2) a dynamic self-tuned internal guidance mechanism featuring an adaptive feedback loop for real-time image optimization. Extensive validation across four major public datasets revealed that XVertNet outperforms state-of-the-art enhancement methods, as demonstrated by improvements in evaluation measures such as entropy, the Tenengrad criterion, LPC-SI, TMQI, and PIQE. Furthermore, clinical validation conducted by two board-certified clinicians confirmed that the enhanced images enabled more sensitive examination of vertebral structural changes. The unsupervised nature of XVertNet facilitates immediate clinical deployment without requiring additional training overhead. This innovation represents a transformative advancement in emergency radiology, providing a scalable and time-efficient solution to enhance diagnostic accuracy in high-pressure clinical environments.

X-Ray Image Synthesis Chest Methodology In Silico Academic Lab Benchmark SOTA

3D-WDA-PMorph: Efficient 3D MRI/TRUS Prostate Registration using Transformer-CNN Network and Wavelet-3D-Depthwise-Attention.

Mahmoudi H, Ramadan H, Riffi J, Tairi H

•papers•Jul 25 2025

Multimodal image registration is crucial in medical imaging, particularly for aligning Magnetic Resonance Imaging (MRI) and Transrectal Ultrasound (TRUS) data, which are widely used in prostate cancer diagnosis and treatment planning. However, this task presents significant challenges due to the inherent differences between these imaging modalities, including variations in resolution, contrast, and noise. Recently, conventional Convolutional Neural Network (CNN)-based registration methods, while effective at extracting local features, often struggle to capture global contextual information and fail to adapt to complex deformations in multimodal data. Conversely, Transformer-based methods excel at capturing long-range dependencies and hierarchical features but face difficulties in integrating fine-grained local details, which are essential for accurate spatial alignment. To address these limitations, we propose a novel 3D image registration framework that combines the strengths of both paradigms. Our method employs a Swin Transformer (ST)-CNN encoder-decoder architecture, with a key innovation focusing on enhancing the skip connection stages. Specifically, we introduce an innovative module named Wavelet-3D-Depthwise-Attention (WDA). The WDA module leverages an attention mechanism that integrates wavelet transforms for multi-scale spatial-frequency representation and 3D-Depthwise convolution to improve computational efficiency and modality fusion. Experimental evaluations on clinical MRI/TRUS datasets confirm that the proposed method achieves a median Dice score of 0.94 and a target registration error of 0.85, indicating an improvement in registration accuracy and robustness over existing state-of-the-art (SOTA) methods. The WDA-enhanced skip connections significantly empower the registration network to preserve critical anatomical details, making our method a promising advancement in prostate multimodal registration. Furthermore, the proposed framework shows strong potential for generalization to other image registration tasks.

Mixed Modality Registration Abdominal Methodology In Silico

Privacy-Preserving Generation of Structured Lymphoma Progression Reports from Cross-sectional Imaging: A Comparative Analysis of Llama 3.3 and Llama 4.

Prucker P, Bressem KK, Kim SH, Weller D, Kader A, Dorfner FJ, Ziegelmayer S, Graf MM, Lemke T, Gassert F, Can E, Meddeb A, Truhn D, Hadamitzky M, Makowski MR, Adams LC, Busch F

•papers•Jul 25 2025

Efficient processing of radiology reports for monitoring disease progression is crucial in oncology. Although large language models (LLMs) show promise in extracting structured information from medical reports, privacy concerns limit their clinical implementation. This study evaluates the feasibility and accuracy of two of the most recent Llama models for generating structured lymphoma progression reports from cross-sectional imaging data in a privacy-preserving, real-world clinical setting. This single-center, retrospective study included adult lymphoma patients who underwent cross-sectional imaging and treatment between July 2023 and July 2024. We established a chain-of-thought prompting strategy to leverage the locally deployed Llama-3.3-70B-Instruct and Llama-4-Scout-17B-16E-Instruct models to generate lymphoma disease progression reports across three iterations. Two radiologists independently scored nodal and extranodal involvement, as well as Lugano staging and treatment response classifications. For each LLM and task, we calculated the F1 score, accuracy, recall, precision, and specificity per label, as well as the case-weighted average with 95% confidence intervals (CIs). Both LLMs correctly implemented the template structure for all 65 patients included in this study. Llama-4-Scout-17B-16E-Instruct demonstrated significantly greater accuracy in extracting nodal and extranodal involvement information (nodal: 0.99 [95% CI = 0.98-0.99] vs. 0.97 [95% CI = 0.95-0.96], p < 0.001; extranodal: 0.99 [95% CI = 0.99-1.00] vs. 0.99 [95% CI = 0.98-0.99], p = 0.013). This difference was more pronounced when predicting Lugano stage and treatment response (stage: 0.85 [95% CI = 0.79-0.89] vs. 0.60 [95% CI = 0.53-0.67], p < 0.001; treatment response: 0.88 [95% CI = 0.83-0.92] vs. 0.65 [95% CI = 0.58-0.71], p < 0.001). Neither model produced hallucinations of newly involved nodal or extranodal sites. The highest relative error rates were found when interpreting the level of disease after treatment. In conclusion, privacy-preserving LLMs can effectively extract clinical information from lymphoma imaging reports. While they excel at data extraction, they are limited in their ability to generate new clinical inferences from the extracted information. Our findings suggest their potential utility in streamlining documentation and highlight areas requiring optimization before clinical implementation.

Mixed Modality LLM Radiology Report Retrospective Clinical In Silico Academic Lab GenAI

Could a New Method of Acromiohumeral Distance Measurement Emerge? Artificial Intelligence vs. Physician.

Dede BT, Çakar İ, Oğuz M, Alyanak B, Bağcıer F

•papers•Jul 25 2025

The aim of this study was to evaluate the reliability of ChatGPT-4 measurement of acromiohumeral distance (AHD), a popular assessment in patients with shoulder pain. In this retrospective study, 71 registered shoulder magnetic resonance imaging (MRI) scans were included. AHD measurements were performed on a coronal oblique T1 sequence with a clear view of the acromion and humerus. Measurements were performed by an experienced radiologist twice at 3-day intervals and by ChatGPT-4 twice at 3-day intervals in different sessions. The first, second, and mean values of AHD measured by the physician were 7.6 ± 1.7, 7.5 ± 1.6, and 7.6 ± 1.7, respectively. The first, second, and mean values measured by ChatGPT-4 were 6.7 ± 0.8, 7.3 ± 1.1, and 7.1 ± 0.8, respectively. There was a significant difference between the physician and ChatGPT-4 between the first and mean measurements (p < 0.0001 and p = 0.009, respectively). However, there was no significant difference between the second measurements (p = 0.220). Intrarater reliability for the physician was excellent (ICC = 0.99); intrarater reliability for ChatGPT-4 was poor (ICC = 0.41). Interrater reliability was poor (ICC = 0.45). In conclusion, this study demonstrated that the reliability of ChatGPT-4 in AHD measurements is inferior to that of an experienced radiologist. This study may help improve the possible future contribution of large language models to medical science.

MRI Detection Musculoskeletal Retrospective Clinical In Silico Academic Lab GenAI

Diagnostic performance of artificial intelligence models for pulmonary nodule classification: a multi-model evaluation.

Herber SK, Müller L, Pinto Dos Santos D, Jorg T, Souschek F, Bäuerle T, Foersch S, Galata C, Mildenberger P, Halfmann MC

•papers•Jul 25 2025

Lung cancer is the leading cause of cancer-related mortality. While early detection improves survival, distinguishing malignant from benign pulmonary nodules remains challenging. Artificial intelligence (AI) has been proposed to enhance diagnostic accuracy, but its clinical reliability is still under investigation. Here, we aimed to evaluate the diagnostic performance of AI models in classifying pulmonary nodules. This single-center retrospective study analyzed pulmonary nodules (4-30 mm) detected on CT scans, using three AI software models. Sensitivity, specificity, false-positive and false-negative rates were calculated. The diagnostic accuracy was assessed using the area under the receiver operating characteristic (ROC) curve (AUC), with histopathology serving as the gold standard. Subgroup analyses were based on nodule size and histopathological classification. The impact of imaging parameters was evaluated using regression analysis. A total of 158 nodules (n = 30 benign, n = 128 malignant) were analyzed. One AI model classified most nodules as intermediate risk, preventing further accuracy assessment. The other models demonstrated moderate sensitivity (53.1-70.3%) but low specificity (46.7-66.7%), leading to a high false-positive rate (45.5-52.4%). AUC values were between 0.5 and 0.6 (95% CI). Subgroup analyses revealed decreased sensitivity (47.8-61.5%) but increased specificity (100%), highlighting inconsistencies. In total, up to 49.0% of the pulmonary nodules were classified as intermediate risk. CT scan type influenced performance (p = 0.03), with better classification accuracy on breath-held CT scans. AI-based software models are not ready for standalone clinical use in pulmonary nodule classification due to low specificity, a high false-negative rate and a high proportion of intermediate-risk classifications. Question How accurate are commercially available AI models for the classification of pulmonary nodules compared to the gold standard of histopathology? Findings The evaluated AI models demonstrated moderate sensitivity, low specificity and high false-negative rates. Up to 49% of pulmonary nodules were classified as intermediate risk. Clinical relevance The high false-negative rates could influence radiologists' decision-making, leading to an increased number of interventions or unnecessary surgical procedures.

CT Classification Chest Retrospective Clinical In Silico

Multimodal prediction based on ultrasound for response to neoadjuvant chemotherapy in triple negative breast cancer.

Lyu M, Yi S, Li C, Xie Y, Liu Y, Xu Z, Wei Z, Lin H, Zheng Y, Huang C, Lin X, Liu Z, Pei S, Huang B, Shi Z

•papers•Jul 25 2025

Pathological complete response (pCR) can guide surgical strategy and postoperative treatments in triple-negative breast cancer (TNBC). In this study, we developed a Breast Cancer Response Prediction (BCRP) model to predict the pCR in patients with TNBC. The BCRP model integrated multi-dimensional longitudinal quantitative imaging features, clinical factors and features from the Breast Imaging Data and Reporting System (BI-RADS). Multi-dimensional longitudinal quantitative imaging features, including deep learning features and radiomics features, were extracted from multiview B-mode and colour Doppler ultrasound images before and after treatment. The BCRP model achieved the areas under the receiver operating curves (AUCs) of 0.94 [95% confidence interval (CI), 0.91-0.98] and 0.84 [95%CI, 0.75-0.92] in the training and external test cohorts, respectively. Additionally, the low BCRP score was an independent risk factor for event-free survival (P < 0.05). The BCRP model showed a promising ability in predicting response to neoadjuvant chemotherapy in TNBC, and could provide valuable information for survival.

Ultrasound Classification Breast Retrospective Clinical In Silico

Advances and challenges in AI-assisted MRI for lumbar disc degeneration detection and classification.

Exploring AI-Based System Design for Pixel-Level Protected Health Information Detection in Medical Images.

Automatic Prediction of TMJ Disc Displacement in CBCT Images Using Machine Learning.

CT-free kidney single-photon emission computed tomography for glomerular filtration rate.

XVertNet: Unsupervised Contrast Enhancement of Vertebral Structures with Dynamic Self-Tuning Guidance and Multi-Stage Analysis.

3D-WDA-PMorph: Efficient 3D MRI/TRUS Prostate Registration using Transformer-CNN Network and Wavelet-3D-Depthwise-Attention.

Privacy-Preserving Generation of Structured Lymphoma Progression Reports from Cross-sectional Imaging: A Comparative Analysis of Llama 3.3 and Llama 4.

Could a New Method of Acromiohumeral Distance Measurement Emerge? Artificial Intelligence vs. Physician.

Diagnostic performance of artificial intelligence models for pulmonary nodule classification: a multi-model evaluation.

Multimodal prediction based on ultrasound for response to neoadjuvant chemotherapy in triple negative breast cancer.

Ready to Sharpen Your Edge?