Latest Papers on Radiology AI.

Geometric, dosimetric and psychometric evaluation of three commercial AI software solutions for OAR auto-segmentation in head and neck radiotherapy.

Podobnik G, Borg C, Debono CJ, Mercieca S, Vrtovec T

•papers•Sep 29 2025

Contouring organs-at-risk (OARs) is a critical yet time-consuming step in head and neck (HaN) radiotherapy planning. Auto-segmentation methods have been widely studied, and commercial solutions are increasingly entering clinical use. However, their adoption warrants a comprehensive, multi-perspective evaluation. The purpose of this study is to compare three commercial artificial intelligence (AI) software solutions (Limbus, MIM and MVision) for HaN OAR auto-segmentation on a cohort of 10 computed tomography images with reference contours obtained from the public HaN-Seg dataset, from both observational (descriptive and empirical) and analytical (geometric, dosimetric and psychometric) perspectives. The observational evaluation included vendor questionnaires on technical specifications and radiographer feedback on usability. The analytical evaluation covered geometric (Dice similarity coefficient, DSC, and 95th percentile Hausdorff distance, HD95), dosimetric (dose constraint compliance, OAR priority-based analysis), and psychometric (5-point Likert scale) assessments. All software solutions covered a broad range of OARs. Overall geometric performance differences were relatively small (Limbus: 69.7% DSC, 5.0 mm HD95; MIM: 69.2% DSC, 5.6 mm HD95; MVision: 66.7% DSC, 5.3 mm HD95), however, statistically significant differences were observed for smaller structures such as the cochleae, optic chiasm, and pituitary and thyroid glands. Differences in dosimetric compliance were overall minor, with the lowest compliance observed for the oral cavity and submandibular glands. In terms of qualitative assessment, radiographers gave the highest average Likert rating to Limbus (3.9), followed by MVision (3.7) and MIM (3.5). With few exceptions, most software solutions produced good-quality AI-generated contours (Likert ratings ≥ 3), yet some editing should still be performed to reach clinical acceptability. Notable discrepancies were seen for the optic chiasm and in cases affected by mouth bites or dental artifacts. Importantly, no clear relationship emerged between geometric, dosimetric, and psychometric metrics, underscoring the need for a multi-perspective evaluation without shortcuts.

CT Segmentation Neurological Retrospective Clinical Clinical Pilot Startup

Elemental composition analysis of calcium-based urinary stones via laser-induced breakdown spectroscopy for enhanced clinical insights.

Xie H, Huang J, Wang R, Ma X, Xie L, Zhang H, Li J, Liu C

•papers•Sep 29 2025

The purpose of this study was to profile elemental composition of calcium-based urinary stones using laser-induced breakdown spectroscopy (LIBS) and develop a machine learning model to distinguish recurrence-associated profiles by integrating elemental and clinical data. A total of 122 calcium-based stones (41 calcium oxalate, 11 calcium phosphate, 49 calcium oxalate/calcium phosphate, 8 calcium oxalate/uric acid, 13 calcium phosphate/struvite) were analyzed via LIBS. Elemental intensity ratios (H/Ca, P/Ca, Mg/Ca, Sr/Ca, Na/Ca, K/Ca) were calculated using Ca (396.847 nm) as reference. Clinical variables (demographics, laboratory and imaging results, recurrence status) were retrospectively collected. A back propagation neural network (BPNN) model was trained using four data strategies: clinical-only, spectral principal components (PCs), combined PCs plus clinical, and merged raw spectral plus clinical data. The performance of these four models was evaluated. Sixteen stone samples from other medical centers were used as external validation sets. Mg and Sr were detected in most of stones. Significant correlations existed among P, Mg, Sr, and K ratios. Recurrent patients showed elevated elemental ratios (p < 0.01), higher urine pH (p < 0.01), and lower stone CT density (p = 0.044). The BPNN model with merged spectral plus clinical data achieved optimal performance in classification (test set accuracy: 94.37%), significantly outperforming clinical-only models (test set accuracy: 73.37%). The results of external validation indicate that the model has good generalization ability. LIBS reveals ubiquitous Mg and Sr in calcium-based stones and elevated elemental ratios in recurrent cases. Integration of elemental profiles with clinical data enables high-accuracy classification of recurrence-associated profiles, providing insights for potential risk stratification in urolithiasis management.

Mixed Modality Classification Abdominal Retrospective Clinical In Silico Academic Lab

Mixed prototype correction for causal inference in medical image classification.

Hong ZL, Yang JC, Peng XR, Wu SS

•papers•Sep 29 2025

The heterogeneity of medical images poses significant challenges to accurate disease diagnosis. To tackle this issue, the impact of such heterogeneity on the causal relationship between image features and diagnostic labels should be incorporated into model design, which however remains under explored. In this paper, we propose a mixed prototype correction for causal inference (MPCCI) method, aimed at mitigating the impact of unseen confounding factors on the causal relationships between medical images and disease labels, so as to enhance the diagnostic accuracy of deep learning models. The MPCCI comprises a causal inference component based on front-door adjustment and an adaptive training strategy. The causal inference component employs a multi-view feature extraction (MVFE) module to establish mediators, and a mixed prototype correction (MPC) module to execute causal interventions. Moreover, the adaptive training strategy incorporates both information purity and maturity metrics to maintain stable model training. Experimental evaluations on four medical image datasets, encompassing CT and ultrasound modalities, demonstrate the superior diagnostic accuracy and reliability of the proposed MPCCI. The code will be available at https://github.com/Yajie-Zhang/MPCCI .

Mixed Modality Classification Methodology In Silico Academic Lab Open Code

Readability versus accuracy in LLM-transformed radiology reports: stakeholder preferences across reading grade levels.

Lee HS, Kim S, Kim S, Seo J, Kim WH, Kim J, Han K, Hwang SH, Lee YH

•papers•Sep 29 2025

To examine how reading grade levels affect stakeholder preferences based on a trade-off between accuracy and readability. A retrospective study of 500 radiology reports from academic and community hospitals across five imaging modalities was conducted. Reports were transformed into 11 reading grade levels (7-17) using Gemini. Accuracy, readability, and preference were rated on a 5-point scale by radiologists, physicians, and laypersons. Errors (generalizations, omissions, hallucinations) and potential changes in patient management (PCPM) were identified. Ordinal logistic regression analyzed preference predictors, and weighted kappa measured interobserver reliability. Preferences varied across reading grade levels depending on stakeholder group, modality, and clinical setting. Overall, preferences peaked at grade 16, but declined at grade 17, particularly among laypersons. Lower reading grades improved readability but increased errors, while higher grades improved accuracy but reduced readability. In multivariable analysis, accuracy was the strongest predictor of preference for all groups (OR: 30.29, 33.05, and 2.16; p <0 .001), followed by readability (OR: 2.73, 1.70, 2.01; p <0.001). Higher-grade levels were generally preferred due to better accuracy, with a range of 12-17. Further increasing grade levels reduced readability sharply, limiting preference. These findings highlight the limitations of unsupervised LLM transformations and suggest the need for hybrid approaches that maintain original reports while incorporating explanatory content to balance accuracy and readability.

Mixed Modality LLM Radiology Report Retrospective Clinical In Silico Academic Lab GenAI

Classification of anterior cruciate ligament tears in knee magnetic resonance images using pre-trained model and custom model.

Thangaperumal S, Murugan PR, Hossen J, Wong WK, Ng PK

•papers•Sep 29 2025

An anterior cruciate ligament (ACL) tear is a prevalent knee injury among athletes, and aged people with osteoporosis are at increased risk for it. For early detection and treatment, precise and rapid identification of ACL tears is significant. A fully automated system that can identify ACL tear is necessary to aid healthcare providers in determining the nature of injuries detected on Magnetic Resonance Imaging (MRI) scans. Two Convolutional Neural Networks (CNN), the pretrained model and the CustomNet model are trained and tested using 581 MRI scans of the knee. Feature extraction is done with the pre-trained ResNet-18 model, and the ISOMAP algorithm is used in the CustomNet model. Linear and nonlinear dimensionality reduction techniques are employed to extract the needed features from the image. For the ResNet-18 model, the accuracy rate ranges between 86% and 92% for various data partitions. After performing PCA, the improved classification rate ranges between 92% and 96.2%. The CustomNet model's accuracy rate ranges from 40 to 70%, 70-90%, 60-70%, and 50-70% for different hyperparameter ensembles. Five-fold cross validation is implemented in CustomNet and it achieved an overall accuracy of 85.6%. These two models demonstrate superior efficiency and accuracy in classifying normal and ACL torn Knee MR images.

MRI Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab

Clinical and MRI markers for acute vs chronic temporomandibular disorders using a machine learning and deep neural networks.

Lee YH, Jeon S, Kim DH, Auh QS, Lee JH, Noh YK

•papers•Sep 29 2025

Exploring the transition from acute to chronic temporomandibular disorders (TMD) remains challenging due to the multifactorial nature of the disease. This study aims to identify clinical, behavioral, and imaging-based predictors that contribute to symptom chronicity in patients with TMD. We enrolled 239 patients with TMD (161 women, 78 men; mean age 35.60 ± 17.93 years), classified as acute ( < 6 months) or chronic ( ≥ 6 months) based on symptom duration. TMD was diagnosed according to the Diagnostic Criteria for TMD (DC/TMD Axis I). Clinical data, sleep-related variables, and temporomandibular joint magnetic resonance imaging (MRI) were collected. MRI assessments included anterior disc displacement (ADD), joint space narrowing, osteoarthritis, and effusion using 3 T T2-weighted and proton density scans. Predictors were evaluated using logistic regression and deep neural networks (DNN), and performance was compared. Chronic TMD is observed in 51.05% of patients. Compared to acute cases, chronic TMD is more frequently associated with TMJ noise (70.5%), bruxism (31.1%), and higher pain intensity (VAS: 4.82 ± 2.47). They also have shorter sleep and higher STOP-Bang scores, indicating greater risk of obstructive sleep apnea. MRI findings reveal increased prevalence of ADD (86.9%), TMJ-OA (82.0%), and joint space narrowing (88.5%) in chronic TMD. Logistic regression achieves an AUROC of 0.7550 (95% CI: 0.6550-0.8550), identifying TMJ noise, bruxism, VAS, sleep disturbance, STOP-Bang≥5, ADD, and joint space narrowing as significant predictors. The DNN model improves accuracy to 79.49% compared to 75.50%, though the difference is not statistically significant (p = 0.3067). Behavioral and TMJ-related structural factors are key predictors of chronic TMD and may aid early identification. Timely recognition may support personalized strategies and improve outcomes.

MRI Classification Retrospective Clinical In Silico Academic Lab

An efficient deep learning network for brain stroke detection using salp shuffled shepherded optimization.

Xue X, Viswapriya SE, Rajeswari D, Homod RZ, Khalaf OI

•papers•Sep 29 2025

Brain strokes (BS) are potentially life-threatening cerebrovascular conditions and the second highest contributor to mortality. They include hemorrhagic and ischemic strokes, which vary greatly in size, shape, and location, posing significant challenges for automated identification. Magnetic Resonance Imaging (MRI) brain imaging using Diffusion Weighted Imaging (DWI) will show fluid balance changes very early. Due to their higher sensitivity, MRI scans are more accurate than Computed Tomography (CT) scans. Salp Shuffled Shepherded EfficientNet (S3ET-NET), a new deep learning model in this research work, could propose the detection of brain stroke using brain MRI. The MRI images are pre-processed by a Gaussian bilateral (GB) filter to reduce the noise distortion in the input images. The Ghost Net model derives suitable features from the pre-processed images. The extracted images will have some optimal features that were selected by applying the Salp Shuffled Shepherded Optimization (S3O) algorithm. The Efficient Net model is utilized for classifying brain stroke cases, such as normal, Ischemic stroke (IS), and hemorrhagic stroke (HS). According to the result, the proposed S3ET-NET attains a 99.41% reliability rate. In contrast to Link Net, Mobile Net, and Google Net, the proposed Ghost Net improves detection accuracy by 1.16, 1.94, and 3.14%, respectively. The suggested Efficient Net outperforms ResNet50, zNet-mRMR-NB, and DNN in the accuracy range, improving by 3.20, 5.22, and 4.21%, respectively.

MRI Classification Neurological Methodology In Silico Academic Lab

Clinical application of deep learning for enhanced multistage caries detection in panoramic radiographs.

Pornprasertsuk-Damrongsri S, Vachmanus S, Papasratorn D, Kitisubkanchana J, Chaikantha S, Arayasantiparb R, Mongkolwat P

•papers•Sep 29 2025

The detection of dental caries is typically overlooked on panoramic radiographs. This study aims to leverage deep learning to identify multistage caries on panoramic radiographs. The panoramic radiographs were confirmed with the gold standard bitewing radiographs to create a reliable ground truth. The dataset of 500 panoramic radiographs with corresponding bitewing confirmations was labelled by an experienced and calibrated radiologist for 1,792 caries from 14,997 teeth. The annotations were stored using the annotation and image markup standard to ensure consistency and reliability. The deep learning system employed a two-model approach: YOLOv5 for tooth detection and Attention U-Net for segmenting caries. The system achieved impressive results, demonstrating strong agreement with dentists for both caries counts and classifications (enamel, dentine, and pulp). However, some discrepancies exist, particularly in underestimating enamel caries. While the model occasionally overpredicts caries in healthy teeth (false positive), it prioritizes minimizing missed lesions (false negative), achieving a high recall of 0.96. Overall performance surpasses previously reported values, with an F1-score of 0.85 and an accuracy of 0.93 for caries segmentation in posterior teeth. The deep learning approach demonstrates promising potential to aid dentists in caries diagnosis, treatment planning, and dental education.

X-Ray Segmentation Retrospective Clinical In Silico Academic Lab

A deep learning algorithm for automatic 3D segmentation and quantification of hamstrings musculotendon injury from MRI.

Riem L, DuCharme O, Coggins A, Kenney A, Cousins M, Feng X, Hein R, Buford M, Lee K, Opar D, Heiderscheit B, Blemker SS

•papers•Sep 29 2025

In high-velocity sports, hamstring strain injuries are common causes of missed play and have high rates of reinjury. Evaluating the severity and location of a hamstring strain injury, currently graded by a clinician using a semiqualitative muscle injury classification score (e.g. as one method, British Athletics Muscle Injury Classification - BAMIC) to describe edema presence and location, aids in guiding athlete recovery. In this study, automated artificial intelligence (AI) models were developed and deployed to automatically segment edema, hamstring muscle and tendon structures using T2-weighted and T1-weighted magnetic resonance images (MRI), respectively. MR scans were collected from collegiate football athletes at time-of-hamstring injury and return to sport. Volume, length, and cross-sectional (CSA) measurements were performed on all structures and subregions (i.e. free tendon and aponeurosis). The edema and hamstring muscle/tendon AI models compared favorably with ground-truth segmentations. AI volumetric output correlated with ground truth for edema (R = 0.97), hamstring muscles (R ≥ 0.99), and hamstring tendon (R ≥ 0.42) structures. Edema volume and percentage of muscle impacted by edema significantly increased with clinical BAMIC grade (p < 0.05). Taken together, these results demonstrate a promising new approach for AI-based quantification of edema which reflects differing levels of injury severity and supports clinical validity. Main Body.

MRI Segmentation Musculoskeletal Retrospective Clinical In Silico Academic Lab

Automated deep U-Net model for ischemic stroke lesion segmentation in the sub-acute phase.

E R, Bevi AR

•papers•Sep 29 2025

Manual segmentation of sub-acute ischemic stroke lesions in fluid-attenuated inversion recovery magnetic resonance imaging (FLAIR MRI) is time-consuming and subject to inter-observer variability, limiting clinical workflow efficiency. To develop and validate an automated deep learning framework for accurate segmentation of sub-acute ischemic stroke lesions in FLAIR MRI using rigorous validation methodology. We propose a novel multi-path residual U-Net(U-shaped network) architecture with six parallel pathways per block (depths 0-5 convolutional layers) and 2.34 million trainable parameters. Hyperparameters were systematically optimized using 5-fold cross-validation across 60 configurations. We addressed intensity inhomogeneity using N4 bias field correction and employed strict patient-level data partitioning (18 training, 5 validation, 5 test patients) to prevent data leakage. Statistical analysis utilized bias-corrected bootstrap confidence intervals and Bonferroni correction for multiple comparisons. Our model achieved a validation dice similarity coefficient (DSC) of 0.85 ± 0.12 (95% CI: 0.79-0.91), a sensitivity of 0.82 ± 0.15, a specificity of 0.95 ± 0.04, and a Hausdorff distance of 14.1 ± 5.8 mm. Test set performance remained consistent (DSC: 0.89 ± 0.07), confirming generalizability. Computational efficiency was demonstrated with 45 ms inference time per slice. The architecture demonstrated statistically significant improvements over DRANet (p = 0.003), 2D CNN (p = 0.001), and Attention U-Net (p = 0.001), while achieving competitive performance comparable to CSNet (p = 0.68). The proposed framework demonstrates robust performance for automated stroke lesion segmentation with rigorous statistical validation. However, multi-site validation across diverse clinical environments remains essential before clinical implementation.

MRI Segmentation Neurological Retrospective Clinical In Silico

Filter Papers

Tags

Geometric, dosimetric and psychometric evaluation of three commercial AI software solutions for OAR auto-segmentation in head and neck radiotherapy.

Elemental composition analysis of calcium-based urinary stones via laser-induced breakdown spectroscopy for enhanced clinical insights.

Mixed prototype correction for causal inference in medical image classification.

Readability versus accuracy in LLM-transformed radiology reports: stakeholder preferences across reading grade levels.

Classification of anterior cruciate ligament tears in knee magnetic resonance images using pre-trained model and custom model.

Clinical and MRI markers for acute vs chronic temporomandibular disorders using a machine learning and deep neural networks.

An efficient deep learning network for brain stroke detection using salp shuffled shepherded optimization.

Clinical application of deep learning for enhanced multistage caries detection in panoramic radiographs.

A deep learning algorithm for automatic 3D segmentation and quantification of hamstrings musculotendon injury from MRI.

Automated deep U-Net model for ischemic stroke lesion segmentation in the sub-acute phase.

Ready to Sharpen Your Edge?