Sort by:
Page 11 of 19190 results

Managing class imbalance in the training of a large language model to predict patient selection for total knee arthroplasty: Results from the Artificial intelligence to Revolutionise the patient Care pathway in Hip and knEe aRthroplastY (ARCHERY) project.

Farrow L, Anderson L, Zhong M

pubmed logopapersJun 1 2025
This study set out to test the efficacy of different techniques used to manage to class imbalance, a type of data bias, in application of a large language model (LLM) to predict patient selection for total knee arthroplasty (TKA). This study utilised data from the Artificial Intelligence to Revolutionise the Patient Care Pathway in Hip and Knee Arthroplasty (ARCHERY) project (ISRCTN18398037). Data included the pre-operative radiology reports of patients referred to secondary care for knee-related complaints from within the North of Scotland. A clinically based LLM (GatorTron) was trained regarding prediction of selection for TKA. Three methods for managing class imbalance were assessed: a standard model, use of class weighting, and majority class undersampling. A total of 7707 individual knee radiology reports were included (dated from 2015 to 2022). The mean text length was 74 words (range 26-275). Only 910/7707 (11.8%) patients underwent TKA surgery (the designated 'minority class'). Class weighting technique performed better for minority class discrimination and calibration compared with the other two techniques (Recall 0.61/AUROC 0.73 for class weighting compared with 0.54/0.70 and 0.59/0.72 for the standard model and majority class undersampling, respectively. There was also significant data loss for majority class undersampling when compared with class-weighting. Use of class-weighting appears to provide the optimal method of training a an LLM to perform analytical tasks on free-text clinical information in the face of significant data bias ('class imbalance'). Such knowledge is an important consideration in the development of high-performance clinical AI models within Trauma and Orthopaedics.

Generative artificial intelligence enables the generation of bone scintigraphy images and improves generalization of deep learning models in data-constrained environments.

Haberl D, Ning J, Kluge K, Kumpf K, Yu J, Jiang Z, Constantino C, Monaci A, Starace M, Haug AR, Calabretta R, Camoni L, Bertagna F, Mascherbauer K, Hofer F, Albano D, Sciagra R, Oliveira F, Costa D, Nitsche C, Hacker M, Spielvogel CP

pubmed logopapersJun 1 2025
Advancements of deep learning in medical imaging are often constrained by the limited availability of large, annotated datasets, resulting in underperforming models when deployed under real-world conditions. This study investigated a generative artificial intelligence (AI) approach to create synthetic medical images taking the example of bone scintigraphy scans, to increase the data diversity of small-scale datasets for more effective model training and improved generalization. We trained a generative model on <sup>99m</sup>Tc-bone scintigraphy scans from 9,170 patients in one center to generate high-quality and fully anonymized annotated scans of patients representing two distinct disease patterns: abnormal uptake indicative of (i) bone metastases and (ii) cardiac uptake indicative of cardiac amyloidosis. A blinded reader study was performed to assess the clinical validity and quality of the generated data. We investigated the added value of the generated data by augmenting an independent small single-center dataset with synthetic data and by training a deep learning model to detect abnormal uptake in a downstream classification task. We tested this model on 7,472 scans from 6,448 patients across four external sites in a cross-tracer and cross-scanner setting and associated the resulting model predictions with clinical outcomes. The clinical value and high quality of the synthetic imaging data were confirmed by four readers, who were unable to distinguish synthetic scans from real scans (average accuracy: 0.48% [95% CI 0.46-0.51]), disagreeing in 239 (60%) of 400 cases (Fleiss' kappa: 0.18). Adding synthetic data to the training set improved model performance by a mean (± SD) of 33(± 10)% AUC (p < 0.0001) for detecting abnormal uptake indicative of bone metastases and by 5(± 4)% AUC (p < 0.0001) for detecting uptake indicative of cardiac amyloidosis across both internal and external testing cohorts, compared to models without synthetic training data. Patients with predicted abnormal uptake had adverse clinical outcomes (log-rank: p < 0.0001). Generative AI enables the targeted generation of bone scintigraphy images representing different clinical conditions. Our findings point to the potential of synthetic data to overcome challenges in data sharing and in developing reliable and prognostic deep learning models in data-limited environments.

Comparative diagnostic accuracy of ChatGPT-4 and machine learning in differentiating spinal tuberculosis and spinal tumors.

Hu X, Xu D, Zhang H, Tang M, Gao Q

pubmed logopapersJun 1 2025
In clinical practice, distinguishing between spinal tuberculosis (STB) and spinal tumors (ST) poses a significant diagnostic challenge. The application of AI-driven large language models (LLMs) shows great potential for improving the accuracy of this differential diagnosis. To evaluate the performance of various machine learning models and ChatGPT-4 in distinguishing between STB and ST. A retrospective cohort study. A total of 143 STB cases and 153 ST cases admitted to Xiangya Hospital Central South University, from January 2016 to June 2023 were collected. This study incorporates basic patient information, standard laboratory results, serum tumor markers, and comprehensive imaging records, including Magnetic Resonance Imaging (MRI) and Computed Tomography (CT), for individuals diagnosed with STB and ST. Machine learning techniques and ChatGPT-4 were utilized to distinguish between STB and ST separately. Six distinct machine learning models, along with ChatGPT-4, were employed to evaluate their differential diagnostic effectiveness. Among the 6 machine learning models, the Gradient Boosting Machine (GBM) algorithm model demonstrated the highest differential diagnostic efficiency. In the training cohort, the GBM model achieved a sensitivity of 98.84% and a specificity of 100.00% in distinguishing STB from ST. In the testing cohort, its sensitivity was 98.25%, and specificity was 91.80%. ChatGPT-4 exhibited a sensitivity of 70.37% and a specificity of 90.65% for differential diagnosis. In single-question cases, ChatGPT-4's sensitivity and specificity were 71.67% and 92.55%, respectively, while in re-questioning cases, they were 44.44% and 76.92%. The GBM model demonstrates significant value in the differential diagnosis of STB and ST, whereas the diagnostic performance of ChatGPT-4 remains suboptimal.

Age-dependent changes in CT vertebral attenuation values in opportunistic screening for osteoporosis: a nationwide multi-center study.

Kim Y, Kim HY, Lee S, Hong S, Lee JW

pubmed logopapersJun 1 2025
To examine how vertebral attenuation changes with aging, and to establish age-adjusted CT attenuation value cutoffs for diagnosing osteoporosis. This multi-center retrospective study included 11,246 patients (mean age ± standard deviation, 50 ± 13 years; 7139 men) who underwent CT and dual-energy X-ray absorptiometry (DXA) in six health-screening centers between 2022 and 2023. Using deep-learning-based software, attenuation values of L1 vertebral bodies were measured. Segmented linear regression in women and simple linear regression in men were used to assess how attenuation values change with aging. A multivariable linear regression analysis was performed to determine whether age is associated with CT attenuation values independently of the DXA T-score. Age-adjusted cutoffs targeting either 90% sensitivity or 90% specificity were derived using quantile regression. Performance of both age-adjusted and age-unadjusted cutoffs was measured, where the target sensitivity or specificity was considered achieved if a 95% confidence interval encompassed 90%. While attenuation values declined consistently with age in men, they declined abruptly in women aged > 42 years. Such decline occurred independently of the DXA T-score (p < 0.001). Age adjustment seemed critical for age ≥ 65 years, where the age-adjusted cutoffs achieved the target (sensitivity of 91.5% (86.3-95.2%) when targeting 90% sensitivity and specificity of 90.0% (88.3-91.6%) when targeting 90% specificity), but age-unadjusted cutoffs did not (95.5% (91.2-98.0%) and 73.8% (71.4-76.1%), respectively). Age-adjusted cutoffs provided a more reliable diagnosis of osteoporosis than age-unadjusted cutoffs since vertebral attenuation values decrease with age, regardless of DXA T-scores. Question How does vertebral CT attenuation change with age? Findings Independent of dual-energy X-ray absorptiometry T-score, vertebral attenuation values on CT declined at a constant rate in men and abruptly in women over 42 years of age. Clinical relevance Age adjustments are needed in opportunistic osteoporosis screening, especially among the elderly.

The role of deep learning in diagnostic imaging of spondyloarthropathies: a systematic review.

Omar M, Watad A, McGonagle D, Soffer S, Glicksberg BS, Nadkarni GN, Klang E

pubmed logopapersJun 1 2025
Diagnostic imaging is an integral part of identifying spondyloarthropathies (SpA), yet the interpretation of these images can be challenging. This review evaluated the use of deep learning models to enhance the diagnostic accuracy of SpA imaging. Following PRISMA guidelines, we systematically searched major databases up to February 2024, focusing on studies that applied deep learning to SpA imaging. Performance metrics, model types, and diagnostic tasks were extracted and analyzed. Study quality was assessed using QUADAS-2. We analyzed 21 studies employing deep learning in SpA imaging diagnosis across MRI, CT, and X-ray modalities. These models, particularly advanced CNNs and U-Nets, demonstrated high accuracy in diagnosing SpA, differentiating arthritis forms, and assessing disease progression. Performance metrics frequently surpassed traditional methods, with some models achieving AUCs up to 0.98 and matching expert radiologist performance. This systematic review underscores the effectiveness of deep learning in SpA imaging diagnostics across MRI, CT, and X-ray modalities. The studies reviewed demonstrated high diagnostic accuracy. However, the presence of small sample sizes in some studies highlights the need for more extensive datasets and further prospective and external validation to enhance the generalizability of these AI models. Question How can deep learning models improve diagnostic accuracy in imaging for spondyloarthropathies (SpA), addressing challenges in early detection and differentiation from other forms of arthritis? Findings Deep learning models, especially CNNs and U-Nets, showed high accuracy in SpA imaging across MRI, CT, and X-ray, often matching or surpassing expert radiologists. Clinical relevance Deep learning models can enhance diagnostic precision in SpA imaging, potentially reducing diagnostic delays and improving treatment decisions, but further validation on larger datasets is required for clinical integration.

Automatic 3-dimensional analysis of posterosuperior full-thickness rotator cuff tear size on magnetic resonance imaging.

Hess H, Gussarow P, Rojas JT, Zumstein MA, Gerber K

pubmed logopapersJun 1 2025
Tear size and shape are known to prognosticate the efficacy of surgical rotator cuff (RC) repair; however, current manual measurements on magnetic resonance images (MRIs) exhibit high interobserver variabilities and exclude 3-dimensional (3D) morphologic information. This study aimed to develop algorithms for automatic 3D analyses of posterosuperior full-thickness RC tear to enable efficient and precise tear evaluation and 3D tear visualization. A deep-learning network for automatic segmentation of the tear region in coronal and sagittal multicenter MRI was trained with manually segmented (consensus of 3 experts) proton density- and T2-weighted MRI of shoulders with full-thickness posterosuperior tears (n = 200). Algorithms for automatic measurement of tendon retraction, tear width, tear area, and automatic Patte classification considering the 3D morphology of the shoulder were implemented and evaluated against manual segmentation (n = 59). Automatic Patte classification was calculated using automatic segmented humerus and scapula on T1-weighted MRI of the same shoulders. Tears were automatically segmented, enabling 3D visualization of the tear, with a mean Dice coefficient of 0.58 ± 0.21 compared to an interobserver variability of 0.46 ± 0.21. The mean absolute error of automatic tendon retraction and tear width measurements (4.98 ± 4.49 mm and 3.88 ± 3.18 mm) were lower than the interobserver variabilities (5.42 ± 7.09 mm and 5.92 ± 1.02 mm). The correlations of all measurements performed on automatic tear segmentations compared with those on consensus segmentations were higher than the interobserver correlation. Automatic Patte classification achieved a Cohen kappa value of 0.62, compared with the interobserver variability of 0.56. Retraction calculated using standard linear measures underestimated the tear size relative to measurements considering the curved shape of the humeral head, especially for larger tears. Even on highly heterogeneous data, the proposed algorithms showed the feasibility to successfully automate tear size analysis and to enable automatic 3D visualization of the tear situation. The presented algorithms standardize cross-center tear analyses and enable the calculation of additional metrics, potentially improving the predictive power of image-based tear measurements for the outcome of surgical treatments, thus aiding in RC tear diagnosis, treatment decision, and planning.

Regions of interest in opportunistic computed tomography-based screening for osteoporosis: impact on short-term in vivo precision.

Park J, Kim Y, Hong S, Chee CG, Lee E, Lee JW

pubmed logopapersJun 1 2025
To determine an optimal region of interest (ROI) for opportunistic screening of osteoporosis in terms of short-term in vivo diagnostic precision. We included patients who underwent two CT scans and one dual-energy X-ray absorptiometry scan within a month in 2022. Deep-learning software automatically measured the attenuation in L1 using 54 ROIs (three slice thicknesses × six shapes × three intravertebral levels). To identify factors associated with a lower attenuation difference between the two CT scans, mixed-effect model analysis was performed with ROI-level (slice thickness, shape, intravertebral levels) and patient-level (age, sex, patient diameter, change in CT machine) factors. The root-mean-square standard deviation (RMSSD) and area under the receiver-operating-characteristic curve (AUROC) were calculated. In total, 73 consecutive patients (mean age ± standard deviation, 69 ± 9 years, 38 women) were included. A lower attenuation difference was observed in ROIs in images with slice thicknesses of 1 and 3 mm than that in images with a slice thickness of 5 mm (p < .001), in large elliptical ROIs (p = .007 or < .001, respectively), and in mid- or cranial-level ROIs than that in caudal-level ROIs (p < .001). No patient-level factors were significantly associated with the attenuation difference. Large, elliptical ROIs placed at the mid-level of L1 on images with 1- or 3-mm slice thicknesses yielded RMSSDs of 12.4-12.5 HU and AUROCs of 0.90. The largest possible regions of interest drawn in the mid-level trabecular portion of the L1 vertebra on thin-slice images may yield improvements in the precision of opportunistic screening for osteoporosis via CT.

Deep learning-enhanced zero echo time MRI for glenohumeral assessment in shoulder instability: a comparative study with CT.

Carretero-Gómez L, Fung M, Wiesinger F, Carl M, McKinnon G, de Arcos J, Mandava S, Arauz S, Sánchez-Lacalle E, Nagrani S, López-Alcorocho JM, Rodríguez-Íñigo E, Malpica N, Padrón M

pubmed logopapersJun 1 2025
To evaluate image quality and lesion conspicuity of zero echo time (ZTE) MRI reconstructed with deep learning (DL)-based algorithm versus conventional reconstruction and to assess DL ZTE performance against CT for bone loss measurements in shoulder instability. Forty-four patients (9 females; 33.5 ± 15.65 years) with symptomatic anterior glenohumeral instability and no previous shoulder surgery underwent ZTE MRI and CT on the same day. ZTE images were reconstructed with conventional and DL methods and post-processed for CT-like contrast. Two musculoskeletal radiologists, blinded to the reconstruction method, independently evaluated 20 randomized MR ZTE datasets with and without DL-enhancement for perceived signal-to-noise ratio, resolution, and lesion conspicuity at humerus and glenoid using a 4-point Likert scale. Inter-reader reliability was assessed using weighted Cohen's kappa (K). An ordinal logistic regression model analyzed Likert scores, with the reconstruction method (DL-enhanced vs. conventional) as the predictor. Glenoid track (GT) and Hill-Sachs interval (HSI) measurements were performed by another radiologist on both DL ZTE and CT datasets. Intermodal agreement was assessed through intraclass correlation coefficients (ICCs) and Bland-Altman analysis. DL ZTE MR bone images scored higher than conventional ZTE across all items, with significantly improved perceived resolution (odds ratio (OR) = 7.67, p = 0.01) and glenoid lesion conspicuity (OR = 25.12, p = 0.01), with substantial inter-rater agreement (K = 0.61 (0.38-0.83) to 0.77 (0.58-0.95)). Inter-modality assessment showed almost perfect agreement between DL ZTE MR and CT for all bone measurements (overall ICC = 0.99 (0.97-0.99)), with mean differences of 0.08 (- 0.80 to 0.96) mm for GT and - 0.07 (- 1.24 to 1.10) mm for HSI. DL-based reconstruction enhances ZTE MRI quality for glenohumeral assessment, offering osseous evaluation and quantification equivalent to gold-standard CT, potentially simplifying preoperative workflow, and reducing CT radiation exposure.

Bridging innovation to implementation in artificial intelligence fracture detection : a commentary piece.

Khattak M, Kierkegaard P, McGregor A, Perry DC

pubmed logopapersJun 1 2025
The deployment of AI in medical imaging, particularly in areas such as fracture detection, represents a transformative advancement in orthopaedic care. AI-driven systems, leveraging deep-learning algorithms, promise to enhance diagnostic accuracy, reduce variability, and streamline workflows by analyzing radiograph images swiftly and accurately. Despite these potential benefits, the integration of AI into clinical settings faces substantial barriers, including slow adoption across health systems, technical challenges, and a major lag between technology development and clinical implementation. This commentary explores the role of AI in healthcare, highlighting its potential to enhance patient outcomes through more accurate and timely diagnoses. It addresses the necessity of bridging the gap between AI innovation and practical application. It also emphasizes the importance of implementation science in effectively integrating AI technologies into healthcare systems, using frameworks such as the Consolidated Framework for Implementation Research and the Knowledge-to-Action Cycle to guide this process. We call for a structured approach to address the challenges of deploying AI in clinical settings, ensuring that AI's benefits translate into improved healthcare delivery and patient care.

Deep learning-based acceleration of high-resolution compressed sense MR imaging of the hip.

Marka AW, Meurer F, Twardy V, Graf M, Ebrahimi Ardjomand S, Weiss K, Makowski MR, Gersing AS, Karampinos DC, Neumann J, Woertler K, Banke IJ, Foreman SC

pubmed logopapersJun 1 2025
To evaluate a Compressed Sense Artificial Intelligence framework (CSAI) incorporating parallel imaging, compressed sense (CS), and deep learning for high-resolution MRI of the hip, comparing it with standard-resolution CS imaging. Thirty-two patients with femoroacetabular impingement syndrome underwent 3 T MRI scans. Coronal and sagittal intermediate-weighted TSE sequences with fat saturation were acquired using CS (0.6 ×0.8 mm resolution) and CSAI (0.3 ×0.4 mm resolution) protocols in comparable acquisition times (7:49 vs. 8:07 minutes for both planes). Two readers systematically assessed the depiction of the acetabular and femoral cartilage (in five cartilage zones), labrum, ligamentum capitis femoris, and bone using a five-point Likert scale. Diagnostic confidence and abnormality detection were recorded and analyzed using the Wilcoxon signed-rank test. CSAI significantly improved the cartilage depiction across most cartilage zones compared to CS. Overall Likert scores were 4.0 ± 0.2 (CS) vs 4.2 ± 0.6 (CSAI) for reader 1 and 4.0 ± 0.2 (CS) vs 4.3 ± 0.6 (CSAI) for reader 2 (p ≤ 0.001). Diagnostic confidence increased from 3.5 ± 0.7 and 3.9 ± 0.6 (CS) to 4.0 ± 0.6 and 4.1 ± 0.7 (CSAI) for readers 1 and 2, respectively (p ≤ 0.001). More cartilage lesions were detected with CSAI, with significant improvements in diagnostic confidence in certain cartilage zones such as femoral zone C and D for both readers. Labrum and ligamentum capitis femoris depiction remained similar, while bone depiction was rated lower. No abnormalities detected in CS were missed in CSAI. CSAI provides high-resolution hip MR images with enhanced cartilage depiction without extending acquisition times, potentially enabling more precise hip cartilage assessment.
Page 11 of 19190 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.