Sort by:
Page 7 of 19190 results

Development of a deep learning model for measuring sagittal parameters on cervical spine X-ray.

Wang S, Li K, Zhang S, Zhang D, Hao Y, Zhou Y, Wang C, Zhao H, Ma Y, Zhao D, Chen J, Li X, Wang H, Li Z, Shi J, Wang X

pubmed logopapersJun 5 2025
To develop a deep learning model to automatically measure the curvature-related sagittal parameters on cervical spinal X-ray images. This retrospective study collected a total of 700 lateral cervical spine X-ray images from three hospitals, consisting of 500 training sets, 100 internal test sets, and 100 external test sets. 6 measured parameters and 34 landmarks were measured and labeled by two doctors and averaged as the gold standard. A Convolutional neural network (CNN) model was built by training on 500 images and testing on 200 images. Statistical analysis is used to evaluate labeling differences and model performance. The percentages of the difference in distance between landmarks within 4 mm were 96.90% (Dr. A vs. Dr. B), 98.47% (Dr. A vs. model), and 97.31% (Dr. B vs. model); within 3 mm were 94.88% (Dr. A vs. Dr. B), 96.43% (Dr. A vs. model), and 94.16% (Dr. B vs. model). The mean difference of the algorithmic model in labeling landmarks was 1.17 ± 1.14 mm. The mean absolute error (MAE) of the algorithmic model for the Borden method, Cervical curvature index (CCI), Vertebral centroid measurement cervical lordosis (CCL), C<sub>0</sub>-C<sub>7</sub> Cobb, C<sub>1</sub>-C<sub>7</sub> Cobb, C<sub>2</sub>-C<sub>7</sub> Cobb in the test sets are 1.67 mm, 2.01%, 3.22°, 2.37°, 2.49°, 2.81°, respectively; symmetric mean absolute percentage error (SMAPE) was 20.06%, 21.68%, 20.02%, 6.68%, 5.28%, 20.46%, respectively. Also, the algorithmic model of the six cervical sagittal parameters is in good agreement with the gold standard (intraclass correlation efficiency was 0.983; p < 0.001). Our deep learning algorithmic model had high accuracy in recognizing the landmarks of the cervical spine and automatically measuring cervical spine-related parameters, which can help radiologists improve their diagnostic efficiency.

Role of Large Language Models for Suggesting Nerve Involvement in Upper Limbs MRI Reports with Muscle Denervation Signs.

Martín-Noguerol T, López-Úbeda P, Luna A, Gómez-Río M, Górriz JM

pubmed logopapersJun 5 2025
Determining the involvement of specific peripheral nerves (PNs) in the upper limb associated with signs of muscle denervation can be challenging. This study aims to develop, compare, and validate various large language models (LLMs) to automatically identify and establish potential relationships between denervated muscles and their corresponding PNs. We collected 300 retrospective MRI reports in Spanish from upper limb examinations conducted between 2018 and 2024 that showed signs of muscle denervation. An expert radiologist manually annotated these reports based on the affected peripheral nerves (median, ulnar, radial, axillary, and suprascapular). BERT, DistilBERT, mBART, RoBERTa, and Medical-ELECTRA models were fine-tuned and evaluated on the reports. Additionally, an automatic voting system was implemented to consolidate predictions through majority voting. The voting system achieved the highest F1 scores for the median, ulnar, and radial nerves, with scores of 0.88, 1.00, and 0.90, respectively. Medical-ELECTRA also performed well, achieving F1 scores above 0.82 for the axillary and suprascapular nerves. In contrast, mBART demonstrated lower performance, particularly with an F1 score of 0.38 for the median nerve. Our voting system generally outperforms the individually tested LLMs in determining the specific PN likely associated with muscle denervation patterns detected in upper limb MRI reports. This system can thereby assist radiologists by suggesting the implicated PN when generating their radiology reports.

A Machine Learning Method to Determine Candidates for Total and Unicompartmental Knee Arthroplasty Based on a Voting Mechanism.

Zhang N, Zhang L, Xiao L, Li Z, Hao Z

pubmed logopapersJun 5 2025
Knee osteoarthritis (KOA) is a prevalent condition. Accurate selection between total knee arthroplasty (TKA) and unicompartmental knee arthroplasty (UKA) is crucial for optimal treatment in patients who have end-stage KOA, particularly for improving clinical outcomes and reducing healthcare costs. This study proposes a machine learning model based on a voting mechanism to enhance the accuracy of surgical decision-making for KOA patients. Radiographic data were collected from a high-volume joint arthroplasty practice, focusing on anterior-posterior, lateral, and skyline X-ray views. The dataset included 277 TKA and 293 UKA cases, each labeled through intraoperative observations (indicating whether TKA or UKA was the appropriate choice). A five-fold cross-validation approach was used for training and validation. In the proposed method, three base models were first trained independently on single-view images, and a voting mechanism was implemented to aggregate model outputs. The performance of the proposed method was evaluated by using metrics such as accuracy and the area under the receiver operating characteristic curve (AUC). The proposed method achieved an accuracy of 94.2% and an AUC of 0.98%, demonstrating superior performance compared to existing models. The voting mechanism enabled base models to effectively utilize the detailed features from all three X-ray views, leading to enhanced predictive accuracy and model interpretability. This study provides a high-accuracy method for surgical decision-making between TKA and UKA for KOA patients, requiring only standard X-rays and offering potential for clinical application in automated referrals and preoperative planning.

UltraBones100k: A reliable automated labeling method and large-scale dataset for ultrasound-based bone surface extraction.

Wu L, Cavalcanti NA, Seibold M, Loggia G, Reissner L, Hein J, Beeler S, Viehöfer A, Wirth S, Calvet L, Fürnstahl P

pubmed logopapersJun 4 2025
Ultrasound-based bone surface segmentation is crucial in computer-assisted orthopedic surgery. However, ultrasound images have limitations, including a low signal-to-noise ratio, acoustic shadowing, and speckle noise, which make interpretation difficult. Existing deep learning models for bone segmentation rely primarily on costly manual labeling by experts, limiting dataset size and model generalizability. Additionally, the complexity of ultrasound physics and acoustic shadow makes the images difficult for humans to interpret, leading to incomplete labels in low-intensity and anechoic regions and limiting model performance. To advance the state-of-the-art in ultrasound bone segmentation and establish effective model benchmarks, larger and higher-quality datasets are needed. We propose a methodology for collecting ex-vivo ultrasound datasets with automatically generated bone labels, including anechoic regions. The proposed labels are derived by accurately superimposing tracked bone Computed Tomography (CT) models onto the tracked ultrasound images. These initial labels are refined to account for ultrasound physics. To clinically evaluate the proposed method, an expert physician from our university hospital specialized in orthopedic sonography assessed the quality of the generated bone labels. A neural network for bone segmentation is trained on the collected dataset and its predictions are compared to expert manual labels, evaluating accuracy, completeness, and F1-score. We collected UltraBones100k, the largest known dataset comprising 100k ex-vivo ultrasound images of human lower limbs with bone annotations, specifically targeting the fibula, tibia, and foot bones. A Wilcoxon signed-rank test with Bonferroni correction confirmed that the bone alignment after our optimization pipeline significantly improved the quality of bone labeling (p<0.001). The model trained on UltraBones100k consistently outperforms manual labeling in all metrics, particularly in low-intensity regions (at a distance threshold of 0.5 mm: 320% improvement in completeness, 27.4% improvement in accuracy, and 197% improvement in F1 score) CONCLUSION:: This work is promising to facilitate research and clinical translation of ultrasound imaging in computer-assisted interventions, particularly for applications such as 2D bone segmentation, 3D bone surface reconstruction, and multi-modality bone registration.

Evaluating the Diagnostic Accuracy of ChatGPT-4.0 for Classifying Multimodal Musculoskeletal Masses: A Comparative Study with Human Raters.

Bosbach WA, Schoeni L, Beisbart C, Senge JF, Mitrakovic M, Anderson SE, Achangwa NR, Divjak E, Ivanac G, Grieser T, Weber MA, Maurer MH, Sanal HT, Daneshvar K

pubmed logopapersJun 3 2025
Novel artificial intelligence tools have the potential to significantly enhance productivity in medicine, while also maintaining or even improving treatment quality. In this study, we aimed to evaluate the current capability of ChatGPT-4.0 to accurately interpret multimodal musculoskeletal tumor cases.We created 25 cases, each containing images from X-ray, computed tomography, magnetic resonance imaging, or scintigraphy. ChatGPT-4.0 was tasked with classifying each case using a six-option, two-choice question, where both a primary and a secondary diagnosis were allowed. For performance evaluation, human raters also assessed the same cases.When only the primary diagnosis was taken into account, the accuracy of human raters was greater than that of ChatGPT-4.0 by a factor of nearly 2 (87% vs. 44%). However, in a setting that also considered secondary diagnoses, the performance gap shrank substantially (accuracy: 94% vs. 71%). Power analysis relying on Cohen's w confirmed the adequacy of the sample set size (n: 25).The tested artificial intelligence tool demonstrated lower performance than human raters. Considering factors such as speed, constant availability, and potential future improvements, it appears plausible that artificial intelligence tools could serve as valuable assistance systems for doctors in future clinical settings. · ChatGPT-4.0 classifies musculoskeletal cases using multimodal imaging inputs.. · Human raters outperform AI in primary diagnosis accuracy by a factor of nearly two.. · Including secondary diagnoses improves AI performance and narrows the gap.. · AI demonstrates potential as an assistive tool in future radiological workflows.. · Power analysis confirms robustness of study findings with the current sample size.. · Bosbach WA, Schoeni L, Beisbart C et al. Evaluating the Diagnostic Accuracy of ChatGPT-4.0 for Classifying Multimodal Musculoskeletal Masses: A Comparative Study with Human Raters. Rofo 2025; DOI 10.1055/a-2594-7085.

Artificial intelligence vs human expertise: A comparison of plantar fascia thickness measurements through MRI imaging.

Alyanak B, Çakar İ, Dede BT, Yıldızgören MT, Bağcıer F

pubmed logopapersJun 3 2025
This study aims to evaluate the reliability of plantar fascia thickness measurements performed by ChatGPT-4 using magnetic resonance imaging (MRI) compared to those obtained by an experienced clinician. In this retrospective, single-center study, foot MRI images from the hospital archive were analysed. Plantar fascia thickness was measured under both blinded and non-blinded conditions by an experienced clinician and ChatGPT-4 at two separate time points. Measurement reliability was assessed using the intraclass correlation coefficient (ICC), mean absolute error (MAE), and mean relative error (MRE). A total of 41 participants (32 females, 9 males) were included. The average plantar fascia thickness measured by the clinician was 4.20 ± 0.80 mm and 4.25 ± 0.92 mm under blinded and non-blinded conditions, respectively, while ChatGPT-4's measurements were 6.47 ± 1.30 mm and 6.46 ± 1.31 mm, respectively. Human evaluators demonstrated excellent agreement (ICC = 0.983-0.989), whereas ChatGPT-4 exhibited low reliability (ICC = 0.391-0.432). In thin plantar fascia cases, ChatGPT-4's error rate was higher, with MAE = 2.70 mm, MRE = 77.17 % under blinded conditions, and MAE = 2.91 mm, MRE = 87.02 % under non-blinded conditions. ChatGPT-4 demonstrated lower reliability in plantar fascia thickness measurements compared to an experienced clinician, with increased error rates in thin structures. These findings highlight the limitations of AI-based models in medical image analysis and emphasize the need for further refinement before clinical implementation.

Automated Classification of Cervical Spinal Stenosis using Deep Learning on CT Scans.

Zhang YL, Huang JW, Li KY, Li HL, Lin XX, Ye HB, Chen YH, Tian NF

pubmed logopapersJun 3 2025
Retrospective study. To develop and validate a computed tomography-based deep learning(DL) model for diagnosing cervical spinal stenosis(CSS). Although magnetic resonance imaging (MRI) is widely used for diagnosing CSS, its inherent limitations, including prolonged scanning time, limited availability in resource-constrained settings, and contraindications for patients with metallic implants, make computed tomography (CT) a critical alternative in specific clinical scenarios. The development of CT-based DL models for CSS detection holds promise in transcending the diagnostic efficacy limitations of conventional CT imaging, thereby serving as an intelligent auxiliary tool to optimize healthcare resource allocation. Paired CT/MRI images were collected. CT images were divided into training, validation, and test sets in an 8:1:1 ratio. The two-stage model architecture employed: (1) a Faster R-CNN-based detection model for localization, annotation, and extraction of regions of interest (ROI); (2) comparison of 16 convolutional neural network (CNN) models for stenosis classification to select the best-performing model. The evaluation metrics included accuracy, F1-score, and Cohen's κ coefficient, with comparisons made against diagnostic results from physicians with varying years of experience. In the multiclass classification task, four high-performing models (DL1-b0, DL2-121, DL3-101, and DL4-26d) achieved accuracies of 88.74%, 89.40%, 89.40%, and 88.08%, respectively. All models demonstrated >80% consistency with senior physicians and >70% consistency with junior physicians.In the binary classification task, the models achieved accuracies of 94.70%, 96.03%, 96.03%, and 94.70%, respectively. All four models demonstrated consistency rates slightly below 90% with junior physicians. However, when compared with senior physicians, three models (excluding DL4-26d) exhibited consistency rates exceeding 90%. The DL model developed in this study demonstrated high accuracy in CT image analysis of CSS, with a diagnostic performance comparable to that of senior physicians.

Prediction of hip fracture by high-resolution peripheral quantitative computed tomography in older Swedish women.

Jaiswal R, Pivodic A, Zoulakis M, Axelsson KF, Litsne H, Johansson L, Lorentzon M

pubmed logopapersJun 3 2025
The socioeconomic burden of hip fractures, the most severe osteoporotic fracture outcome, is increasing and the current clinical risk assessment lacks sensitivity. This study aimed to develop a method for improved prediction of hip fracture by incorporating measurements of bone microstructure and composition derived from HR-pQCT. In a prospective cohort study of 3028 community-dwelling women aged 75-80, all participants answered questionnaires and underwent baseline examinations of anthropometrics and bone by DXA and HR-pQCT. Medical records, a regional x-ray archive, and registers were used to identify incident fractures and death. Prediction models for hip, major osteoporotic fracture (MOF), and any fracture were developed using Cox proportional hazards regression and machine learning algorithms (neural network, random forest, ensemble, and Extreme Gradient Boosting). In the 2856 (94.3%) women with complete HR-pQCT data at 2 tibia sites (distal and ultra-distal), the median follow-up period was 8.0 yr, and 217 hip, 746 MOF, and 1008 any type of incident fracture occurred. In Cox regression models adjusted for age, BMI, clinical risk factors (CRFs), and FN BMD, the strongest predictors of hip fracture were tibia total volumetric BMD and cortical thickness. The performance of the Cox regression-based prediction models for hip fracture was significantly improved by HR-pQCT (time-dependent AUC; area under receiver operating characteristic curve at 5 yr of follow-up 0.75 [0.64-0.85]), compared to a reference model including CRFs and FN BMD (AUC = 0.71 [0.58-0.81], p < .001) and a Fracture Risk Assessment Tool risk score model (AUC = 0.70 [0.60-0.80], p < .001). The Cox regression model for hip fracture had a significantly higher accuracy than the neural network-based model, the best-performing machine learning algorithm, at clinically relevant sensitivity levels. We conclude that the addition of HR-pQCT parameters improves the prediction of hip fractures in a cohort of older Swedish women.

A first-of-its-kind two-body statistical shape model of the arthropathic shoulder: enhancing biomechanics and surgical planning.

Blackman J, Giles JW

pubmed logopapersJun 3 2025
Statistical Shape Models are machine learning tools in computational orthopedics that enable the study of anatomical variability and the creation of synthetic models for pathogenetic analysis and surgical planning. Current models of the glenohumeral joint either describe individual bones or are limited to non-pathologic datasets, failing to capture coupled shape variation in arthropathic anatomy. We aimed to develop a novel combined scapula-proximal-humerus model applicable to clinical populations. Preoperative computed tomography scans from 45 Reverse Total Shoulder Arthroplasty patients were used to generate three-dimensional models of the scapula and proximal humerus. Correspondence point clouds were combined into a two-body shape model using Principal Component Analysis. Individual scapula-only and proximal-humerus-only shape models were also created for comparison. The models were validated using compactness, specificity, generalization ability, and leave-one-out cross-validation. The modes of variation for each model were also compared. The combined model was described using eigenvector decomposition into single body models. The models were further compared in their ability to predict the shape of one body when given the shape of its counterpart, and the generation of diverse realistic synthetic pairs de novo. The scapula and proximal-humerus models performed comparably to previous studies with median average leave-one-out cross-validation errors of 1.08 mm (IQR: 0.359 mm), and 0.521 mm (IQR: 0.111 mm); the combined model was similar with median error of 1.13 mm (IQR: 0.239 mm). The combined model described coupled variations between the shapes equalling 43.2% of their individual variabilities, including the relationship between glenoid and humeral head erosions. The combined model outperformed the individual models generatively with reduced missing shape prediction bias (> 10%) and uniformly diverse shape plausibility (uniformity p-value < .001 vs. .59). This study developed the first two-body scapulohumeral shape model that captures coupled variations in arthropathic shoulder anatomy and the first proximal-humeral statistical model constructed using a clinical dataset. While single-body models are effective for descriptive tasks, combined models excel in generating joint-level anatomy. This model can be used to augment computational analyses of synthetic populations investigating shoulder biomechanics and surgical planning.

Development and validation of machine learning models for distal instrumentation-related problems in patients with degenerative lumbar scoliosis based on preoperative CT and MRI.

Feng Z, Yang H, Li Z, Zhang X, Hai Y

pubmed logopapersJun 3 2025
This investigation proposes a machine learning framework leveraging preoperative MRI and CT imaging data to predict postoperative complications related to distal instrumentation (DIP) in degenerative lumbar scoliosis patients undergoing long-segment fusion procedures. We retrospectively analyzed 136 patients, categorizing based on the development of DIP. Preoperative MRI and CT scans provided muscle function and bone density data, including the relative gross cross-sectional area and relative functional cross-sectional area of the multifidus, erector spinae, paraspinal extensor, psoas major muscles, the gross muscle fat index and functional muscle fat index, Hounsfield unit values of the lumbosacral region and the lower instrumented vertebra. Predictive factors for DIP were selected through stepwise LASSO regression. The filtered and all factors were incorporated into six machine learning algorithms twice, namely k-nearest neighbors, decision tree, support vector machine, random forest, multilayer perceptron (MLP), and Naïve Bayes, with tenfold cross-validation. Among patients, 16.9% developed DIP, with the multifidus' functional cross-sectional area and lumbosacral region's Hounsfield unit value as significant predictors. The MLP model exhibited superior performance when all predictive factors were input, with an average AUC of 0.98 and recall rate of 0.90. We compared various machine learning algorithms and constructed, trained, and validated predictive models based on muscle function and bone density-related variables obtained from preoperative CT and MRI, which could identify patients with high risk of DIP after long-segment spinal fusion surgery.
Page 7 of 19190 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.