Sort by:
Page 5 of 17169 results

Role of Large Language Models for Suggesting Nerve Involvement in Upper Limbs MRI Reports with Muscle Denervation Signs.

Martín-Noguerol T, López-Úbeda P, Luna A, Gómez-Río M, Górriz JM

pubmed logopapersJun 5 2025
Determining the involvement of specific peripheral nerves (PNs) in the upper limb associated with signs of muscle denervation can be challenging. This study aims to develop, compare, and validate various large language models (LLMs) to automatically identify and establish potential relationships between denervated muscles and their corresponding PNs. We collected 300 retrospective MRI reports in Spanish from upper limb examinations conducted between 2018 and 2024 that showed signs of muscle denervation. An expert radiologist manually annotated these reports based on the affected peripheral nerves (median, ulnar, radial, axillary, and suprascapular). BERT, DistilBERT, mBART, RoBERTa, and Medical-ELECTRA models were fine-tuned and evaluated on the reports. Additionally, an automatic voting system was implemented to consolidate predictions through majority voting. The voting system achieved the highest F1 scores for the median, ulnar, and radial nerves, with scores of 0.88, 1.00, and 0.90, respectively. Medical-ELECTRA also performed well, achieving F1 scores above 0.82 for the axillary and suprascapular nerves. In contrast, mBART demonstrated lower performance, particularly with an F1 score of 0.38 for the median nerve. Our voting system generally outperforms the individually tested LLMs in determining the specific PN likely associated with muscle denervation patterns detected in upper limb MRI reports. This system can thereby assist radiologists by suggesting the implicated PN when generating their radiology reports.

Ensemble of weak spectral total-variation learners: a PET-CT case study.

Rosenberg A, Kennedy J, Keidar Z, Zeevi YY, Gilboa G

pubmed logopapersJun 5 2025
Solving computer vision problems through machine learning, one often encounters lack of sufficient training data. To mitigate this, we propose the use of ensembles of weak learners based on spectral total-variation (STV) features (Gilboa G. 2014 A total variation spectral framework for scale and texture analysis. <i>SIAM J. Imaging Sci</i>. <b>7</b>, 1937-1961. (doi:10.1137/130930704)). The features are related to nonlinear eigenfunctions of the total-variation subgradient and can characterize well textures at various scales. It was shown (Burger M, Gilboa G, Moeller M, Eckardt L, Cremers D. 2016 Spectral decompositions using one-homogeneous functionals. <i>SIAM J. Imaging Sci</i>. <b>9</b>, 1374-1408. (doi:10.1137/15m1054687)) that, in the one-dimensional case, orthogonal features are generated, whereas in two dimensions the features are empirically lowly correlated. Ensemble learning theory advocates the use of lowly correlated weak learners. We thus propose here to design ensembles using learners based on STV features. To show the effectiveness of this paradigm, we examine a hard real-world medical imaging problem: the predictive value of computed tomography (CT) data for high uptake in positron emission tomography (PET) for patients suspected of skeletal metastases. The database consists of 457 scans with 1524 unique pairs of registered CT and PET slices. Our approach is compared with deep-learning methods and to radiomics features, showing STV learners perform best (AUC=[Formula: see text]), compared with neural nets (AUC=[Formula: see text]) and radiomics (AUC=[Formula: see text]). We observe that fine STV scales in CT images are especially indicative of the presence of high uptake in PET.This article is part of the theme issue 'Partial differential equations in data science'.

UltraBones100k: A reliable automated labeling method and large-scale dataset for ultrasound-based bone surface extraction.

Wu L, Cavalcanti NA, Seibold M, Loggia G, Reissner L, Hein J, Beeler S, Viehöfer A, Wirth S, Calvet L, Fürnstahl P

pubmed logopapersJun 4 2025
Ultrasound-based bone surface segmentation is crucial in computer-assisted orthopedic surgery. However, ultrasound images have limitations, including a low signal-to-noise ratio, acoustic shadowing, and speckle noise, which make interpretation difficult. Existing deep learning models for bone segmentation rely primarily on costly manual labeling by experts, limiting dataset size and model generalizability. Additionally, the complexity of ultrasound physics and acoustic shadow makes the images difficult for humans to interpret, leading to incomplete labels in low-intensity and anechoic regions and limiting model performance. To advance the state-of-the-art in ultrasound bone segmentation and establish effective model benchmarks, larger and higher-quality datasets are needed. We propose a methodology for collecting ex-vivo ultrasound datasets with automatically generated bone labels, including anechoic regions. The proposed labels are derived by accurately superimposing tracked bone Computed Tomography (CT) models onto the tracked ultrasound images. These initial labels are refined to account for ultrasound physics. To clinically evaluate the proposed method, an expert physician from our university hospital specialized in orthopedic sonography assessed the quality of the generated bone labels. A neural network for bone segmentation is trained on the collected dataset and its predictions are compared to expert manual labels, evaluating accuracy, completeness, and F1-score. We collected UltraBones100k, the largest known dataset comprising 100k ex-vivo ultrasound images of human lower limbs with bone annotations, specifically targeting the fibula, tibia, and foot bones. A Wilcoxon signed-rank test with Bonferroni correction confirmed that the bone alignment after our optimization pipeline significantly improved the quality of bone labeling (p<0.001). The model trained on UltraBones100k consistently outperforms manual labeling in all metrics, particularly in low-intensity regions (at a distance threshold of 0.5 mm: 320% improvement in completeness, 27.4% improvement in accuracy, and 197% improvement in F1 score) CONCLUSION:: This work is promising to facilitate research and clinical translation of ultrasound imaging in computer-assisted interventions, particularly for applications such as 2D bone segmentation, 3D bone surface reconstruction, and multi-modality bone registration.

A first-of-its-kind two-body statistical shape model of the arthropathic shoulder: enhancing biomechanics and surgical planning.

Blackman J, Giles JW

pubmed logopapersJun 3 2025
Statistical Shape Models are machine learning tools in computational orthopedics that enable the study of anatomical variability and the creation of synthetic models for pathogenetic analysis and surgical planning. Current models of the glenohumeral joint either describe individual bones or are limited to non-pathologic datasets, failing to capture coupled shape variation in arthropathic anatomy. We aimed to develop a novel combined scapula-proximal-humerus model applicable to clinical populations. Preoperative computed tomography scans from 45 Reverse Total Shoulder Arthroplasty patients were used to generate three-dimensional models of the scapula and proximal humerus. Correspondence point clouds were combined into a two-body shape model using Principal Component Analysis. Individual scapula-only and proximal-humerus-only shape models were also created for comparison. The models were validated using compactness, specificity, generalization ability, and leave-one-out cross-validation. The modes of variation for each model were also compared. The combined model was described using eigenvector decomposition into single body models. The models were further compared in their ability to predict the shape of one body when given the shape of its counterpart, and the generation of diverse realistic synthetic pairs de novo. The scapula and proximal-humerus models performed comparably to previous studies with median average leave-one-out cross-validation errors of 1.08 mm (IQR: 0.359 mm), and 0.521 mm (IQR: 0.111 mm); the combined model was similar with median error of 1.13 mm (IQR: 0.239 mm). The combined model described coupled variations between the shapes equalling 43.2% of their individual variabilities, including the relationship between glenoid and humeral head erosions. The combined model outperformed the individual models generatively with reduced missing shape prediction bias (> 10%) and uniformly diverse shape plausibility (uniformity p-value < .001 vs. .59). This study developed the first two-body scapulohumeral shape model that captures coupled variations in arthropathic shoulder anatomy and the first proximal-humeral statistical model constructed using a clinical dataset. While single-body models are effective for descriptive tasks, combined models excel in generating joint-level anatomy. This model can be used to augment computational analyses of synthetic populations investigating shoulder biomechanics and surgical planning.

Automated Classification of Cervical Spinal Stenosis using Deep Learning on CT Scans.

Zhang YL, Huang JW, Li KY, Li HL, Lin XX, Ye HB, Chen YH, Tian NF

pubmed logopapersJun 3 2025
Retrospective study. To develop and validate a computed tomography-based deep learning(DL) model for diagnosing cervical spinal stenosis(CSS). Although magnetic resonance imaging (MRI) is widely used for diagnosing CSS, its inherent limitations, including prolonged scanning time, limited availability in resource-constrained settings, and contraindications for patients with metallic implants, make computed tomography (CT) a critical alternative in specific clinical scenarios. The development of CT-based DL models for CSS detection holds promise in transcending the diagnostic efficacy limitations of conventional CT imaging, thereby serving as an intelligent auxiliary tool to optimize healthcare resource allocation. Paired CT/MRI images were collected. CT images were divided into training, validation, and test sets in an 8:1:1 ratio. The two-stage model architecture employed: (1) a Faster R-CNN-based detection model for localization, annotation, and extraction of regions of interest (ROI); (2) comparison of 16 convolutional neural network (CNN) models for stenosis classification to select the best-performing model. The evaluation metrics included accuracy, F1-score, and Cohen's κ coefficient, with comparisons made against diagnostic results from physicians with varying years of experience. In the multiclass classification task, four high-performing models (DL1-b0, DL2-121, DL3-101, and DL4-26d) achieved accuracies of 88.74%, 89.40%, 89.40%, and 88.08%, respectively. All models demonstrated >80% consistency with senior physicians and >70% consistency with junior physicians.In the binary classification task, the models achieved accuracies of 94.70%, 96.03%, 96.03%, and 94.70%, respectively. All four models demonstrated consistency rates slightly below 90% with junior physicians. However, when compared with senior physicians, three models (excluding DL4-26d) exhibited consistency rates exceeding 90%. The DL model developed in this study demonstrated high accuracy in CT image analysis of CSS, with a diagnostic performance comparable to that of senior physicians.

Artificial intelligence vs human expertise: A comparison of plantar fascia thickness measurements through MRI imaging.

Alyanak B, Çakar İ, Dede BT, Yıldızgören MT, Bağcıer F

pubmed logopapersJun 3 2025
This study aims to evaluate the reliability of plantar fascia thickness measurements performed by ChatGPT-4 using magnetic resonance imaging (MRI) compared to those obtained by an experienced clinician. In this retrospective, single-center study, foot MRI images from the hospital archive were analysed. Plantar fascia thickness was measured under both blinded and non-blinded conditions by an experienced clinician and ChatGPT-4 at two separate time points. Measurement reliability was assessed using the intraclass correlation coefficient (ICC), mean absolute error (MAE), and mean relative error (MRE). A total of 41 participants (32 females, 9 males) were included. The average plantar fascia thickness measured by the clinician was 4.20 ± 0.80 mm and 4.25 ± 0.92 mm under blinded and non-blinded conditions, respectively, while ChatGPT-4's measurements were 6.47 ± 1.30 mm and 6.46 ± 1.31 mm, respectively. Human evaluators demonstrated excellent agreement (ICC = 0.983-0.989), whereas ChatGPT-4 exhibited low reliability (ICC = 0.391-0.432). In thin plantar fascia cases, ChatGPT-4's error rate was higher, with MAE = 2.70 mm, MRE = 77.17 % under blinded conditions, and MAE = 2.91 mm, MRE = 87.02 % under non-blinded conditions. ChatGPT-4 demonstrated lower reliability in plantar fascia thickness measurements compared to an experienced clinician, with increased error rates in thin structures. These findings highlight the limitations of AI-based models in medical image analysis and emphasize the need for further refinement before clinical implementation.

Artificial intelligence in bone metastasis analysis: Current advancements, opportunities and challenges.

Afnouch M, Bougourzi F, Gaddour O, Dornaika F, Ahmed AT

pubmed logopapersJun 3 2025
Artificial Intelligence is transforming medical imaging, particularly in the analysis of bone metastases (BM), a serious complication of advanced cancers. Machine learning and deep learning techniques offer new opportunities to improve detection, recognition, and segmentation of bone metastasis. Yet, challenges such as limited data, interpretability, and clinical validation remain. Following PRISMA guidelines, we reviewed artificial intelligence methods and applications for bone metastasis analysis across major imaging modalities including CT, MRI, PET, SPECT, and bone scintigraphy. The survey includes traditional machine learning models and modern deep learning architectures such as CNNs and transformers. We also examined available datasets and their effect in developing artificial intelligence in this field. Artificial intelligence models have achieved strong performance across tasks and modalities, with Convolutional Neural Network (CNN) and Transformer architectures showing particularly efficient performance across different tasks. However, limitations persist, including data imbalance, overfitting risks, and the need for greater transparency. Clinical translation is also challenged by regulatory and validation hurdles. Artificial intelligence holds strong potential to improve BM diagnosis and streamline radiology workflows. To reach clinical maturity, future work must address data diversity, model explainability, and large-scale validation, which are critical steps for being trusted to be integrated into the oncology care routines.

Prediction of hip fracture by high-resolution peripheral quantitative computed tomography in older Swedish women.

Jaiswal R, Pivodic A, Zoulakis M, Axelsson KF, Litsne H, Johansson L, Lorentzon M

pubmed logopapersJun 3 2025
The socioeconomic burden of hip fractures, the most severe osteoporotic fracture outcome, is increasing and the current clinical risk assessment lacks sensitivity. This study aimed to develop a method for improved prediction of hip fracture by incorporating measurements of bone microstructure and composition derived from HR-pQCT. In a prospective cohort study of 3028 community-dwelling women aged 75-80, all participants answered questionnaires and underwent baseline examinations of anthropometrics and bone by DXA and HR-pQCT. Medical records, a regional x-ray archive, and registers were used to identify incident fractures and death. Prediction models for hip, major osteoporotic fracture (MOF), and any fracture were developed using Cox proportional hazards regression and machine learning algorithms (neural network, random forest, ensemble, and Extreme Gradient Boosting). In the 2856 (94.3%) women with complete HR-pQCT data at 2 tibia sites (distal and ultra-distal), the median follow-up period was 8.0 yr, and 217 hip, 746 MOF, and 1008 any type of incident fracture occurred. In Cox regression models adjusted for age, BMI, clinical risk factors (CRFs), and FN BMD, the strongest predictors of hip fracture were tibia total volumetric BMD and cortical thickness. The performance of the Cox regression-based prediction models for hip fracture was significantly improved by HR-pQCT (time-dependent AUC; area under receiver operating characteristic curve at 5 yr of follow-up 0.75 [0.64-0.85]), compared to a reference model including CRFs and FN BMD (AUC = 0.71 [0.58-0.81], p < .001) and a Fracture Risk Assessment Tool risk score model (AUC = 0.70 [0.60-0.80], p < .001). The Cox regression model for hip fracture had a significantly higher accuracy than the neural network-based model, the best-performing machine learning algorithm, at clinically relevant sensitivity levels. We conclude that the addition of HR-pQCT parameters improves the prediction of hip fractures in a cohort of older Swedish women.

Development and validation of machine learning models for distal instrumentation-related problems in patients with degenerative lumbar scoliosis based on preoperative CT and MRI.

Feng Z, Yang H, Li Z, Zhang X, Hai Y

pubmed logopapersJun 3 2025
This investigation proposes a machine learning framework leveraging preoperative MRI and CT imaging data to predict postoperative complications related to distal instrumentation (DIP) in degenerative lumbar scoliosis patients undergoing long-segment fusion procedures. We retrospectively analyzed 136 patients, categorizing based on the development of DIP. Preoperative MRI and CT scans provided muscle function and bone density data, including the relative gross cross-sectional area and relative functional cross-sectional area of the multifidus, erector spinae, paraspinal extensor, psoas major muscles, the gross muscle fat index and functional muscle fat index, Hounsfield unit values of the lumbosacral region and the lower instrumented vertebra. Predictive factors for DIP were selected through stepwise LASSO regression. The filtered and all factors were incorporated into six machine learning algorithms twice, namely k-nearest neighbors, decision tree, support vector machine, random forest, multilayer perceptron (MLP), and Naïve Bayes, with tenfold cross-validation. Among patients, 16.9% developed DIP, with the multifidus' functional cross-sectional area and lumbosacral region's Hounsfield unit value as significant predictors. The MLP model exhibited superior performance when all predictive factors were input, with an average AUC of 0.98 and recall rate of 0.90. We compared various machine learning algorithms and constructed, trained, and validated predictive models based on muscle function and bone density-related variables obtained from preoperative CT and MRI, which could identify patients with high risk of DIP after long-segment spinal fusion surgery.

Evaluating the Diagnostic Accuracy of ChatGPT-4.0 for Classifying Multimodal Musculoskeletal Masses: A Comparative Study with Human Raters.

Bosbach WA, Schoeni L, Beisbart C, Senge JF, Mitrakovic M, Anderson SE, Achangwa NR, Divjak E, Ivanac G, Grieser T, Weber MA, Maurer MH, Sanal HT, Daneshvar K

pubmed logopapersJun 3 2025
Novel artificial intelligence tools have the potential to significantly enhance productivity in medicine, while also maintaining or even improving treatment quality. In this study, we aimed to evaluate the current capability of ChatGPT-4.0 to accurately interpret multimodal musculoskeletal tumor cases.We created 25 cases, each containing images from X-ray, computed tomography, magnetic resonance imaging, or scintigraphy. ChatGPT-4.0 was tasked with classifying each case using a six-option, two-choice question, where both a primary and a secondary diagnosis were allowed. For performance evaluation, human raters also assessed the same cases.When only the primary diagnosis was taken into account, the accuracy of human raters was greater than that of ChatGPT-4.0 by a factor of nearly 2 (87% vs. 44%). However, in a setting that also considered secondary diagnoses, the performance gap shrank substantially (accuracy: 94% vs. 71%). Power analysis relying on Cohen's w confirmed the adequacy of the sample set size (n: 25).The tested artificial intelligence tool demonstrated lower performance than human raters. Considering factors such as speed, constant availability, and potential future improvements, it appears plausible that artificial intelligence tools could serve as valuable assistance systems for doctors in future clinical settings. · ChatGPT-4.0 classifies musculoskeletal cases using multimodal imaging inputs.. · Human raters outperform AI in primary diagnosis accuracy by a factor of nearly two.. · Including secondary diagnoses improves AI performance and narrows the gap.. · AI demonstrates potential as an assistive tool in future radiological workflows.. · Power analysis confirms robustness of study findings with the current sample size.. · Bosbach WA, Schoeni L, Beisbart C et al. Evaluating the Diagnostic Accuracy of ChatGPT-4.0 for Classifying Multimodal Musculoskeletal Masses: A Comparative Study with Human Raters. Rofo 2025; DOI 10.1055/a-2594-7085.
Page 5 of 17169 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.