Latest Papers on Radiology AI. Tags: Musculoskeletal, Order: Best Match, Limit: 10.

Development of a deep learning-based MRI diagnostic model for human Brucella spondylitis.

Wang B, Wei J, Wang Z, Niu P, Yang L, Hu Y, Shao D, Zhao W

•papers•Jul 9 2025

Brucella spondylitis (BS) and tuberculous spondylitis (TS) are prevalent spinal infections with distinct treatment protocols. Rapid and accurate differentiation between these two conditions is crucial for effective clinical management; however, current imaging and pathogen-based diagnostic methods fall short of fully meeting clinical requirements. This study explores the feasibility of employing deep learning (DL) models based on conventional magnetic resonance imaging (MRI) to differentiate BS and TS. A total of 310 subjects were enrolled in our hospital, comprising 209 with BS, 101 with TS. The participants were randomly divided into a training set (n = 217) and a test set (n = 93). And 74 with other hospital was external validation set. Integrating Convolutional Block Attention Module (CBAM) into the ResNeXt-50 architecture and training the model using sagittal T2-weighted images (T2WI). Classification performance was evaluated using the area under the receiver operating characteristic (AUC) curve, and diagnostic accuracy was compared against general models such as ResNet50, GoogleNet, EfficientNetV2, and VGG16. The CBAM-ResNeXt model revealed superior performance, with accuracy, precision, recall, F1-score, and AUC from 0.942, 0.940, 0.928, 0.934, 0.953, respectively. These metrics outperformed those of the general models. The proposed model offers promising potential for the diagnosis of BS and TS using conventional MRI. It could serve as an invaluable tool in clinical practice, providing a reliable reference for distinguishing between these two diseases.

MRI Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab

Wrist bone segmentation in X-ray images using CT-based simulations

Youssef ElTantawy, Alexia Karantana, Xin Chen

•preprint•Jul 8 2025

Plain X-ray is one of the most common image modalities for clinical diagnosis (e.g. bone fracture, pneumonia, cancer screening, etc.). X-ray image segmentation is an essential step for many computer-aided diagnostic systems, yet it remains challenging. Deep-learning-based methods have achieved superior performance in medical image segmentation tasks but often require a large amount of high-quality annotated data for model training. Providing such an annotated dataset is not only time-consuming but also requires a high level of expertise. This is particularly challenging in wrist bone segmentation in X-rays, due to the interposition of multiple small carpal bones in the image. To overcome the data annotation issue, this work utilizes a large number of simulated X-ray images generated from Computed Tomography (CT) volumes with their corresponding 10 bone labels to train a deep learning-based model for wrist bone segmentation in real X-ray images. The proposed method was evaluated using both simulated images and real images. The method achieved Dice scores ranging from 0.80 to 0.92 for the simulated dataset generated from different view angles. Qualitative analysis of the segmentation results of the real X-ray images also demonstrated the superior performance of the trained model. The trained model and X-ray simulation code are freely available for research purposes: the link will be provided upon acceptance.

X-Ray Segmentation Musculoskeletal Methodology In Silico Open Code

A Deep Learning Model for Comprehensive Automated Bone Lesion Detection and Classification on Staging Computed Tomography Scans.

Simon BD, Harmon SA, Yang D, Belue MJ, Xu Z, Tetreault J, Pinto PA, Wood BJ, Citrin DE, Madan RA, Xu D, Choyke PL, Gulley JL, Turkbey B

•papers•Jul 8 2025

A common site of metastases for a variety of cancers is the bone, which is challenging and time consuming to review and important for cancer staging. Here, we developed a deep learning approach for detection and classification of bone lesions on staging CTs. This study developed an nnUNet model using 402 patients' CTs, including prostate cancer patients with benign or malignant osteoblastic (blastic) bone lesions, and patients with benign or malignant osteolytic (lytic) bone lesions from various primary cancers. An expert radiologist contoured ground truth lesions, and the model was evaluated for detection on a lesion level. For classification performance, accuracy, sensitivity, specificity, and other metrics were calculated. The held-out test set consisted of 69 patients (32 with bone metastases). The AUC of AI-predicted burden of disease was calculated on a patient level. In the independent test set, 70% of ground truth lesions were detected (67% of malignant lesions and 72% of benign lesions). The model achieved accuracy of 85% in classifying lesions as malignant or benign (91% sensitivity and 81% specificity). Although AI identified false positives in several benign patients, the patient-level AUC was 0.82 using predicted disease burden proportion. Our lesion detection and classification AI model performs accurately and has the potential to correct physician errors. Further studies should investigate if the model can impact physician review in terms of detection rate, classification accuracy, and review time.

CT Detection Musculoskeletal Retrospective Clinical In Silico

Integrating Machine Learning into Myositis Research: a Systematic Review.

Juarez-Gomez C, Aguilar-Vazquez A, Gonzalez-Gauna E, Garcia-Ordoñez GP, Martin-Marquez BT, Gomez-Rios CA, Becerra-Jimenez J, Gaspar-Ruiz A, Vazquez-Del Mercado M

•papers•Jul 8 2025

Idiopathic inflammatory myopathies (IIM) are a group of autoimmune rheumatic diseases characterized by proximal muscle weakness and extra muscular manifestations. Since 1975, these IIM have been classified into different clinical phenotypes. Each clinical phenotype is associated with a better or worse prognosis and a particular physiopathology. Machine learning (ML) is a fascinating field of knowledge with worldwide applications in different fields. In IIM, ML is an emerging tool assessed in very specific clinical contexts as a complementary tool for research purposes, including transcriptome profiles in muscle biopsies, differential diagnosis using magnetic resonance imaging (MRI), and ultrasound (US). With the cancer-associated risk and predisposing factors for interstitial lung disease (ILD) development, this systematic review evaluates 23 original studies using supervised learning models, including logistic regression (LR), random forest (RF), support vector machines (SVM), and convolutional neural networks (CNN), with performance assessed primarily through the area under the curve coupled with the receiver operating characteristic (AUC-ROC).

Mixed Modality Classification Musculoskeletal Review Concept Academic Lab Benchmark SOTA

Automated instance segmentation and registration of spinal vertebrae from CT-Scans with an improved 3D U-net neural network and corner point registration.

Hill J, Khokher MR, Nguyen C, Adcock M, Li R, Anderson S, Morrell T, Diprose T, Salvado O, Wang D, Tay GK

•papers•Jul 8 2025

This paper presents a rapid and robust approach for 3D volumetric segmentation, labelling, and registration of human spinal vertebrae from CT scans using an optimised and improved 3D U-Net neural network architecture. The network is designed by incorporating residual and dense interconnections, followed by an extensive evaluation of different network setups by optimising the network components like activation functions, optimisers, and pooling operations. In addition, the network architecture is optimised for varying numbers of convolution layers per block and U-Net levels with fixed and cascading numbers of filters. For 3D virtual reality visualisation, the segmentation output of the improved 3D U-Net network is registered with the original scans through a corner point registration process. The registration takes into account the spatial coordinates of each segmented vertebra as a 3D volume and eight virtual fiducial markers to ensure alignment in all rotational planes. Trained on the VerSe'20 dataset, the proposed pipeline achieves a Dice score coefficient of 92.38% for vertebrae instance segmentation and a Hausdorff distance of 5.26 mm for vertebrae localisation on the VerSe'20 public test dataset, which outperforms many existing methods that participated in the VerSe'20 challenge. Integrated with Singular Health's MedVR software for virtual reality visualisation, the proposed solution has been deployed on standard edge-computing hardware in medical institutions. Depending on the scan size, the deployed solution takes between 90 and 210 s to label and segment vertebrae, including the cervical vertebrae. It is hoped that the acceleration of the segmentation and registration process will facilitate the easier preparation of future training datasets and benefit pre-surgical visualisation and planning.

CT Segmentation Musculoskeletal Methodology In Silico Startup Benchmark SOTA

Assessment of T2-weighted MRI-derived synthetic CT for the detection of suspected lumbar facet arthritis: a comparative analysis with conventional CT.

Cao G, Wang H, Xie S, Cai D, Guo J, Zhu J, Ye K, Wang Y, Xia J

•papers•Jul 8 2025

We evaluated sCT generated from T2-weighted imaging (T2WI) using deep learning techniques to detect structural lesions in lumbar facet arthritis, with conventional CT as the reference standard. This single-center retrospective study included 40 patients who had lumbar MRI and CT with in 1 week (September 2020 to August 2021). A Pix2Pix-GAN framework generated CT images from MRI data, and image quality was assessed using structural similarity index (SSIM), mean absolute error (MAE), peak signal-to-noise ratio (PSNR), nd Dice similarity coefficient (DSC). Two senior radiologists evaluated 15 anatomical landmarks. Sensitivity, specificity, and accuracy for detecting bone erosion, osteosclerosis, and joint space alterations were analyzed for sCT, T2-weighted MRI, and conventional CT. Forty participants (21 men, 19 women) were enrolled, with a mean age of 39 ± 16.9 years. sCT showed strong agreement with conventional CT, with SSIM values of 0.888 for axial and 0.889 for sagittal views. PSNR and MAE values were 24.56 dB and 0.031 for axial and 23.75 dB and 0.038 for sagittal views, respectively. DSC values were 0.935 for axial and 0.876 for sagittal views. sCT showed excellent intra- and inter-reader reliability intraclass correlation coefficients (0.953-0.995 and 0.839-0.983, respectively). sCT had higher sensitivity (57.9% vs. 5.3%), specificity (98.8% vs. 84.6%), and accuracy (93.0% vs. 73.3%) for bone erosion than T2-weighted MRI and outperformed it for osteosclerosis and joint space changes. sCT outperformed conventional T2-weighted MRI in detecting structural lesions indicative of lumbar facet arthritis, with conventional CT as the reference standard.

Mixed Modality Image Synthesis Musculoskeletal Retrospective Clinical In Silico Academic Lab

[The standardization and digitalization and intelligentization represent the future development direction of hip arthroscopy diagnosis and treatment technology].

Li CB, Zhang J, Wang L, Wang YT, Kang XQ, Wang MX

•papers•Jul 8 2025

In recent years, hip arthroscopy has made great progress and has been extended to the treatment of intra-articular or periarticular diseases. However, the complex structure of the hip joint, high technical operation requirements and relatively long learning curve have hindered the popularization and development of hip arthroscopy in China. Therefore, on the one hand, it is necessary to promote the research and training of standardized techniques for the diagnosis of hip disease and the treatment of arthroscopic surgery, so as to improve the safety, effectiveness and popularization of the technology. On the other hand, our organization proactively leverages cutting-edge digitalization and intelligentization technologies, including medical image digitalization, medical big data analytics, artificial intelligence, surgical navigation and robotic control, virtual reality, telemedicine, and 5G communication technology. We conduct a range of innovative research and development initiatives such as intelligent-assisted diagnosis of hip diseases, digital preoperative planning, surgical intelligent navigation and robotic procedures, and smart rehabilitation solutions. These efforts aim to facilitate a digitalization and intelligentization leap in technology and continuously enhance the precision of diagnosis and treatment. In conclusion, standardization promotes the homogenization of diagnosis and treatment, while digitalization and intelligentization facilitate the precision of operations. The synergy of the two lays the foundation for personalized diagnosis and treatment and continuous innovation, ultimately driving the rapid development of hip arthroscopy technology.

Mixed Modality Classification Musculoskeletal Review Concept Academic Lab GenAI

An Institutional Large Language Model for Musculoskeletal MRI Improves Protocol Adherence and Accuracy.

Patrick Decourcy Hallinan JT, Leow NW, Low YX, Lee A, Ong W, Zhou Chan MD, Devi GK, He SS, De-Liang Loh D, Wei Lim DS, Low XZ, Teo EC, Furqan SM, Yang Tham WW, Tan JH, Kumar N, Makmur A, Yonghan T

•papers•Jul 8 2025

Privacy-preserving large language models (PP-LLMs) hold potential for assisting clinicians with documentation. We evaluated a PP-LLM to improve the clinical information on radiology request forms for musculoskeletal magnetic resonance imaging (MRI) and to automate protocoling, which ensures that the most appropriate imaging is performed. The present retrospective study included musculoskeletal MRI radiology request forms that had been randomly collected from June to December 2023. Studies without electronic medical record (EMR) entries were excluded. An institutional PP-LLM (Claude Sonnet 3.5) augmented the original radiology request forms by mining EMRs, and, in combination with rule-based processing of the LLM outputs, suggested appropriate protocols using institutional guidelines. Clinical information on the original and PP-LLM radiology request forms were compared with use of the RI-RADS (Reason for exam Imaging Reporting and Data System) grading by 2 musculoskeletal (MSK) radiologists independently (MSK1, with 13 years of experience, and MSK2, with 11 years of experience). These radiologists established a consensus reference standard for protocoling, against which the PP-LLM and of 2 second-year board-certified radiologists (RAD1 and RAD2) were compared. Inter-rater reliability was assessed with use of the Gwet AC1, and the percentage agreement with the reference standard was calculated. Overall, 500 musculoskeletal MRI radiology request forms were analyzed for 407 patients (202 women and 205 men with a mean age [and standard deviation] of 50.3 ± 19.5 years) across a range of anatomical regions, including the spine/pelvis (143 MRI scans; 28.6%), upper extremity (169 scans; 33.8%) and lower extremity (188 scans; 37.6%). Two hundred and twenty-two (44.4%) of the 500 MRI scans required contrast. The clinical information provided in the PP-LLM-augmented radiology request forms was rated as superior to that in the original requests. Only 0.4% to 0.6% of PP-LLM radiology request forms were rated as limited/deficient, compared with 12.4% to 22.6% of the original requests (p < 0.001). Almost-perfect inter-rater reliability was observed for LLM-enhanced requests (AC1 = 0.99; 95% confidence interval [CI], 0.99 to 1.0), compared with substantial agreement for the original forms (AC1 = 0.62; 95% CI, 0.56 to 0.67). For protocoling, MSK1 and MSK2 showed almost-perfect agreement on the region/coverage (AC1 = 0.96; 95% CI, 0.95 to 0.98) and contrast requirement (AC1 = 0.98; 95% CI, 0.97 to 0.99). Compared with the consensus reference standard, protocoling accuracy for the PP-LLM was 95.8% (95% CI, 94.0% to 97.6%), which was significantly higher than that for both RAD1 (88.6%; 95% CI, 85.8% to 91.4%) and RAD2 (88.2%; 95% CI, 85.4% to 91.0%) (p < 0.001 for both). Musculoskeletal MRI request form augmentation with an institutional LLM provided superior clinical information and improved protocoling accuracy compared with clinician requests and non-MSK-trained radiologists. Institutional adoption of such LLMs could enhance the appropriateness of MRI utilization and patient care. Diagnostic Level III. See Instructions for Authors for a complete description of levels of evidence.

MRI Triage Musculoskeletal Retrospective Clinical Clinical Pilot Academic Lab GenAI

External validation of an artificial intelligence tool for fracture detection in children with osteogenesis imperfecta: a multireader study.

Pauling C, Laidlow-Singh H, Evans E, Garbera D, Williamson R, Fernando R, Thomas K, Martin H, Arthurs OJ, Shelmerdine SC

•papers•Jul 7 2025

To determine the performance of a commercially available AI tool for fracture detection when used in children with osteogenesis imperfecta (OI). All appendicular and pelvic radiographs from an OI clinic at a single centre from 48 patients were included. Seven radiologists evaluated anonymised images in two rounds, first without, then with AI assistance. Differences in diagnostic accuracy between the rounds were analysed. 48 patients (mean 12 years) provided 336 images, containing 206 fractures established by consensus opinion of two radiologists. AI produced a per-examination accuracy of 74.8% [95% CI: 65.4%, 82.7%], compared to average radiologist performance at 83.4% [95% CI: 75.2%, 89.8%]. Radiologists using AI assistance improved average radiologist accuracy per examination to 90.7% [95% CI: 83.5%, 95.4%]. AI gave more false negatives than radiologists, with 80 missed fractures versus 41, respectively. Radiologists were more likely (74.6%) to alter their original decision to agree with AI at the per-image level, 82.8% of which led to a correct result, 64.0% of which were changing from a false positive to a true negative. Despite inferior standalone performance, AI assistance can still improve radiologist fracture detection in a rare disease paediatric population. Radiologists using AI typically led to more accurate diagnostic outcomes through reduced false positives. Future studies focusing on the real-world application of AI tools in a larger population of children with bone fragility disorders will help better evaluate whether these improvements in accuracy translate into improved patient outcomes. Question How well does a commercially available artificial intelligence (AI) tool identify fractures, on appendicular radiographs of children with osteogenesis imperfecta (OI), and can it also improve radiologists' identification of fractures in this population? Findings Specialist human radiologists outperformed the AI fracture detection tool when acting alone; however, their diagnostic performance overall improved with AI assistance. Clinical relevance AI assistance improves specialist radiologist fracture detection in children with osteogenesis imperfecta, even with AI performance alone inferior to the radiologists acting alone. The reason for this was due to the AI moderating the number of false positives generated by the radiologists.

X-Ray Detection Musculoskeletal Retrospective Clinical Clinical Pilot Academic Lab Benchmark SOTA

Leveraging Large Language Models for Accurate AO Fracture Classification from CT Text Reports.

Mergen M, Spitzl D, Ketzer C, Strenzke M, Marka AW, Makowski MR, Bressem KK, Adams LC, Gassert FT

•papers•Jul 7 2025

Large language models (LLMs) have shown promising potential in analyzing complex textual data, including radiological reports. These models can assist clinicians, particularly those with limited experience, by integrating and presenting diagnostic criteria within radiological classifications. However, before clinical adoption, LLMs must be rigorously validated by medical professionals to ensure accuracy, especially in the context of advanced radiological classification systems. This study evaluates the performance of four LLMs-ChatGPT-4o, AmbossGPT, Claude 3.5 Sonnet, and Gemini 2.0 Flash-in classifying fractures based on the AO classification system using CT reports. A dataset of 292 fictitious physician-generated CT reports, representing 310 fractures, was used to assess the accuracy of each LLM in AO fracture classification retrospectively. Performance was evaluated by comparing the models' classifications to ground truth labels, with accuracy rates analyzed across different fracture types and subtypes. ChatGPT-4o and AmbossGPT achieved the highest overall accuracy (74.6 and 74.3%, respectively), outperforming Claude 3.5 Sonnet (69.5%) and Gemini 2.0 Flash (62.7%). Statistically significant differences were observed in fracture type classification, particularly between ChatGPT-4o and Gemini 2.0 Flash (Δ12%, p < 0.001). While all models demonstrated strong bone recognition rates (90-99%), their accuracy in fracture subtype classification remained lower (71-77%), indicating limitations in nuanced diagnostic categorization. LLMs show potential in assisting radiologists with initial fracture classification, particularly in high-volume or resource-limited settings. However, their performance remains inconsistent for detailed subtype classification, highlighting the need for further refinement and validation before clinical integration in advanced diagnostic workflows.

CT Classification Musculoskeletal Retrospective Clinical In Silico GenAI Benchmark SOTA

Development of a deep learning-based MRI diagnostic model for human Brucella spondylitis.

Wrist bone segmentation in X-ray images using CT-based simulations

A Deep Learning Model for Comprehensive Automated Bone Lesion Detection and Classification on Staging Computed Tomography Scans.

Integrating Machine Learning into Myositis Research: a Systematic Review.

Automated instance segmentation and registration of spinal vertebrae from CT-Scans with an improved 3D U-net neural network and corner point registration.

Assessment of T2-weighted MRI-derived synthetic CT for the detection of suspected lumbar facet arthritis: a comparative analysis with conventional CT.

[The standardization and digitalization and intelligentization represent the future development direction of hip arthroscopy diagnosis and treatment technology].

An Institutional Large Language Model for Musculoskeletal MRI Improves Protocol Adherence and Accuracy.

External validation of an artificial intelligence tool for fracture detection in children with osteogenesis imperfecta: a multireader study.

Leveraging Large Language Models for Accurate AO Fracture Classification from CT Text Reports.

Ready to Sharpen Your Edge?