Automated classification of shoulder radiology focusing on cuff tear arthropathy and glenoid erosion using AI.
Authors
Affiliations (2)
Affiliations (2)
- Department of Clinical Sciences at Danderyd Hospital, Division of Orthopaedics, Karolinska Institutet, Stockholm, Sweden. [email protected].
- Department of Clinical Sciences at Danderyd Hospital, Division of Orthopaedics, Karolinska Institutet, Stockholm, Sweden.
Abstract
Recent advancements in the field artificial intelligence (AI), particularly in the architecture of convolutional neural network (CNN) architecture, have revolutionized medical imaging by enabling accurate image recognition. However, the application of AI in identifying degenerative musculoskeletal disorders, specifically on plain radiographs, is still poorly explored. The aim of this study is to classify cuff tear arthropathy (CTA) and glenoid erosion using AI on plain shoulder radiographs, using the Hamada and Favard classification systems. We used a publicly available CNN trained for image recognition and trained it using a diverse dataset of 6733 shoulder and clavicle X-ray images covering various clinical conditions. The performance of the network was evaluated in detail on a validation set of 561 images. Metrics such as sensitivity, specificity, Youden's index, and Area Under Curve (AUC) in the receiver operating characteristics curve analysis, were used for evaluation. AUC was the primary measure of accuracy. The network showed exceptional performance in identifying Hamada grades 3 and 4, achieving AUCs of 0.95, 95% CI [0.91-0.98] for both categories. While performance was slightly lower for Hamada grades 0-2 and glenoid erosion, with AUCs ranging from 0.81 to 0.91, it still demonstrated considerable accuracy. Similar results were obtained for Favard although we could not validate the network for more advanced stages due to lack of data. Our study demonstrates the network's robust capability to identify CTA on plain radiographs, comparable to earlier studies focused on osteoarthritis. Notably, the network excelled in later disease stages characterized by pronounced pathology. The ability to achieve such performance with a heterogeneous dataset bodes well for the real-world implementation of AI technology. However, it is crucial to acknowledge the potential influence of using a validation set instead of a dedicated test set, warranting further investigation.