Fusion of X-Ray Images and Clinical Data for a Multimodal Deep Learning Prediction Model of Osteoporosis: Algorithm Development and Validation Study.
Authors
Affiliations (3)
Affiliations (3)
- Department of Information, Daping Hospital, Army Medical University, No.10 Daping Changjiang Branch Road, Yuzhong District, Chongqing, China.
- Department of Orthopedics, Daping Hospital, Army Medical University, Chongqing, China.
- Department of Traumatic Surgery, School of Basic Medicine, Army Medical University, Chongqing, China.
Abstract
Osteoporosis is a bone disease characterized by reduced bone mineral density and mass, which increase the risk of fragility fractures in patients. Artificial intelligence can mine imaging features specific to different bone densities, shapes, and structures and fuse other multimodal features for synergistic diagnosis to improve prediction accuracy. This study aims to develop a multimodal model that fuses chest X-rays and clinical parameters for opportunistic screening of osteoporosis and to compare and analyze the experimental results with existing methods. We used multimodal data, including chest X-ray images and clinical data, from a total of 1780 patients at Chongqing Daping Hospital from January 2019 to August 2024. We adopted a probability fusion strategy to construct a multimodal model. In our model, we used a convolutional neural network as the backbone network for image processing and fine-tuned it using a transfer learning technique to suit the specific task of this study. In addition, we introduced a gradient-based wavelet feature extraction method. We combined it with an attention mechanism to assist in feature fusion, which enhanced the model's focus on key regions of the image and further improved its ability to extract image features. The multimodal model proposed in this paper outperforms the traditional methods in the 4 evaluation metrics of area under the curve value, accuracy, sensitivity, and specificity. Compared with using only the X-ray image model, the multimodal model improved the area under the curve value significantly from 0.951 to 0.975 (P=.004), the accuracy from 89.32% to 92.36% (P=.045), the sensitivity from 89.82% to 91.23% (P=.03), and the specificity from 88.64% to 93.92% (P=.008). While the multimodal model that fuses chest X-ray images and clinical data demonstrated superior performance compared to unimodal models and traditional methods, this study has several limitations. The dataset size may not be sufficient to capture the full diversity of the population. The retrospective nature of the study may introduce selection bias, and the lack of external validation limits the generalizability of the findings. Future studies should address these limitations by incorporating larger, more diverse datasets and conducting rigorous external validation to further establish the model's clinical use.