Back to all papers

Comprehensive comparative analysis of explainable deep learning model for differentiation of brucellar spondylitis and tuberculous spondylitis through MRI sequences.

December 24, 2025pubmed logopapers

Authors

Yasin P,Tuersun A,Ashir A,Makhambetov Y,Sheng J,Song X

Affiliations (6)

  • Department of Spine Surgery, The Sixth Affiliated Hospital of Xinjiang Medical University, Ürümqi, 830000, Xinjiang, People's Republic of China.
  • Xinjiang Key Laboratory of Artificial Intelligence Assisted Imaging Diagnosis, Department of Radiology, The First People's Hospital of Kashi Prefecture, Kashi, 844000, Xinjiang, People's Republic of China.
  • Mangistau Regional Multifunctional Hospital, 130000, Aktau City, Mangistau Region, Republic of Kazakhstan.
  • Private Hospital Neiron, 130000, Aktau City, Mangistau Region, Republic of Kazakhstan.
  • The Second Department of General Surgery, Xinjiang Uygur Autonomous Region Sixth People's Hospital, Ürümqi, 830000, Xinjiang, People's Republic of China. [email protected].
  • Department of Spine Surgery, The Sixth Affiliated Hospital of Xinjiang Medical University, Ürümqi, 830000, Xinjiang, People's Republic of China. [email protected].

Abstract

The differentiation of brucellar spondylitis (BS) from tuberculous spondylitis (TS) on magnetic resonance imaging (MRI) is a critical clinical challenge. While deep learning holds promise, the optimal architectural strategy for integrating information from multi-sequence MRI remains unclear. This study systematically compared distinct deep learning architectures to identify a valid and effective integration strategy for this diagnostic problem. In this retrospective, single-center diagnostic study, we included 235 patients with surgically and pathologically confirmed BS (n = 82) or TS (n = 153) from January 2014 to December 2024. We systematically evaluated four distinct architectural strategies for processing sagittal T1-weighted, T2-weighted, and fat-suppressed MRI sequences: (1) baseline models trained on single sequences; (2) a single-branch model that fused sequences as input channels; (3) a heterogeneous multi-branch model using different backbones for each sequence; and (4) a homogeneous multi-branch model using identical backbones. Models were developed on patient-level data splits for training (70%), validation (15%), and internal testing (15%). The primary performance metric was the area under the receiver operating characteristic curve (AUC) on the test set. Statistical significance of performance differences between models was assessed using the DeLong test, with P values adjusted for multiple comparisons using the Benjamini-Hochberg procedure. The single-branch fusion model, which treated the three sequences as channels in a single input, failed to learn, yielding performance equivalent to random chance (test AUC range: 0.474-0.538). In stark contrast, both the single-sequence and multi-branch architectures proved to be effective. The best single-sequence model achieved a test AUC of 0.765 (95% CI 0.759-0.771). The optimal multi-branch model, which successfully integrated all three sequences, achieved a comparable test AUC of 0.764 (95% CI 0.757-0.770). The choice of architecture for integrating multi-sequence MRI data is a critical determinant of model viability. Our findings demonstrate that naive channel wise fusion is an invalid strategy for this task. In contrast, both processing a single MRI sequence and utilizing a multi-branch parallel-processing architecture are valid and effective strategies, achieving comparable diagnostic performance. This study clarifies the architectural principles required for successfully applying deep learning to this multi-modal diagnostic challenge.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 7,600+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.