Back to all papers

A multimodal deep learning framework for clinical nursing assessment in lumbar fusion surgery via representation learning and feature extraction.

May 7, 2026pubmed logopapers

Authors

Li C,Cai J,Huang X,Ni Y,Ling X,Fang S

Affiliations (3)

  • Nursing Department, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, 310000, Zhejiang Province, China.
  • Nursing Department, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, 310000, Zhejiang Province, China. [email protected].
  • Zhejiang University, Hangzhou, 310000, Zhejiang Province, China. [email protected].

Abstract

Medical image interpretation plays a critical role in lumbar fusion surgery, where accurate analysis of anatomical structures is essential for clinical assessment. However, most existing deep learning approaches rely primarily on visual features and fail to effectively integrate heterogeneous clinical information. This study proposes a multimodal deep learning framework for lumbar spine image interpretation by jointly modeling medical images and associated clinical text. The framework adopts a global-local representation learning strategy to capture both overall anatomical context and fine-grained structural information. A visual encoder extracts hierarchical features from lumbar radiographs and CT scans, while a transformer-based text encoder captures semantic information from clinical reports. These representations are projected into a shared embedding space to enable cross-modal alignment. To enhance feature interaction, a text-guided attention mechanism is introduced to model correspondence between image regions and textual descriptions. The learned multimodal representations are applied to multiple downstream tasks, including cross-modal retrieval, classification, and lumbar structure segmentation. Experimental results show that the proposed framework outperforms image-only baselines and achieves competitive performance compared with existing multimodal approaches. The integration of global and local representations improves feature discrimination and structural modeling. Visualization results provide qualitative evidence that the model focuses on anatomically relevant regions, although such observations should be interpreted with caution. Overall, the proposed framework demonstrates the potential of multimodal representation learning for lumbar spine image analysis and provides a structured approach for integrating heterogeneous clinical data.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.