A multimodal deep learning framework for clinical nursing assessment in lumbar fusion surgery via representation learning and feature extraction.

May 7, 2026

papers

DOI: 10.1038/s41598-026-51495-x PMID: 42098353

Authors

Li C,Cai J,Huang X,Ni Y,Ling X,Fang S

Affiliations (3)

Nursing Department, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, 310000, Zhejiang Province, China.
Nursing Department, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, 310000, Zhejiang Province, China. [email protected].
Zhejiang University, Hangzhou, 310000, Zhejiang Province, China. [email protected].

Abstract

Medical image interpretation plays a critical role in lumbar fusion surgery, where accurate analysis of anatomical structures is essential for clinical assessment. However, most existing deep learning approaches rely primarily on visual features and fail to effectively integrate heterogeneous clinical information. This study proposes a multimodal deep learning framework for lumbar spine image interpretation by jointly modeling medical images and associated clinical text. The framework adopts a global-local representation learning strategy to capture both overall anatomical context and fine-grained structural information. A visual encoder extracts hierarchical features from lumbar radiographs and CT scans, while a transformer-based text encoder captures semantic information from clinical reports. These representations are projected into a shared embedding space to enable cross-modal alignment. To enhance feature interaction, a text-guided attention mechanism is introduced to model correspondence between image regions and textual descriptions. The learned multimodal representations are applied to multiple downstream tasks, including cross-modal retrieval, classification, and lumbar structure segmentation. Experimental results show that the proposed framework outperforms image-only baselines and achieves competitive performance compared with existing multimodal approaches. The integration of global and local representations improves feature discrimination and structural modeling. Visualization results provide qualitative evidence that the model focuses on anatomically relevant regions, although such observations should be interpreted with caution. Overall, the proposed framework demonstrates the potential of multimodal representation learning for lumbar spine image analysis and provides a structured approach for integrating heterogeneous clinical data.

View Source Full Text PDF

Topics

Journal Article

A multimodal deep learning framework for clinical nursing assessment in lumbar fusion surgery via representation learning and feature extraction.

Authors

Affiliations (3)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?