Cross-dimensional Spatial-temporal Feature Integration Framework for Lung Ultrasound Video Analysis in Pneumonia.

April 6, 2026

DOI: 10.1109/TMI.2026.3681138 PMID: 41941825

Authors

Liu Y,He C,Hou D,Ta D,Zhao M,Xing W

Abstract

Pneumonia is an acute respiratory infection, posing a serious threat to health and lives. Lung ultrasound (LUS), as a non-invasive and rapid imaging technique, can monitor real-time changes in lung, providing valuable assistance in clinical diagnosis. However, most LUS studies are limited to frame-level analysis and ignore respiratory cycle changes, leading to diagnostic errors. To address these problems, we propose a cross-dimensional spatial-temporal feature integration model for LUS video analysis. Specifically, the sliding window and feature difference analysis are first utilized to preprocess the original LUS videos for eliminating invalid and highly similar frames and implementing abstract video. Subsequently, a cross-dimensional feature fusion backbone integrates an improved temporal-C3D network and a self-designed recursive inception-meet-transformer (IMT) network to extract features from different dimensions for fusion. Thereby, comprehensive features can be obtained for characterizing LUS videos. Finally, the Longformer is employed to analyze the temporal dependencies of cross-dimensional features, supplemented by a classification head for evaluating LUS videos. 3018 LUS video clips were collected from 119 patients in three hospitals for the evaluation of the proposed LUS video scoring model. By dividing at the patient level, the training and testing set consist of 2652 clips from 104 patients and 366 clips from 15 patients, respectively. Experimental results of 5-fold cross validation demonstrate that the proposed model achieves outstanding scoring performance, with an accuracy, precision, recall, specificity, F1-score, and AUC of 91.78 ± 0.52%, 92.19 ± 0.76%, 91.81 ± 0.62%, 97.17 ± 0.20%, 91.94 ± 0.44%, and 97.83 ± 0.19%, respectively. The independent testing set also shows the superior generalization capability with a scoring accuracy of 87.65 ± 1.12%. Moreover, ablation studies confirm that each designed module contributes significantly to the model's performance, and comparative experiments further confirm the superiority of the proposed model compared to previous models. These robust findings highlight the proposed LUS video scoring model's strong potential for clinical deployment.

View Source Full Text PDF

Topics

Journal Article

Cross-dimensional Spatial-temporal Feature Integration Framework for Lung Ultrasound Video Analysis in Pneumonia.

Authors

Abstract

Tags

Topics

Ready to Sharpen Your Edge?