Graphicalized vision-language modeling for comprehensive lung nodule analysis and risk stratification.

April 11, 2026

papers

DOI: 10.1038/s41746-026-02602-9 PMID: 41965884

Authors

Zhao D,Xi J,Guo X,Chai J,Xu Z,Li L,Xue Y,Sun Q,Zheng Y,Liu S

Affiliations (6)

Department of Thoracic Surgery, The Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China.
Department of Thoracic Surgery, First Hospital of Yulin City, Yulin, China.
Department of Anesthesiology, Jiangwan Hospital of Hongkou District, Shanghai, China. [email protected].
Shanghai Key Laboratory of Anesthesiology and Brain Functional Modulation, Clinical Research Center for Anesthesiology and Perioperative Medicine, Translational Research Institute of Brain and Brain-Like Intelligence, Shanghai Fourth People's Hospital, School of Medicine, Tongji University, Shanghai, China. [email protected].
Department of Thoracic Surgery, The Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China. [email protected].
Key Laboratory of Surgery Critical Care and Life Support (Xi'an Jiaotong University), Ministry of Education, Xi'an, China. [email protected].

Abstract

Lung cancer care involves coupled tasks such as precise nodule detection, patient-level survival risk estimation, and nodule count quantification, typically handled by separate systems despite clear interdependence. We present VITALIS, a multimodal vision-language framework that fuses CT and PET/CT imaging with structured radiology text using a graph-aware Transformer: Laplacian diffusion enriches token features on an image-text graph, while structural and prior-guided attention focus computation on anatomically and clinically related contexts, followed by bidirectional image-text conditioning to form a fused patient representation. This representation parameterizes a continuous-time latent risk process governed by a context-modulated Neural ODE, enabling individualized continuous-time modeling of time-to-event risk. Task-specific heads decode the latent trajectory into nodule detection, nodule malignancy classification, survival risk estimation, and nodule count prediction. Evaluated on three public cohorts, the framework delivers accurate delineations, low-false-positive localization, calibrated survival risk estimates, and consistent nodule counts across tasks. These findings indicate that coupling graph-aware multimodal encoding with continuous-time latent dynamics provides a coherent basis for integrated diagnostic and prognostic modeling in lung cancer.

View Source Full Text PDF

Topics

Journal Article

Graphicalized vision-language modeling for comprehensive lung nodule analysis and risk stratification.

Authors

Affiliations (6)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?