Back to all papers

Graphicalized vision-language modeling for comprehensive lung nodule analysis and risk stratification.

April 11, 2026pubmed logopapers

Authors

Zhao D,Xi J,Guo X,Chai J,Xu Z,Li L,Xue Y,Sun Q,Zheng Y,Liu S

Affiliations (6)

  • Department of Thoracic Surgery, The Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China.
  • Department of Thoracic Surgery, First Hospital of Yulin City, Yulin, China.
  • Department of Anesthesiology, Jiangwan Hospital of Hongkou District, Shanghai, China. [email protected].
  • Shanghai Key Laboratory of Anesthesiology and Brain Functional Modulation, Clinical Research Center for Anesthesiology and Perioperative Medicine, Translational Research Institute of Brain and Brain-Like Intelligence, Shanghai Fourth People's Hospital, School of Medicine, Tongji University, Shanghai, China. [email protected].
  • Department of Thoracic Surgery, The Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710004, China. [email protected].
  • Key Laboratory of Surgery Critical Care and Life Support (Xi'an Jiaotong University), Ministry of Education, Xi'an, China. [email protected].

Abstract

Lung cancer care involves coupled tasks such as precise nodule detection, patient-level survival risk estimation, and nodule count quantification, typically handled by separate systems despite clear interdependence. We present VITALIS, a multimodal vision-language framework that fuses CT and PET/CT imaging with structured radiology text using a graph-aware Transformer: Laplacian diffusion enriches token features on an image-text graph, while structural and prior-guided attention focus computation on anatomically and clinically related contexts, followed by bidirectional image-text conditioning to form a fused patient representation. This representation parameterizes a continuous-time latent risk process governed by a context-modulated Neural ODE, enabling individualized continuous-time modeling of time-to-event risk. Task-specific heads decode the latent trajectory into nodule detection, nodule malignancy classification, survival risk estimation, and nodule count prediction. Evaluated on three public cohorts, the framework delivers accurate delineations, low-false-positive localization, calibrated survival risk estimates, and consistent nodule counts across tasks. These findings indicate that coupling graph-aware multimodal encoding with continuous-time latent dynamics provides a coherent basis for integrated diagnostic and prognostic modeling in lung cancer.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.