A disease-centric vision-language foundation model for precision oncology in kidney cancer.
Authors
Affiliations (18)
Affiliations (18)
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, Shanghai, China.
- Shanghai Key Laboratory of Medical Imaging Computing and Computer Assisted Intervention, Shanghai, China.
- Department of Urology, Qilu Hospital of Shandong University, Jinan, Shandong, China.
- Microsoft Research Asia, Shanghai, China.
- Department of Radiology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.
- Center of Health data science, Linyi People's Hospital, Shandong, China.
- Shandong Open Laboratory of Data Innovation Application, Shandong, China.
- Department of Radiology, the First People's Hospital of Lianyungang, Lianyungang, China.
- Department of Radiology, Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, China.
- Department of Urology, Zhangye People's Hospital affiliated to Hexi University, Zhangye, China.
- Department of Urology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
- Department of Urology, Linyi People's Hospital, Shandong, China. [email protected].
- Department of Urology, Zhongshan Hospital, Fudan University, Shanghai, China. [email protected].
- Department of Urology, Qilu Hospital of Shandong University, Jinan, Shandong, China. [email protected].
- Department of Urology, Zhongshan Hospital, Fudan University, Shanghai, China. [email protected].
- Department of Urology, Xiamen Branch, Zhongshan Hospital, Fudan University, Xiamen, China. [email protected].
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, Shanghai, China. [email protected].
- Shanghai Key Laboratory of Medical Imaging Computing and Computer Assisted Intervention, Shanghai, China. [email protected].
Abstract
The non-invasive assessment of renal masses remains a critical challenge in urologic oncology, where diagnostic uncertainty frequently causes overtreatment. Here, we develop RenalCLIP, a vision-language foundation model for precision oncology in kidney cancer. Utilizing 27,866 computed tomography scans from 8809 patients across diverse multi-center cohorts, we employ a two-stage pre-training strategy to align domain-specific visual and textual representations. RenalCLIP achieves enhanced performance and generalizability across ten core clinical tasks, spanning anatomical assessment, diagnostic classification, and survival prediction, significantly outperforming state-of-the-art general-purpose foundation models. Furthermore, RenalCLIP demonstrates strong data efficiency in diagnostic classification, achieving peak baseline performance using only 20% of the training data. The model also exhibits robust zero-shot diagnostic capabilities, effective image-text retrieval, and high-quality medical report generation. Our findings establish RenalCLIP as a powerful, generalizable tool to enhance diagnostic precision, refine prognostic stratification, and personalize the management of renal masses.