TF-VSF: A Novel Training-Free Visual-Semantic Fusion Rare Medical Morning Glory Syndrome Diseases Severity Assessment Method.
Authors
Affiliations (7)
Affiliations (7)
- School of Design, Shanghai Jiao Tong University, Shanghai, 200240, China.
- Xin Hua Hospital Affiliated to School of Medicine, Shanghai Jiao Tong University, Shanghai, 200092, China.
- Xin Hua Hospital Affiliated to School of Medicine, Shanghai Jiao Tong University, Shanghai, 200092, China. [email protected].
- School of Design, Shanghai Jiao Tong University, Shanghai, 200240, China. [email protected].
- Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314100, China. [email protected].
- Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, 200240, China. [email protected].
- USC-SJTU Institute of Cultural and Creative Industry, Shanghai Jiao Tong University, Shanghai, 200240, China. [email protected].
Abstract
Medical morning glory syndrome (MGS) is a rare congenital disease. Approximately 50% of MGS patients present with retinal detachment. Widespread screening for MGS significantly aids in early detection, but it places a considerable burden on healthcare professionals. Recently, AI-assisted diagnostic methods have made significant strides and achieved satisfactory accuracy. However, current AI-assisted methods heavily rely on large datasets to promote feature learning. The unavailable MGS data presents a challenge in optimizing the model parameters. To address this limitation, we propose a training-free method named TF-VSF, leveraging the prior knowledge from foundation models and MGS-specific pathological structures to generate low-dimensional, refined feature representations for the diagnostic grading task. Specifically, the channel-based visual recalibration (CVR) module introduces the pretrained prior knowledge from SAM, generating a coarse segmentation mask, which is then refined by a pyramid calibration module to filter the high-dimensional semantic structures in a no-parameter manner. Then, the semantic-based location perception (SLP) module utilizes the pretrained contrastive language-image pretraining (CLIP) prior knowledge to generate the semantic implicit feature presentation with the edge energy control, which is then fused with the refined features in the CVR module. Finally, the grading results are achieved through independent component analysis (ICA) feature reduction and density constraint clustering. We developed a dataset of 1016 MGS fundus images. Compared with the self-supervised and fully trained methods, TF-VSF achieves 95.87% in accuracy and 93.50% in F1-score, surpassing comparable methods in general image domains, and the medical image domain of self-supervised methods, fully trained methods, and training-free methods. TF-VSF represents a novel framework that bridges the gap in AI-assisted diagnostic technology for rare diseases.