CNS-CLIP: Transforming a Neurosurgical Journal Into a Multimodal Medical Model.

June 1, 2025

papers

DOI: 10.1227/neu.0000000000003297 PMID: 39636129

Authors

Alyakin A,Kurland D,Alber DA,Sangwon KL,Li D,Tsirigos A,Leuthardt E,Kondziolka D,Oermann EK

Affiliations (3)

Department of Neurological Surgery, NYU Langone Health, New York , New York , USA.
Department of Neurosurgery, Washington University in Saint Louis, Saint Louis , Missouri , USA.
Applied Bioinformatics Laboratories, New York University School of Medicine, New York , New York , USA.

Abstract

Classical biomedical data science models are trained on a single modality and aimed at one specific task. However, the exponential increase in the size and capabilities of the foundation models inside and outside medicine shows a shift toward task-agnostic models using large-scale, often internet-based, data. Recent research into smaller foundation models trained on specific literature, such as programming textbooks, demonstrated that they can display capabilities similar to or superior to large generalist models, suggesting a potential middle ground between small task-specific and large foundation models. This study attempts to introduce a domain-specific multimodal model, Congress of Neurological Surgeons (CNS)-Contrastive Language-Image Pretraining (CLIP), developed for neurosurgical applications, leveraging data exclusively from Neurosurgery Publications. We constructed a multimodal data set of articles from Neurosurgery Publications through PDF data collection and figure-caption extraction using an artificial intelligence pipeline for quality control. Our final data set included 24 021 figure-caption pairs. We then developed a fine-tuning protocol for the OpenAI CLIP model. The model was evaluated on tasks including neurosurgical information retrieval, computed tomography imaging classification, and zero-shot ImageNet classification. CNS-CLIP demonstrated superior performance in neurosurgical information retrieval with a Top-1 accuracy of 24.56%, compared with 8.61% for the baseline. The average area under receiver operating characteristic across 6 neuroradiology tasks achieved by CNS-CLIP was 0.95, slightly superior to OpenAI's Contrastive Language-Image Pretraining at 0.94 and significantly outperforming a vanilla vision transformer at 0.62. In generalist classification, CNS-CLIP reached a Top-1 accuracy of 47.55%, a decrease from the baseline of 52.37%, demonstrating a catastrophic forgetting phenomenon. This study presents a pioneering effort in building a domain-specific multimodal model using data from a medical society publication. The results indicate that domain-specific models, while less globally versatile, can offer advantages in specialized contexts. This emphasizes the importance of using tailored data and domain-focused development in training foundation models in neurosurgery and general medicine.

View Source Full Text PDF

Topics

NeurosurgeryNeurosurgical ProceduresPeriodicals as TopicJournal Article

CNS-CLIP: Transforming a Neurosurgical Journal Into a Multimodal Medical Model.

Authors

Affiliations (3)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?