Multimodal Machine Learning for Diagnosis of Multiple Sclerosis Using Optical Coherence Tomography in Pediatric Cases
Authors
Affiliations (1)
Affiliations (1)
- The Hospital for Sick Children; Department of Pediatric Neurology, University of Toronto
Abstract
Background and ObjectivesIdentifying MS in children early and distinguishing it from other neuroinflammatory conditions of childhood is critical, as early therapeutic intervention can improve outcomes. The anterior visual pathway has been demonstrated to be of central importance in diagnostic considerations for MS and has recently been identified as a fifth topography in the McDonald Diagnostic Criteria for MS. Optical coherence tomography (OCT) provides high-resolution retinal imaging and reflects the structural integrity of the retinal nerve fiber and ganglion cell inner plexiform layers. Whether multimodal deep learning models can use OCT alone to diagnose pediatric MS (POMS) is unknown. MethodsWe analyzed 3D OCT scans collected prospectively through the Neuroinflammatory Registry of the Hospital for Sick Children (REB#1000005356). Raw macular and optic nerve head images, and 52 automatically segmented features were included. We evaluated three classification approaches: (1) deep learning models (e.g. ResNet, DenseNet) for representation learning followed by classical ML classifiers, (2) ML models trained on OCT-derived features, and (3) multimodal models combining both via early and late fusion. ResultsScans from individuals with POMS (onset 16.0 {+/-} 3.1 years, 51.0%F; 211 scans) and 29 children with non-inflammatory neurological conditions (13.1 {+/-} 4.0 years, 69.0%F, 52 scans) were included. The early fusion model achieved the highest performance (AUC: 0.87, F1: 0.87, Accuracy: 90%), outperforming both unimodal and late fusion models. The best unimodal feature-based model (SVC) yielded an AUC of 0.84, F1 of 0.85 and an accuracy of 85%, while the best image-based model (ResNet101 with Random Forest) achieved an AUC of 0.87, F1 of 0.79, and accuracy of 84%. Late fusion underperformed, reaching 82% accuracy but failing in the minority class. DiscussionMultimodal learning with early fusion significantly enhances diagnostic performance by combining spatial retinal information with clinically relevant structural features. This approach captures complementary patterns associated with MS pathology and shows promise as an AI-driven tool to support pediatric neuroinflammatory diagnosis.