Enhancing Instance Feature Representation: A Foundation Model-Based Multi-Instance Approach for Neonatal Retinal Screening.
Authors
Abstract
Automated analysis of neonatal fundus images presents a uniquely intricate challenge in medical imaging. Existing methodologies predominantly focus on diagnosing abnormalities from individual images, often leading to inaccuracies due to the diverse and subtle nature of neonatal retinal features. Consequently, clinical standards frequently mandate the acquisition of retinal images from multiple angles to ensure the detection of minute lesions. To accommodate this, we propose leveraging multiple fundus images captured from various regions of the retina to comprehensively screen for a wide range of neonatal ocular pathologies. We employ Multiple Instance Learning (MIL) for this task, and introduce a simple yet effective learnable structure on the existing MIL method, called Learnable Dense to Global (LD2G-MIL). Different from other methods that focus on instance-to-bag feature aggregation, the proposed method focuses on generating better instance-level representations that are co-optimized with downstream MIL targets in a learnable way. Additionally, it incorporates a bag prior-based similarity loss (BP loss) mechanism, leveraging prior knowledge to enhance performance in neonatal retinal screening. To validate the efficacy of our LD2G-MIL method, we compiled the Neonatal Fundus Images (NFI) dataset, an extensive collection comprising 115,621 retinal images from 8,886 neonatal clinical episodes. Empirical evaluations on this dataset demonstrate that our approach consistently outperforms stateof-the-art (SOTA) generic and specialized methods. The code and trained models are publicly available at https: //github.com/CVIU-CSU/LD2G-MIL.