Applicability study of AI attribution methods for ophthalmic image classification.
Authors
Affiliations (4)
Affiliations (4)
- Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Waehringer Guertel 18-20 (4L), 1090, Vienna, Austria. [email protected].
- Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Waehringer Guertel 18-20 (4L), 1090, Vienna, Austria.
- Carl Zeiss Meditec AG, Oberkochen, Germany.
- Department of Ophthalmology and Optometry, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria.
Abstract
Optical coherence tomography (OCT) enables early detection of vision-threatening diabetic retinopathy (DR) and retinal fluid accumulation, both major complications of diabetes. Despite high classification performance, deep learning models face limited clinical adoption due to poor interpretability. While attribution methods are effective for explaining predictions in the natural image domain, their applicability to medical imaging remains underexplored. To bridge this gap, our work explores how well these strong results transfer to the medical imaging domain. This study evaluates three cutting-edge attribution methods- DeepLIFT, AGI, and AttEXplore -for explaining predictions of a VGG16-based deep learning model in DR classification using widefield OCTA en face images and fluid detection in OCT B-scans. We assess attribution methods' ability to highlight clinically relevant regions using quantitative (insertion and deletion scores) and qualitative (heatmap visual analysis) measures. Although the VGG16 model achieves high classification accuracy (94% for DR and 98% for fluid), attribution methods yield markedly different qualitative results due to variations in underlying assumptions and hyperparameter sensitivity. Additionally, high insertion or low deletion scores do not necessarily correlate with clinically meaningful visual attributions. In particular, insertion- and deletion-based behaviour can be more informative in pathological cases, where localized lesions can drive predictions, but tends to be less informative in normal cases, where confirming the absence of pathology requires global contextual evidence. Comparing the three approaches, we find that AGI and AttEXplore achieve similarly strong quantitative performance, whereas AttEXplore more consistently highlights clinically meaningful structures in pathological cases than AGI and DeepLIFT, making it the preferred option in our DR and fluid detection settings. However, our results also show that in any potential clinical usage, all three methods must be interpreted alongside clinical expertise, and even the result of the best-performing attribution approach cannot serve as an automatic proxy for clinical relevance. This study provides critical insights into challenges of applying attribution methods to medical imaging, using ophthalmic data, laying a foundation for improving transparency and trust in AI-assisted diagnostics.