Semi-supervised medical image captioning via anatomical collaborative evidence network.

May 26, 2026

papers

DOI: 10.3389/fmed.2026.1816295 PMID: 42272675

Authors

Zhou S,Liu Q,Cai L,Lu K,Qiao L,Xu N,Wu Y,Xu Y,Li J

Affiliations (2)

College of Information Engineering, Sichuan Agricultural University, Ya'an, China.
Department of Otolaryngology, Ya'an People's Hospital, Ya'an, China.

Abstract

Medical image captioning bridges visual perception and clinical language, but its development is limited by the high cost of detailed anatomical annotation and by the risk of hallucinations or overconfidence in ambiguous endoscopic images. We propose ACE-Net, an Anatomy Collaborative Evidence Network for semi-supervised medical image captioning. ACE-Net integrates evidential deep learning into the visual encoding stage through an evidence-driven soft-gating mechanism that quantifies epistemic uncertainty and suppresses unreliable visual noise. A triple-guided Mixture-of-Experts decoder further organizes clinical reasoning into semantic anchoring, visual evidencing, and spatial calibration. Spatial consistency alignment is imposed within a teacher-student co-training framework to promote stable anatomical attention patterns without pixel-level supervision. On a high-resolution otolaryngology endoscopy dataset, ACE-Net achieved a BLEU-4 score of 0.7511 and a ROUGE-L score of 0.8728, demonstrating strong text-generation performance and improved anatomical grounding under limited annotation. These results suggest that effective anatomical localization can be induced through evidence-constrained global supervision rather than expensive pixel-level annotations, providing a data-efficient and reliable paradigm for medical image captioning.

View Source Full Text PDF

Topics

Journal Article

Semi-supervised medical image captioning via anatomical collaborative evidence network.

Authors

Affiliations (2)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?