MedVQA-TREE: A Multimodal Reasoning and Retrieval Framework for Sarcopenia Prediction

August 26, 2025

arXiv: 2508.19319v1

Authors

Pardis Moradbeiki,Nasser Ghadiri,Sayed Jalal Zahabi,Uffe Kock Wiil,Kristoffer Kittelmann Brockhattingen,Ali Ebrahimi

Abstract

Accurate sarcopenia diagnosis via ultrasound remains challenging due to subtle imaging cues, limited labeled data, and the absence of clinical context in most models. We propose MedVQA-TREE, a multimodal framework that integrates a hierarchical image interpretation module, a gated feature-level fusion mechanism, and a novel multi-hop, multi-query retrieval strategy. The vision module includes anatomical classification, region segmentation, and graph-based spatial reasoning to capture coarse, mid-level, and fine-grained structures. A gated fusion mechanism selectively integrates visual features with textual queries, while clinical knowledge is retrieved through a UMLS-guided pipeline accessing PubMed and a sarcopenia-specific external knowledge base. MedVQA-TREE was trained and evaluated on two public MedVQA datasets (VQA-RAD and PathVQA) and a custom sarcopenia ultrasound dataset. The model achieved up to 99% diagnostic accuracy and outperformed previous state-of-the-art methods by over 10%. These results underscore the benefit of combining structured visual understanding with guided knowledge retrieval for effective AI-assisted diagnosis in sarcopenia.

View Source Full Text PDF

Topics

eess.IV

MedVQA-TREE: A Multimodal Reasoning and Retrieval Framework for Sarcopenia Prediction

Authors

Abstract

Tags

Topics

Ready to Sharpen Your Edge?