Back to all papers

A Prompt-Guided Vision-Language Framework for Interpretable and Region-Aware Disease Diagnosis in Chest X-rays.

April 21, 2026pubmed logopapers

Authors

Liu L,Luo S,Li X,Ma F

Abstract

Effective interpretation of chest X-rays requires a tightly integrated process of visual analysis, diagnostic reasoning, and structured reporting. Yet, most machine learning systems handle these steps in isolation. Visual encoders are typically trained without diagnostic context, and language outputs often lack spatial grounding. To address this gap, we propose an interactive vision-language framework that supports prompt-guided reasoning over both textual and spatial queries, enabling region-aware, clinically aligned interpretations. The framework comprises three functional modules: Prompt-Guided Localization (PGL) for identifying relevant regions, Region-Level Diagnosis (RLD) for structured classification, and Region-Aware Explanation (RAE) for generating localized descriptions. These modules are unified through a regional alignment mechanism built on a multi-task Detection Transformer (DETR) backbone, which maps prompts and image regions into a shared semantic space. To train the system under limited supervision, we adopt a two-stage strategy: contrastive pretraining to establish cross-modal alignment, followed by multi-task fine-tuning to support downstream tasks including disease classification and report generation. Experiments across the publicly available chest X-ray datasets MIMIC-CXR, VinDr-CXR, and MS-CXR demonstrate consistent gains compared with state-of-the-art methods. Module-wise ablations further validate the contribution of each component and highlight the framework's potential for transparent, clinically applicable diagnostic support.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.