Sort by:
Page 30 of 66652 results

Generate to Ground: Multimodal Text Conditioning Boosts Phrase Grounding in Medical Vision-Language Models

Felix Nützel, Mischa Dombrowski, Bernhard Kainz

arxiv logopreprintJul 16 2025
Phrase grounding, i.e., mapping natural language phrases to specific image regions, holds significant potential for disease localization in medical imaging through clinical reports. While current state-of-the-art methods rely on discriminative, self-supervised contrastive models, we demonstrate that generative text-to-image diffusion models, leveraging cross-attention maps, can achieve superior zero-shot phrase grounding performance. Contrary to prior assumptions, we show that fine-tuning diffusion models with a frozen, domain-specific language model, such as CXR-BERT, substantially outperforms domain-agnostic counterparts. This setup achieves remarkable improvements, with mIoU scores doubling those of current discriminative methods. These findings highlight the underexplored potential of generative models for phrase grounding tasks. To further enhance performance, we introduce Bimodal Bias Merging (BBM), a novel post-processing technique that aligns text and image biases to identify regions of high certainty. BBM refines cross-attention maps, achieving even greater localization accuracy. Our results establish generative approaches as a more effective paradigm for phrase grounding in the medical imaging domain, paving the way for more robust and interpretable applications in clinical practice. The source code and model weights are available at https://github.com/Felix-012/generate_to_ground.

Interpreting Radiologist's Intention from Eye Movements in Chest X-ray Diagnosis

Trong-Thang Pham, Anh Nguyen, Zhigang Deng, Carol C. Wu, Hien Van Nguyen, Ngan Le

arxiv logopreprintJul 16 2025
Radiologists rely on eye movements to navigate and interpret medical images. A trained radiologist possesses knowledge about the potential diseases that may be present in the images and, when searching, follows a mental checklist to locate them using their gaze. This is a key observation, yet existing models fail to capture the underlying intent behind each fixation. In this paper, we introduce a deep learning-based approach, RadGazeIntent, designed to model this behavior: having an intention to find something and actively searching for it. Our transformer-based architecture processes both the temporal and spatial dimensions of gaze data, transforming fine-grained fixation features into coarse, meaningful representations of diagnostic intent to interpret radiologists' goals. To capture the nuances of radiologists' varied intention-driven behaviors, we process existing medical eye-tracking datasets to create three intention-labeled subsets: RadSeq (Systematic Sequential Search), RadExplore (Uncertainty-driven Exploration), and RadHybrid (Hybrid Pattern). Experimental results demonstrate RadGazeIntent's ability to predict which findings radiologists are examining at specific moments, outperforming baseline methods across all intention-labeled datasets.

Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants

Sybelle Goedicke-Fritz, Michelle Bous, Annika Engel, Matthias Flotho, Pascal Hirsch, Hannah Wittig, Dino Milanovic, Dominik Mohr, Mathias Kaspar, Sogand Nemat, Dorothea Kerner, Arno Bücker, Andreas Keller, Sascha Meyer, Michael Zemlin, Philipp Flotho

arxiv logopreprintJul 16 2025
Bronchopulmonary dysplasia (BPD) is a chronic lung disease affecting 35% of extremely low birth weight infants. Defined by oxygen dependence at 36 weeks postmenstrual age, it causes lifelong respiratory complications. However, preventive interventions carry severe risks, including neurodevelopmental impairment, ventilator-induced lung injury, and systemic complications. Therefore, early BPD prognosis and prediction of BPD outcome is crucial to avoid unnecessary toxicity in low risk infants. Admission radiographs of extremely preterm infants are routinely acquired within 24h of life and could serve as a non-invasive prognostic tool. In this work, we developed and investigated a deep learning approach using chest X-rays from 163 extremely low-birth-weight infants ($\leq$32 weeks gestation, 401-999g) obtained within 24 hours of birth. We fine-tuned a ResNet-50 pretrained specifically on adult chest radiographs, employing progressive layer freezing with discriminative learning rates to prevent overfitting and evaluated a CutMix augmentation and linear probing. For moderate/severe BPD outcome prediction, our best performing model with progressive freezing, linear probing and CutMix achieved an AUROC of 0.78 $\pm$ 0.10, balanced accuracy of 0.69 $\pm$ 0.10, and an F1-score of 0.67 $\pm$ 0.11. In-domain pre-training significantly outperformed ImageNet initialization (p = 0.031) which confirms domain-specific pretraining to be important for BPD outcome prediction. Routine IRDS grades showed limited prognostic value (AUROC 0.57 $\pm$ 0.11), confirming the need of learned markers. Our approach demonstrates that domain-specific pretraining enables accurate BPD prediction from routine day-1 radiographs. Through progressive freezing and linear probing, the method remains computationally feasible for site-level implementation and future federated learning deployments.

Collaborative Integration of AI and Human Expertise to Improve Detection of Chest Radiograph Abnormalities.

Awasthi A, Le N, Deng Z, Wu CC, Nguyen HV

pubmed logopapersJul 16 2025
<i>"Just Accepted" papers have undergone full peer review and have been accepted for publication in <i>Radiology: Artificial Intelligence</i>. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content.</i> Purpose To develop a collaborative AI system that integrates eye gaze data and radiology reports to improve diagnostic accuracy in chest radiograph interpretation by identifying and correcting perceptual errors. Materials and Methods This retrospective study utilized public datasets REFLACX and EGD-CXR to develop a collaborative AI solution, named Collaborative Radiology Expert (CoRaX). It employs a large multimodal model to analyze image embeddings, eye gaze data, and radiology reports, aiming to rectify perceptual errors in chest radiology. The proposed system was evaluated using two simulated error datasets featuring random and uncertain alterations of five abnormalities. Evaluation focused on the system's referral-making process, the quality of referrals, and its performance within collaborative diagnostic settings. Results In the random masking-based error dataset, 28.0% (93/332) of abnormalities were altered. The system successfully corrected 21.3% (71/332) of these errors, with 6.6% (22/332) remaining unresolved. The accuracy of the system in identifying the correct regions of interest for missed abnormalities was 63.0% [95% CI: 59.0%, 68.0%], and 85.7% (240/280) of interactions with radiologists were deemed satisfactory, meaning that the system provided diagnostic aid to radiologists. In the uncertainty-masking-based error dataset, 43.9% (146/332) of abnormalities were altered. The system corrected 34.6% (115/332) of these errors, with 9.3% (31/332) unresolved. The accuracy of predicted regions of missed abnormalities for this dataset was 58.0% [95% CI: 55.0%, 62.0%], and 78.4% (233/297) of interactions were satisfactory. Conclusion The CoRaX system can collaborate efficiently with radiologists and address perceptual errors across various abnormalities in chest radiographs. ©RSNA, 2025.

Single Inspiratory Chest CT-based Generative Deep Learning Models to Evaluate Functional Small Airway Disease.

Zhang D, Zhao M, Zhou X, Li Y, Guan Y, Xia Y, Zhang J, Dai Q, Zhang J, Fan L, Zhou SK, Liu S

pubmed logopapersJul 16 2025
<i>"Just Accepted" papers have undergone full peer review and have been accepted for publication in <i>Radiology: Artificial Intelligence</i>. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content.</i> Purpose To develop a deep learning model that uses a single inspiratory chest CT scan to generate parametric response maps (PRM) and predict functional small airway disease (fSAD). Materials and Methods In this retrospective study, predictive and generative deep learning models for PRM using inspiratory chest CT were developed using a model development dataset with fivefold cross-validation, with PRM derived from paired respiratory CT as the reference standard. Voxel-wise metrics, including sensitivity, area under the receiver operating characteristic curve (AUC), and structural similarity, were used to evaluate model performance in predicting PRM and expiratory CT images. The best performing model was tested on three internal test sets and an external test set. Results The model development dataset of 308 patients (median age, 67 years, [IQR: 62-70 years]; 113 female) was divided into the training set (<i>n</i> = 216), the internal validation set (<i>n</i> = 31), and the first internal test set (<i>n</i> = 61). The generative model outperformed the predictive model in detecting fSAD (sensitivity 86.3% vs 38.9%; AUC 0.86 vs 0.70). The generative model performed well in the second internal (AUCs of 0.64, 0.84, 0.97 for emphysema, fSAD and normal lung tissue), the third internal (AUCs of 0.63, 0.83, 0.97), and the external (AUCs of 0.58, 0.85, 0.94) test sets. Notably, the model exhibited exceptional performance in the PRISm group of the fourth internal test set (AUC = 0.62, 0.88, and 0.96). Conclusion The proposed generative model, using a single inspiratory CT, outperformed existing algorithms in PRM evaluation, achieved comparable results to paired respiratory CT. Published under a CC BY 4.0 license.

Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants

Sybelle Goedicke-Fritz, Michelle Bous, Annika Engel, Matthias Flotho, Pascal Hirsch, Hannah Wittig, Dino Milanovic, Dominik Mohr, Mathias Kaspar, Sogand Nemat, Dorothea Kerner, Arno Bücker, Andreas Keller, Sascha Meyer, Michael Zemlin, Philipp Flotho

arxiv logopreprintJul 16 2025
Bronchopulmonary dysplasia (BPD) is a chronic lung disease affecting 35% of extremely low birth weight infants. Defined by oxygen dependence at 36 weeks postmenstrual age, it causes lifelong respiratory complications. However, preventive interventions carry severe risks, including neurodevelopmental impairment, ventilator-induced lung injury, and systemic complications. Therefore, early BPD prognosis and prediction of BPD outcome is crucial to avoid unnecessary toxicity in low risk infants. Admission radiographs of extremely preterm infants are routinely acquired within 24h of life and could serve as a non-invasive prognostic tool. In this work, we developed and investigated a deep learning approach using chest X-rays from 163 extremely low-birth-weight infants ($\leq$32 weeks gestation, 401-999g) obtained within 24 hours of birth. We fine-tuned a ResNet-50 pretrained specifically on adult chest radiographs, employing progressive layer freezing with discriminative learning rates to prevent overfitting and evaluated a CutMix augmentation and linear probing. For moderate/severe BPD outcome prediction, our best performing model with progressive freezing, linear probing and CutMix achieved an AUROC of 0.78 $\pm$ 0.10, balanced accuracy of 0.69 $\pm$ 0.10, and an F1-score of 0.67 $\pm$ 0.11. In-domain pre-training significantly outperformed ImageNet initialization (p = 0.031) which confirms domain-specific pretraining to be important for BPD outcome prediction. Routine IRDS grades showed limited prognostic value (AUROC 0.57 $\pm$ 0.11), confirming the need of learned markers. Our approach demonstrates that domain-specific pretraining enables accurate BPD prediction from routine day-1 radiographs. Through progressive freezing and linear probing, the method remains computationally feasible for site-level implementation and future federated learning deployments.

Scaling Chest X-ray Foundation Models from Mixed Supervisions for Dense Prediction.

Wang F, Yu L

pubmed logopapersJul 16 2025
Foundation models have significantly revolutionized the field of chest X-ray diagnosis with their ability to transfer across various diseases and tasks. However, previous works have predominantly utilized self-supervised learning from medical image-text pairs, which falls short in dense medical prediction tasks due to their sole reliance on such coarse pair supervision, thereby limiting their applicability to detailed diagnostics. In this paper, we introduce a Dense Chest X-ray Foundation Model (DCXFM), which utilizes mixed supervision types (i.e., text, label, and segmentation masks) to significantly enhance the scalability of foundation models across various medical tasks. Our model involves two training stages: we first employ a novel self-distilled multimodal pretraining paradigm to exploit text and label supervision, along with local-to-global self-distillation and soft cross-modal contrastive alignment strategies to enhance localization capabilities. Subsequently, we introduce an efficient cost aggregation module, comprising spatial and class aggregation mechanisms, to further advance dense prediction tasks with densely annotated datasets. Comprehensive evaluations on three tasks (phrase grounding, zero-shot semantic segmentation, and zero-shot classification) demonstrate DCXFM's superior performance over other state-of-the-art medical image-text pretraining models. Remarkably, DCXFM exhibits powerful zero-shot capabilities across various datasets in phrase grounding and zero-shot semantic segmentation, underscoring its superior generalization in dense prediction tasks.

Semantically Informed Salient Regions Guided Radiology Report Generation

Zeyi Hou, Zeqiang Wei, Ruixin Yan, Ning Lang, Xiuzhuang Zhou

arxiv logopreprintJul 15 2025
Recent advances in automated radiology report generation from chest X-rays using deep learning algorithms have the potential to significantly reduce the arduous workload of radiologists. However, due to the inherent massive data bias in radiology images, where abnormalities are typically subtle and sparsely distributed, existing methods often produce fluent yet medically inaccurate reports, limiting their applicability in clinical practice. To address this issue effectively, we propose a Semantically Informed Salient Regions-guided (SISRNet) report generation method. Specifically, our approach explicitly identifies salient regions with medically critical characteristics using fine-grained cross-modal semantics. Then, SISRNet systematically focuses on these high-information regions during both image modeling and report generation, effectively capturing subtle abnormal findings, mitigating the negative impact of data bias, and ultimately generating clinically accurate reports. Compared to its peers, SISRNet demonstrates superior performance on widely used IU-Xray and MIMIC-CXR datasets.

Performance of a screening-trained DL model for pulmonary nodule malignancy estimation of incidental clinical nodules.

Dinnessen R, Peeters D, Antonissen N, Mohamed Hoesein FAA, Gietema HA, Scholten ET, Schaefer-Prokop C, Jacobs C

pubmed logopapersJul 15 2025
To test the performance of a DL model developed and validated for screen-detected pulmonary nodules on incidental nodules detected in a clinical setting. A retrospective dataset of incidental pulmonary nodules sized 5-15 mm was collected, and a subset of size-matched solid nodules was selected. The performance of the DL model was compared to the Brock model. AUCs with 95% CIs were compared using the DeLong method. Sensitivity and specificity were determined at various thresholds, using a 10% threshold for the Brock model as reference. The model's calibration was visually assessed. The dataset included 49 malignant and 359 benign solid or part-solid nodules, and the size-matched dataset included 47 malignant and 47 benign solid nodules. In the complete dataset, AUCs [95% CI] were 0.89 [0.85, 0.93] for the DL model and 0.86 [0.81, 0.92] for the Brock model (p = 0.27). In the size-matched subset, AUCs of the DL and Brock models were 0.78 [0.69, 0.88] and 0.58 [0.46, 0.69] (p < 0.01), respectively. At a 10% threshold, the Brock model had a sensitivity of 0.49 [0.35, 0.63] and a specificity of 0.92 [0.89, 0.94]. At a threshold of 17%, the DL model matched the specificity of the Brock model at the 10% threshold, but had a higher sensitivity (0.57 [0.43, 0.71]). Calibration analysis revealed that the DL model overestimated the malignancy probability. The DL model demonstrated good discriminatory performance in a dataset of incidental nodules and outperformed the Brock model, but may need recalibration for clinical practice. Question What is the performance of a DL model for pulmonary nodule malignancy risk estimation developed on screening data in a dataset of incidentally detected nodules? Findings The DL model performed well on a dataset of nodules from clinical routine care and outperformed the Brock model in a size-matched subset. Clinical relevance This study provides further evidence about the potential of DL models for risk stratification of incidental nodules, which may improve nodule management in routine clinical practice.

A generative model uses healthy and diseased image pairs for pixel-level chest X-ray pathology localization.

Dong K, Cheng Y, He K, Suo J

pubmed logopapersJul 14 2025
Medical artificial intelligence (AI) offers potential for automatic pathological interpretation, but a practicable AI model demands both pixel-level accuracy and high explainability for diagnosis. The construction of such models relies on substantial training data with fine-grained labelling, which is impractical in real applications. To circumvent this barrier, we propose a prompt-driven constrained generative model to produce anatomically aligned healthy and diseased image pairs and learn a pathology localization model in a supervised manner. This paradigm provides high-fidelity labelled data and addresses the lack of chest X-ray images with labelling at fine scales. Benefitting from the emerging text-driven generative model and the incorporated constraint, our model presents promising localization accuracy of subtle pathologies, high explainability for clinical decisions, and good transferability to many unseen pathological categories such as new prompts and mixed pathologies. These advantageous features establish our model as a promising solution to assist chest X-ray analysis. In addition, the proposed approach is also inspiring for other tasks lacking massive training data and time-consuming manual labelling.
Page 30 of 66652 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.