Latest Papers on Radiology AI. Tags: X-Ray

Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images

Zahra TehraniNasab, Amar Kumar, Tal Arbel

•preprint•Jul 17 2025

Medical image synthesis presents unique challenges due to the inherent complexity and high-resolution details required in clinical contexts. Traditional generative architectures such as Generative Adversarial Networks (GANs) or Variational Auto Encoder (VAEs) have shown great promise for high-resolution image generation but struggle with preserving fine-grained details that are key for accurate diagnosis. To address this issue, we introduce Pixel Perfect MegaMed, the first vision-language foundation model to synthesize images at resolutions of 1024x1024. Our method deploys a multi-scale transformer architecture designed specifically for ultra-high resolution medical image generation, enabling the preservation of both global anatomical context and local image-level details. By leveraging vision-language alignment techniques tailored to medical terminology and imaging modalities, Pixel Perfect MegaMed bridges the gap between textual descriptions and visual representations at unprecedented resolution levels. We apply our model to the CheXpert dataset and demonstrate its ability to generate clinically faithful chest X-rays from text prompts. Beyond visual quality, these high-resolution synthetic images prove valuable for downstream tasks such as classification, showing measurable performance gains when used for data augmentation, particularly in low-data regimes. Our code is accessible through the project website - https://tehraninasab.github.io/pixelperfect-megamed.

X-Ray Image Synthesis Chest Methodology In Silico Academic Lab GenAI Open Code

A Deep Learning-Based Ensemble System for Automated Shoulder Fracture Detection in Clinical Radiographs

Hemanth Kumar M, Karthika M, Saianiruth M, Vasanthakumar Venugopal, Anandakumar D, Revathi Ezhumalai, Charulatha K, Kishore Kumar J, Dayana G, Kalyan Sivasailam, Bargava Subramanian

•preprint•Jul 17 2025

Background: Shoulder fractures are often underdiagnosed, especially in emergency and high-volume clinical settings. Studies report up to 10% of such fractures may be missed by radiologists. AI-driven tools offer a scalable way to assist early detection and reduce diagnostic delays. We address this gap through a dedicated AI system for shoulder radiographs. Methods: We developed a multi-model deep learning system using 10,000 annotated shoulder X-rays. Architectures include Faster R-CNN (ResNet50-FPN, ResNeXt), EfficientDet, and RF-DETR. To enhance detection, we applied bounding box and classification-level ensemble techniques such as Soft-NMS, WBF, and NMW fusion. Results: The NMW ensemble achieved 95.5% accuracy and an F1-score of 0.9610, outperforming individual models across all key metrics. It demonstrated strong recall and localization precision, confirming its effectiveness for clinical fracture detection in shoulder X-rays. Conclusion: The results show ensemble-based AI can reliably detect shoulder fractures in radiographs with high clinical relevance. The model's accuracy and deployment readiness position it well for integration into real-time diagnostic workflows. The current model is limited to binary fracture detection, reflecting its design for rapid screening and triage support rather than detailed orthopedic classification.

X-Ray Detection Musculoskeletal Methodology In Silico

Artificial intelligence-based diabetes risk prediction from longitudinal DXA bone measurements.

Khan S, Shah Z

•papers•Jul 16 2025

Diabetes mellitus (DM) is a serious global health concern that poses a significant threat to human life. Beyond its direct impact, diabetes substantially increases the risk of developing severe complications such as hypertension, cardiovascular disease, and musculoskeletal disorders like arthritis and osteoporosis. The field of diabetes classification has advanced significantly with the use of diverse data modalities and sophisticated tools to identify individuals or groups as diabetic. But the task of predicting diabetes prior to its onset, particularly through the use of longitudinal multi-modal data, remains relatively underexplored. To better understand the risk factors associated with diabetes development among Qatari adults, this longitudinal research aims to investigate dual-energy X-ray absorptiometry (DXA)-derived whole-body and regional bone composition measures as potential predictors of diabetes onset. We proposed a case-control retrospective study, with a total of 1,382 participants contains 725 male participants (cases: 146, control: 579) and 657 female participants (case: 133, control: 524). We excluded participants with incomplete data points. To handle class imbalance, we augmented our data using Synthetic Minority Over-sampling Technique (SMOTE) and SMOTEENN (SMOTE with Edited Nearest Neighbors), and to further investigated the association between bones data features and diabetes status, we employed ANOVA analytical method. For diabetes onset prediction, we employed both conventional and deep learning (DL) models to predict risk factors associated with diabetes in Qatari adults. We used SHAP and probabilistic methods to investigate the association of identified risk factors with diabetes. During experimental analysis, we found that bone mineral density (BMD), bone mineral contents (BMC) in the hip, femoral neck, troch area, and lumbar spine showed an upward trend in diabetic patients with [Formula: see text]. Meanwhile, we found that patients with abnormal glucose metabolism had increased wards BMD and BMC with low Z-score compared to healthy participants. Consequently, it shows that the diabetic group has superior bone health than the control group in the cohort, because they exhibit higher BMD, muscle mass, and bone area across most body regions. Moreover, in the age group distribution analysis, we found that the diabetes prediction rate was higher among healthy participants in the younger age group 20-40 years. But as the age range increased, the model predictions became more accurate for diabetic participants, especially in the older age group 56-69 years. It is also observed that male participants demonstrated a higher susceptibility to diabetes onset compared to female participants. Shallow models outperformed the DL models by presenting improved accuracy (91.08%), AUROC (96%), and recall values (91%). This pivotal approach utilizing DXA scans highlights significant potential for the rapid and minimally invasive early detection of diabetes.

X-Ray Classification Whole Body Retrospective Clinical In Silico Academic Lab Breakthrough

Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants

Sybelle Goedicke-Fritz, Michelle Bous, Annika Engel, Matthias Flotho, Pascal Hirsch, Hannah Wittig, Dino Milanovic, Dominik Mohr, Mathias Kaspar, Sogand Nemat, Dorothea Kerner, Arno Bücker, Andreas Keller, Sascha Meyer, Michael Zemlin, Philipp Flotho

•preprint•Jul 16 2025

Bronchopulmonary dysplasia (BPD) is a chronic lung disease affecting 35% of extremely low birth weight infants. Defined by oxygen dependence at 36 weeks postmenstrual age, it causes lifelong respiratory complications. However, preventive interventions carry severe risks, including neurodevelopmental impairment, ventilator-induced lung injury, and systemic complications. Therefore, early BPD prognosis and prediction of BPD outcome is crucial to avoid unnecessary toxicity in low risk infants. Admission radiographs of extremely preterm infants are routinely acquired within 24h of life and could serve as a non-invasive prognostic tool. In this work, we developed and investigated a deep learning approach using chest X-rays from 163 extremely low-birth-weight infants ($\leq$32 weeks gestation, 401-999g) obtained within 24 hours of birth. We fine-tuned a ResNet-50 pretrained specifically on adult chest radiographs, employing progressive layer freezing with discriminative learning rates to prevent overfitting and evaluated a CutMix augmentation and linear probing. For moderate/severe BPD outcome prediction, our best performing model with progressive freezing, linear probing and CutMix achieved an AUROC of 0.78 $\pm$ 0.10, balanced accuracy of 0.69 $\pm$ 0.10, and an F1-score of 0.67 $\pm$ 0.11. In-domain pre-training significantly outperformed ImageNet initialization (p = 0.031) which confirms domain-specific pretraining to be important for BPD outcome prediction. Routine IRDS grades showed limited prognostic value (AUROC 0.57 $\pm$ 0.11), confirming the need of learned markers. Our approach demonstrates that domain-specific pretraining enables accurate BPD prediction from routine day-1 radiographs. Through progressive freezing and linear probing, the method remains computationally feasible for site-level implementation and future federated learning deployments.

X-Ray Classification Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Generate to Ground: Multimodal Text Conditioning Boosts Phrase Grounding in Medical Vision-Language Models

Felix Nützel, Mischa Dombrowski, Bernhard Kainz

•preprint•Jul 16 2025

Phrase grounding, i.e., mapping natural language phrases to specific image regions, holds significant potential for disease localization in medical imaging through clinical reports. While current state-of-the-art methods rely on discriminative, self-supervised contrastive models, we demonstrate that generative text-to-image diffusion models, leveraging cross-attention maps, can achieve superior zero-shot phrase grounding performance. Contrary to prior assumptions, we show that fine-tuning diffusion models with a frozen, domain-specific language model, such as CXR-BERT, substantially outperforms domain-agnostic counterparts. This setup achieves remarkable improvements, with mIoU scores doubling those of current discriminative methods. These findings highlight the underexplored potential of generative models for phrase grounding tasks. To further enhance performance, we introduce Bimodal Bias Merging (BBM), a novel post-processing technique that aligns text and image biases to identify regions of high certainty. BBM refines cross-attention maps, achieving even greater localization accuracy. Our results establish generative approaches as a more effective paradigm for phrase grounding in the medical imaging domain, paving the way for more robust and interpretable applications in clinical practice. The source code and model weights are available at https://github.com/Felix-012/generate_to_ground.

X-Ray Segmentation Chest Methodology In Silico Academic Lab Open Code

Interpreting Radiologist's Intention from Eye Movements in Chest X-ray Diagnosis

Trong-Thang Pham, Anh Nguyen, Zhigang Deng, Carol C. Wu, Hien Van Nguyen, Ngan Le

•preprint•Jul 16 2025

Radiologists rely on eye movements to navigate and interpret medical images. A trained radiologist possesses knowledge about the potential diseases that may be present in the images and, when searching, follows a mental checklist to locate them using their gaze. This is a key observation, yet existing models fail to capture the underlying intent behind each fixation. In this paper, we introduce a deep learning-based approach, RadGazeIntent, designed to model this behavior: having an intention to find something and actively searching for it. Our transformer-based architecture processes both the temporal and spatial dimensions of gaze data, transforming fine-grained fixation features into coarse, meaningful representations of diagnostic intent to interpret radiologists' goals. To capture the nuances of radiologists' varied intention-driven behaviors, we process existing medical eye-tracking datasets to create three intention-labeled subsets: RadSeq (Systematic Sequential Search), RadExplore (Uncertainty-driven Exploration), and RadHybrid (Hybrid Pattern). Experimental results demonstrate RadGazeIntent's ability to predict which findings radiologists are examining at specific moments, outperforming baseline methods across all intention-labeled datasets.

X-Ray Classification Chest Methodology In Silico Academic Lab Benchmark SOTA

Collaborative Integration of AI and Human Expertise to Improve Detection of Chest Radiograph Abnormalities.

Awasthi A, Le N, Deng Z, Wu CC, Nguyen HV

•papers•Jul 16 2025

<i>"Just Accepted" papers have undergone full peer review and have been accepted for publication in <i>Radiology: Artificial Intelligence</i>. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content.</i> Purpose To develop a collaborative AI system that integrates eye gaze data and radiology reports to improve diagnostic accuracy in chest radiograph interpretation by identifying and correcting perceptual errors. Materials and Methods This retrospective study utilized public datasets REFLACX and EGD-CXR to develop a collaborative AI solution, named Collaborative Radiology Expert (CoRaX). It employs a large multimodal model to analyze image embeddings, eye gaze data, and radiology reports, aiming to rectify perceptual errors in chest radiology. The proposed system was evaluated using two simulated error datasets featuring random and uncertain alterations of five abnormalities. Evaluation focused on the system's referral-making process, the quality of referrals, and its performance within collaborative diagnostic settings. Results In the random masking-based error dataset, 28.0% (93/332) of abnormalities were altered. The system successfully corrected 21.3% (71/332) of these errors, with 6.6% (22/332) remaining unresolved. The accuracy of the system in identifying the correct regions of interest for missed abnormalities was 63.0% [95% CI: 59.0%, 68.0%], and 85.7% (240/280) of interactions with radiologists were deemed satisfactory, meaning that the system provided diagnostic aid to radiologists. In the uncertainty-masking-based error dataset, 43.9% (146/332) of abnormalities were altered. The system corrected 34.6% (115/332) of these errors, with 9.3% (31/332) unresolved. The accuracy of predicted regions of missed abnormalities for this dataset was 58.0% [95% CI: 55.0%, 62.0%], and 78.4% (233/297) of interactions were satisfactory. Conclusion The CoRaX system can collaborate efficiently with radiologists and address perceptual errors across various abnormalities in chest radiographs. ©RSNA, 2025.

X-Ray Detection Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants

Sybelle Goedicke-Fritz, Michelle Bous, Annika Engel, Matthias Flotho, Pascal Hirsch, Hannah Wittig, Dino Milanovic, Dominik Mohr, Mathias Kaspar, Sogand Nemat, Dorothea Kerner, Arno Bücker, Andreas Keller, Sascha Meyer, Michael Zemlin, Philipp Flotho

•preprint•Jul 16 2025

Bronchopulmonary dysplasia (BPD) is a chronic lung disease affecting 35% of extremely low birth weight infants. Defined by oxygen dependence at 36 weeks postmenstrual age, it causes lifelong respiratory complications. However, preventive interventions carry severe risks, including neurodevelopmental impairment, ventilator-induced lung injury, and systemic complications. Therefore, early BPD prognosis and prediction of BPD outcome is crucial to avoid unnecessary toxicity in low risk infants. Admission radiographs of extremely preterm infants are routinely acquired within 24h of life and could serve as a non-invasive prognostic tool. In this work, we developed and investigated a deep learning approach using chest X-rays from 163 extremely low-birth-weight infants ($\leq$32 weeks gestation, 401-999g) obtained within 24 hours of birth. We fine-tuned a ResNet-50 pretrained specifically on adult chest radiographs, employing progressive layer freezing with discriminative learning rates to prevent overfitting and evaluated a CutMix augmentation and linear probing. For moderate/severe BPD outcome prediction, our best performing model with progressive freezing, linear probing and CutMix achieved an AUROC of 0.78 $\pm$ 0.10, balanced accuracy of 0.69 $\pm$ 0.10, and an F1-score of 0.67 $\pm$ 0.11. In-domain pre-training significantly outperformed ImageNet initialization (p = 0.031) which confirms domain-specific pretraining to be important for BPD outcome prediction. Routine IRDS grades showed limited prognostic value (AUROC 0.57 $\pm$ 0.11), confirming the need of learned markers. Our approach demonstrates that domain-specific pretraining enables accurate BPD prediction from routine day-1 radiographs. Through progressive freezing and linear probing, the method remains computationally feasible for site-level implementation and future federated learning deployments.

X-Ray Classification Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Deep learning-assisted comparison of different models for predicting maxillary canine impaction on panoramic radiography.

Zhang C, Zhu H, Long H, Shi Y, Guo J, You M

•papers•Jul 16 2025

The panoramic radiograph is the most commonly used imaging modality for predicting maxillary canine impaction. Several prediction models have been constructed based on panoramic radiographs. This study aimed to compare the prediction accuracy of existing models in an external validation facilitated by an automatic landmark detection system based on deep learning. Patients aged 7-14 years who underwent panoramic radiographic examinations and received a diagnosis of impacted canines were included in the study. An automatic landmark localization system was employed to assist the measurement of geometric parameters on the panoramic radiographs, followed by the calculated prediction of the canine impaction. Three prediction models constructed by Arnautska, Alqerban et al, and Margot et al were evaluated. The metrics of accuracy, sensitivity, specificity, precision, and area under the receiver operating characteristic curve (AUC) were used to compare the performance of different models. A total of 102 panoramic radiographs with 102 impacted canines and 102 nonimpacted canines were analyzed in this study. The prediction outcomes indicated that the model by Margot et al achieved the highest performance, with a sensitivity of 95% and a specificity of 86% (AUC, 0.97), followed by the model by Arnautska, with a sensitivity of 93% and a specificity of 71% (AUC, 0.94). The model by Alqerban et al showed poor performance with an AUC of only 0.20. Two of the existing predictive models exhibited good diagnostic accuracy, whereas the third model demonstrated suboptimal performance. Nonetheless, even the most effective model is constrained by several limitations, such as logical and computational challenges, which necessitate further refinement.

X-Ray Detection Retrospective Clinical In Silico Academic Lab

Scaling Chest X-ray Foundation Models from Mixed Supervisions for Dense Prediction.

Wang F, Yu L

•papers•Jul 16 2025

Foundation models have significantly revolutionized the field of chest X-ray diagnosis with their ability to transfer across various diseases and tasks. However, previous works have predominantly utilized self-supervised learning from medical image-text pairs, which falls short in dense medical prediction tasks due to their sole reliance on such coarse pair supervision, thereby limiting their applicability to detailed diagnostics. In this paper, we introduce a Dense Chest X-ray Foundation Model (DCXFM), which utilizes mixed supervision types (i.e., text, label, and segmentation masks) to significantly enhance the scalability of foundation models across various medical tasks. Our model involves two training stages: we first employ a novel self-distilled multimodal pretraining paradigm to exploit text and label supervision, along with local-to-global self-distillation and soft cross-modal contrastive alignment strategies to enhance localization capabilities. Subsequently, we introduce an efficient cost aggregation module, comprising spatial and class aggregation mechanisms, to further advance dense prediction tasks with densely annotated datasets. Comprehensive evaluations on three tasks (phrase grounding, zero-shot semantic segmentation, and zero-shot classification) demonstrate DCXFM's superior performance over other state-of-the-art medical image-text pretraining models. Remarkably, DCXFM exhibits powerful zero-shot capabilities across various datasets in phrase grounding and zero-shot semantic segmentation, underscoring its superior generalization in dense prediction tasks.

X-Ray Segmentation Chest Methodology In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images

A Deep Learning-Based Ensemble System for Automated Shoulder Fracture Detection in Clinical Radiographs

Artificial intelligence-based diabetes risk prediction from longitudinal DXA bone measurements.

Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants

Generate to Ground: Multimodal Text Conditioning Boosts Phrase Grounding in Medical Vision-Language Models

Interpreting Radiologist's Intention from Eye Movements in Chest X-ray Diagnosis

Collaborative Integration of AI and Human Expertise to Improve Detection of Chest Radiograph Abnormalities.

Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants

Deep learning-assisted comparison of different models for predicting maxillary canine impaction on panoramic radiography.

Scaling Chest X-ray Foundation Models from Mixed Supervisions for Dense Prediction.

Ready to Sharpen Your Edge?