Sort by:
Page 1 of 66652 results
Next

Evaluating GPT-4o for emergency disposition of complex respiratory cases with pulmonology consultation: a diagnostic accuracy study.

Yıldırım C, Aykut A, Günsoy E, Öncül MV

pubmed logopapersOct 2 2025
Large Language Models (LLMs), such as GPT-4o, are increasingly investigated for clinical decision support in emergency medicine. However, their real-world performance in disposition prediction remains insufficiently studied. This study evaluated the diagnostic accuracy of GPT-4o in predicting ED disposition-discharge, ward admission, or ICU admission-in complex emergency respiratory cases requiring pulmonology consultation and chest CT, representing a selective high-acuity subgroup of ED patients. We conducted a retrospective observational study in a tertiary ED between November 2024 and February 2025. We retrospectively included ED patients with complex respiratory presentations who underwent pulmonology consultation and chest CT, representing a selective high-acuity subgroup rather than the general ED respiratory population. GPT-4o was prompted to predict the most appropriate ED disposition using three progressively enriched input models: Model 1 (age, sex, oxygen saturation, home oxygen therapy, and venous blood gas parameters); Model 2 (Model 1 plus laboratory data); and Model 3 (Model 2 plus chest CT findings). Model performance was assessed using accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score. Among the 221 patients included, 69.2% were admitted to the ward, 9.0% to the intensive care unit (ICU), and 21.7% were discharged. For hospital admission prediction, Model 3 demonstrated the highest sensitivity (91.9%) and overall accuracy (76.5%), but the lowest specificity (20.8%). In contrast, for discharge prediction, Model 3 achieved the highest specificity (91.9%) but the lowest sensitivity (20.8%). Numerical improvements were observed across models, but none reached statistical significance (all p > 0.22). Model 1 therefore performed comparably to Models 2-3 while being less complex. Among patients who were discharged despite GPT-4o predicting admission, the 14-day ED re-presentation rates were 23.8% (5/21) for Model 1, 30.0% (9/30) for Model 2, and 28.9% (11/38) for Model 3. GPT-4o demonstrated high sensitivity in identifying ED patients requiring hospital admission, particularly those needing intensive care, when provided with progressively enriched clinical input. However, its low sensitivity for discharge prediction resulted in frequent overtriage, limiting its utility for autonomous decision-making. This proof-of-concept study demonstrates GPT-4o's capacity to stratify disposition decisions in complex respiratory cases under varying levels of limited input data. However, these findings should be interpreted in light of key limitations, including the selective high-acuity cohort and the absence of vital signs, and require prospective validation before clinical implementation.

Uncovering Overconfident Failures in CXR Models via Augmentation-Sensitivity Risk Scoring

Han-Jay Shu, Wei-Ning Chiu, Shun-Ting Chang, Meng-Ping Huang, Takeshi Tohyama, Ahram Han, Po-Chih Kuo

arxiv logopreprintOct 2 2025
Deep learning models achieve strong performance in chest radiograph (CXR) interpretation, yet fairness and reliability concerns persist. Models often show uneven accuracy across patient subgroups, leading to hidden failures not reflected in aggregate metrics. Existing error detection approaches -- based on confidence calibration or out-of-distribution (OOD) detection -- struggle with subtle within-distribution errors, while image- and representation-level consistency-based methods remain underexplored in medical imaging. We propose an augmentation-sensitivity risk scoring (ASRS) framework to identify error-prone CXR cases. ASRS applies clinically plausible rotations ($\pm 15^\circ$/$\pm 30^\circ$) and measures embedding shifts with the RAD-DINO encoder. Sensitivity scores stratify samples into stability quartiles, where highly sensitive cases show substantially lower recall ($-0.2$ to $-0.3$) despite high AUROC and confidence. ASRS provides a label-free means for selective prediction and clinician review, improving fairness and safety in medical AI.

GFSR-Net: Guided Focus via Segment-Wise Relevance Network for Interpretable Deep Learning in Medical Imaging

Jhonatan Contreras, Thomas Bocklitz

arxiv logopreprintOct 2 2025
Deep learning has achieved remarkable success in medical image analysis, however its adoption in clinical practice is limited by a lack of interpretability. These models often make correct predictions without explaining their reasoning. They may also rely on image regions unrelated to the disease or visual cues, such as annotations, that are not present in real-world conditions. This can reduce trust and increase the risk of misleading diagnoses. We introduce the Guided Focus via Segment-Wise Relevance Network (GFSR-Net), an approach designed to improve interpretability and reliability in medical imaging. GFSR-Net uses a small number of human annotations to approximate where a person would focus within an image intuitively, without requiring precise boundaries or exhaustive markings, making the process fast and practical. During training, the model learns to align its focus with these areas, progressively emphasizing features that carry diagnostic meaning. This guidance works across different types of natural and medical images, including chest X-rays, retinal scans, and dermatological images. Our experiments demonstrate that GFSR achieves comparable or superior accuracy while producing saliency maps that better reflect human expectations. This reduces the reliance on irrelevant patterns and increases confidence in automated diagnostic tools.

PixelPrint 4D : A 3D Printing Method of Fabricating Patient-Specific Deformable CT Phantoms for Respiratory Motion Applications.

Im JY, Micah N, Perkins AE, Mei K, Geagan M, Roshkovan L, Noël PB

pubmed logopapersOct 1 2025
Respiratory motion poses a significant challenge for clinical workflows in diagnostic imaging and radiation therapy. Many technologies such as motion artifact reduction and tumor tracking have been developed to compensate for its effect. To assess these technologies, respiratory motion phantoms (RMPs) are required as preclinical testing environments, for instance, in computed tomography (CT). However, current CT RMPs are highly simplified and do not exhibit realistic tissue structures or deformation patterns. With the rise of more complex motion compensation technologies such as deep learning-based algorithms, there is a need for more realistic RMPs. This work introduces PixelPrint 4D , a 3D printing method for fabricating lifelike, patient-specific deformable lung phantoms for CT imaging. A 4DCT dataset of a lung cancer patient was acquired. The volumetric image data of the right lung at end inhalation was converted into 3D printer instructions using the previously developed PixelPrint software. A flexible 3D printing material was used to replicate variable densities voxel-by-voxel within the phantom. The accuracy of the phantom was assessed by acquiring CT scans of the phantom at rest, and under various levels of compression. These phantom images were then compiled into a pseudo-4DCT dataset and compared to the reference patient 4DCT images. Metrics used to assess the phantom structural accuracy included mean attenuation errors, 2-sample 2-sided Kolmogorov-Smirnov (KS) test on histograms, and structural similarity index (SSIM). The phantom deformation properties were assessed by calculating displacement errors of the tumor and throughout the full lung volume, attenuation change errors, and Jacobian errors, as well as the relationship between Jacobian and attenuation changes. The phantom closely replicated patient lung structures, textures, and attenuation profiles. SSIM was measured as 0.93 between the patient and phantom lung, suggesting a high level of structural accuracy. Furthermore, it exhibited realistic nonrigid deformation patterns. The mean tumor motion errors in the phantom were ≤0.7 ± 0.6 mm in each orthogonal direction. Finally, the relationship between attenuation and local volume changes in the phantom had a strong correlation with that of the patient, with analysis of covariance yielding P  = 0.83 and f  = 0.04, suggesting no significant difference between the phantom and patient. PixelPrint 4D facilitates the creation of highly realistic RMPs, exceeding the capabilities of existing models to provide enhanced testing environments for a wide range of emerging CT technologies.

Design of AI-driven microwave imaging for lung tumor monitoring.

Singh A, Paul S, Gayen S, Mandal B, Mitra D, Augustine R

pubmed logopapersOct 1 2025
The global incidence of lung diseases, particularly lung cancer, is increasing at an alarming rate, underscoring the urgent need for early detection, robust monitoring, and timely intervention. This study presents design aspects of an artificial intelligence (AI)-integrated microwave-based diagnostic tool for the early detection of lung tumors. The proposed method assimilates the prowess of machine learning (ML) tools with microwave imaging (MWI). A microwave unit containing eight antennas in the form of a wearable belt is employed for data collection from the CST body models. The data, collected in the form of scattering parameters, are reconstructed as 2D images. Two different ML approaches have been investigated for tumor detection and prediction of the size of the detected tumor. The first approach employs XGBoost models on raw S-parameters and the second approach uses convolutional neural networks (CNN) on the reconstructed 2-D microwave images. It is found that the XGBoost-based classifier with S-parameters outperforms the CNN-based classifier on reconstructed microwave images for tumor detection. Whereas a CNN-based model on reconstructed microwave images performs much better than an XGBoost-based regression model designed on the raw S-parameters for tumor size prediction. The performances of both of these models are evaluated on other body models to examine their generalization capacity over unknown data. This work explores the feasibility of a low-cost portable AI-integrated microwave diagnostic device for lung tumor detection, which eliminates the risk of exposure to harmful ionizing radiations of X-ray and CT scans.

Does Bigger Mean Better? Comparitive Analysis of CNNs and Biomedical Vision Language Modles in Medical Diagnosis

Ran Tong, Jiaqi Liu, Su Liu, Jiexi Xu, Lanruo Wang, Tong Wang

arxiv logopreprintOct 1 2025
The accurate interpretation of chest radiographs using automated methods is a critical task in medical imaging. This paper presents a comparative analysis between a supervised lightweight Convolutional Neural Network (CNN) and a state-of-the-art, zero-shot medical Vision-Language Model (VLM), BiomedCLIP, across two distinct diagnostic tasks: pneumonia detection on the PneumoniaMNIST benchmark and tuberculosis detection on the Shenzhen TB dataset. Our experiments show that supervised CNNs serve as highly competitive baselines in both cases. While the default zero-shot performance of the VLM is lower, we demonstrate that its potential can be unlocked via a simple yet crucial remedy: decision threshold calibration. By optimizing the classification threshold on a validation set, the performance of BiomedCLIP is significantly boosted across both datasets. For pneumonia detection, calibration enables the zero-shot VLM to achieve a superior F1-score of 0.8841, surpassing the supervised CNN's 0.8803. For tuberculosis detection, calibration dramatically improves the F1-score from 0.4812 to 0.7684, bringing it close to the supervised baseline's 0.7834. This work highlights a key insight: proper calibration is essential for leveraging the full diagnostic power of zero-shot VLMs, enabling them to match or even outperform efficient, task-specific supervised models.

Graph neural network model using radiomics for lung CT image segmentation.

Faizi MK, Qiang Y, Shagar MMB, Wei Y, Qiao Y, Zhao J, Urrehman Z

pubmed logopapersOct 1 2025
Early detection of lung cancer is critical for improving treatment outcomes, and automatic lung image segmentation plays a key role in diagnosing lung-related diseases such as cancer, COVID-19, and respiratory disorders. Challenges include overlapping anatomical structures, complex pixel-level feature fusion, and intricate morphology of lung tissues all of which impede segmentation accuracy. To address these issues, this paper introduces GEANet, a novel framework for lung segmentation in CT images. GEANet utilizes an encoder-decoder architecture enriched with radiomics-derived features. Additionally, it incorporates Graph Neural Network (GNN) modules to effectively capture the complex heterogeneity of tumors. Additionally, a boundary refinement module is incorporated to improve image reconstruction and boundary delineation accuracy. The framework utilizes a hybrid loss function combining Focal Loss and IoU Loss to address class imbalance and enhance segmentation robustness. Experimental results on benchmark datasets demonstrate that GEANet outperforms eight state-of-the-art methods across various metrics, achieving superior segmentation accuracy while maintaining computational efficiency.

Automated Structured Radiology Report Generation with Rich Clinical Context

Seongjae Kang, Dong Bok Lee, Juho Jung, Dongseop Kim, Won Hwa Kim, Sunghoon Joo

arxiv logopreprintOct 1 2025
Automated structured radiology report generation (SRRG) from chest X-ray images offers significant potential to reduce workload of radiologists by generating reports in structured formats that ensure clarity, consistency, and adherence to clinical reporting standards. While radiologists effectively utilize available clinical contexts in their diagnostic reasoning, existing SRRG systems overlook these essential elements. This fundamental gap leads to critical problems including temporal hallucinations when referencing non-existent clinical contexts. To address these limitations, we propose contextualized SRRG (C-SRRG) that comprehensively incorporates rich clinical context for SRRG. We curate C-SRRG dataset by integrating comprehensive clinical context encompassing 1) multi-view X-ray images, 2) clinical indication, 3) imaging techniques, and 4) prior studies with corresponding comparisons based on patient histories. Through extensive benchmarking with state-of-the-art multimodal large language models, we demonstrate that incorporating clinical context with the proposed C-SRRG significantly improves report generation quality. We publicly release dataset, code, and checkpoints to facilitate future research for clinically-aligned automated RRG at https://github.com/vuno/contextualized-srrg.

Self-Supervised Anatomical Consistency Learning for Vision-Grounded Medical Report Generation

Longzhen Yang, Zhangkai Ni, Ying Wen, Yihang Liu, Lianghua He, Heng Tao Shen

arxiv logopreprintSep 30 2025
Vision-grounded medical report generation aims to produce clinically accurate descriptions of medical images, anchored in explicit visual evidence to improve interpretability and facilitate integration into clinical workflows. However, existing methods often rely on separately trained detection modules that require extensive expert annotations, introducing high labeling costs and limiting generalizability due to pathology distribution bias across datasets. To address these challenges, we propose Self-Supervised Anatomical Consistency Learning (SS-ACL) -- a novel and annotation-free framework that aligns generated reports with corresponding anatomical regions using simple textual prompts. SS-ACL constructs a hierarchical anatomical graph inspired by the invariant top-down inclusion structure of human anatomy, organizing entities by spatial location. It recursively reconstructs fine-grained anatomical regions to enforce intra-sample spatial alignment, inherently guiding attention maps toward visually relevant areas prompted by text. To further enhance inter-sample semantic alignment for abnormality recognition, SS-ACL introduces a region-level contrastive learning based on anatomical consistency. These aligned embeddings serve as priors for report generation, enabling attention maps to provide interpretable visual evidence. Extensive experiments demonstrate that SS-ACL, without relying on expert annotations, (i) generates accurate and visually grounded reports -- outperforming state-of-the-art methods by 10\% in lexical accuracy and 25\% in clinical efficacy, and (ii) achieves competitive performance on various downstream visual tasks, surpassing current leading visual foundation models by 8\% in zero-shot visual grounding.

Identification of structural predictors of lung function improvement in adults with cystic fibrosis treated with elexacaftor-tezacaftor-ivacaftor using deep-learning.

Chassagnon G, Marini R, Ong V, Da Silva J, Habip Gatenyo D, Honore I, Kanaan R, Carlier N, Fesenbeckh J, Burnet E, Revel MP, Martin C, Burgel PR

pubmed logopapersSep 30 2025
The purpose of this study was to evaluate the relationship between structural abnormalities on CT and lung function prior to and after initiation of elexacaftor-tezacaftor-ivacaftor (ETI) in adults with cystic fibrosis (CF) using a deep learning model. A deep learning quantification model was developed using 100 chest computed tomography (CT) examinations of patients with CF and 150 chest CT examinations of patients with various other bronchial diseases to quantify seven types of abnormalities. This model was then applied to an independent dataset of CT examinations of 218 adults with CF who were treated with ETI. The relationship between structural abnormalities and percent predicted forced expiratory volume in one second (ppFEV<sub>1</sub>) was examined using general linear regression models. The deep learning model performed as well as radiologists for the quantification of the seven types of abnormalities. Chest CT examinations obtained before to and one year after the initiation of ETI were analyzed. The independent structural predictors of ppFEV<sub>1</sub> prior to ETI were bronchial wall thickening (P = 0.011), mucus plugging (P < 0.001), consolidation/atelectasis (P < 0.001), and mosaic perfusion (P < 0.001). An increase in ppFEV<sub>1</sub> after initiation of ETI independently correlated with a decrease in bronchial wall thicknening (-49 %; P = 0.004), mucus plugging (-92 %; P < 0.001), centrilobular nodules (-78 %; P = 0.009) and mosaic perfusion (-14 %; P < 0.001). Younger age (P < 0.001), greater mucus plugging extent (P = 0.016), and centrilobular nodules (P < 0.001) prior to ETI initiation were independent predictors of ppFEV<sub>1</sub> improvement. A deep learning model can quantify CT lung abnormalities in adults with CF. Lung function impairment in adults with CF is associated with muco-inflammatory lesions on CT, which are largely reversible with ETI, and with mosaic perfusion, which appear less reversible and is presumably related to irreversible damage. Predictors of lung function improvement are a younger age and a greater extent of muco-inflammatory lesions obstructing the airways.
Page 1 of 66652 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.