Sort by:
Page 46 of 66652 results

Multimodal Large Language Models for Medical Report Generation via Customized Prompt Tuning

Chunlei Li, Jingyang Hou, Yilei Shi, Jingliang Hu, Xiao Xiang Zhu, Lichao Mou

arxiv logopreprintJun 18 2025
Medical report generation from imaging data remains a challenging task in clinical practice. While large language models (LLMs) show great promise in addressing this challenge, their effective integration with medical imaging data still deserves in-depth exploration. In this paper, we present MRG-LLM, a novel multimodal large language model (MLLM) that combines a frozen LLM with a learnable visual encoder and introduces a dynamic prompt customization mechanism. Our key innovation lies in generating instance-specific prompts tailored to individual medical images through conditional affine transformations derived from visual features. We propose two implementations: prompt-wise and promptbook-wise customization, enabling precise and targeted report generation. Extensive experiments on IU X-ray and MIMIC-CXR datasets demonstrate that MRG-LLM achieves state-of-the-art performance in medical report generation. Our code will be made publicly available.

Development and interpretation of machine learning-based prognostic models for predicting high-risk prognostic pathological components in pulmonary nodules: integrating clinical features, serum tumor marker and imaging features.

Wang D, Qiu J, Li R, Tian H

pubmed logopapersJun 17 2025
With the improvement of imaging, the screening rate of Pulmonary nodules (PNs) has further increased, but their identification of High-Risk Prognostic Pathological Components (HRPPC) is still a major challenge. In this study, we aimed to build a multi-parameter machine learning predictive model to improve the discrimination accuracy of HRPPC. This study included 816 patients with ≤ 3 cm pulmonary nodules with clear pathology and underwent pulmonary resection. High-resolution chest CT images, clinicopathological characteristics were collected from patients. Lasso regression was utilized in order to identify key features, and a machine learning prediction model was constructed based on the screened key features. The recognition ability of the prediction model was evaluated using (ROC) curves and confusion matrices. Model calibration ability was evaluated using calibration curves. Decision curve analysis (DCA) was used to evaluate the value of the model for clinical applications. Use SHAP values for interpreting predictive models. A total of 816 patients were included in this study, of which 112 (13.79%) had HRPPC of pulmonary nodules. By selecting key variables through Lasso recursive feature elimination, we finally identified 13 key relevant features. The XGB model performed the best, with an area under the ROC curve (AUC) of 0.930 (95% CI: 0.906-0.954) in the training cohort and 0.835 (95% CI: 0.774-0.895) in the validation cohort, indicating that the XGB model had excellent predictive performance. In addition, the calibration curves of the XGB model showed good calibration in both cohorts. DCA demonstrated that the predictive model had a positive benefit in general clinical decision-making. The SHAP values identified the top 3 predictors affecting the HRPPC of PNs as CT Value, Nodule Long Diameter, and PRO-GRP. Our prediction model for identifying HRPPC in PNs has excellent discrimination, calibration and clinical utility. Thoracic surgeons could make relatively reliable predictions of HRPPC in PNs without the possibility of invasive testing.

SCISSOR: Mitigating Semantic Bias through Cluster-Aware Siamese Networks for Robust Classification

Shuo Yang, Bardh Prenkaj, Gjergji Kasneci

arxiv logopreprintJun 17 2025
Shortcut learning undermines model generalization to out-of-distribution data. While the literature attributes shortcuts to biases in superficial features, we show that imbalances in the semantic distribution of sample embeddings induce spurious semantic correlations, compromising model robustness. To address this issue, we propose SCISSOR (Semantic Cluster Intervention for Suppressing ShORtcut), a Siamese network-based debiasing approach that remaps the semantic space by discouraging latent clusters exploited as shortcuts. Unlike prior data-debiasing approaches, SCISSOR eliminates the need for data augmentation and rewriting. We evaluate SCISSOR on 6 models across 4 benchmarks: Chest-XRay and Not-MNIST in computer vision, and GYAFC and Yelp in NLP tasks. Compared to several baselines, SCISSOR reports +5.3 absolute points in F1 score on GYAFC, +7.3 on Yelp, +7.7 on Chest-XRay, and +1 on Not-MNIST. SCISSOR is also highly advantageous for lightweight models with ~9.5% improvement on F1 for ViT on computer vision datasets and ~11.9% for BERT on NLP. Our study redefines the landscape of model generalization by addressing overlooked semantic biases, establishing SCISSOR as a foundational framework for mitigating shortcut learning and fostering more robust, bias-resistant AI systems.

SCISSOR: Mitigating Semantic Bias through Cluster-Aware Siamese Networks for Robust Classification

Shuo Yang, Bardh Prenkaj, Gjergji Kasneci

arxiv logopreprintJun 17 2025
Shortcut learning undermines model generalization to out-of-distribution data. While the literature attributes shortcuts to biases in superficial features, we show that imbalances in the semantic distribution of sample embeddings induce spurious semantic correlations, compromising model robustness. To address this issue, we propose SCISSOR (Semantic Cluster Intervention for Suppressing ShORtcut), a Siamese network-based debiasing approach that remaps the semantic space by discouraging latent clusters exploited as shortcuts. Unlike prior data-debiasing approaches, SCISSOR eliminates the need for data augmentation and rewriting. We evaluate SCISSOR on 6 models across 4 benchmarks: Chest-XRay and Not-MNIST in computer vision, and GYAFC and Yelp in NLP tasks. Compared to several baselines, SCISSOR reports +5.3 absolute points in F1 score on GYAFC, +7.3 on Yelp, +7.7 on Chest-XRay, and +1 on Not-MNIST. SCISSOR is also highly advantageous for lightweight models with ~9.5% improvement on F1 for ViT on computer vision datasets and ~11.9% for BERT on NLP. Our study redefines the landscape of model generalization by addressing overlooked semantic biases, establishing SCISSOR as a foundational framework for mitigating shortcut learning and fostering more robust, bias-resistant AI systems.

Risk factors and prognostic indicators for progressive fibrosing interstitial lung disease: a deep learning-based CT quantification approach.

Lee K, Lee JH, Koh SY, Park H, Goo JM

pubmed logopapersJun 17 2025
To investigate the value of deep learning-based quantitative CT (QCT) in predicting progressive fibrosing interstitial lung disease (PF-ILD) and assessing prognosis. This single-center retrospective study included ILD patients with CT examinations between January 2015 and June 2021. Each ILD finding (ground-glass opacity (GGO), reticular opacity (RO), honeycombing) and fibrosis (sum of RO and honeycombing) was quantified from baseline and follow-up CTs. Logistic regression was performed to identify predictors of PF-ILD, defined as radiologic progression along with forced vital capacity (FVC) decline ≥ 5% predicted. Cox proportional hazard regression was used to assess mortality. The added value of incorporating QCT into FVC was evaluated using C-index. Among 465 ILD patients (median age [IQR], 65 [58-71] years; 238 men), 148 had PF-ILD. After adjusting for clinico-radiological variables, baseline RO (OR: 1.096, 95% CI: 1.042, 1.152, p < 0.001) and fibrosis extent (OR: 1.035, 95% CI: 1.004, 1.067, p = 0.025) were PF-ILD predictors. Baseline RO (HR: 1.063, 95% CI: 1.013, 1.115, p = 0.013), honeycombing (HR: 1.074, 95% CI: 1.034, 1.116, p < 0.001), and fibrosis extent (HR: 1.067, 95% CI: 1.043, 1.093, p < 0.001) predicted poor prognosis. The Cox models combining baseline percent predicted FVC with QCT (each ILD finding, C-index: 0.714, 95% CI: 0.660, 0.764; fibrosis, C-index: 0.703, 95% CI: 0.649, 0.752; both p-values < 0.001) outperformed the model without QCT (C-index: 0.545, 95% CI: 0.500, 0.599). Deep learning-based QCT for ILD findings is useful for predicting PF-ILD and its prognosis. Question Does deep learning-based CT quantification of interstitial lung disease (ILD) findings have value in predicting progressive fibrosing ILD (PF-ILD) and improving prognostication? Findings Deep learning-based CT quantification of baseline reticular opacity and fibrosis predicted the development of PF-ILD. In addition, CT quantification demonstrated value in predicting all-cause mortality. Clinical relevance Deep learning-based CT quantification of ILD findings is useful for predicting PF-ILD and its prognosis. Identifying patients at high risk of PF-ILD through CT quantification enables closer monitoring and earlier treatment initiation, which may lead to improved clinical outcomes.

Enhancing Ultrasound-Based Diagnosis of Unilateral Diaphragmatic Paralysis with a Visual Transformer-Based Model.

Kalkanis A, Bakalis D, Testelmans D, Buyse B, Simos YV, Tsamis KI, Manis G

pubmed logopapersJun 17 2025
This paper presents a novel methodology that combines a pre-trained Visual Transformer-Based Deep Model (ViT) with a custom denoising image filter for the diagnosis of Unilateral Diaphragmatic Paralysis (UDP) using Ultrasound (US) images. The ViT is employed to extract complex features from US images of 17 volunteers, capturing intricate patterns and details that are critical for accurate diagnosis. The extracted features are then fed into an ensemble learning model to determine the presence of UDP. The proposed framework achieves an average accuracy of 93.8% on a stratified 5-fold cross-validation, surpassing relevant state-of-the-art (SOTA) image classifiers. This high level of performance underscores the robustness and effectiveness of the framework, highlighting its potential as a prominent diagnostic tool in medical imaging.

Predicting overall survival of NSCLC patients with clinical, radiomics and deep learning features

Kanakarajan, H., Zhou, J., Baene, W. D., Sitskoorn, M.

medrxiv logopreprintJun 16 2025
Background and purposeAccurate estimation of Overall Survival (OS) in Non-Small Cell Lung Cancer (NSCLC) patients provides critical insights for treatment planning. While previous studies showed that radiomics and Deep Learning (DL) features increased prediction accuracy, this study aimed to examine whether a model that combines the radiomics and DL features with the clinical and dosimetric features outperformed other models. Materials and methodsWe collected pre-treatment lung CT scans and clinical data for 225 NSCLC patients from the Maastro Clinic: 180 for training and 45 for testing. Radiomics features were extracted using the Python radiomics feature extractor, and DL features were obtained using a 3D ResNet model. An ensemble model comprising XGB and NN classifiers was developed using: (1) clinical features only; (2) clinical and radiomics features; (3) clinical and DL features; and (4) clinical, radiomics, and DL features. The performance metrics were evaluated for the test and K-fold cross-validation data sets. ResultsThe prediction model utilizing only clinical variables provided an Area Under the Receiver Operating Characteristic Curve (AUC) of 0.64 and a test accuracy of 77.55%. The best performance came from combining clinical, radiomics, and DL features (AUC: 0.84, accuracy: 85.71%). The prediction improvement of this model was statistically significant compared to models trained with clinical features alone or with a combination of clinical and radiomics features. ConclusionIntegrating radiomics and DL features with clinical characteristics improved the prediction of OS after radiotherapy for NSCLC patients. The increased accuracy of our integrated model enables personalized, risk-based treatment planning, guiding clinicians toward more effective interventions, improved patient outcomes and enhanced quality of life.

Beyond the First Read: AI-Assisted Perceptual Error Detection in Chest Radiography Accounting for Interobserver Variability

Adhrith Vutukuri, Akash Awasthi, David Yang, Carol C. Wu, Hien Van Nguyen

arxiv logopreprintJun 16 2025
Chest radiography is widely used in diagnostic imaging. However, perceptual errors -- especially overlooked but visible abnormalities -- remain common and clinically significant. Current workflows and AI systems provide limited support for detecting such errors after interpretation and often lack meaningful human--AI collaboration. We introduce RADAR (Radiologist--AI Diagnostic Assistance and Review), a post-interpretation companion system. RADAR ingests finalized radiologist annotations and CXR images, then performs regional-level analysis to detect and refer potentially missed abnormal regions. The system supports a "second-look" workflow and offers suggested regions of interest (ROIs) rather than fixed labels to accommodate inter-observer variation. We evaluated RADAR on a simulated perceptual-error dataset derived from de-identified CXR cases, using F1 score and Intersection over Union (IoU) as primary metrics. RADAR achieved a recall of 0.78, precision of 0.44, and an F1 score of 0.56 in detecting missed abnormalities in the simulated perceptual-error dataset. Although precision is moderate, this reduces over-reliance on AI by encouraging radiologist oversight in human--AI collaboration. The median IoU was 0.78, with more than 90% of referrals exceeding 0.5 IoU, indicating accurate regional localization. RADAR effectively complements radiologist judgment, providing valuable post-read support for perceptual-error detection in CXR interpretation. Its flexible ROI suggestions and non-intrusive integration position it as a promising tool in real-world radiology workflows. To facilitate reproducibility and further evaluation, we release a fully open-source web implementation alongside a simulated error dataset. All code, data, demonstration videos, and the application are publicly available at https://github.com/avutukuri01/RADAR.

Precision Medicine and Machine Learning to predict critical disease and death due to Coronavirus disease 2019 (COVID-19).

Júnior WLDT, Danelli T, Tano ZN, Cassela PLCS, Trigo GL, Cardoso KM, Loni LP, Ahrens TM, Espinosa BR, Fernandes AJ, Almeida ERD, Lozovoy MAB, Reiche EMV, Maes M, Simão ANC

pubmed logopapersJun 16 2025
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causes Coronavirus Disease 2019 (COVID-19) and induces activation of inflammatory pathways, including the inflammasome. The aim was to construct Machine Learning (ML) models to predict critical disease and death in patients with COVID-19. A total of 528 individuals with SARS-CoV-2 infection were included, comprising 308 with critical and 220 with non-critical COVID-19. The ML models included imaging, demographic, inflammatory biomarkers, NLRP3 (rs10754558 and rs10157379) and IL18 (rs360717 and rs187238) inflammasome variants. Individuals with critical COVID-19 were older, higher male/female ratio, body mass index (BMI), rate of type 2 diabetes mellitus (T2DM), hypertension, inflammatory biomarkers, need of orotracheal intubation, intensive care unit admission, incidence of death, and sickness symptom complex (SSC) scores and lower peripheral oxygen saturation (SpO<sub>2</sub>) compared to those with non-critical disease. We found that 49.5 % of the variance in the severity of critical COVID-19 was explained by SpO<sub>2</sub> and SSC (negatively associated), chest computed tomography alterations (CCTA), inflammatory biomarkers, severe acute respiratory syndrome (SARS), BMI, T2DM, and age (positively associated). In this model, the NLRP3/IL18 variants showed indirect effects on critical COVID-19 that were mediated by inflammatory biomarkers, SARS, and SSC. Neural network models yielded a prediction of critical disease and death due to COVID-19 with an area under the receiving operating characteristic curve of 0.930 and 0.927, respectively. These ML methods increase the accuracy of predicting severity, critical illness, and mortality caused by COVID-19 and show that the genetic variants contribute to the predictive power of the ML models.

Finding Optimal Kernel Size and Dimension in Convolutional Neural Networks An Architecture Optimization Approach

Shreyas Rajeev, B Sathish Babu

arxiv logopreprintJun 16 2025
Kernel size selection in Convolutional Neural Networks (CNNs) is a critical but often overlooked design decision that affects receptive field, feature extraction, computational cost, and model accuracy. This paper proposes the Best Kernel Size Estimation Function (BKSEF), a mathematically grounded and empirically validated framework for optimal, layer-wise kernel size determination. BKSEF balances information gain, computational efficiency, and accuracy improvements by integrating principles from information theory, signal processing, and learning theory. Extensive experiments on CIFAR-10, CIFAR-100, ImageNet-lite, ChestX-ray14, and GTSRB datasets demonstrate that BKSEF-guided architectures achieve up to 3.1 percent accuracy improvement and 42.8 percent reduction in FLOPs compared to traditional models using uniform 3x3 kernels. Two real-world case studies further validate the approach: one for medical image classification in a cloud-based setup, and another for traffic sign recognition on edge devices. The former achieved enhanced interpretability and accuracy, while the latter reduced latency and model size significantly, with minimal accuracy trade-off. These results show that kernel size can be an active, optimizable parameter rather than a fixed heuristic. BKSEF provides practical heuristics and theoretical support for researchers and developers seeking efficient and application-aware CNN designs. It is suitable for integration into neural architecture search pipelines and real-time systems, offering a new perspective on CNN optimization.
Page 46 of 66652 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.