Sort by:
Page 54 of 1331329 results

From BERT to generative AI - Comparing encoder-only vs. large language models in a cohort of lung cancer patients for named entity recognition in unstructured medical reports.

Arzideh K, Schäfer H, Allende-Cid H, Baldini G, Hilser T, Idrissi-Yaghir A, Laue K, Chakraborty N, Doll N, Antweiler D, Klug K, Beck N, Giesselbach S, Friedrich CM, Nensa F, Schuler M, Hosch R

pubmed logopapersJun 23 2025
Extracting clinical entities from unstructured medical documents is critical for improving clinical decision support and documentation workflows. This study examines the performance of various encoder and decoder models trained for Named Entity Recognition (NER) of clinical parameters in pathology and radiology reports, highlighting the applicability of Large Language Models (LLMs) for this task. Three NER methods were evaluated: (1) flat NER using transformer-based models, (2) nested NER with a multi-task learning setup, and (3) instruction-based NER utilizing LLMs. A dataset of 2013 pathology reports and 413 radiology reports, annotated by medical students, was used for training and testing. The performance of encoder-based NER models (flat and nested) was superior to that of LLM-based approaches. The best-performing flat NER models achieved F1-scores of 0.87-0.88 on pathology reports and up to 0.78 on radiology reports, while nested NER models performed slightly lower. In contrast, multiple LLMs, despite achieving high precision, yielded significantly lower F1-scores (ranging from 0.18 to 0.30) due to poor recall. A contributing factor appears to be that these LLMs produce fewer but more accurate entities, suggesting they become overly conservative when generating outputs. LLMs in their current form are unsuitable for comprehensive entity extraction tasks in clinical domains, particularly when faced with a high number of entity types per document, though instructing them to return more entities in subsequent refinements may improve recall. Additionally, their computational overhead does not provide proportional performance gains. Encoder-based NER models, particularly those pre-trained on biomedical data, remain the preferred choice for extracting information from unstructured medical documents.

MOSCARD -- Causal Reasoning and De-confounding for Multimodal Opportunistic Screening of Cardiovascular Adverse Events

Jialu Pi, Juan Maria Farina, Rimita Lahiri, Jiwoong Jeong, Archana Gurudu, Hyung-Bok Park, Chieh-Ju Chao, Chadi Ayoub, Reza Arsanjani, Imon Banerjee

arxiv logopreprintJun 23 2025
Major Adverse Cardiovascular Events (MACE) remain the leading cause of mortality globally, as reported in the Global Disease Burden Study 2021. Opportunistic screening leverages data collected from routine health check-ups and multimodal data can play a key role to identify at-risk individuals. Chest X-rays (CXR) provide insights into chronic conditions contributing to major adverse cardiovascular events (MACE), while 12-lead electrocardiogram (ECG) directly assesses cardiac electrical activity and structural abnormalities. Integrating CXR and ECG could offer a more comprehensive risk assessment than conventional models, which rely on clinical scores, computed tomography (CT) measurements, or biomarkers, which may be limited by sampling bias and single modality constraints. We propose a novel predictive modeling framework - MOSCARD, multimodal causal reasoning with co-attention to align two distinct modalities and simultaneously mitigate bias and confounders in opportunistic risk estimation. Primary technical contributions are - (i) multimodal alignment of CXR with ECG guidance; (ii) integration of causal reasoning; (iii) dual back-propagation graph for de-confounding. Evaluated on internal, shift data from emergency department (ED) and external MIMIC datasets, our model outperformed single modality and state-of-the-art foundational models - AUC: 0.75, 0.83, 0.71 respectively. Proposed cost-effective opportunistic screening enables early intervention, improving patient outcomes and reducing disparities.

Comparative Analysis of Multimodal Large Language Models GPT-4o and o1 vs Clinicians in Clinical Case Challenge Questions

Jung, J., Kim, H., Bae, S., Park, J. Y.

medrxiv logopreprintJun 23 2025
BackgroundGenerative Pre-trained Transformer 4 (GPT-4) has demonstrated strong performance in standardized medical examinations but has limitations in real-world clinical settings. The newly released multimodal GPT-4o model, which integrates text and image inputs to enhance diagnostic capabilities, and the multimodal o1 model, which incorporates advanced reasoning, may address these limitations. ObjectiveThis study aimed to compare the performance of GPT-4o and o1 against clinicians in real-world clinical case challenges. MethodsThis retrospective, cross-sectional study used Medscape case challenge questions from May 2011 to June 2024 (n = 1,426). Each case included text and images of patient history, physical examination findings, diagnostic test results, and imaging studies. Clinicians were required to choose one answer from among multiple options, with the most frequent response defined as the clinicians decision. Data-based decisions were made using GPT models (3.5 Turbo, 4 Turbo, 4 Omni, and o1) to interpret the text and images, followed by a process to provide a formatted answer. We compared the performances of the clinicians and GPT models using Mixed-effects logistic regression analysis. ResultsOf the 1,426 questions, clinicians achieved an overall accuracy of 85.0%, whereas GPT-4o and o1 demonstrated higher accuracies of 88.4% and 94.3% (mean difference 3.4%; P = .005 and mean difference 9.3%; P < .001), respectively. In the multimodal performance analysis, which included cases involving images (n = 917), GPT-4o achieved an accuracy of 88.3%, and o1 achieved 93.9%, both significantly outperforming clinicians (mean difference 4.2%; P = .005 and mean difference 9.8%; P < .001). o1 showed the highest accuracy across all question categories, achieving 92.6% in diagnosis (mean difference 14.5%; P < .001), 97.0% in disease characteristics (mean difference 7.2%; P < .001), 92.6% in examination (mean difference 7.3%; P = .002), and 94.8% in treatment (mean difference 4.3%; P = .005), consistently outperforming clinicians. In terms of medical specialty, o1 achieved 93.6% accuracy in internal medicine (mean difference 10.3%; P < .001), 96.6% in major surgery (mean difference 9.2%; P = .030), 97.3% in psychiatry (mean difference 10.6%; P = .030), and 95.4% in minor specialties (mean difference 10.0%; P < .001), significantly surpassing clinicians. Across five trials, GPT-4o and o1 provided the correct answer 5/5 times in 86.2% and 90.7% of the cases, respectively. ConclusionsThe GPT-4o and o1 models achieved higher accuracy than clinicians in clinical case challenge questions, particularly in disease diagnosis. The GPT-4o and o1 could serve as valuable tools to assist healthcare professionals in clinical settings.

Stacking Ensemble Learning-based Models Enabling Accurate Diagnosis of Cardiac Amyloidosis using SPECT/CT:an International and Multicentre Study

Mo, Q., Cui, J., Jia, S., Zhang, Y., Xiao, Y., Liu, C., Zhou, C., Spielvogel, C. P., Calabretta, R., Zhou, W., Cao, K., Hacker, M., Li, X., Zhao, M.

medrxiv logopreprintJun 23 2025
PURPOSECardiac amyloidosis (CA), a life-threatening infiltrative cardiomyopathy, can be non-invasively diagnosed using [99mTc]Tc-bisphosphonate SPECT/CT. However, subjective visual interpretation risks diagnostic inaccuracies. We developed and validated a machine learning (ML) framework leveraging SPECT/CT radiomics to automate CA detection. METHODSThis retrospective multicenter study analyzed 290 patients of suspected CA who underwent [99mTc]Tc-PYP or [99mTc]Tc-DPD SPECT/CT. Radiomic features were extracted from co-registered SPECT and CT images, harmonized via intra-class correlation and Pearson correlation filtering, and optimized through LASSO regression. A stacking ensemble model incorporating support vector machine (SVM), random forest (RF), gradient boosting decision tree (GBDT), and adaptive boosting (AdaBoost) classifiers was constructed. The model was validated using an internal validation set (n = 54) and two external test set (n = 54 and n = 58).Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), calibration, and decision curve analysis (DCA). Feature importance was interpreted using SHapley Additive exPlanations (SHAP) values. RESULTSOf 290 patients, 117 (40.3%) had CA. The stacking radiomics model attained AUCs of 0.871, 0.824, and 0.839 in the validation, test 1, and test 2 cohorts, respectively, significantly outperforming the clinical model (AUC 0.546 in validation set, P<0.05). DCA demonstrated superior net benefit over the clinical model across relevant thresholds, and SHAP analysis highlighted wavelet-transformed first-order and texture features as key predictors. CONCLUSIONA stacking ML model with SPECT/CT radiomics improves CA diagnosis, showing strong generalizability across varied imaging protocols and populations and highlighting its potential as a decision-support tool.

From "time is brain" to "time is collaterals": updates on the role of cerebral collateral circulation in stroke.

Marilena M, Romana PF, Guido A, Gianluca R, Sebastiano F, Enrico P, Sabrina A

pubmed logopapersJun 22 2025
Acute ischemic stroke (AIS) remains the leading cause of mortality and disability worldwide. While revascularization therapies-such as intravenous thrombolysis (IVT) and endovascular thrombectomy (EVT)-have significantly improved outcomes, their success is strongly influenced by the status of cerebral collateral circulation. Collateral vessels sustain cerebral perfusion during vascular occlusion, limiting infarct growth and extending therapeutic windows. Despite this recognized importance, standardized methods for assessing collateral status and integrating it into treatment strategies are still evolving. This narrative review synthesizes current evidence on the role of collateral circulation in AIS, focusing on its impact on infarct dynamics, treatment efficacy, and functional recovery. We highlight findings from major clinical trials-including MR CLEAN, DAWN, DEFUSE-3, and SWIFT PRIME which consistently demonstrate that robust collateral networks are associated with improved outcomes and expanded eligibility for reperfusion therapies. Advances in neuroimaging, such as multiphase CTA and perfusion MRI, alongside emerging AI-driven automated collateral grading, are reshaping patients' selection and clinical decision-making. We also discuss novel therapeutic strategies aimed at enhancing collateral flow, such as vasodilators, neuroprotective agents, statins, and stem cell therapies. Despite growing evidence supporting collateral-based treatment approaches, real-time clinical implementation remains limited by challenges in standardization and access. Cerebral collateral circulation is a critical determinant of stroke prognosis and treatment response. Incorporating collateral assessment into acute stroke workflows-supported by advanced imaging, artificial intelligence, and personalized medicine-offers a promising pathway to optimize outcomes. As the field moves beyond a strict "time is brain" model, the emerging paradigm of "time is collaterals" may better reflect the dynamic interplay between perfusion, tissue viability, and therapeutic opportunity in AIS management.

Training-free Test-time Improvement for Explainable Medical Image Classification

Hangzhou He, Jiachen Tang, Lei Zhu, Kaiwen Li, Yanye Lu

arxiv logopreprintJun 22 2025
Deep learning-based medical image classification techniques are rapidly advancing in medical image analysis, making it crucial to develop accurate and trustworthy models that can be efficiently deployed across diverse clinical scenarios. Concept Bottleneck Models (CBMs), which first predict a set of explainable concepts from images and then perform classification based on these concepts, are increasingly being adopted for explainable medical image classification. However, the inherent explainability of CBMs introduces new challenges when deploying trained models to new environments. Variations in imaging protocols and staining methods may induce concept-level shifts, such as alterations in color distribution and scale. Furthermore, since CBM training requires explicit concept annotations, fine-tuning models solely with image-level labels could compromise concept prediction accuracy and faithfulness - a critical limitation given the high cost of acquiring expert-annotated concept labels in medical domains. To address these challenges, we propose a training-free confusion concept identification strategy. By leveraging minimal new data (e.g., 4 images per class) with only image-level labels, our approach enhances out-of-domain performance without sacrificing source domain accuracy through two key operations: masking misactivated confounding concepts and amplifying under-activated discriminative concepts. The efficacy of our method is validated on both skin and white blood cell images. Our code is available at: https://github.com/riverback/TF-TTI-XMed.

CT Radiomics-Based Explainable Machine Learning Model for Accurate Differentiation of Malignant and Benign Endometrial Tumors: A Two-Center Study

Tingrui Zhang, Honglin Wu, Zekun Jiang, Yingying Wang, Rui Ye, Huiming Ni, Chang Liu, Jin Cao, Xuan Sun, Rong Shao, Xiaorong Wei, Yingchun Sun

arxiv logopreprintJun 22 2025
Aimed to develop and validate a CT radiomics-based explainable machine learning model for diagnosing malignancy and benignity specifically in endometrial cancer (EC) patients. A total of 83 EC patients from two centers, including 46 with malignant and 37 with benign conditions, were included, with data split into a training set (n=59) and a testing set (n=24). The regions of interest (ROIs) were manually segmented from pre-surgical CT scans, and 1132 radiomic features were extracted from the pre-surgical CT scans using Pyradiomics. Six explainable machine learning modeling algorithms were implemented respectively, for determining the optimal radiomics pipeline. The diagnostic performance of the radiomic model was evaluated by using sensitivity, specificity, accuracy, precision, F1 score, confusion matrices, and ROC curves. To enhance clinical understanding and usability, we separately implemented SHAP analysis and feature mapping visualization, and evaluated the calibration curve and decision curve. By comparing six modeling strategies, the Random Forest model emerged as the optimal choice for diagnosing EC, with a training AUC of 1.00 and a testing AUC of 0.96. SHAP identified the most important radiomic features, revealing that all selected features were significantly associated with EC (P < 0.05). Radiomics feature maps also provide a feasible assessment tool for clinical applications. DCA indicated a higher net benefit for our model compared to the "All" and "None" strategies, suggesting its clinical utility in identifying high-risk cases and reducing unnecessary interventions. In conclusion, the CT radiomics-based explainable machine learning model achieved high diagnostic performance, which could be used as an intelligent auxiliary tool for the diagnosis of endometrial cancer.

STACT-Time: Spatio-Temporal Cross Attention for Cine Thyroid Ultrasound Time Series Classification

Irsyad Adam, Tengyue Zhang, Shrayes Raman, Zhuyu Qiu, Brandon Taraku, Hexiang Feng, Sile Wang, Ashwath Radhachandran, Shreeram Athreya, Vedrana Ivezic, Peipei Ping, Corey Arnold, William Speier

arxiv logopreprintJun 22 2025
Thyroid cancer is among the most common cancers in the United States. Thyroid nodules are frequently detected through ultrasound (US) imaging, and some require further evaluation via fine-needle aspiration (FNA) biopsy. Despite its effectiveness, FNA often leads to unnecessary biopsies of benign nodules, causing patient discomfort and anxiety. To address this, the American College of Radiology Thyroid Imaging Reporting and Data System (TI-RADS) has been developed to reduce benign biopsies. However, such systems are limited by interobserver variability. Recent deep learning approaches have sought to improve risk stratification, but they often fail to utilize the rich temporal and spatial context provided by US cine clips, which contain dynamic global information and surrounding structural changes across various views. In this work, we propose the Spatio-Temporal Cross Attention for Cine Thyroid Ultrasound Time Series Classification (STACT-Time) model, a novel representation learning framework that integrates imaging features from US cine clips with features from segmentation masks automatically generated by a pretrained model. By leveraging self-attention and cross-attention mechanisms, our model captures the rich temporal and spatial context of US cine clips while enhancing feature representation through segmentation-guided learning. Our model improves malignancy prediction compared to state-of-the-art models, achieving a cross-validation precision of 0.91 (plus or minus 0.02) and an F1 score of 0.89 (plus or minus 0.02). By reducing unnecessary biopsies of benign nodules while maintaining high sensitivity for malignancy detection, our model has the potential to enhance clinical decision-making and improve patient outcomes.

Decoding Federated Learning: The FedNAM+ Conformal Revolution

Sree Bhargavi Balija, Amitash Nanda, Debashis Sahoo

arxiv logopreprintJun 22 2025
Federated learning has significantly advanced distributed training of machine learning models across decentralized data sources. However, existing frameworks often lack comprehensive solutions that combine uncertainty quantification, interpretability, and robustness. To address this, we propose FedNAM+, a federated learning framework that integrates Neural Additive Models (NAMs) with a novel conformal prediction method to enable interpretable and reliable uncertainty estimation. Our method introduces a dynamic level adjustment technique that utilizes gradient-based sensitivity maps to identify key input features influencing predictions. This facilitates both interpretability and pixel-wise uncertainty estimates. Unlike traditional interpretability methods such as LIME and SHAP, which do not provide confidence intervals, FedNAM+ offers visual insights into prediction reliability. We validate our approach through experiments on CT scan, MNIST, and CIFAR datasets, demonstrating high prediction accuracy with minimal loss (e.g., only 0.1% on MNIST), along with transparent uncertainty measures. Visual analysis highlights variable uncertainty intervals, revealing low-confidence regions where model performance can be improved with additional data. Compared to Monte Carlo Dropout, FedNAM+ delivers efficient and global uncertainty estimates with reduced computational overhead, making it particularly suitable for federated learning scenarios. Overall, FedNAM+ provides a robust, interpretable, and computationally efficient framework that enhances trust and transparency in decentralized predictive modeling.

The future of biomarkers for vascular contributions to cognitive impairment and dementia (VCID): proceedings of the 2025 annual workshop of the Albert research institute for white matter and cognition.

Lennon MJ, Karvelas N, Ganesh A, Whitehead S, Sorond FA, Durán Laforet V, Head E, Arfanakis K, Kolachalama VB, Liu X, Lu H, Ramirez J, Walker K, Weekman E, Wellington CL, Winston C, Barone FC, Corriveau RA

pubmed logopapersJun 21 2025
Advances in biomarkers and pathophysiology of vascular contributions to cognitive impairment and dementia (VCID) are expected to bring greater mechanistic insights, more targeted treatments, and potentially disease-modifying therapies. The 2025 Annual Workshop of the Albert Research Institute for White Matter and Cognition, sponsored by the Leo and Anne Albert Charitable Trust since 2015, focused on novel biomarkers for VCID. The meeting highlighted the complexity of dementia, emphasizing that the majority of cases involve multiple brain pathologies, with vascular pathology typically present. Potential novel approaches to diagnosis of disease processes and progression that may result in VCID included measures of microglial senescence and retinal changes, as well as artificial intelligence (AI) integration of multimodal datasets. Proteomic studies identified plasma proteins associated with cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL; a rare genetic disorder affecting brain vessels) and age-related vascular pathology that suggested potential therapeutic targets. Blood-based microglial and brain-derived extracellular vesicles are promising tools for early detection of brain inflammation and other changes that have been associated with cognitive decline. Imaging measures of blood perfusion, oxygen extraction, and cerebrospinal fluid (CSF) flow were discussed as potential VCID biomarkers, in part because of correlations with classic pathological Alzheimer's disease (AD) biomarkers. MRI-visible perivascular spaces, which may be a novel imaging biomarker of sleep-driven glymphatic waste clearance dysfunction, are associated with vascular risk factors, lower cognitive function, and various brain pathologies including Alzheimer's, Parkinson's and cerebral amyloid angiopathy (CAA). People with Down syndrome are at high risk for dementia. Individuals with Down syndrome who develop dementia almost universally experience mixed brain pathologies, with AD pathology and cerebrovascular pathology being the most common. This follows the pattern in the general population where mixed pathologies are also predominant in the brains of people clinically diagnosed with dementia, including AD dementia. Intimate partner violence-related brain injury, hypertension's impact on dementia risk, and the promise of remote ischemic conditioning for treating VCID were additional themes.
Page 54 of 1331329 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.