Sort by:
Page 196 of 2922917 results

External validation and performance analysis of a deep learning-based model for the detection of intracranial hemorrhage.

Nada A, Sayed AA, Hamouda M, Tantawi M, Khan A, Alt A, Hassanein H, Sevim BC, Altes T, Gaballah A

pubmed logopapersJun 1 2025
PurposeWe aimed to investigate the external validation and performance of an FDA-approved deep learning model in labeling intracranial hemorrhage (ICH) cases on a real-world heterogeneous clinical dataset. Furthermore, we delved deeper into evaluating how patients' risk factors influenced the model's performance and gathered feedback on satisfaction from radiologists of varying ranks.MethodsThis prospective IRB approved study included 5600 non-contrast CT scans of the head in various clinical settings, that is, emergency, inpatient, and outpatient units. The patients' risk factors were collected and tested for impacting the performance of DL model utilizing univariate and multivariate regression analyses. The performance of DL model was contrasted to the radiologists' interpretation to determine the presence or absence of ICH with subsequent classification into subcategories of ICH. Key metrics, including accuracy, sensitivity, specificity, positive predictive value, and negative predictive value, were calculated. Receiver operating characteristics curve, along with the area under the curve, were determined. Additionally, a questionnaire was conducted with radiologists of varying ranks to assess their experience with the model.ResultsThe model exhibited outstanding performance, achieving a high sensitivity of 89% and specificity of 96%. Additional performance metrics, including positive predictive value (82%), negative predictive value (97%), and overall accuracy (94%), underscore its robust capabilities. The area under the ROC curve further demonstrated the model's efficacy, reaching 0.954. Multivariate logistic regression revealed statistical significance for age, sex, history of trauma, operative intervention, HTN, and smoking.ConclusionOur study highlights the satisfactory performance of the DL model on a diverse real-world dataset, garnering positive feedback from radiology trainees.

RS-MAE: Region-State Masked Autoencoder for Neuropsychiatric Disorder Classifications Based on Resting-State fMRI.

Ma H, Xu Y, Tian L

pubmed logopapersJun 1 2025
Dynamic functional connectivity (DFC) extracted from resting-state functional magnetic resonance imaging (fMRI) has been widely used for neuropsychiatric disorder classifications. However, serious information redundancy within DFC matrices can significantly undermine the performance of classification models based on them. Moreover, traditional deep models cannot adapt well to connectivity-like data, and insufficient training samples further hinder their effective training. In this study, we proposed a novel region-state masked autoencoder (RS-MAE) for proficient representation learning based on DFC matrices and ultimately neuropsychiatric disorder classifications based on fMRI. Three strategies were taken to address the aforementioned limitations. First, masked autoencoder (MAE) was introduced to reduce redundancy within DFC matrices and learn effective representations of human brain function simultaneously. Second, region-state (RS) patch embedding was proposed to replace space-time patch embedding in video MAE to adapt to DFC matrices, in which only topological locality, rather than spatial locality, exists. Third, random state concatenation (RSC) was introduced as a DFC matrix augmentation approach, to alleviate the problem of training sample insufficiency. Neuropsychiatric disorder classifications were attained by fine-tuning the pretrained encoder included in RS-MAE. The performance of the proposed RS-MAE was evaluated on four publicly available datasets, achieving accuracies of 76.32%, 77.25%, 88.87%, and 76.53% for the attention deficit and hyperactivity disorder (ADHD), autism spectrum disorder (ASD), Alzheimer's disease (AD), and schizophrenia (SCZ) classification tasks, respectively. These results demonstrate the efficacy of the RS-MAE as a proficient deep learning model for neuropsychiatric disorder classifications.

Revolutionizing Radiology Workflow with Factual and Efficient CXR Report Generation

Pimchanok Sukjai, Apiradee Boonmee

arxiv logopreprintJun 1 2025
The escalating demand for medical image interpretation underscores the critical need for advanced artificial intelligence solutions to enhance the efficiency and accuracy of radiological diagnoses. This paper introduces CXR-PathFinder, a novel Large Language Model (LLM)-centric foundation model specifically engineered for automated chest X-ray (CXR) report generation. We propose a unique training paradigm, Clinician-Guided Adversarial Fine-Tuning (CGAFT), which meticulously integrates expert clinical feedback into an adversarial learning framework to mitigate factual inconsistencies and improve diagnostic precision. Complementing this, our Knowledge Graph Augmentation Module (KGAM) acts as an inference-time safeguard, dynamically verifying generated medical statements against authoritative knowledge bases to minimize hallucinations and ensure standardized terminology. Leveraging a comprehensive dataset of millions of paired CXR images and expert reports, our experiments demonstrate that CXR-PathFinder significantly outperforms existing state-of-the-art medical vision-language models across various quantitative metrics, including clinical accuracy (Macro F1 (14): 46.5, Micro F1 (14): 59.5). Furthermore, blinded human evaluation by board-certified radiologists confirms CXR-PathFinder's superior clinical utility, completeness, and accuracy, establishing its potential as a reliable and efficient aid for radiological practice. The developed method effectively balances high diagnostic fidelity with computational efficiency, providing a robust solution for automated medical report generation.

Aiding Medical Diagnosis through Image Synthesis and Classification

Kanishk Choudhary

arxiv logopreprintJun 1 2025
Medical professionals, especially those in training, often depend on visual reference materials to support an accurate diagnosis and develop pattern recognition skills. However, existing resources may lack the diversity and accessibility needed for broad and effective clinical learning. This paper presents a system designed to generate realistic medical images from textual descriptions and validate their accuracy through a classification model. A pretrained stable diffusion model was fine-tuned using Low-Rank Adaptation (LoRA) on the PathMNIST dataset, consisting of nine colorectal histopathology tissue types. The generative model was trained multiple times using different training parameter configurations, guided by domain-specific prompts to capture meaningful features. To ensure quality control, a ResNet-18 classification model was trained on the same dataset, achieving 99.76% accuracy in detecting the correct label of a colorectal histopathological medical image. Generated images were then filtered using the trained classifier and an iterative process, where inaccurate outputs were discarded and regenerated until they were correctly classified. The highest performing version of the generative model from experimentation achieved an F1 score of 0.6727, with precision and recall scores of 0.6817 and 0.7111, respectively. Some types of tissue, such as adipose tissue and lymphocytes, reached perfect classification scores, while others proved more challenging due to structural complexity. The self-validating approach created demonstrates a reliable method for synthesizing domain-specific medical images because of high accuracy in both the generation and classification portions of the system, with potential applications in both diagnostic support and clinical education. Future work includes improving prompt-specific accuracy and extending the system to other areas of medical imaging.

Modality Translation and Registration of MR and Ultrasound Images Using Diffusion Models

Xudong Ma, Nantheera Anantrasirichai, Stefanos Bolomytis, Alin Achim

arxiv logopreprintJun 1 2025
Multimodal MR-US registration is critical for prostate cancer diagnosis. However, this task remains challenging due to significant modality discrepancies. Existing methods often fail to align critical boundaries while being overly sensitive to irrelevant details. To address this, we propose an anatomically coherent modality translation (ACMT) network based on a hierarchical feature disentanglement design. We leverage shallow-layer features for texture consistency and deep-layer features for boundary preservation. Unlike conventional modality translation methods that convert one modality into another, our ACMT introduces the customized design of an intermediate pseudo modality. Both MR and US images are translated toward this intermediate domain, effectively addressing the bottlenecks faced by traditional translation methods in the downstream registration task. Experiments demonstrate that our method mitigates modality-specific discrepancies while preserving crucial anatomical boundaries for accurate registration. Quantitative evaluations show superior modality similarity compared to state-of-the-art modality translation methods. Furthermore, downstream registration experiments confirm that our translated images achieve the best alignment performance, highlighting the robustness of our framework for multi-modal prostate image registration.

A Large Convolutional Neural Network for Clinical Target and Multi-organ Segmentation in Gynecologic Brachytherapy with Multi-stage Learning

Mingzhe Hu, Yuan Gao, Yuheng Li, Ricahrd LJ Qiu, Chih-Wei Chang, Keyur D. Shah, Priyanka Kapoor, Beth Bradshaw, Yuan Shao, Justin Roper, Jill Remick, Zhen Tian, Xiaofeng Yang

arxiv logopreprintJun 1 2025
Purpose: Accurate segmentation of clinical target volumes (CTV) and organs-at-risk is crucial for optimizing gynecologic brachytherapy (GYN-BT) treatment planning. However, anatomical variability, low soft-tissue contrast in CT imaging, and limited annotated datasets pose significant challenges. This study presents GynBTNet, a novel multi-stage learning framework designed to enhance segmentation performance through self-supervised pretraining and hierarchical fine-tuning strategies. Methods: GynBTNet employs a three-stage training strategy: (1) self-supervised pretraining on large-scale CT datasets using sparse submanifold convolution to capture robust anatomical representations, (2) supervised fine-tuning on a comprehensive multi-organ segmentation dataset to refine feature extraction, and (3) task-specific fine-tuning on a dedicated GYN-BT dataset to optimize segmentation performance for clinical applications. The model was evaluated against state-of-the-art methods using the Dice Similarity Coefficient (DSC), 95th percentile Hausdorff Distance (HD95), and Average Surface Distance (ASD). Results: Our GynBTNet achieved superior segmentation performance, significantly outperforming nnU-Net and Swin-UNETR. Notably, it yielded a DSC of 0.837 +/- 0.068 for CTV, 0.940 +/- 0.052 for the bladder, 0.842 +/- 0.070 for the rectum, and 0.871 +/- 0.047 for the uterus, with reduced HD95 and ASD compared to baseline models. Self-supervised pretraining led to consistent performance improvements, particularly for structures with complex boundaries. However, segmentation of the sigmoid colon remained challenging, likely due to anatomical ambiguities and inter-patient variability. Statistical significance analysis confirmed that GynBTNet's improvements were significant compared to baseline models.

MedBookVQA: A Systematic and Comprehensive Medical Benchmark Derived from Open-Access Book

Sau Lai Yip, Sunan He, Yuxiang Nie, Shu Pui Chan, Yilin Ye, Sum Ying Lam, Hao Chen

arxiv logopreprintJun 1 2025
The accelerating development of general medical artificial intelligence (GMAI), powered by multimodal large language models (MLLMs), offers transformative potential for addressing persistent healthcare challenges, including workforce deficits and escalating costs. The parallel development of systematic evaluation benchmarks emerges as a critical imperative to enable performance assessment and provide technological guidance. Meanwhile, as an invaluable knowledge source, the potential of medical textbooks for benchmark development remains underexploited. Here, we present MedBookVQA, a systematic and comprehensive multimodal benchmark derived from open-access medical textbooks. To curate this benchmark, we propose a standardized pipeline for automated extraction of medical figures while contextually aligning them with corresponding medical narratives. Based on this curated data, we generate 5,000 clinically relevant questions spanning modality recognition, disease classification, anatomical identification, symptom diagnosis, and surgical procedures. A multi-tier annotation system categorizes queries through hierarchical taxonomies encompassing medical imaging modalities (42 categories), body anatomies (125 structures), and clinical specialties (31 departments), enabling nuanced analysis across medical subdomains. We evaluate a wide array of MLLMs, including proprietary, open-sourced, medical, and reasoning models, revealing significant performance disparities across task types and model categories. Our findings highlight critical capability gaps in current GMAI systems while establishing textbook-derived multimodal benchmarks as essential evaluation tools. MedBookVQA establishes textbook-derived benchmarking as a critical paradigm for advancing clinical AI, exposing limitations in GMAI systems while providing anatomically structured performance metrics across specialties.

Comparison of Sarcopenia Assessment in Liver Transplant Recipients by Computed Tomography Freehand Region-of-Interest versus an Automated Deep Learning System.

Miller W, Fate K, Fisher J, Thul J, Ko Y, Kim KW, Pruett T, Teigen L

pubmed logopapersJun 1 2025
Sarcopenia, or the loss of muscle quality and quantity, has been associated with poor clinical outcomes in liver transplantation such as infection, increased length of stay, and increased patient mortality. Abdominal computed tomography (CT) scans are utilized to measure patient core musculature as a measurement of sarcopenia. Methods to extract information on core body musculature can be through either freehand region-of-interest (ROI) or machine learning algorithms to quantitate total body muscle within a given area. This study directly compares these two collection methods leveraging length of stay (LOS) outcomes previously found to be associated with freehand ROI measurements. A total of 50 individuals were included who underwent liver transplantation from our single center between January 1, 2016, and May 30, 2021, and had a non-contrast abdominal CT scan within 6-months of surgery. CT-derived skeletal muscle measures at the third lumbar vertebrae were obtained using freehand ROI and an automated deep learning system. Correlation analysis of freehand psoas muscle measures, psoas area index (PAI) and mean Hounsfield units (mHU), were significantly correlated to the automated deep learning system's total skeletal muscle measures at the level of the L3, skeletal muscle index (SMI) and skeletal muscle density (SMD), respectively (R<sup>2</sup> = 0.4221; p value < 0.0001; R<sup>2</sup> = 0.6297; p value < 0.0001). The automated deep learning model's SMI predicted ∼20% of the variability (R<sup>2</sup> = 0.2013; hospital length of stay) while the PAI variable only predicted about 10% of the variability (R<sup>2</sup> = 0.0919; total healthcare length of stay) of the length of stay variables. In contrast, both the freehand ROI mHU and the automated deep learning model's muscle density variables were associated with ∼20% of the variability in the inpatient length of stay (R<sup>2</sup> = 0.2383 and 0.1810, respectively) and total healthcare length of stay variables (R<sup>2</sup> = 0.2190 and 0.1947, respectively). Sarcopenia measurements represent an important risk stratification tool for liver transplantation outcomes. For muscle sarcopenia assessment association with LOS, freehand measures of sarcopenia perform similarly to automated deep learning system measurements.

Measurement of adipose body composition using an artificial intelligence-based CT Protocol and its association with severe acute pancreatitis in hospitalized patients.

Cortés P, Mistretta TA, Jackson B, Olson CG, Al Qady AM, Stancampiano FF, Korfiatis P, Klug JR, Harris DM, Dan Echols J, Carter RE, Ji B, Hardway HD, Wallace MB, Kumbhari V, Bi Y

pubmed logopapersJun 1 2025
The clinical utility of body composition in predicting the severity of acute pancreatitis (AP) remains unclear. We aimed to measure body composition using artificial intelligence (AI) to predict severe AP in hospitalized patients. We performed a retrospective study of patients hospitalized with AP at three tertiary care centers in 2018. Patients with computer tomography (CT) imaging of the abdomen at admission were included. A fully automated and validated abdominal segmentation algorithm was used for body composition analysis. The primary outcome was severe AP, defined as having persistent single- or multi-organ failure as per the revised Atlanta classification. 352 patients were included. Severe AP occurred in 35 patients (9.9%). In multivariable analysis, adjusting for male sex and first episode of AP, intermuscular adipose tissue (IMAT) was associated with severe AP, OR = 1.06 per 5 cm<sup>2</sup>, p = 0.0207. Subcutaneous adipose tissue (SAT) area approached significance, OR = 1.05, p = 0.17. Neither visceral adipose tissue (VAT) nor skeletal muscle (SM) was associated with severe AP. In obese patients, a higher SM was associated with severe AP in unadjusted analysis (86.7 vs 75.1 and 70.3 cm<sup>2</sup> in moderate and mild, respectively p = 0.009). In this multi-site retrospective study using AI to measure body composition, we found elevated IMAT to be associated with severe AP. Although SAT was non-significant for severe AP, it approached statistical significance. Neither VAT nor SM were significant. Further research in larger prospective studies may be beneficial.

Comparing Artificial Intelligence and Traditional Regression Models in Lung Cancer Risk Prediction Using A Systematic Review and Meta-Analysis.

Leonard S, Patel MA, Zhou Z, Le H, Mondal P, Adams SJ

pubmed logopapersJun 1 2025
Accurately identifying individuals who are at high risk of lung cancer is critical to optimize lung cancer screening with low-dose CT (LDCT). We sought to compare the performance of traditional regression models and artificial intelligence (AI)-based models in predicting future lung cancer risk. A systematic review and meta-analysis were conducted with reporting according to Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. We searched MEDLINE, Embase, Scopus, and the Cumulative Index to Nursing and Allied Health Literature databases for studies reporting the performance of AI or traditional regression models for predicting lung cancer risk. Two researchers screened articles, and a third researcher resolved conflicts. Model characteristics and predictive performance metrics were extracted. The quality of studies was assessed using the Prediction model Risk of Bias Assessment Tool. A meta-analysis assessed the discrimination performance of models, based on area under the receiver operating characteristic curve (AUC). One hundred forty studies met inclusion criteria and included 185 traditional and 64 AI-based models. Of these, 16 AI models and 65 traditional models have been externally validated. The pooled AUC of external validations of AI models was 0.82 (95% confidence interval [CI], 0.80-0.85), and the pooled AUC for traditional regression models was 0.73 (95% CI, 0.72-0.74). In a subgroup analysis, AI models that included LDCT had a pooled AUC of 0.85 (95% CI, 0.82-0.88). Overall risk of bias was high for both AI and traditional models. AI-based models, particularly those using imaging data, show promise for improving lung cancer risk prediction over traditional regression models. Future research should focus on prospective validation of AI models and direct comparisons with traditional methods in diverse populations.
Page 196 of 2922917 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.