Sort by:
Page 18 of 51504 results

Open Set Recognition for Endoscopic Image Classification: A Deep Learning Approach on the Kvasir Dataset

Kasra Moazzami, Seoyoun Son, John Lin, Sun Min Lee, Daniel Son, Hayeon Lee, Jeongho Lee, Seongji Lee

arxiv logopreprintJun 23 2025
Endoscopic image classification plays a pivotal role in medical diagnostics by identifying anatomical landmarks and pathological findings. However, conventional closed-set classification frameworks are inherently limited in open-world clinical settings, where previously unseen conditions can arise andcompromise model reliability. To address this, we explore the application of Open Set Recognition (OSR) techniques on the Kvasir dataset, a publicly available and diverse endoscopic image collection. In this study, we evaluate and compare the OSR capabilities of several representative deep learning architectures, including ResNet-50, Swin Transformer, and a hybrid ResNet-Transformer model, under both closed-set and open-set conditions. OpenMax is adopted as a baseline OSR method to assess the ability of these models to distinguish known classes from previously unseen categories. This work represents one of the first efforts to apply open set recognition to the Kvasir dataset and provides a foundational benchmark for evaluating OSR performance in medical image analysis. Our results offer practical insights into model behavior in clinically realistic settings and highlight the importance of OSR techniques for the safe deployment of AI systems in endoscopy.

Fine-tuned large language model for classifying CT-guided interventional radiology reports.

Yasaka K, Nishimura N, Fukushima T, Kubo T, Kiryu S, Abe O

pubmed logopapersJun 23 2025
BackgroundManual data curation was necessary to extract radiology reports due to the ambiguities of natural language.PurposeTo develop a fine-tuned large language model that classifies computed tomography (CT)-guided interventional radiology reports into technique categories and to compare its performance with that of the readers.Material and MethodsThis retrospective study included patients who underwent CT-guided interventional radiology between August 2008 and November 2024. Patients were chronologically assigned to the training (n = 1142; 646 men; mean age = 64.1 ± 15.7 years), validation (n = 131; 83 men; mean age = 66.1 ± 16.1 years), and test (n = 332; 196 men; mean age = 66.1 ± 14.8 years) datasets. In establishing a reference standard, reports were manually classified into categories 1 (drainage), 2 (lesion biopsy within fat or soft tissue density tissues), 3 (lung biopsy), and 4 (bone biopsy). The bi-directional encoder representation from the transformers model was fine-tuned with the training dataset, and the model with the best performance in the validation dataset was selected. The performance and required time for classification in the test dataset were compared between the best-performing model and the two readers.ResultsCategories 1/2/3/4 included 309/367/270/196, 30/42/40/19, and 75/124/78/55 patients for the training, validation, and test datasets, respectively. The model demonstrated an accuracy of 0.979 in the test dataset, which was significantly better than that of the readers (0.922-0.940) (<i>P</i> ≤0.012). The model classified reports within a 49.8-53.5-fold shorter time compared to readers.ConclusionThe fine-tuned large language model classified CT-guided interventional radiology reports into four categories demonstrating high accuracy within a remarkably short time.

Multimodal deep learning for predicting neoadjuvant treatment outcomes in breast cancer: a systematic review.

Krasniqi E, Filomeno L, Arcuri T, Ferretti G, Gasparro S, Fulvi A, Roselli A, D'Onofrio L, Pizzuti L, Barba M, Maugeri-Saccà M, Botti C, Graziano F, Puccica I, Cappelli S, Pelle F, Cavicchi F, Villanucci A, Paris I, Calabrò F, Rea S, Costantini M, Perracchio L, Sanguineti G, Takanen S, Marucci L, Greco L, Kayal R, Moscetti L, Marchesini E, Calonaci N, Blandino G, Caravagna G, Vici P

pubmed logopapersJun 23 2025
Pathological complete response (pCR) to neoadjuvant systemic therapy (NAST) is an established prognostic marker in breast cancer (BC). Multimodal deep learning (DL), integrating diverse data sources (radiology, pathology, omics, clinical), holds promise for improving pCR prediction accuracy. This systematic review synthesizes evidence on multimodal DL for pCR prediction and compares its performance against unimodal DL. Following PRISMA, we searched PubMed, Embase, and Web of Science (January 2015-April 2025) for studies applying DL to predict pCR in BC patients receiving NAST, using data from radiology, digital pathology (DP), multi-omics, and/or clinical records, and reporting AUC. Data on study design, DL architectures, and performance (AUC) were extracted. A narrative synthesis was conducted due to heterogeneity. Fifty-one studies, mostly retrospective (90.2%, median cohort 281), were included. Magnetic resonance imaging and DP were common primary modalities. Multimodal approaches were used in 52.9% of studies, often combining imaging with clinical data. Convolutional neural networks were the dominant architecture (88.2%). Longitudinal imaging improved prediction over baseline-only (median AUC 0.91 vs. 0.82). Overall, the median AUC across studies was 0.88, with 35.3% achieving AUC ≥ 0.90. Multimodal models showed a modest but consistent improvement over unimodal approaches (median AUC 0.88 vs. 0.83). Omics and clinical text were rarely primary DL inputs. DL models demonstrate promising accuracy for pCR prediction, especially when integrating multiple modalities and longitudinal imaging. However, significant methodological heterogeneity, reliance on retrospective data, and limited external validation hinder clinical translation. Future research should prioritize prospective validation, integration underutilized data (multi-omics, clinical), and explainable AI to advance DL predictors to the clinical setting.

Development and validation of a SOTA-based system for biliopancreatic segmentation and station recognition system in EUS.

Zhang J, Zhang J, Chen H, Tian F, Zhang Y, Zhou Y, Jiang Z

pubmed logopapersJun 23 2025
Endoscopic ultrasound (EUS) is a vital tool for diagnosing biliopancreatic disease, offering detailed imaging to identify key abnormalities. Its interpretation demands expertise, which limits its accessibility for less trained practitioners. Thus, the creation of tools or systems to assist in interpreting EUS images is crucial for improving diagnostic accuracy and efficiency. To develop an AI-assisted EUS system for accurate pancreatic and biliopancreatic duct segmentation, and evaluate its impact on endoscopists' ability to identify biliary-pancreatic diseases during segmentation and anatomical localization. The EUS-AI system was designed to perform station positioning and anatomical structure segmentation. A total of 45,737 EUS images from 1852 patients were used for model training. Among them, 2881 images were for internal testing, and 2747 images from 208 patients were for external validation. Additionally, 340 images formed a man-machine competition test set. During the research process, various newer state-of-the-art (SOTA) deep learning algorithms were also compared. In classification, in the station recognition task, compared to the ResNet-50 and YOLOv8-CLS algorithms, the Mean Teacher algorithm achieved the highest accuracy, with an average of 95.60% (92.07%-99.12%) in the internal test set and 92.72% (88.30%-97.15%) in the external test set. For segmentation, compared to the UNet ++ and YOLOv8 algorithms, the U-Net v2 algorithm was optimal. Ultimately, the EUS-AI system was constructed using the optimal models from two tasks, and a man-machine competition experiment was conducted. The results demonstrated that the performance of the EUS-AI system significantly outperformed that of mid-level endoscopists, both in terms of position recognition (p < 0.001) and pancreas and biliopancreatic duct segmentation tasks (p < 0.001, p = 0.004). The EUS-AI system is expected to significantly shorten the learning curve for the pancreatic EUS examination and enhance procedural standardization.

Chest X-ray Foundation Model with Global and Local Representations Integration.

Yang Z, Xu X, Zhang J, Wang G, Kalra MK, Yan P

pubmed logopapersJun 23 2025
Chest X-ray (CXR) is the most frequently ordered imaging test, supporting diverse clinical tasks from thoracic disease detection to postoperative monitoring. However, task-specific classification models are limited in scope, require costly labeled data, and lack generalizability to out-of-distribution datasets. To address these challenges, we introduce CheXFound, a self-supervised vision foundation model that learns robust CXR representations and generalizes effectively across a wide range of downstream tasks. We pretrained CheXFound on a curated CXR-987K dataset, comprising over approximately 987K unique CXRs from 12 publicly available sources. We propose a Global and Local Representations Integration (GLoRI) head for downstream adaptations, by incorporating fine- and coarse-grained disease-specific local features with global image features for enhanced performance in multilabel classification. Our experimental results showed that CheXFound outperformed state-of-the-art models in classifying 40 disease findings across different prevalence levels on the CXR-LT 24 dataset and exhibited superior label efficiency on downstream tasks with limited training data. Additionally, CheXFound achieved significant improvements on downstream tasks with out-of-distribution datasets, including opportunistic cardiovascular disease risk estimation, mortality prediction, malpositioned tube detection, and anatomical structure segmentation. The above results demonstrate CheXFound's strong generalization capabilities, which will enable diverse downstream adaptations with improved label efficiency in future applications. The project source code is publicly available at https://github.com/RPIDIAL/CheXFound.

GPT-4o and Specialized AI in Breast Ultrasound Imaging: A comparative Study on Accuracy, Agreement, Limitations, and Diagnostic Potential.

Sanli DET, Sanli AN, Buyukdereli Atadag Y, Kurt A, Esmerer E

pubmed logopapersJun 23 2025
This study aimed to evaluate the ability of ChatGPT and Breast Ultrasound Helper, a special ChatGPT-based subprogram trained on ultrasound image analysis, to analyze and differentiate benign and malignant breast lesions on ultrasound images. Ultrasound images of histopathologically confirmed breast cancer and fibroadenoma patients were read GPT-4o (the latest ChatGPT version) and Breast Ultrasound Helper (BUH), a tool from the "Explore" section of ChatGPT. Both were prompted in English using ACR BI-RADS Breast Ultrasound Lexicon criteria: lesion shape, orientation, margin, internal echo pattern, echogenicity, posterior acoustic features, microcalcifications or hyperechoic foci, perilesional hyperechoic rim, edema or architectural distortion, lesion size, and BI-RADS category. Two experienced radiologists evaluated the images and the responses of the programs in consensus. The outputs, BI-RADS category agreement, and benign/malignant discrimination were statistically compared. A total of 232 ultrasound images were analyzed, of which 133 (57.3%) were malignant and 99 (42.7%) benign. In comparative analysis, BUH showed superior performance overall, with higher kappa values and statistically significant results across multiple features (P .001). However, the overall level of agreement with the radiologists' consensus for all features was similar for BUH (κ: 0.387-0.755) and GPT-4o (κ: 0.317-0.803). On the other hand, BI-RADS category agreement was slightly higher in GPT-4o than in BUH (69.4% versus 65.9%), but BUH was slightly more successful in distinguishing benign lesions from malignant lesions (65.9% versus 67.7%). Although both AI tools show moderate-good performance in ultrasound image analysis, their limited compatibility with radiologists' evaluations and BI-RADS categorization suggests that their clinical application in breast ultrasound interpretation is still early and unreliable.

Deep learning-quantified body composition from positron emission tomography/computed tomography and cardiovascular outcomes: a multicentre study.

Miller RJH, Yi J, Shanbhag A, Marcinkiewicz A, Patel KK, Lemley M, Ramirez G, Geers J, Chareonthaitawee P, Wopperer S, Berman DS, Di Carli M, Dey D, Slomka PJ

pubmed logopapersJun 23 2025
Positron emission tomography (PET)/computed tomography (CT) myocardial perfusion imaging (MPI) is a vital diagnostic tool, especially in patients with cardiometabolic syndrome. Low-dose CT scans are routinely performed with PET for attenuation correction and potentially contain valuable data about body tissue composition. Deep learning and image processing were combined to automatically quantify skeletal muscle (SM), bone and adipose tissue from these scans and then evaluate their associations with death or myocardial infarction (MI). In PET MPI from three sites, deep learning quantified SM, bone, epicardial adipose tissue (EAT), subcutaneous adipose tissue (SAT), visceral adipose tissue (VAT), and intermuscular adipose tissue (IMAT). Sex-specific thresholds for abnormal values were established. Associations with death or MI were evaluated using unadjusted and multivariable models adjusted for clinical and imaging factors. This study included 10 085 patients, with median age 68 (interquartile range 59-76) and 5767 (57%) male. Body tissue segmentations were completed in 102 ± 4 s. Higher VAT density was associated with an increased risk of death or MI in both unadjusted [hazard ratio (HR) 1.40, 95% confidence interval (CI) 1.37-1.43] and adjusted (HR 1.24, 95% CI 1.19-1.28) analyses, with similar findings for IMAT, SAT, and EAT. Patients with elevated VAT density and reduced myocardial flow reserve had a significantly increased risk of death or MI (adjusted HR 2.49, 95% CI 2.23-2.77). Volumetric body tissue composition can be obtained rapidly and automatically from standard cardiac PET/CT. This new information provides a detailed, quantitative assessment of sarcopenia and cardiometabolic health for physicians.

MRI Radiomics and Automated Habitat Analysis Enhance Machine Learning Prediction of Bone Metastasis and High-Grade Gleason Scores in Prostate Cancer.

Yang Y, Zheng B, Zou B, Liu R, Yang R, Chen Q, Guo Y, Yu S, Chen B

pubmed logopapersJun 23 2025
To explore the value of machine learning models based on MRI radiomics and automated habitat analysis in predicting bone metastasis and high-grade pathological Gleason scores in prostate cancer. This retrospective study enrolled 214 patients with pathologically diagnosed prostate cancer from May 2013 to January 2025, including 93 cases with bone metastasis and 159 cases with high-grade Gleason scores. Clinical, pathological and MRI data were collected. An nnUNet model automatically segmented the prostate in MRI scans. K-means clustering identified subregions within the entire prostate in T2-FS images. Senior radiologists manually segmented regions of interest (ROIs) in prostate lesions. Radiomics features were extracted from these habitat subregions and lesion ROIs. These features combined with clinical features were utilized to build multiple machine learning classifiers to predict bone metastasis and high-grade Gleason scores while a K-means clustering method was applied to obtain habitat subregions within the whole prostate. Finally, the models underwent interpretable analysis based on feature importance. The nnUNet model achieved a mean Dice coefficient of 0.970 for segmentation. Habitat analysis using 2 clusters yielded the highest average silhouette coefficient (0.57). Machine learning models based on a combination of lesion radiomics, habitat radiomics, and clinical features achieved the best performance in both prediction tasks. The Extra Trees Classifier achieved the highest AUC (0.900) for predicting bone metastasis, while the CatBoost Classifier performed best (AUC 0.895) for predicting high-grade Gleason scores. The interpretability analysis of the optimal models showed that the PSA clinical feature was crucial for predictions, while both habitat radiomics and lesion radiomics also played important roles. The study proposed an automated prostate habitat analysis for prostate cancer, enabling a comprehensive analysis of tumor heterogeneity. The machine learning models developed achieved excellent performance in predicting the risk of bone metastasis and high-grade Gleason scores in prostate cancer. This approach overcomes the limitations of manual feature extraction, and the inadequate analysis of heterogeneity often encountered in traditional radiomics, thereby improving model performance.

MOSCARD -- Causal Reasoning and De-confounding for Multimodal Opportunistic Screening of Cardiovascular Adverse Events

Jialu Pi, Juan Maria Farina, Rimita Lahiri, Jiwoong Jeong, Archana Gurudu, Hyung-Bok Park, Chieh-Ju Chao, Chadi Ayoub, Reza Arsanjani, Imon Banerjee

arxiv logopreprintJun 23 2025
Major Adverse Cardiovascular Events (MACE) remain the leading cause of mortality globally, as reported in the Global Disease Burden Study 2021. Opportunistic screening leverages data collected from routine health check-ups and multimodal data can play a key role to identify at-risk individuals. Chest X-rays (CXR) provide insights into chronic conditions contributing to major adverse cardiovascular events (MACE), while 12-lead electrocardiogram (ECG) directly assesses cardiac electrical activity and structural abnormalities. Integrating CXR and ECG could offer a more comprehensive risk assessment than conventional models, which rely on clinical scores, computed tomography (CT) measurements, or biomarkers, which may be limited by sampling bias and single modality constraints. We propose a novel predictive modeling framework - MOSCARD, multimodal causal reasoning with co-attention to align two distinct modalities and simultaneously mitigate bias and confounders in opportunistic risk estimation. Primary technical contributions are - (i) multimodal alignment of CXR with ECG guidance; (ii) integration of causal reasoning; (iii) dual back-propagation graph for de-confounding. Evaluated on internal, shift data from emergency department (ED) and external MIMIC datasets, our model outperformed single modality and state-of-the-art foundational models - AUC: 0.75, 0.83, 0.71 respectively. Proposed cost-effective opportunistic screening enables early intervention, improving patient outcomes and reducing disparities.

Comparative Analysis of Multimodal Large Language Models GPT-4o and o1 vs Clinicians in Clinical Case Challenge Questions

Jung, J., Kim, H., Bae, S., Park, J. Y.

medrxiv logopreprintJun 23 2025
BackgroundGenerative Pre-trained Transformer 4 (GPT-4) has demonstrated strong performance in standardized medical examinations but has limitations in real-world clinical settings. The newly released multimodal GPT-4o model, which integrates text and image inputs to enhance diagnostic capabilities, and the multimodal o1 model, which incorporates advanced reasoning, may address these limitations. ObjectiveThis study aimed to compare the performance of GPT-4o and o1 against clinicians in real-world clinical case challenges. MethodsThis retrospective, cross-sectional study used Medscape case challenge questions from May 2011 to June 2024 (n = 1,426). Each case included text and images of patient history, physical examination findings, diagnostic test results, and imaging studies. Clinicians were required to choose one answer from among multiple options, with the most frequent response defined as the clinicians decision. Data-based decisions were made using GPT models (3.5 Turbo, 4 Turbo, 4 Omni, and o1) to interpret the text and images, followed by a process to provide a formatted answer. We compared the performances of the clinicians and GPT models using Mixed-effects logistic regression analysis. ResultsOf the 1,426 questions, clinicians achieved an overall accuracy of 85.0%, whereas GPT-4o and o1 demonstrated higher accuracies of 88.4% and 94.3% (mean difference 3.4%; P = .005 and mean difference 9.3%; P < .001), respectively. In the multimodal performance analysis, which included cases involving images (n = 917), GPT-4o achieved an accuracy of 88.3%, and o1 achieved 93.9%, both significantly outperforming clinicians (mean difference 4.2%; P = .005 and mean difference 9.8%; P < .001). o1 showed the highest accuracy across all question categories, achieving 92.6% in diagnosis (mean difference 14.5%; P < .001), 97.0% in disease characteristics (mean difference 7.2%; P < .001), 92.6% in examination (mean difference 7.3%; P = .002), and 94.8% in treatment (mean difference 4.3%; P = .005), consistently outperforming clinicians. In terms of medical specialty, o1 achieved 93.6% accuracy in internal medicine (mean difference 10.3%; P < .001), 96.6% in major surgery (mean difference 9.2%; P = .030), 97.3% in psychiatry (mean difference 10.6%; P = .030), and 95.4% in minor specialties (mean difference 10.0%; P < .001), significantly surpassing clinicians. Across five trials, GPT-4o and o1 provided the correct answer 5/5 times in 86.2% and 90.7% of the cases, respectively. ConclusionsThe GPT-4o and o1 models achieved higher accuracy than clinicians in clinical case challenge questions, particularly in disease diagnosis. The GPT-4o and o1 could serve as valuable tools to assist healthcare professionals in clinical settings.
Page 18 of 51504 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.