Sort by:
Page 25 of 1601600 results

Performance analysis of large language models in multi-disease detection from chest computed tomography reports: a comparative study: Experimental Research.

Luo P, Fan C, Li A, Jiang T, Jiang A, Qi C, Gan W, Zhu L, Mou W, Zeng D, Tang B, Xiao M, Chu G, Liang Z, Shen J, Liu Z, Wei T, Cheng Q, Lin A, Chen X

pubmed logopapersJun 5 2025
Computed Tomography (CT) is widely acknowledged as the gold standard for diagnosing thoracic diseases. However, the accuracy of interpretation significantly depends on radiologists' expertise. Large Language Models (LLMs) have shown considerable promise in various medical applications, particularly in radiology. This study aims to assess the performance of leading LLMs in analyzing unstructured chest CT reports and to examine how different questioning methodologies and fine-tuning strategies influence their effectiveness in enhancing chest CT diagnosis. This retrospective analysis evaluated 13,489 chest CT reports encompassing 13 common thoracic conditions across pulmonary, cardiovascular, pleural, and upper abdominal systems. Five LLMs (Claude-3.5-Sonnet, GPT-4, GPT-3.5-Turbo, Gemini-Pro, Qwen-Max) were assessed using dual questioning methodologies: multiple-choice and open-ended. Radiologist-curated datasets underwent rigorous preprocessing, including RadLex terminology standardization, multi-step diagnostic validation, and exclusion of ambiguous cases. Model performance was quantified via Subjective Answer Accuracy Rate (SAAR), Reference Answer Accuracy Rate (RAAR), and Area Under the Receiver Operating Characteristic (ROC) Curve analysis. GPT-3.5-Turbo underwent fine-tuning (100 iterations with one training epoch) on 200 high-performing cases to enhance diagnostic precision for initially misclassified conditions. GPT-4 demonstrated superior performance with the highest RAAR of 75.1% in multiple-choice questioning, followed by Qwen-Max (66.0%) and Claude-3.5 (63.5%), significantly outperforming GPT-3.5-Turbo (41.8%) and Gemini-Pro (40.8%) across the entire patient cohort. Multiple-choice questioning consistently improved both RAAR and SAAR for all models compared to open-ended questioning, with RAAR consistently surpassing SAAR. Model performance demonstrated notable variations across different diseases and organ conditions. Notably, fine-tuning substantially enhanced the performance of GPT-3.5-Turbo, which initially exhibited suboptimal results in most scenarios. This study demonstrated that general-purpose LLMs can effectively interpret chest CT reports, with performance varying significantly across models depending on the questioning methodology and fine-tuning approaches employed. For surgical practice, these findings provided evidence-based guidance for selecting appropriate AI tools to enhance preoperative planning, particularly for thoracic procedures. The integration of optimized LLMs into surgical workflows may improve decision-making efficiency, risk stratification, and diagnostic speed, potentially contributing to better surgical outcomes through more accurate preoperative assessment.

DWI and Clinical Characteristics Correlations in Acute Ischemic Stroke After Thrombolysis

Li, J., Huang, C., Liu, Y., Li, Y., Zhang, J., Xiao, M., yan, Z., zhao, H., Zeng, X., Mu, J.

medrxiv logopreprintJun 5 2025
ObjectiveMagnetic Resonance Diffusion-Weighted Imaging (DWI) is a crucial tool for diagnosing acute ischemic stroke, yet some patients present as DWI-negative. This study aims to analyze the imaging differences and associated clinical characteristics in acute ischemic stroke patients receiving intravenous thrombolysis, in order to enhance understanding of DWI-negative strokes. MethodsRetrospective collection of clinical data from acute ischemic stroke patients receiving intravenous thrombolysis at the Stroke Center of the First Affiliated Hospital of Chongqing Medical University from January 2017 to June 2023, categorized into DWI-positive and negative groups. Descriptive statistics, univariate analysis, binary logistic regression, and machine learning model were utilized to assess the predictive value of clinical features. Additionally, telephone follow-up was conducted for DWI-negative patients to record medication compliance, stroke recurrence, and mortality, with Fine-Gray competing risk model used to analyze recurrent risk factors. ResultsThe incidence rate of DWI-negative ischemic stroke is 22.74%. Factors positively associated with DWI-positive cases include onset to needle time (ONT), onset to first MRI time (OMT), NIHSS score at 1 week of hospitalization (NIHSS-1w), hyperlipidemia (HLP), and atrial fibrillation (AF) (p<0.05, OR>1). Conversely, recurrent ischemic stroke (RIS) and platelet count (PLT) are negatively correlated with DWI-positive cases (p<0.05, OR<1). Trial of Org 10172 in Acute Stroke Treatment (TOAST) classification significantly influences DWI presentation (p=0.01), but the specific impact of etiological subtypes remains unclear. Machine learning models suggest that the features with the highest predictive value, in descending order, are AF, HLP, OMT, ONT, NIHSS difference within 24 hours post-thrombolysis(NIHSS-d(0-24h)PT), and RIS. ConclusionsNIHSS-1w, OMT, ONT, HLP, and AF can predict DWI-positive findings, while platelet count and RIS are associated with DWI-negative cases. AF and HLP demonstrate the highest predictive value. DWI-negative patients have a higher risk of stroke recurrence than mortality in the short term, with a potential correlation between TOAST classification and recurrence risk.

A ViTUNeT-based model using YOLOv8 for efficient LVNC diagnosis and automatic cleaning of dataset.

de Haro S, Bernabé G, García JM, González-Férez P

pubmed logopapersJun 4 2025
Left ventricular non-compaction is a cardiac condition marked by excessive trabeculae in the left ventricle's inner wall. Although various methods exist to measure these structures, the medical community still lacks consensus on the best approach. Previously, we developed DL-LVTQ, a tool based on a UNet neural network, to quantify trabeculae in this region. In this study, we expand the dataset to include new patients with Titin cardiomyopathy and healthy individuals with fewer trabeculae, requiring retraining of our models to enhance predictions. We also propose ViTUNeT, a neural network architecture combining U-Net and Vision Transformers to segment the left ventricle more accurately. Additionally, we train a YOLOv8 model to detect the ventricle and integrate it with ViTUNeT model to focus on the region of interest. Results from ViTUNet and YOLOv8 are similar to DL-LVTQ, suggesting dataset quality limits further accuracy improvements. To test this, we analyze MRI images and develop a method using two YOLOv8 models to identify and remove problematic images, leading to better results. Combining YOLOv8 with deep learning networks offers a promising approach for improving cardiac image analysis and segmentation.

Best Practices and Checklist for Reviewing Artificial Intelligence-Based Medical Imaging Papers: Classification.

Kline TL, Kitamura F, Warren D, Pan I, Korchi AM, Tenenholtz N, Moy L, Gichoya JW, Santos I, Moradi K, Avval AH, Alkhulaifat D, Blumer SL, Hwang MY, Git KA, Shroff A, Stember J, Walach E, Shih G, Langer SG

pubmed logopapersJun 4 2025
Recent advances in Artificial Intelligence (AI) methodologies and their application to medical imaging has led to an explosion of related research programs utilizing AI to produce state-of-the-art classification performance. Ideally, research culminates in dissemination of the findings in peer-reviewed journals. To date, acceptance or rejection criteria are often subjective; however, reproducible science requires reproducible review. The Machine Learning Education Sub-Committee of the Society for Imaging Informatics in Medicine (SIIM) has identified a knowledge gap and need to establish guidelines for reviewing these studies. This present work, written from the machine learning practitioner standpoint, follows a similar approach to our previous paper related to segmentation. In this series, the committee will address best practices to follow in AI-based studies and present the required sections with examples and discussion of requirements to make the studies cohesive, reproducible, accurate, and self-contained. This entry in the series focuses on image classification. Elements like dataset curation, data pre-processing steps, reference standard identification, data partitioning, model architecture, and training are discussed. Sections are presented as in a typical manuscript. The content describes the information necessary to ensure the study is of sufficient quality for publication consideration and, compared with other checklists, provides a focused approach with application to image classification tasks. The goal of this series is to provide resources to not only help improve the review process for AI-based medical imaging papers, but to facilitate a standard for the information that should be presented within all components of the research study.

A review on learning-based algorithms for tractography and human brain white matter tracts recognition.

Barati Shoorche A, Farnia P, Makkiabadi B, Leemans A

pubmed logopapersJun 4 2025
Human brain fiber tractography using diffusion magnetic resonance imaging is a crucial stage in mapping brain white matter structures, pre-surgical planning, and extracting connectivity patterns. Accurate and reliable tractography, by providing detailed geometric information about the position of neural pathways, minimizes the risk of damage during neurosurgical procedures. Both tractography itself and its post-processing steps such as bundle segmentation are usually used in these contexts. Many approaches have been put forward in the past decades and recently, multiple data-driven tractography algorithms and automatic segmentation pipelines have been proposed to address the limitations of traditional methods. Several of these recent methods are based on learning algorithms that have demonstrated promising results. In this study, in addition to introducing diffusion MRI datasets, we review learning-based algorithms such as conventional machine learning, deep learning, reinforcement learning and dictionary learning methods that have been used for white matter tract, nerve and pathway recognition as well as whole brain streamlines or whole brain tractogram creation. The contribution is to discuss both tractography and tract recognition methods, in addition to extending previous related reviews with most recent methods, covering architectures as well as network details, assess the efficiency of learning-based methods through a comprehensive comparison in this field, and finally demonstrate the important role of learning-based methods in tractography.

Retrieval-Augmented Generation with Large Language Models in Radiology: From Theory to Practice.

Fink A, Rau A, Reisert M, Bamberg F, Russe MF

pubmed logopapersJun 4 2025
<i>"Just Accepted" papers have undergone full peer review and have been accepted for publication in <i>Radiology: Artificial Intelligence</i>. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content.</i> Large language models (LLMs) hold substantial promise in addressing the growing workload in radiology, but recent studies also reveal limitations, such as hallucinations and opacity in sources for LLM responses. Retrieval-augmented Generation (RAG) based LLMs offer a promising approach to streamline radiology workflows by integrating reliable, verifiable, and customizable information. Ongoing refinement is critical to enable RAG models to manage large amounts of input data and to engage in complex multiagent dialogues. This report provides an overview of recent advances in LLM architecture, including few-shot and zero-shot learning, RAG integration, multistep reasoning, and agentic RAG, and identifies future research directions. Exemplary cases demonstrate the practical application of these techniques in radiology practice. ©RSNA, 2025.

Predicting clinical outcomes using 18F-FDG PET/CT-based radiomic features and machine learning algorithms in patients with esophageal cancer.

Mutevelizade G, Aydin N, Duran Can O, Teke O, Suner AF, Erdugan M, Sayit E

pubmed logopapersJun 4 2025
This study evaluated the relationship between 18F-fluorodeoxyglucose PET/computed tomography (18F-FDG PET/CT) radiomic features and clinical parameters, including tumor localization, histopathological subtype, lymph node metastasis, mortality, and treatment response, in esophageal cancer (EC) patients undergoing chemoradiotherapy and the predictive performance of various machine learning (ML) models. In this retrospective study, 39 patients with EC who underwent pretreatment 18F-FDG PET/CT and received concurrent chemoradiotherapy were analyzed. Texture features were extracted using LIFEx software. Logistic regression, Naive Bayes, random forest, extreme gradient boosting (XGB), and support vector machine classifiers were applied to predict clinical outcomes. Cox regression and Kaplan-Meier analyses were used to evaluate overall survival (OS), and the accuracy of ML algorithms was quantified using the area under the receiver operating characteristic curve. Radiomic features showed significant associations with several clinical parameters. Lymph node metastasis, tumor localization, and treatment response emerged as predictors of OS. Among the ML models, XGB demonstrated the most consistent and highest predictive performance across clinical outcomes. Radiomic features extracted from 18F-FDG PET/CT, when combined with ML approaches, may aid in predicting treatment response and clinical outcomes in EC. Radiomic features demonstrated value in assessing tumor heterogeneity; however, clinical parameters retained a stronger prognostic value for OS.

Machine Learning to Automatically Differentiate Hypertrophic Cardiomyopathy, Cardiac Light Chain, and Cardiac Transthyretin Amyloidosis: A Multicenter CMR Study.

Weberling LD, Ochs A, Benovoy M, Aus dem Siepen F, Salatzki J, Giannitsis E, Duan C, Maresca K, Zhang Y, Möller J, Friedrich S, Schönland S, Meder B, Friedrich MG, Frey N, André F

pubmed logopapersJun 4 2025
Cardiac amyloidosis is associated with poor outcomes and is caused by the interstitial deposition of misfolded proteins, typically ATTR (transthyretin) or AL (light chains). Although specific therapies during early disease stages exist, the diagnosis is often only established at an advanced stage. Cardiovascular magnetic resonance (CMR) is the gold standard for imaging suspected myocardial disease. However, differentiating cardiac amyloidosis from hypertrophic cardiomyopathy may be challenging, and a reliable method for an image-based classification of amyloidosis subtypes is lacking. This study sought to investigate a CMR machine learning (ML) algorithm to identify and distinguish cardiac amyloidosis. This retrospective, multicenter, multivendor feasibility study included consecutive patients diagnosed with hypertrophic cardiomyopathy or AL/ATTR amyloidosis and healthy volunteers. Standard clinical information, semiautomated CMR imaging data, and qualitative CMR features were integrated into a trained ML algorithm. Four hundred participants (95 healthy, 94 hypertrophic cardiomyopathy, 95 AL, and 116 ATTR) from 56 institutions were included (269 men aged 58.5 [48.4-69.4] years). A 3-stage ML screening cascade sequentially differentiated healthy volunteers from patients, then hypertrophic cardiomyopathy from amyloidosis, and then AL from ATTR. The ML algorithm resulted in an accurate differentiation at each step (area under the curve, 1.0, 0.99, and 0.92, respectively). After reducing included data to demographics and imaging data alone, the performance remained excellent (area under the curve, 0.99, 0.98, and 0.88, respectively), even after removing late gadolinium enhancement imaging data from the model (area under the curve, 1.0, 0.95, 0.86, respectively). A trained ML model using semiautomated CMR imaging data and patient demographics can accurately identify cardiac amyloidosis and differentiate subtypes.

Long-Term Prognostic Implications of Thoracic Aortic Calcification on CT Using Artificial Intelligence-Based Quantification in a Screening Population: A Two-Center Study.

Lee JE, Kim NY, Kim YH, Kwon Y, Kim S, Han K, Suh YJ

pubmed logopapersJun 4 2025
<b>BACKGROUND.</b> The importance of including the thoracic aortic calcification (TAC), in addition to coronary artery calcification (CAC), in prognostic assessments has been difficult to determine, partly due to greater challenge in performing standardized TAC assessments. <b>OBJECTIVE.</b> The purpose of this study was to evaluate long-term prognostic implications of TAC assessed using artificial intelligence (AI)-based quantification on routine chest CT in a screening population. <b>METHODS.</b> This retrospective study included 7404 asymptomatic individuals (median age, 53.9 years; 5875 men, 1529 women) who underwent nongated noncontrast chest CT as part of a national general health screening program at one of two centers from January 2007 to December 2014. A commercial AI program quantified TAC and CAC using Agatston scores, which were stratified into categories. Radiologists manually quantified TAC and CAC in 2567 examinations. The role of AI-based TAC categories in predicting major adverse cardiovascular events (MACE) and all-cause mortality (ACM), independent of AI-based CAC categories as well as clinical and laboratory variables, was assessed by multivariable Cox proportional hazards models using data from both centers and concordance statistics from prognostic models developed and tested using center 1 and center 2 data, respectively. <b>RESULTS.</b> AI-based and manual quantification showed excellent agreement for TAC and CAC (concordance correlation coefficient: 0.967 and 0.895, respectively). The median observation periods were 7.5 years for MACE (383 events in 5342 individuals) and 11.0 years for ACM (292 events in 7404 individuals). When adjusted for AI-based CAC categories along with clinical and laboratory variables, the risk for MACE was not independently associated with any AI-based TAC category; risk of ACM was independently associated with AI-based TAC score of 1001-3000 (HR = 2.14, <i>p</i> = .02) but not with other AI-based TAC categories. When prognostic models were tested, the addition of AI-based TAC categories did not improve model fit relative to models containing clinical variables, laboratory variables, and AI-based CAC categories for MACE (concordance index [C-index] = 0.760-0.760, <i>p</i> = .81) or ACM (C-index = 0.823-0.830, <i>p</i> = .32). <b>CONCLUSION.</b> The addition of TAC to models containing CAC provided limited improvement in risk prediction in an asymptomatic screening population undergoing CT. <b>CLINICAL IMPACT.</b> AI-based quantification provides a standardized approach for better understanding the potential role of TAC as a predictive imaging biomarker.
Page 25 of 1601600 results
Show
per page
Get Started

Upload your X-ray image and get interpretation.

Upload now →

Disclaimer: X-ray Interpreter's AI-generated results are for informational purposes only and not a substitute for professional medical advice. Always consult a healthcare professional for medical diagnosis and treatment.