Sort by:
Page 31 of 91907 results

Integrating Artificial Intelligence in Thyroid Nodule Management: Clinical Outcomes and Cost-Effectiveness Analysis.

Bodoque-Cubas J, Fernández-Sáez J, Martínez-Hervás S, Pérez-Lacasta MJ, Carles-Lavila M, Pallarés-Gasulla RM, Salazar-González JJ, Gil-Boix JV, Miret-Llauradó M, Aulinas-Masó A, Argüelles-Jiménez I, Tofé-Povedano S

pubmed logopapersJul 12 2025
The increasing incidence of thyroid nodules (TN) raises concerns about overdiagnosis and overtreatment. This study evaluates the clinical and economic impact of KOIOS, an FDA-approved artificial intelligence (AI) tool for the management of TN. A retrospective analysis was conducted on 176 patients who underwent thyroid surgery between May 2022 and November 2024. Ultrasound images were evaluated independently by an expert and novice operators using the American College of Radiology Thyroid Imaging Reporting and Data System (ACR-TIRADS), while KOIOS provided AI-adapted risk stratification. Sensitivity, specificity, and Receiver-Operating Curve (ROC) analysis were performed. The incremental cost-effectiveness ratio (ICER) was defined based on the number of optimal care interventions (FNAB and thyroid surgery). Both deterministic and probabilistic sensitivity analyses were conducted to evaluate model robustness. KOIOS AI demonstrated similar diagnostic performance to the expert operator (AUC: 0.794, 95% CI: 0.718-0.871 vs. 0.784, 95% CI: 0.706-0.861; p = 0.754) and significantly outperformed the novice operator (AUC: 0.619, 95% CI: 0.526-0.711; p < 0.001). ICER analysis estimated the cost per additional optimal care decision at -€8,085.56, indicating KOIOS as a dominant and cost-saving strategy when considering a third-party payer perspective over a one-year horizon. Deterministic sensitivity analysis identified surgical costs as the main drivers of variability, while probabilistic analysis consistently favored KOIOS as the optimal strategy. KOIOS AI is a cost-effective alternative, particularly in reducing overdiagnosis and overtreatment for benign TNs. Prospective, real-life studies are needed to validate these findings and explore long-term implications.

Accuracy of large language models in generating differential diagnosis from clinical presentation and imaging findings in pediatric cases.

Jung J, Phillipi M, Tran B, Chen K, Chan N, Ho E, Sun S, Houshyar R

pubmed logopapersJul 12 2025
Large language models (LLM) have shown promise in assisting medical decision-making. However, there is limited literature exploring the diagnostic accuracy of LLMs in generating differential diagnoses from text-based image descriptions and clinical presentations in pediatric radiology. To examine the performance of multiple proprietary LLMs in producing accurate differential diagnoses for text-based pediatric radiological cases without imaging. One hundred sixty-four cases were retrospectively selected from a pediatric radiology textbook and converted into two formats: (1) image description only, and (2) image description with clinical presentation. The ChatGPT-4 V, Claude 3.5 Sonnet, and Gemini 1.5 Pro algorithms were given these inputs and tasked with providing a top 1 diagnosis and a top 3 differential diagnoses. Accuracy of responses was assessed by comparison with the original literature. Top 1 accuracy was defined as whether the top 1 diagnosis matched the textbook, and top 3 differential accuracy was defined as the number of diagnoses in the model-generated top 3 differential that matched any of the top 3 diagnoses in the textbook. McNemar's test, Cochran's Q test, Friedman test, and Wilcoxon signed-rank test were used to compare algorithms and assess the impact of added clinical information, respectively. There was no significant difference in top 1 accuracy between ChatGPT-4 V, Claude 3.5 Sonnet, and Gemini 1.5 Pro when only image descriptions were provided (56.1% [95% CI 48.4-63.5], 64.6% [95% CI 57.1-71.5], 61.6% [95% CI 54.0-68.7]; P = 0.11). Adding clinical presentation to image description significantly improved top 1 accuracy for ChatGPT-4 V (64.0% [95% CI 56.4-71.0], P = 0.02) and Claude 3.5 Sonnet (80.5% [95% CI 73.8-85.8], P < 0.001). For image description and clinical presentation cases, Claude 3.5 Sonnet significantly outperformed both ChatGPT-4 V and Gemini 1.5 Pro (P < 0.001). For top 3 differential accuracy, no significant differences were observed between ChatGPT-4 V, Claude 3.5 Sonnet, and Gemini 1.5 Pro, regardless of whether the cases included only image descriptions (1.29 [95% CI 1.16-1.41], 1.35 [95% CI 1.23-1.48], 1.37 [95% CI 1.25-1.49]; P = 0.60) or both image descriptions and clinical presentations (1.33 [95% CI 1.20-1.45], 1.52 [95% CI 1.41-1.64], 1.48 [95% 1.36-1.59]; P = 0.72). Only Claude 3.5 Sonnet performed significantly better when clinical presentation was added (P < 0.001). Commercial LLMs performed similarly on pediatric radiology cases in providing top 1 accuracy and top 3 differential accuracy when only a text-based image description was used. Adding clinical presentation significantly improved top 1 accuracy for ChatGPT-4 V and Claude 3.5 Sonnet, with Claude showing the largest improvement. Claude 3.5 Sonnet outperformed both ChatGPT-4 V and Gemini 1.5 Pro in top 1 accuracy when both image and clinical data were provided. No significant differences were found in top 3 differential accuracy across models in any condition.

AI-powered disease progression prediction in multiple sclerosis using magnetic resonance imaging: a systematic review and meta-analysis.

Houshi S, Khodakarami Z, Shaygannejad A, Khosravi F, Shaygannejad V

pubmed logopapersJul 12 2025
Disability progression despite disease-modifying therapy remains a major challenge in multiple sclerosis (MS). Artificial intelligence (AI) models exploiting magnetic resonance imaging (MRI) promise personalized prognostication, yet their real-world accuracy is uncertain. To systematically review and meta-analyze MRI-based AI studies predicting future disability progression in MS. Five databases were searched from inception to 17 May 2025 following PRISMA. Eligible studies used MRI in an AI model to forecast changes in the Expanded Disability Status Scale (EDSS) or equivalent metrics. Two reviewers conducted study selection, data extraction, and QUADAS-2 assessment. Random-effects meta-analysis was applied when ≥3 studies reported compatible regression statistics. Twenty-one studies with 12,252 MS patients met inclusion criteria. Five used regression on continuous EDSS, fourteen classification, one time-to-event, and one both. Conventional machine learning predominated (57%), and deep learning (38%). Median classification area under the curve (AUC) was 0.78 (range 0.57-0.86); median regression root-mean-square-error (RMSE) 1.08 EDSS points. Pooled RMSE across regression studies was 1.31 (95% CI 1.02-1.60; I<sup>2</sup> = 95%). Deep learning conferred only marginal, non-significant gains over classical algorithms. External validation appeared in six studies; calibration, decision-curve analysis and code releases were seldom reported. QUADAS-2 indicated generally low patient-selection bias but frequent index-test concerns. MRI-driven AI models predict MS disability progression with moderate accuracy, but error margins that exceed one EDSS point limit individual-level utility. Harmonized endpoints, larger multicenter cohorts, rigorous external validation, and prospective clinician-in-the-loop trials are essential before routine clinical adoption.

Vision-language model for report generation and outcome prediction in CT pulmonary angiogram.

Zhong Z, Wang Y, Wu J, Hsu WC, Somasundaram V, Bi L, Kulkarni S, Ma Z, Collins S, Baird G, Ahn SH, Feng X, Kamel I, Lin CT, Greineder C, Atalay M, Jiao Z, Bai H

pubmed logopapersJul 12 2025
Accurate and comprehensive interpretation of pulmonary embolism (PE) from Computed Tomography Pulmonary Angiography (CTPA) scans remains a clinical challenge due to the limited specificity and structure of existing AI tools. We propose an agent-based framework that integrates Vision-Language Models (VLMs) for detecting 32 PE-related abnormalities and Large Language Models (LLMs) for structured report generation. Trained on over 69,000 CTPA studies from 24,890 patients across Brown University Health (BUH), Johns Hopkins University (JHU), and the INSPECT dataset from Stanford, the model demonstrates strong performance in abnormality classification and report generation. For abnormality classification, it achieved AUROC scores of 0.788 (BUH), 0.754 (INSPECT), and 0.710 (JHU), with corresponding BERT-F1 scores of 0.891, 0.829, and 0.842. The abnormality-guided reporting strategy consistently outperformed the organ-based and holistic captioning baselines. For survival prediction, a multimodal fusion model that incorporates imaging, clinical variables, diagnostic outputs, and generated reports achieved concordance indices of 0.863 (BUH) and 0.731 (JHU), outperforming traditional PESI scores. This framework provides a clinically meaningful and interpretable solution for end-to-end PE diagnosis, structured reporting, and outcome prediction.

Prompt4Trust: A Reinforcement Learning Prompt Augmentation Framework for Clinically-Aligned Confidence Calibration in Multimodal Large Language Models

Anita Kriz, Elizabeth Laura Janes, Xing Shen, Tal Arbel

arxiv logopreprintJul 12 2025
Multimodal large language models (MLLMs) hold considerable promise for applications in healthcare. However, their deployment in safety-critical settings is hindered by two key limitations: (i) sensitivity to prompt design, and (ii) a tendency to generate incorrect responses with high confidence. As clinicians may rely on a model's stated confidence to gauge the reliability of its predictions, it is especially important that when a model expresses high confidence, it is also highly accurate. We introduce Prompt4Trust, the first reinforcement learning (RL) framework for prompt augmentation targeting confidence calibration in MLLMs. A lightweight LLM is trained to produce context-aware auxiliary prompts that guide a downstream task MLLM to generate responses in which the expressed confidence more accurately reflects predictive accuracy. Unlike conventional calibration techniques, Prompt4Trust specifically prioritizes aspects of calibration most critical for safe and trustworthy clinical decision-making. Beyond improvements driven by this clinically motivated calibration objective, our proposed method also improves task accuracy, achieving state-of-the-art medical visual question answering (VQA) performance on the PMC-VQA benchmark, which is composed of multiple-choice questions spanning diverse medical imaging modalities. Moreover, our framework trained with a small downstream task MLLM showed promising zero-shot generalization to larger MLLMs in our experiments, suggesting the potential for scalable calibration without the associated computational costs. This work demonstrates the potential of automated yet human-aligned prompt engineering for improving the the trustworthiness of MLLMs in safety critical settings. Our codebase can be found at https://github.com/xingbpshen/prompt4trust.

Incremental diagnostic value of AI-derived coronary artery calcium in 18F-flurpiridaz PET Myocardial Perfusion Imaging

Barrett, O., Shanbhag, A., Zaid, R., Miller, R. J., Lemley, M., Builoff, V., Liang, J., Kavanagh, P., Buckley, C., Dey, D., Berman, D. S., Slomka, P.

medrxiv logopreprintJul 11 2025
BackgroundPositron Emission Tomography (PET) myocardial perfusion imaging (MPI) is a powerful tool for predicting coronary artery disease (CAD). Coronary artery calcium (CAC) provides incremental risk stratification to PET-MPI and enhances diagnostic accuracy. We assessed additive value of CAC score, derived from PET/CT attenuation maps to stress TPD results using the novel 18F-flurpiridaz tracer in detecting significant CAD. Methods and ResultsPatients from 18F-flurpiridaz phase III clinical trial who underwent PET/CT MPI with 18F-flurpiridaz tracer, had available CT attenuation correction (CTAC) scans for CAC scoring, and underwent invasive coronary angiography (ICA) within a 6-month period between 2011 and 2013, were included. Total perfusion deficit (TPD) was quantified automatically, and CAC scores from CTAC scans were assessed using artificial intelligence (AI)-derived segmentation and manual scoring. Obstructive CAD was defined as [&ge;]50% stenosis in Left Main (LM) artery, or 70% or more stenosis in any of the other major epicardial vessels. Prediction performance for CAD was assessed by comparing the area under receiver operating characteristic curve (AUC) for stress TPD alone and in combination with CAC score. Among 498 patients (72% males, median age 63 years) 30.1% had CAD. Incorporating CAC score resulted in a greater AUC: manual scoring (AUC=0.87, 95% Confidence Interval [CI] 0.34-0.90; p=0.015) and AI-based scoring (AUC=0.88, 95%CI 0.85-0.90; p=0.002) compared to stress TPD alone (AUC 0.84, 95% CI 0.80-0.92). ConclusionsCombining automatically derived TPD and CAC score enhances 18F-flurpiridaz PET MPI accuracy in detecting significant CAD, offering a method that can be routinely used with PET/CT scanners without additional scanning or technologist time. CONDENSED ABSTRACTO_ST_ABSBackgroundC_ST_ABSWe assessed the added value of CAC score from hybrid PET/CT CTAC scans combined with stress TPD for detecting significant CAD using novel 18F-flurpiridaz tracer Methods and resultsPatients from the 18F-flurpiridaz phase III clinical trial (n=498, 72% male, median age 63) who underwent PET/CT MPI and ICA within 6-months were included. TPD was quantified automatically, and CAC scores were assessed by AI and manual methods. Adding CAC score to TPD improved AUC for manual (0.87) and AI-based (0.88) scoring versus TPD alone (0.84). ConclusionsCombining TPD and CAC score enhances 18F-flurpiridaz PET MPI accuracy for CAD detection O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=110 SRC="FIGDIR/small/25330013v1_ufig1.gif" ALT="Figure 1"> View larger version (37K): [email protected]@ba93d1org.highwire.dtl.DTLVardef@13eabd9org.highwire.dtl.DTLVardef@1845505_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOGraphical Abstract.C_FLOATNO Overview of the study design. C_FIG

Advancing Rare Neurological Disorder Diagnosis: Addressing Challenges with Systematic Reviews and AI-Driven MRI Meta-Trans Learning Framework for NeuroDegenerative Disorders.

Gupta A, Malhotra D

pubmed logopapersJul 11 2025
Neurological Disorders (ND) affect a large portion of the global population, impacting the brain, spinal cord, and nerves. These disorders fall into categories such as NeuroDevelopmental (NDD), NeuroBiological (NBD), and NeuroDegenerative (ND<sub>e</sub>) disorders, which range from common to rare conditions. While Artificial Intelligence (AI) has advanced healthcare diagnostics, training Machine Learning (ML) and Deep Learning (DL) models for early detection of rare neurological disorders remains a challenge due to limited patient data. This data scarcity poses a significant public health issue. Meta_Trans Learning (M<sub>TA</sub>L), which integrates Meta-Learning (M<sub>t</sub>L) and Transfer Learning (TL), offers a promising solution by leveraging small datasets to extract expert patterns, generalize findings, and reduce AI bias in healthcare. This research systematically reviews studies from 2018 to 2024 to explore how ML and M<sub>TA</sub>L techniques are applied in diagnosing NDD, NBD, and ND<sub>e</sub> disorders. It also provides statistical and parametric analysis of ML and DL methods for neurological disorder diagnosis. Lastly, the study introduces a MRI-based ND<sub>e</sub>-M<sub>TA</sub>L framework to aid healthcare professionals in early detection of rare neuro disorders, aiming to enhance diagnostic accuracy and advance healthcare practices.

Performance of Radiomics and Deep Learning Models in Predicting Distant Metastases in Soft Tissue Sarcomas: A Systematic Review and Meta-analysis.

Mirghaderi P, Valizadeh P, Haseli S, Kim HS, Azhideh A, Nyflot MJ, Schaub SK, Chalian M

pubmed logopapersJul 11 2025
Predicting distant metastases in soft tissue sarcomas (STS) is vital for guiding clinical decision-making. Recent advancements in radiomics and deep learning (DL) models have shown promise, but their diagnostic accuracy remains unclear. This meta-analysis aims to assess the performance of radiomics and DL-based models in predicting metastases in STS by analyzing pooled sensitivity and specificity. Following PRISMA guidelines, a thorough search was conducted in PubMed, Web of Science, and Embase. A random-effects model was used to estimate the pooled area under the curve (AUC), sensitivity, and specificity. Subgroup analyses were performed based on imaging modality (MRI, PET, PET/CT), feature extraction method (DL radiomics [DLR] vs. handcrafted radiomics [HCR]), incorporation of clinical features, and dataset used. Heterogeneity by I² statistic, leave-one-out sensitivity analyses, and publication bias by Egger's test assessed model robustness and potential biases. Ninetheen studies involving 1712 patients were included. The pooled AUC for predicting metastasis was 0.88 (95% CI: 0.80-0.92). The pooled AUC values were 88% (95% CI: 77-89%) for MRI-based models, 80% (95% CI: 76-92%) for PET-based models, and 91% (95% CI: 78-93%) for PET/CT-based models, with no significant differences (p = 0.75). DL-based models showed significantly higher sensitivity than HCR models (p < 0.01). Including clinical features did not significantly improve model performance (AUC: 0.90 vs. 0.88, p = 0.99). Significant heterogeneity was noted (I² > 25%), and Egger's test suggested potential publication bias (p < 0.001). Radiomics models showed promising potential for predicting metastases in STSs, with DL approaches outperforming traditional HCR. While integrating this approach into routine clinical practice is still evolving, it can aid physicians in identifying high-risk patients and implementing targeted monitoring strategies to reduce the risk of severe complications associated with metastasis. However, challenges such as heterogeneity, limited external validation, and potential publication bias persist. Future research should concentrate on standardizing imaging protocols and conducting multi-center validation studies to improve the clinical applicability of radiomics predictive models.

HNOSeg-XS: Extremely Small Hartley Neural Operator for Efficient and Resolution-Robust 3D Image Segmentation.

Wong KCL, Wang H, Syeda-Mahmood T

pubmed logopapersJul 11 2025
In medical image segmentation, convolutional neural networks (CNNs) and transformers are dominant. For CNNs, given the local receptive fields of convolutional layers, long-range spatial correlations are captured through consecutive convolutions and pooling. However, as the computational cost and memory footprint can be prohibitively large, 3D models can only afford fewer layers than 2D models with reduced receptive fields and abstract levels. For transformers, although long-range correlations can be captured by multi-head attention, its quadratic complexity with respect to input size is computationally demanding. Therefore, either model may require input size reduction to allow more filters and layers for better segmentation. Nevertheless, given their discrete nature, models trained with patch-wise training or image downsampling may produce suboptimal results when applied on higher resolutions. To address this issue, here we propose the resolution-robust HNOSeg-XS architecture. We model image segmentation by learnable partial differential equations through the Fourier neural operator which has the zero-shot super-resolution property. By replacing the Fourier transform by the Hartley transform and reformulating the problem in the frequency domain, we created the HNOSeg-XS model, which is resolution robust, fast, memory efficient, and extremely parameter efficient. When tested on the BraTS'23, KiTS'23, and MVSeg'23 datasets with a Tesla V100 GPU, HNOSeg-XS showed its superior resolution robustness with fewer than 34.7k model parameters. It also achieved the overall best inference time (< 0.24 s) and memory efficiency (< 1.8 GiB) compared to the tested CNN and transformer models<sup>1</sup>.

Ensemble of Weak Spectral Total Variation Learners: a PET-CT Case Study

Anna Rosenberg, John Kennedy, Zohar Keidar, Yehoshua Y. Zeevi, Guy Gilboa

arxiv logopreprintJul 11 2025
Solving computer vision problems through machine learning, one often encounters lack of sufficient training data. To mitigate this we propose the use of ensembles of weak learners based on spectral total-variation (STV) features (Gilboa 2014). The features are related to nonlinear eigenfunctions of the total-variation subgradient and can characterize well textures at various scales. It was shown (Burger et-al 2016) that, in the one-dimensional case, orthogonal features are generated, whereas in two-dimensions the features are empirically lowly correlated. Ensemble learning theory advocates the use of lowly correlated weak learners. We thus propose here to design ensembles using learners based on STV features. To show the effectiveness of this paradigm we examine a hard real-world medical imaging problem: the predictive value of computed tomography (CT) data for high uptake in positron emission tomography (PET) for patients suspected of skeletal metastases. The database consists of 457 scans with 1524 unique pairs of registered CT and PET slices. Our approach is compared to deep-learning methods and to Radiomics features, showing STV learners perform best (AUC=0.87), compared to neural nets (AUC=0.75) and Radiomics (AUC=0.79). We observe that fine STV scales in CT images are especially indicative for the presence of high uptake in PET.
Page 31 of 91907 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.