Sort by:
Page 37 of 79781 results

The Potential of ChatGPT as an Aiding Tool for the Neuroradiologist

nikola, s., paz, d.

medrxiv logopreprintJul 14 2025
PurposeThis study aims to explore whether ChatGPT can serve as an assistive tool for neuroradiologists in establishing a reasonable differential diagnosis in central nervous system tumors based on MRI images characteristics. MethodsThis retrospective study included 50 patients aged 18-90 who underwent imaging and surgery at the Western Galilee Medical Center. ChatGPT was provided with demographic and radiological information of the patients to generate differential diagnoses. We compared ChatGPTs performance to an experienced neuroradiologist, using pathological reports as the gold standard. Quantitative data were described using means and standard deviations, median and range. Qualitative data were described using frequencies and percentages. The level of agreement between examiners (neuroradiologist versus ChatGPT) was assessed using Fleiss kappa coefficient. A significance value below 5% was considered statistically significant. Statistical analysis was performed using IBM SPSS Statistics, version 27. ResultsThe results showed that while ChatGPT demonstrated good performance, particularly in identifying common tumors such as glioblastoma and meningioma, its overall accuracy (48%) was lower than that of the neuroradiologist (70%). The AI tool showed moderate agreement with the neuroradiologist (kappa = 0.445) and with pathology results (kappa = 0.419). ChatGPTs performance varied across tumor types, performing better with common tumors but struggling with rarer ones. ConclusionThis study suggests that ChatGPT has the potential to serve as an assistive tool in neuroradiology for establishing a reasonable differential diagnosis in central nervous system tumors based on MRI images characteristics. However, its limitations and potential risks must be considered, and it should therefore be used with caution.

A Clinically-Informed Framework for Evaluating Vision-Language Models in Radiology Report Generation: Taxonomy of Errors and Risk-Aware Metric

Guan, H., Hou, P. C., Hong, P., Wang, L., Zhang, W., Du, X., Zhou, Z., Zhou, L.

medrxiv logopreprintJul 14 2025
Recent advances in vision-language models (VLMs) have enabled automatic radiology report generation, yet current evaluation methods remain limited to general-purpose NLP metrics or coarse classification-based clinical scores. In this study, we propose a clinically informed evaluation framework for VLM-generated radiology reports that goes beyond traditional performance measures. We define a taxonomy of 12 radiology-specific error types, each annotated with clinical risk levels (low, medium, high) in collaboration with physicians. Using this framework, we conduct a comprehensive error analysis of three representative VLMs, i.e., DeepSeek VL2, CXR-LLaVA, and CheXagent, on 685 gold-standard, expert-annotated MIMIC-CXR cases. We further introduce a risk-aware evaluation metric, the Clinical Risk-weighted Error Score for Text-generation (CREST), to quantify safety impact. Our findings reveal critical model vulnerabilities, common error patterns, and condition-specific risk profiles, offering actionable insights for model development and deployment. This work establishes a safety-centric foundation for evaluating and improving medical report generation models. The source code of our evaluation framework, including CREST computation and error taxonomy analysis, is available at https://github.com/guanharry/VLM-CREST.

Brain Stroke Detection and Classification Using CT Imaging with Transformer Models and Explainable AI

Shomukh Qari, Maha A. Thafar

arxiv logopreprintJul 13 2025
Stroke is one of the leading causes of death globally, making early and accurate diagnosis essential for improving patient outcomes, particularly in emergency settings where timely intervention is critical. CT scans are the key imaging modality because of their speed, accessibility, and cost-effectiveness. This study proposed an artificial intelligence framework for multiclass stroke classification (ischemic, hemorrhagic, and no stroke) using CT scan images from a dataset provided by the Republic of Turkey's Ministry of Health. The proposed method adopted MaxViT, a state-of-the-art Vision Transformer, as the primary deep learning model for image-based stroke classification, with additional transformer variants (vision transformer, transformer-in-transformer, and ConvNext). To enhance model generalization and address class imbalance, we applied data augmentation techniques, including synthetic image generation. The MaxViT model trained with augmentation achieved the best performance, reaching an accuracy and F1-score of 98.00%, outperforming all other evaluated models and the baseline methods. The primary goal of this study was to distinguish between stroke types with high accuracy while addressing crucial issues of transparency and trust in artificial intelligence models. To achieve this, Explainable Artificial Intelligence (XAI) was integrated into the framework, particularly Grad-CAM++. It provides visual explanations of the model's decisions by highlighting relevant stroke regions in the CT scans and establishing an accurate, interpretable, and clinically applicable solution for early stroke detection. This research contributed to the development of a trustworthy AI-assisted diagnostic tool for stroke, facilitating its integration into clinical practice and enhancing access to timely and optimal stroke diagnosis in emergency departments, thereby saving more lives.

Prompt Engineering in Segment Anything Model: Methodologies, Applications, and Emerging Challenges

Yidong Jiang

arxiv logopreprintJul 13 2025
The Segment Anything Model (SAM) has revolutionized image segmentation through its innovative prompt-based approach, yet the critical role of prompt engineering in its success remains underexplored. This paper presents the first comprehensive survey focusing specifically on prompt engineering techniques for SAM and its variants. We systematically organize and analyze the rapidly growing body of work in this emerging field, covering fundamental methodologies, practical applications, and key challenges. Our review reveals how prompt engineering has evolved from simple geometric inputs to sophisticated multimodal approaches, enabling SAM's adaptation across diverse domains including medical imaging and remote sensing. We identify unique challenges in prompt optimization and discuss promising research directions. This survey fills an important gap in the literature by providing a structured framework for understanding and advancing prompt engineering in foundation models for segmentation.

A Survey on Medical Image Compression: From Traditional to Learning-Based

Guofeng Tong, Sixuan Liu, Yang Lv, Hanyu Pei, Feng-Lei Fan

arxiv logopreprintJul 13 2025
The exponential growth of medical imaging has created significant challenges in data storage, transmission, and management for healthcare systems. In this vein, efficient compression becomes increasingly important. Unlike natural image compression, medical image compression prioritizes preserving diagnostic details and structural integrity, imposing stricter quality requirements and demanding fast, memory-efficient algorithms that balance computational complexity with clinically acceptable reconstruction quality. Meanwhile, the medical imaging family includes a plethora of modalities, each possessing different requirements. For example, 2D medical image (e.g., X-rays, histopathological images) compression focuses on exploiting intra-slice spatial redundancy, while volumetric medical image faces require handling intra-slice and inter-slice spatial correlations, and 4D dynamic imaging (e.g., time-series CT/MRI, 4D ultrasound) additionally demands processing temporal correlations between consecutive time frames. Traditional compression methods, grounded in mathematical transforms and information theory principles, provide solid theoretical foundations, predictable performance, and high standardization levels, with extensive validation in clinical environments. In contrast, deep learning-based approaches demonstrate remarkable adaptive learning capabilities and can capture complex statistical characteristics and semantic information within medical images. This comprehensive survey establishes a two-facet taxonomy based on data structure (2D vs 3D/4D) and technical approaches (traditional vs learning-based), thereby systematically presenting the complete technological evolution, analyzing the unique technical challenges, and prospecting future directions in medical image compression.

Integrating LLMs into Radiology Education: An Interpretation-Centric Framework for Enhanced Learning While Supporting Workflow.

Lyo SK, Cook TS

pubmed logopapersJul 12 2025
Radiology education is challenged by increasing clinical workloads, limiting trainee supervision time and hindering real-time feedback. Large language models (LLMs) can enhance radiology education by providing real-time guidance, feedback, and educational resources while supporting efficient clinical workflows. We present an interpretation-centric framework for integrating LLMs into radiology education subdivided into distinct phases spanning pre-dictation preparation, active dictation support, and post-dictation analysis. In the pre-dictation phase, LLMs can analyze clinical data and provide context-aware summaries of each case, suggest relevant educational resources, and triage cases based on their educational value. In the active dictation phase, LLMs can provide real-time educational support through processes such as differential diagnosis support, completeness guidance, classification schema assistance, structured follow-up guidance, and embedded educational resources. In the post-dictation phase, LLMs can be used to analyze discrepancies between trainee and attending reports, identify areas for improvement, provide targeted educational recommendations, track trainee performance over time, and analyze the radiologic entities that trainees encounter. This framework offers a comprehensive approach to integrating LLMs into radiology education, with the potential to enhance trainee learning while preserving clinical efficiency.

Diabetic Tibial Neuropathy Prediction: Improving interpretability of Various Machine-Learning Models Based on Multimodal-Ultrasound Features Using SHAP Methodology.

Chen Y, Sun Z, Zhong H, Chen Y, Wu X, Su L, Lai Z, Zheng T, Lyu G, Su Q

pubmed logopapersJul 12 2025
This study aimed to develop and evaluate eight machine learning models based on multimodal ultrasound to precisely predict of diabetic tibial neuropathy (DTN) in patients. Additionally, the SHapley Additive exPlanations(SHAP)framework was introduced to quantify the importance of each feature variable, providing a precise and noninvasive assessment tool for DTN patients, optimizing clinical management strategies, and enhancing patient prognosis. A prospective analysis was conducted using multimodal ultrasound and clinical data from 255 suspected DTN patients who visited the Second Affiliated Hospital of Fujian Medical University between January 2024 and November 2024. Key features were selected using Least Absolute Shrinkage and Selection Operator (LASSO) regression. Predictive models were constructed using Extreme Gradient Boosting (XGB), Logistic Regression, Support Vector Machines, k-Nearest Neighbors, Random Forest, Decision Tree, Naïve Bayes, and Neural Network. The SHAP method was employed to refine model interpretability. Furthermore, in order to verify the generalization degree of the model, this study also collected 135 patients from three other tertiary hospitals for external test. LASSO regression identified Echo intensity(EI), Cross-sectional area (CSA), Mean elasticity value(Emean), Superb microvascular imaging(SMI), and History of smoking were key features for DTN prediction. The XGB model achieved an Area Under the Curve (AUC) of 0.94, 0.83 and 0.79 in the training, internal test and external test sets, respectively. SHAP analysis highlighted the ranking significance of EI, CSA, Emean, SMI, and History of smoking. Personalized prediction explanations provided by theSHAP values demonstrated the contribution of each feature to the final prediction, and enhancing model interpretability. Furthermore, decision plots depicted how different features influenced mispredictions, thereby facilitating further model optimization or feature adjustment. This study proposed a DTN prediction model based on machine-learning algorithms applied to multimodal ultrasound data. The results indicated the superior performance of the XGB model and its interpretability was enhanced using SHAP analysis. This cost-effective and user-friendly approach provides potential support for personalized treatment and precision medicine for DTN.

Seeing is Believing-On the Utility of CT in Phenotyping COPD.

Awan HA, Chaudhary MFA, Reinhardt JM

pubmed logopapersJul 12 2025
Chronic obstructive pulmonary disease (COPD) is a heterogeneous condition with complicated structural and functional impairments. For decades now, chest computed tomography (CT) has been used to quantify various abnormalities related to COPD. More recently, with the newer data-driven approaches, biomarker development and validation have evolved rapidly. Studies now target multiple anatomical structures including lung parenchyma, the airways, the vasculature, and the fissures to better characterize COPD. This review explores the evolution of chest CT biomarkers in COPD, beginning with traditional thresholding approaches that quantify emphysema and airway dimensions. We then highlight some of the texture analysis efforts that have been made over the years for subtyping lung tissue. We also discuss image registration-based biomarkers that have enabled spatially-aware mechanisms for understanding local abnormalities within the lungs. More recently, deep learning has enabled automated biomarker extraction, offering improved precision in phenotype characterization and outcome prediction. We highlight the most recent of these approaches as well. Despite these advancements, several challenges remain in terms of dataset heterogeneity, model generalizability, and clinical interpretability. This review lastly provides a structured overview of these limitations and highlights future potential of CT biomarkers in personalized COPD management.

Automated MRI protocoling in neuroradiology in the era of large language models.

Reiner LN, Chelbi M, Fetscher L, Stöckel JC, Csapó-Schmidt C, Guseynova S, Al Mohamad F, Bressem KK, Nawabi J, Siebert E, Wattjes MP, Scheel M, Meddeb A

pubmed logopapersJul 11 2025
This study investigates the automation of MRI protocoling, a routine task in radiology, using large language models (LLMs), comparing an open-source (LLama 3.1 405B) and a proprietary model (GPT-4o) with and without retrieval-augmented generation (RAG), a method for incorporating domain-specific knowledge. This retrospective study included MRI studies conducted between January and December 2023, along with institution-specific protocol assignment guidelines. Clinical questions were extracted, and a neuroradiologist established the gold standard protocol. LLMs were tasked with assigning MRI protocols and contrast medium administration with and without RAG. The results were compared to protocols selected by four radiologists. Token-based symmetric accuracy, the Wilcoxon signed-rank test, and the McNemar test were used for evaluation. Data from 100 neuroradiology reports (mean age = 54.2 years ± 18.41, women 50%) were included. RAG integration significantly improved accuracy in sequence and contrast media prediction for LLama 3.1 (Sequences: 38% vs. 70%, P < .001, Contrast Media: 77% vs. 94%, P < .001), and GPT-4o (Sequences: 43% vs. 81%, P < .001, Contrast Media: 79% vs. 92%, P = .006). GPT-4o outperformed LLama 3.1 in MRI sequence prediction (81% vs. 70%, P < .001), with comparable accuracies to the radiologists (81% ± 0.21, P = .43). Both models equaled radiologists in predicting contrast media administration (LLama 3.1 RAG: 94% vs. 91% ± 0.2, P = .37, GPT-4o RAG: 92% vs. 91% ± 0.24, P = .48). Large language models show great potential as decision-support tools for MRI protocoling, with performance similar to radiologists. RAG enhances the ability of LLMs to provide accurate, institution-specific protocol recommendations.

Multivariate whole brain neurodegenerative-cognitive-clinical severity mapping in the Alzheimer's disease continuum using explainable AI

Murad, T., Miao, H., Thakuri, D. S., Darekar, G., Chand, G.

medrxiv logopreprintJul 11 2025
Neurodegeneration and cognitive impairment are commonly reported in Alzheimers disease (AD); however, their multivariate links are not well understood. To map the multivariate relationships between whole brain neurodegenerative (WBN) markers, global cognition, and clinical severity in the AD continuum, we developed the explainable artificial intelligence (AI) methods, validated on semi-simulated data, and applied the outperforming method systematically to large-scale experimental data (N=1,756). The outperforming explainable AI method showed robust performance in predicting cognition from regional WBN markers and identified the ground-truth simulated dominant brain regions contributing to cognition. This method also showed excellent performance on experimental data and identified several prominent WBN regions hierarchically and simultaneously associated with cognitive declines across the AD continuum. These multivariate regional features also correlated with clinical severity, suggesting their clinical relevance. Overall, this study innovatively mapped the multivariate regional WBN-cognitive-clinical severity relationships in the AD continuum, thereby significantly advancing AD-relevant neurobiological pathways.
Page 37 of 79781 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.