Sort by:
Page 4 of 98978 results

Enhanced EfficientNet-Extended Multimodal Parkinson's disease classification with Hybrid Particle Swarm and Grey Wolf Optimizer.

Raajasree K, Jaichandran R

pubmed logopapersSep 30 2025
Parkinson's disease (PD) is a chronic neurodegenerative disorder characterized by progressive loss of dopaminergic neurons in substantia nigra, resulting in both motor impairments and cognitive decline. Traditional PD classification methods are expert-dependent and time-intensive, while existing deep learning (DL) models often suffer from inconsistent accuracy, limited interpretability, and inability to fully capture PD's clinical heterogeneity. This study proposes a novel framework Enhanced EfficientNet-Extended Multimodal PD Classification with Hybrid Particle Swarm and Grey Wolf Optimizer (EEFN-XM-PDC-HybPS-GWO) to overcome these challenges. The model integrates T1-weighted MRI, DaTscan images, and gait scores from NTUA and PhysioNet repository respectively. Denoising is achieved via Multiscale Attention Variational Autoencoders (MSA-VAE), and critical regions are segmented using Semantic Invariant Multi-View Clustering (SIMVC). The Enhanced EfficientNet-Extended Multimodal (EEFN-XM) model extracts and fuses image and gait features, while HybPS-GWO optimizes classification weights. The system classifies subjects into early-stage PD, advanced-stage PD, and healthy controls (HCs). Ablation analysis confirms the hybrid optimizer's contribution to performance gains. The proposed model achieved 99.2% accuracy with stratified 5-fold cross-validation, outperforming DMFEN-PDC, MMT-CA-PDC, and LSTM-PDD-GS by 7.3%, 15.97%, and 10.43%, respectively, and reduced execution time by 33.33%. EEFN-XM-PDC-HybPS-GWO demonstrates superior accuracy, computational efficiency, and clinical relevance, particularly in early-stage diagnosis and PD classification.

Empowering Radiologists With ChatGPT-4o: Comparative Evaluation of Large Language Models and Radiologists in Cardiac Cases.

Cesur T, Gunes YC, Camur E, Dağli M

pubmed logopapersSep 30 2025
This study evaluated the diagnostic accuracy and differential diagnostic capabilities of 12 Large Language Models (LLMs), one cardiac radiologist, and 3 general radiologists in cardiac radiology. The impact of the ChatGPT-4o assistance on radiologist performance was also investigated. We collected publicly available 80 "Cardiac Case of the Month" from the Society of Thoracic Radiology website. LLMs and Radiologist-III were provided with text-based information, whereas other radiologists visually assessed the cases with and without the ChatGPT-4o assistance. Diagnostic accuracy and differential diagnosis scores (DDx scores) were analyzed using the χ2, Kruskal-Wallis, Wilcoxon, McNemar, and Mann-Whitney U tests. The unassisted diagnostic accuracy of the cardiac radiologist was 72.5%, general radiologist-I was 53.8%, and general radiologist-II was 51.3%. With ChatGPT-4o, the accuracy improved to 78.8%, 70.0%, and 63.8%, respectively. The improvements for general radiologists-I and II were statistically significant (P≤0.006). All radiologists' DDx scores improved significantly with ChatGPT-4o assistance (P≤0.05). Remarkably, Radiologist-I's GPT-4o-assisted diagnostic accuracy and DDx score were not significantly different from the Cardiac Radiologist's unassisted performance (P>0.05).Among the LLMs, Claude 3 Opus and Claude 3.5 Sonnet had the highest accuracy (81.3%), followed by Claude 3 Sonnet (70.0%). Regarding the DDx score, Claude 3 Opus outperformed all models and radiologist-III (P<0.05). The accuracy of the general radiologist-III significantly improved from 48.8% to 63.8% with GPT4o assistance (P<0.001). ChatGPT-4o may enhance the diagnostic performance of general radiologists in cardiac imaging, suggesting its potential as a diagnostic support tool. Further studies are required to assess the clinical integration.

TemMed-Bench: Evaluating Temporal Medical Image Reasoning in Vision-Language Models

Junyi Zhang, Jia-Chen Gu, Wenbo Hu, Yu Zhou, Robinson Piramuthu, Nanyun Peng

arxiv logopreprintSep 29 2025
Existing medical reasoning benchmarks for vision-language models primarily focus on analyzing a patient's condition based on an image from a single visit. However, this setting deviates significantly from real-world clinical practice, where doctors typically refer to a patient's historical conditions to provide a comprehensive assessment by tracking their changes over time. In this paper, we introduce TemMed-Bench, the first benchmark designed for analyzing changes in patients' conditions between different clinical visits, which challenges large vision-language models (LVLMs) to reason over temporal medical images. TemMed-Bench consists of a test set comprising three tasks - visual question-answering (VQA), report generation, and image-pair selection - and a supplementary knowledge corpus of over 17,000 instances. With TemMed-Bench, we conduct an evaluation of six proprietary and six open-source LVLMs. Our results show that most LVLMs lack the ability to analyze patients' condition changes over temporal medical images, and a large proportion perform only at a random-guessing level in the closed-book setting. In contrast, GPT o3, o4-mini and Claude 3.5 Sonnet demonstrate comparatively decent performance, though they have yet to reach the desired level. Furthermore, we explore augmenting the input with both retrieved visual and textual modalities in the medical domain. We also show that multi-modal retrieval augmentation yields notably higher performance gains than no retrieval and textual retrieval alone across most models on our benchmark, with the VQA task showing an average improvement of 2.59%. Overall, we compose a benchmark grounded on real-world clinical practice, and it reveals LVLMs' limitations in temporal medical image reasoning, as well as highlighting the use of multi-modal retrieval augmentation as a potentially promising direction worth exploring to address this challenge.

Precision medicine in prostate cancer: individualized treatment through radiomics, genomics, and biomarkers.

Min K, Lin Q, Qiu D

pubmed logopapersSep 29 2025
Prostate cancer (PCa) is one of the most common malignancies threatening men's health globally. A comprehensive and integrated approach is essential for its early screening, diagnosis, risk stratification, treatment guidance, and efficacy assessment. Radiomics, leveraging multi-parametric magnetic resonance imaging (mpMRI) and positron emission tomography/computed tomography (PET/CT), has demonstrated significant clinical value in the non-invasive diagnosis, aggressiveness assessment, and prognosis prediction of PCa, with substantial potential when combined with artificial intelligence. In genomics, mutations or deletions in genes such as TMPRSS2-ERG, PTEN, RB1, TP53, and DNA damage repair genes (e.g., BRCA1/2) are closely associated with disease development and progression, holding profound implications for diagnosis, treatment, and prognosis. Concurrently, biomarkers like prostate-specific antigen (PSA), novel urinary markers (e.g., PCA3), and circulating tumor cells (CTCs) are widely utilized in PCa research and management. Integrating these technologies into personalized treatment plans and the broader framework of precision medicine allows for an in-depth exploration of the relationship between specific biomarkers and disease pathogenesis. This review summarizes the current research on radiomics, genomics, and biomarkers in PCa, and discusses their future potential and applications in advancing individualized patient care.

Elemental composition analysis of calcium-based urinary stones via laser-induced breakdown spectroscopy for enhanced clinical insights.

Xie H, Huang J, Wang R, Ma X, Xie L, Zhang H, Li J, Liu C

pubmed logopapersSep 29 2025
The purpose of this study was to profile elemental composition of calcium-based urinary stones using laser-induced breakdown spectroscopy (LIBS) and develop a machine learning model to distinguish recurrence-associated profiles by integrating elemental and clinical data. A total of 122 calcium-based stones (41 calcium oxalate, 11 calcium phosphate, 49 calcium oxalate/calcium phosphate, 8 calcium oxalate/uric acid, 13 calcium phosphate/struvite) were analyzed via LIBS. Elemental intensity ratios (H/Ca, P/Ca, Mg/Ca, Sr/Ca, Na/Ca, K/Ca) were calculated using Ca (396.847 nm) as reference. Clinical variables (demographics, laboratory and imaging results, recurrence status) were retrospectively collected. A back propagation neural network (BPNN) model was trained using four data strategies: clinical-only, spectral principal components (PCs), combined PCs plus clinical, and merged raw spectral plus clinical data. The performance of these four models was evaluated. Sixteen stone samples from other medical centers were used as external validation sets. Mg and Sr were detected in most of stones. Significant correlations existed among P, Mg, Sr, and K ratios. Recurrent patients showed elevated elemental ratios (p < 0.01), higher urine pH (p < 0.01), and lower stone CT density (p = 0.044). The BPNN model with merged spectral plus clinical data achieved optimal performance in classification (test set accuracy: 94.37%), significantly outperforming clinical-only models (test set accuracy: 73.37%). The results of external validation indicate that the model has good generalization ability. LIBS reveals ubiquitous Mg and Sr in calcium-based stones and elevated elemental ratios in recurrent cases. Integration of elemental profiles with clinical data enables high-accuracy classification of recurrence-associated profiles, providing insights for potential risk stratification in urolithiasis management.

Mixed prototype correction for causal inference in medical image classification.

Hong ZL, Yang JC, Peng XR, Wu SS

pubmed logopapersSep 29 2025
The heterogeneity of medical images poses significant challenges to accurate disease diagnosis. To tackle this issue, the impact of such heterogeneity on the causal relationship between image features and diagnostic labels should be incorporated into model design, which however remains under explored. In this paper, we propose a mixed prototype correction for causal inference (MPCCI) method, aimed at mitigating the impact of unseen confounding factors on the causal relationships between medical images and disease labels, so as to enhance the diagnostic accuracy of deep learning models. The MPCCI comprises a causal inference component based on front-door adjustment and an adaptive training strategy. The causal inference component employs a multi-view feature extraction (MVFE) module to establish mediators, and a mixed prototype correction (MPC) module to execute causal interventions. Moreover, the adaptive training strategy incorporates both information purity and maturity metrics to maintain stable model training. Experimental evaluations on four medical image datasets, encompassing CT and ultrasound modalities, demonstrate the superior diagnostic accuracy and reliability of the proposed MPCCI. The code will be available at https://github.com/Yajie-Zhang/MPCCI .

Readability versus accuracy in LLM-transformed radiology reports: stakeholder preferences across reading grade levels.

Lee HS, Kim S, Kim S, Seo J, Kim WH, Kim J, Han K, Hwang SH, Lee YH

pubmed logopapersSep 29 2025
To examine how reading grade levels affect stakeholder preferences based on a trade-off between accuracy and readability. A retrospective study of 500 radiology reports from academic and community hospitals across five imaging modalities was conducted. Reports were transformed into 11 reading grade levels (7-17) using Gemini. Accuracy, readability, and preference were rated on a 5-point scale by radiologists, physicians, and laypersons. Errors (generalizations, omissions, hallucinations) and potential changes in patient management (PCPM) were identified. Ordinal logistic regression analyzed preference predictors, and weighted kappa measured interobserver reliability. Preferences varied across reading grade levels depending on stakeholder group, modality, and clinical setting. Overall, preferences peaked at grade 16, but declined at grade 17, particularly among laypersons. Lower reading grades improved readability but increased errors, while higher grades improved accuracy but reduced readability. In multivariable analysis, accuracy was the strongest predictor of preference for all groups (OR: 30.29, 33.05, and 2.16; p <0 .001), followed by readability (OR: 2.73, 1.70, 2.01; p <0.001). Higher-grade levels were generally preferred due to better accuracy, with a range of 12-17. Further increasing grade levels reduced readability sharply, limiting preference. These findings highlight the limitations of unsupervised LLM transformations and suggest the need for hybrid approaches that maintain original reports while incorporating explanatory content to balance accuracy and readability.

Democratizing AI in Healthcare with Open Medical Inference (OMI): Protocols, Data Exchange, and AI Integration.

Pelka O, Sigle S, Werner P, Schweizer ST, Iancu A, Scherer L, Kamzol NA, Eil JH, Apfelbacher T, Seletkov D, Susetzky T, May MS, Bucher AM, Fegeler C, Boeker M, Braren R, Prokosch HU, Nensa F

pubmed logopapersSep 29 2025
The integration of artificial intelligence (AI) into healthcare is transforming clinical decision-making, patient outcomes, and workflows. AI inference, applying trained models to new data, is central to this evolution, with cloud-based infrastructures enabling scalable AI deployment. The Open Medical Inference (OMI) platform democratizes AI access through open protocols and standardized data formats for seamless, interoperable healthcare data exchange. By integrating standards like FHIR and DICOMweb, OMI ensures interoperability between healthcare institutions and AI services while fostering ethical AI use through a governance framework addressing privacy, transparency, and fairness.OMI's implementation is structured into work packages, each addressing technical and ethical aspects. These include expanding the Medical Informatics Initiative (MII) Core Dataset for medical imaging, developing infrastructure for AI inference, and creating an open-source DICOMweb adapter for legacy systems. Standardized data formats ensure interoperability, while the AI Governance Framework promotes trust and responsible AI use.The project aims to establish an interoperable AI network across healthcare institutions, connecting existing infrastructures and AI services to enhance clinical outcomes. · OMI develops open protocols and standardized data formats for seamless healthcare data exchange.. · Integration with FHIR and DICOMweb ensures interoperability between healthcare systems and AI services.. · A governance framework addresses privacy, transparency, and fairness in AI usage.. · Work packages focus on expanding datasets, creating infrastructure, and enabling legacy system integration.. · The project aims to create a scalable, secure, and interoperable AI network in healthcare.. · Pelka O, Sigle S, Werner P et al. Democratizing AI in Healthcare with Open Medical Inference (OMI): Protocols, Data Exchange, and AI Integration. Rofo 2025; DOI 10.1055/a-2651-6653.

Radiology's Last Exam (RadLE): Benchmarking Frontier Multimodal AI Against Human Experts and a Taxonomy of Visual Reasoning Errors in Radiology

Suvrankar Datta, Divya Buchireddygari, Lakshmi Vennela Chowdary Kaza, Mrudula Bhalke, Kautik Singh, Ayush Pandey, Sonit Sai Vasipalli, Upasana Karnwal, Hakikat Bir Singh Bhatti, Bhavya Ratan Maroo, Sanjana Hebbar, Rahul Joseph, Gurkawal Kaur, Devyani Singh, Akhil V, Dheeksha Devasya Shama Prasad, Nishtha Mahajan, Ayinaparthi Arisha, Rajesh Vanagundi, Reet Nandy, Kartik Vuthoo, Snigdhaa Rajvanshi, Nikhileswar Kondaveeti, Suyash Gunjal, Rishabh Jain, Rajat Jain, Anurag Agrawal

arxiv logopreprintSep 29 2025
Generalist multimodal AI systems such as large language models (LLMs) and vision language models (VLMs) are increasingly accessed by clinicians and patients alike for medical image interpretation through widely available consumer-facing chatbots. Most evaluations claiming expert level performance are on public datasets containing common pathologies. Rigorous evaluation of frontier models on difficult diagnostic cases remains limited. We developed a pilot benchmark of 50 expert-level "spot diagnosis" cases across multiple imaging modalities to evaluate the performance of frontier AI models against board-certified radiologists and radiology trainees. To mirror real-world usage, the reasoning modes of five popular frontier AI models were tested through their native web interfaces, viz. OpenAI o3, OpenAI GPT-5, Gemini 2.5 Pro, Grok-4, and Claude Opus 4.1. Accuracy was scored by blinded experts, and reproducibility was assessed across three independent runs. GPT-5 was additionally evaluated across various reasoning modes. Reasoning quality errors were assessed and a taxonomy of visual reasoning errors was defined. Board-certified radiologists achieved the highest diagnostic accuracy (83%), outperforming trainees (45%) and all AI models (best performance shown by GPT-5: 30%). Reliability was substantial for GPT-5 and o3, moderate for Gemini 2.5 Pro and Grok-4, and poor for Claude Opus 4.1. These findings demonstrate that advanced frontier models fall far short of radiologists in challenging diagnostic cases. Our benchmark highlights the present limitations of generalist AI in medical imaging and cautions against unsupervised clinical use. We also provide a qualitative analysis of reasoning traces and propose a practical taxonomy of visual reasoning errors by AI models for better understanding their failure modes, informing evaluation standards and guiding more robust model development.

Toward a Vision-Language Foundation Model for Medical Data: Multimodal Dataset and Benchmarks for Vietnamese PET/CT Report Generation

Huu Tien Nguyen, Dac Thai Nguyen, The Minh Duc Nguyen, Trung Thanh Nguyen, Thao Nguyen Truong, Huy Hieu Pham, Johan Barthelemy, Minh Quan Tran, Thanh Tam Nguyen, Quoc Viet Hung Nguyen, Quynh Anh Chau, Hong Son Mai, Thanh Trung Nguyen, Phi Le Nguyen

arxiv logopreprintSep 29 2025
Vision-Language Foundation Models (VLMs), trained on large-scale multimodal datasets, have driven significant advances in Artificial Intelligence by enabling rich cross-modal reasoning. Despite their success in general domains, applying these models to medical imaging remains challenging due to the limited availability of diverse imaging modalities and multilingual clinical data. Most existing medical VLMs are trained on a subset of imaging modalities and focus primarily on high-resource languages, thus limiting their generalizability and clinical utility. To address these limitations, we introduce a novel Vietnamese-language multimodal medical dataset comprising 1,567,062 paired CT-PET images and corresponding 2,757 full-length clinical reports. This dataset is designed to fill two pressing gaps in medical AI development: (1) the lack of PET/CT imaging data in existing VLMs training corpora, which hinders the development of models capable of handling functional imaging tasks; and (2) the underrepresentation of low-resource languages, particularly the Vietnamese language, in medical vision-language research. To the best of our knowledge, this is the first dataset to provide comprehensive PET/CT-report pairs in Vietnamese. We further introduce a training framework to enhance VLMs' learning, including data augmentation and expert-validated test sets. We conduct comprehensive experiments benchmarking state-of-the-art VLMs on downstream tasks, including medical report generation and visual question answering. The experimental results show that incorporating our dataset significantly improves the performance of existing VLMs. We believe this dataset and benchmark will serve as a pivotal step in advancing the development of more robust VLMs for medical imaging, particularly in low-resource languages, and improving their clinical relevance in Vietnamese healthcare.
Page 4 of 98978 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.