Sort by:
Page 7 of 14134 results

ReXVQA: A Large-scale Visual Question Answering Benchmark for Generalist Chest X-ray Understanding

Ankit Pal, Jung-Oh Lee, Xiaoman Zhang, Malaikannan Sankarasubbu, Seunghyeon Roh, Won Jung Kim, Meesun Lee, Pranav Rajpurkar

arxiv logopreprintJun 4 2025
We present ReXVQA, the largest and most comprehensive benchmark for visual question answering (VQA) in chest radiology, comprising approximately 696,000 questions paired with 160,000 chest X-rays studies across training, validation, and test sets. Unlike prior efforts that rely heavily on template based queries, ReXVQA introduces a diverse and clinically authentic task suite reflecting five core radiological reasoning skills: presence assessment, location analysis, negation detection, differential diagnosis, and geometric reasoning. We evaluate eight state-of-the-art multimodal large language models, including MedGemma-4B-it, Qwen2.5-VL, Janus-Pro-7B, and Eagle2-9B. The best-performing model (MedGemma) achieves 83.24% overall accuracy. To bridge the gap between AI performance and clinical expertise, we conducted a comprehensive human reader study involving 3 radiology residents on 200 randomly sampled cases. Our evaluation demonstrates that MedGemma achieved superior performance (83.84% accuracy) compared to human readers (best radiology resident: 77.27%), representing a significant milestone where AI performance exceeds expert human evaluation on chest X-ray interpretation. The reader study reveals distinct performance patterns between AI models and human experts, with strong inter-reader agreement among radiologists while showing more variable agreement patterns between human readers and AI models. ReXVQA establishes a new standard for evaluating generalist radiological AI systems, offering public leaderboards, fine-grained evaluation splits, structured explanations, and category-level breakdowns. This benchmark lays the foundation for next-generation AI systems capable of mimicking expert-level clinical reasoning beyond narrow pathology classification. Our dataset will be open-sourced at https://huggingface.co/datasets/rajpurkarlab/ReXVQA

Open-PMC-18M: A High-Fidelity Large Scale Medical Dataset for Multimodal Representation Learning

Negin Baghbanzadeh, Sajad Ashkezari, Elham Dolatabadi, Arash Afkanpour

arxiv logopreprintJun 3 2025
Compound figures, which are multi-panel composites containing diverse subfigures, are ubiquitous in biomedical literature, yet large-scale subfigure extraction remains largely unaddressed. Prior work on subfigure extraction has been limited in both dataset size and generalizability, leaving a critical open question: How does high-fidelity image-text alignment via large-scale subfigure extraction impact representation learning in vision-language models? We address this gap by introducing a scalable subfigure extraction pipeline based on transformer-based object detection, trained on a synthetic corpus of 500,000 compound figures, and achieving state-of-the-art performance on both ImageCLEF 2016 and synthetic benchmarks. Using this pipeline, we release OPEN-PMC-18M, a large-scale high quality biomedical vision-language dataset comprising 18 million clinically relevant subfigure-caption pairs spanning radiology, microscopy, and visible light photography. We train and evaluate vision-language models on our curated datasets and show improved performance across retrieval, zero-shot classification, and robustness benchmarks, outperforming existing baselines. We release our dataset, models, and code to support reproducible benchmarks and further study into biomedical vision-language modeling and representation learning.

Open-PMC-18M: A High-Fidelity Large Scale Medical Dataset for Multimodal Representation Learning

Negin Baghbanzadeh, Sajad Ashkezari, Elham Dolatabadi, Arash Afkanpour

arxiv logopreprintJun 3 2025
Compound figures, which are multi-panel composites containing diverse subfigures, are ubiquitous in biomedical literature, yet large-scale subfigure extraction remains largely unaddressed. Prior work on subfigure extraction has been limited in both dataset size and generalizability, leaving a critical open question: How does high-fidelity image-text alignment via large-scale subfigure extraction impact representation learning in vision-language models? We address this gap by introducing a scalable subfigure extraction pipeline based on transformer-based object detection, trained on a synthetic corpus of 500,000 compound figures, and achieving state-of-the-art performance on both ImageCLEF 2016 and synthetic benchmarks. Using this pipeline, we release OPEN-PMC-18M, a large-scale high quality biomedical vision-language dataset comprising 18 million clinically relevant subfigure-caption pairs spanning radiology, microscopy, and visible light photography. We train and evaluate vision-language models on our curated datasets and show improved performance across retrieval, zero-shot classification, and robustness benchmarks, outperforming existing baselines. We release our dataset, models, and code to support reproducible benchmarks and further study into biomedical vision-language modeling and representation learning.

Upper Airway Volume Predicts Brain Structure and Cognition in Adolescents.

Kanhere A, Navarathna N, Yi PH, Parekh VS, Pickle J, Cloak CC, Ernst T, Chang L, Li D, Redline S, Isaiah A

pubmed logopapersJun 3 2025
One in ten children experiences sleep-disordered breathing (SDB). Untreated SDB is associated with poor cognition, but the underlying mechanisms are less understood. We assessed the relationship between magnetic resonance imaging (MRI)-derived upper airway volume and children's cognition and regional cortical gray matter volumes. We used five-year data from the Adolescent Brain Cognitive Development study (n=11,875 children, 9-10 years at baseline). Upper airway volumes were derived using a deep learning model applied to 5,552,640 brain MRI slices. The primary outcome was the Total Cognition Composite score from the National Institutes of Health Toolbox (NIH-TB). Secondary outcomes included other NIH-TB measures and cortical gray matter volumes. The habitual snoring group had significantly smaller airway volumes than non-snorers (mean difference=1.2 cm<sup>3</sup>; 95% CI, 1.0-1.4 cm<sup>3</sup>; P<0.001). Deep learning-derived airway volume predicted the Total Cognition Composite score (estimated mean difference=3.68 points; 95% CI, 2.41-4.96; P<0.001) per one-unit increase in the natural log of airway volume (~2.7-fold raw volume increase). This airway volume increase was also associated with an average 0.02 cm<sup>3</sup> increase in right temporal pole volume (95% CI, 0.01-0.02 cm<sup>3</sup>; P<0.001). Similar airway volume predicted most NIH-TB domain scores and multiple frontal and temporal gray matter volumes. These brain volumes mediated the relationship between airway volume and cognition. We demonstrate a novel application of deep learning-based airway segmentation in a large pediatric cohort. Upper airway volume is a potential biomarker for cognitive outcomes in pediatric SDB, offers insights into neurobiological mechanisms, and informs future studies on risk stratification. This article is open access and distributed under the terms of the Creative Commons Attribution Non-Commercial No Derivatives License 4.0 (http://creativecommons.org/licenses/by-nc-nd/4.0/).

MedBookVQA: A Systematic and Comprehensive Medical Benchmark Derived from Open-Access Book

Sau Lai Yip, Sunan He, Yuxiang Nie, Shu Pui Chan, Yilin Ye, Sum Ying Lam, Hao Chen

arxiv logopreprintJun 1 2025
The accelerating development of general medical artificial intelligence (GMAI), powered by multimodal large language models (MLLMs), offers transformative potential for addressing persistent healthcare challenges, including workforce deficits and escalating costs. The parallel development of systematic evaluation benchmarks emerges as a critical imperative to enable performance assessment and provide technological guidance. Meanwhile, as an invaluable knowledge source, the potential of medical textbooks for benchmark development remains underexploited. Here, we present MedBookVQA, a systematic and comprehensive multimodal benchmark derived from open-access medical textbooks. To curate this benchmark, we propose a standardized pipeline for automated extraction of medical figures while contextually aligning them with corresponding medical narratives. Based on this curated data, we generate 5,000 clinically relevant questions spanning modality recognition, disease classification, anatomical identification, symptom diagnosis, and surgical procedures. A multi-tier annotation system categorizes queries through hierarchical taxonomies encompassing medical imaging modalities (42 categories), body anatomies (125 structures), and clinical specialties (31 departments), enabling nuanced analysis across medical subdomains. We evaluate a wide array of MLLMs, including proprietary, open-sourced, medical, and reasoning models, revealing significant performance disparities across task types and model categories. Our findings highlight critical capability gaps in current GMAI systems while establishing textbook-derived multimodal benchmarks as essential evaluation tools. MedBookVQA establishes textbook-derived benchmarking as a critical paradigm for advancing clinical AI, exposing limitations in GMAI systems while providing anatomically structured performance metrics across specialties.

Multimodal Neuroimaging Based Alzheimer's Disease Diagnosis Using Evolutionary RVFL Classifier.

Goel T, Sharma R, Tanveer M, Suganthan PN, Maji K, Pilli R

pubmed logopapersJun 1 2025
Alzheimer's disease (AD) is one of the most known causes of dementia which can be characterized by continuous deterioration in the cognitive skills of elderly people. It is a non-reversible disorder that can only be cured if detected early, which is known as mild cognitive impairment (MCI). The most common biomarkers to diagnose AD are structural atrophy and accumulation of plaques and tangles, which can be detected using magnetic resonance imaging (MRI) and positron emission tomography (PET) scans. Therefore, the present paper proposes wavelet transform-based multimodality fusion of MRI and PET scans to incorporate structural and metabolic information for the early detection of this life-taking neurodegenerative disease. Further, the deep learning model, ResNet-50, extracts the fused images' features. The random vector functional link (RVFL) with only one hidden layer is used to classify the extracted features. The weights and biases of the original RVFL network are being optimized by using an evolutionary algorithm to get optimum accuracy. All the experiments and comparisons are performed over the publicly available Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset to demonstrate the suggested algorithm's efficacy.

Integrating anatomy and electrophysiology in the healthy human heart: Insights from biventricular statistical shape analysis using universal coordinates.

Van Santvliet L, Zappon E, Gsell MAF, Thaler F, Blondeel M, Dymarkowski S, Claessen G, Willems R, Urschler M, Vandenberk B, Plank G, De Vos M

pubmed logopapersJun 1 2025
A cardiac digital twin is a virtual replica of a patient-specific heart, mimicking its anatomy and physiology. A crucial step of building a cardiac digital twin is anatomical twinning, where the computational mesh of the digital twin is tailored to the patient-specific cardiac anatomy. In a number of studies, the effect of anatomical variation on clinically relevant functional measurements like electrocardiograms (ECGs) is investigated, using computational simulations. While such a simulation environment provides researchers with a carefully controlled ground truth, the impact of anatomical differences on functional measurements in real-world patients remains understudied. In this study, we develop a biventricular statistical shape model and use it to quantify the effect of biventricular anatomy on ECG-derived and demographic features, providing novel insights for the development of digital twins of cardiac electrophysiology. To this end, a dataset comprising high-resolution cardiac CT scans from 271 healthy individuals, including athletes, is utilized. Furthermore, a novel, universal, ventricular coordinate-based method is developed to establish lightweight shape correspondence. The performance of the shape model is rigorously established, focusing on its dimensionality reduction capabilities and the training data requirements. The most important variability in healthy ventricles captured by the model is their size, followed by their elongation. These anatomical factors are found to significantly correlate with ECG-derived and demographic features. Additionally, a comprehensive synthetic cohort is made available, featuring ready-to-use biventricular meshes with fiber structures and anatomical region annotations. These meshes are well-suited for electrophysiological simulations.

Multi-level feature fusion network for kidney disease detection.

Rehman Khan SU

pubmed logopapersJun 1 2025
Kidney irregularities pose a significant public health challenge, often leading to severe complications, yet the limited availability of nephrologists makes early detection costly and time-consuming. To address this issue, we propose a deep learning framework for automated kidney disease detection, leveraging feature fusion and sequential modeling techniques to enhance diagnostic accuracy. Our study thoroughly evaluates six pretrained models under identical experimental conditions, identifying ResNet50 and VGG19 as the highly efficient models for feature extraction due to their deep residual learning and hierarchical representations. Our proposed methodology integrates feature fusion with an inception block to extract diverse feature representations while maintaining imbalance dataset overhead. To enhance sequential learning and capture long-term dependencies in disease progression, ConvLSTM is incorporated after feature fusion. Additionally, Inception block is employed after ConvLSTM to refine hierarchical feature extraction, further strengthening the proposed model ability to leverage both spatial and temporal patterns. To validate our approach, we introduce a new named Multiple Hospital Collected (MHC-CT) dataset, consisting of 1860 tumor and 1024 normal kidney CT scans, meticulously annotated by medical experts. Our model achieves 99.60 % accuracy on this dataset, demonstrating its robustness in binary classification. Furthermore, to assess its generalization capability, we evaluate the model on a publicly available benchmark multiclass CT scan dataset, achieving 91.31 % accuracy. The superior performance is attributed to the effective feature fusion using inception blocks and the sequential learning capabilities of ConvLSTM, which together enhance spatial and temporal feature representations. These results highlight the efficacy of the proposed framework in automating kidney disease detection, providing a reliable, and efficient solution for clinical decision-making. https://github.com/VS-EYE/KidneyDiseaseDetection.git.

The impact of training image quality with a novel protocol on artificial intelligence-based LGE-MRI image segmentation for potential atrial fibrillation management.

Berezhnoy AK, Kalinin AS, Parshin DA, Selivanov AS, Demin AG, Zubov AG, Shaidullina RS, Aitova AA, Slotvitsky MM, Kalemberg AA, Kirillova VS, Syrovnev VA, Agladze KI, Tsvelaya VA

pubmed logopapersJun 1 2025
Atrial fibrillation (AF) is the most common cardiac arrhythmia, affecting up to 2 % of the population. Catheter ablation is a promising treatment for AF, particularly for paroxysmal AF patients, but it often has high recurrence rates. Developing in silico models of patients' atria during the ablation procedure using cardiac MRI data may help reduce these rates. This study aims to develop an effective automated deep learning-based segmentation pipeline by compiling a specialized dataset and employing standardized labeling protocols to improve segmentation accuracy and efficiency. In doing so, we aim to achieve the highest possible accuracy and generalization ability while minimizing the burden on clinicians involved in manual data segmentation. We collected LGE-MRI data from VMRC and the cDEMRIS database. Two specialists manually labeled the data using standardized protocols to reduce subjective errors. Neural network (nnU-Net and smpU-Net++) performance was evaluated using statistical tests, including sensitivity and specificity analysis. A new database of LGE-MRI images, based on manual segmentation, was created (VMRC). Our approach with consistent labeling protocols achieved a Dice coefficient of 92.4 % ± 0.8 % for the cavity and 64.5 % ± 1.9 % for LA walls. Using the pre-trained RIFE model, we attained a Dice score of approximately 89.1 % ± 1.6 % for atrial LGE-MRI imputation, outperforming classical methods. Sensitivity and specificity values demonstrated substantial enhancement in the performance of neural networks trained with the new protocol. Standardized labeling and RIFE applications significantly improved machine learning tool efficiency for constructing 3D LA models. This novel approach supports integrating state-of-the-art machine learning methods into broader in silico pipelines for predicting ablation outcomes in AF patients.

Optimized attention-enhanced U-Net for autism detection and region localization in MRI.

K VRP, Bindu CH, Rama Devi K

pubmed logopapersJun 1 2025
Autism spectrum disorder (ASD) is a neurodevelopmental condition that affects a child's cognitive and social skills, often diagnosed only after symptoms appear around age 2. Leveraging MRI for early ASD detection can improve intervention outcomes. This study proposes a framework for autism detection and region localization using an optimized deep learning approach with attention mechanisms. The pipeline includes MRI image collection, pre-processing (bias field correction, histogram equalization, artifact removal, and non-local mean filtering), and autism classification with a Symmetric Structured MobileNet with Attention Mechanism (SSM-AM). Enhanced by Refreshing Awareness-aided Election-Based Optimization (RA-EBO), SSM-AM achieves robust classification. Abnormality region localization utilizes a Multiscale Dilated Attention-based Adaptive U-Net (MDA-AUnet) further optimized by RA-EBO. Experimental results demonstrate that our proposed model outperforms existing methods, achieving an accuracy of 97.29%, sensitivity of 97.27%, specificity of 97.36%, and precision of 98.98%, significantly improving classification and localization performance. These results highlight the potential of our approach for early ASD diagnosis and targeted interventions. The datasets utilized for this work are publicly available at https://fcon_1000.projects.nitrc.org/indi/abide/.
Page 7 of 14134 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.