Sort by:
Page 31 of 55548 results

Open-PMC-18M: A High-Fidelity Large Scale Medical Dataset for Multimodal Representation Learning

Negin Baghbanzadeh, Sajad Ashkezari, Elham Dolatabadi, Arash Afkanpour

arxiv logopreprintJun 3 2025
Compound figures, which are multi-panel composites containing diverse subfigures, are ubiquitous in biomedical literature, yet large-scale subfigure extraction remains largely unaddressed. Prior work on subfigure extraction has been limited in both dataset size and generalizability, leaving a critical open question: How does high-fidelity image-text alignment via large-scale subfigure extraction impact representation learning in vision-language models? We address this gap by introducing a scalable subfigure extraction pipeline based on transformer-based object detection, trained on a synthetic corpus of 500,000 compound figures, and achieving state-of-the-art performance on both ImageCLEF 2016 and synthetic benchmarks. Using this pipeline, we release OPEN-PMC-18M, a large-scale high quality biomedical vision-language dataset comprising 18 million clinically relevant subfigure-caption pairs spanning radiology, microscopy, and visible light photography. We train and evaluate vision-language models on our curated datasets and show improved performance across retrieval, zero-shot classification, and robustness benchmarks, outperforming existing baselines. We release our dataset, models, and code to support reproducible benchmarks and further study into biomedical vision-language modeling and representation learning.

petBrain: A New Pipeline for Amyloid, Tau Tangles and Neurodegeneration Quantification Using PET and MRI

Pierrick Coupé, Boris Mansencal, Floréal Morandat, Sergio Morell-Ortega, Nicolas Villain, Jose V. Manjón, Vincent Planche

arxiv logopreprintJun 3 2025
INTRODUCTION: Quantification of amyloid plaques (A), neurofibrillary tangles (T2), and neurodegeneration (N) using PET and MRI is critical for Alzheimer's disease (AD) diagnosis and prognosis. Existing pipelines face limitations regarding processing time, variability in tracer types, and challenges in multimodal integration. METHODS: We developed petBrain, a novel end-to-end processing pipeline for amyloid-PET, tau-PET, and structural MRI. It leverages deep learning-based segmentation, standardized biomarker quantification (Centiloid, CenTauR, HAVAs), and simultaneous estimation of A, T2, and N biomarkers. The pipeline is implemented as a web-based platform, requiring no local computational infrastructure or specialized software knowledge. RESULTS: petBrain provides reliable and rapid biomarker quantification, with results comparable to existing pipelines for A and T2. It shows strong concordance with data processed in ADNI databases. The staging and quantification of A/T2/N by petBrain demonstrated good agreement with CSF/plasma biomarkers, clinical status, and cognitive performance. DISCUSSION: petBrain represents a powerful and openly accessible platform for standardized AD biomarker analysis, facilitating applications in clinical research.

Deep Learning-Based Opportunistic CT Osteoporosis Screening and Establishment of Normative Values

Westerhoff, M., Gyftopoulos, S., Dane, B., Vega, E., Murdock, D., Lindow, N., Herter, F., Bousabarah, K., Recht, M. P., Bredella, M. A.

medrxiv logopreprintJun 3 2025
BackgroundOsteoporosis is underdiagnosed and undertreated prompting the exploration of opportunistic screening using CT and artificial intelligence (AI). PurposeTo develop a reproducible deep learning-based convolutional neural network to automatically place a 3D region of interest (ROI) in trabecular bone, develop a correction method to normalize attenuation across different CT protocols or and scanner models, and to establish thresholds for osteoporosis in a large diverse population. MethodsA deep learning-based method was developed to automatically quantify trabecular attenuation using a 3D ROI of the thoracic and lumbar spine on chest, abdomen, or spine CTs, adjusted for different tube voltages and scanner models. Normative values, thresholds for osteoporosis of trabecular attenuation of the spine were established across a diverse population, stratified by age, sex, race, and ethnicity using reported prevalence of osteoporosis by the WHO. Results538,946 CT examinations from 283,499 patients (mean age 65 years{+/-}15, 51.2% women and 55.5% White), performed on 50 scanner models using six different tube voltages were analyzed. Hounsfield Units at 80 kVp versus 120 kVp differed by 23%, and different scanner models resulted in differences of values by < 10%. Automated ROI placement of 1496 vertebra was validated by manual radiologist review, demonstrating >99% agreement. Mean trabecular attenuation was higher in young women (<50 years) than young men (p<.001) and decreased with age, with a steeper decline in postmenopausal women. In patients older than 50 years, trabecular attention was higher in males than females (p<.001). Trabecular attenuation was highest in Blacks, followed by Asians and lowest in Whites (p<.001). The threshold for L1 in diagnosing osteoporosis was 80 HU. ConclusionDeep learning-based automated opportunistic osteoporosis screening can identify patients with low bone mineral density that undergo CT scans for clinical purposes on different scanners and protocols. Key Results 3 main results/conclusionsO_LIIn a study of 538,946 CT examinations performed in 283,499 patients using different scanner models and imaging protocols, an automated deep learning-based convolutional neural network was able to accurately place a three-dimensional regions of interest within thoracic and lumbar vertebra to measure trabecular attenuation. C_LIO_LITube voltage had a larger influence on attenuation values (23%) than scanner model (<10%). C_LIO_LIA threshold of 80 HU was identified for L1 to diagnose osteoporosis using an automated three-dimensional region of interest. C_LI

Inferring single-cell spatial gene expression with tissue morphology via explainable deep learning

Zhao, Y., Alizadeh, E., Taha, H. B., Liu, Y., Xu, M., Mahoney, J. M., Li, S.

biorxiv logopreprintJun 2 2025
Deep learning models trained with spatial omics data uncover complex patterns and relationships among cells, genes, and proteins in a high-dimensional space. State-of-the-art in silico spatial multi-cell gene expression methods using histological images of tissue stained with hematoxylin and eosin (H&E) allow us to characterize cellular heterogeneity. We developed a vision transformer (ViT) framework to map histological signatures to spatial single-cell transcriptomic signatures, named SPiRiT. SPiRiT predicts single-cell spatial gene expression using the matched H&E image tiles of human breast cancer and whole mouse pup, evaluated by Xenium (10x Genomics) datasets. Importantly, SPiRiT incorporates rigorous strategies to ensure reproducibility and robustness of predictions and provides trustworthy interpretation through attention-based model explainability. SPiRiT model interpretation revealed the areas, and attention details it uses to predict gene expressions like marker genes in invasive cancer cells. In an apple-to-apple comparison with ST-Net, SPiRiT improved the predictive accuracy by 40%. These gene predictions and expression levels were highly consistent with the tumor region annotation. In summary, SPiRiT highlights the feasibility to infer spatial single-cell gene expression using tissue morphology in multiple-species.

Tomographic Foundation Model -- FORCE: Flow-Oriented Reconstruction Conditioning Engine

Wenjun Xia, Chuang Niu, Ge Wang

arxiv logopreprintJun 2 2025
Computed tomography (CT) is a major medical imaging modality. Clinical CT scenarios, such as low-dose screening, sparse-view scanning, and metal implants, often lead to severe noise and artifacts in reconstructed images, requiring improved reconstruction techniques. The introduction of deep learning has significantly advanced CT image reconstruction. However, obtaining paired training data remains rather challenging due to patient motion and other constraints. Although deep learning methods can still perform well with approximately paired data, they inherently carry the risk of hallucination due to data inconsistencies and model instability. In this paper, we integrate the data fidelity with the state-of-the-art generative AI model, referred to as the Poisson flow generative model (PFGM) with a generalized version PFGM++, and propose a novel CT framework: Flow-Oriented Reconstruction Conditioning Engine (FORCE). In our experiments, the proposed method shows superior performance in various CT imaging tasks, outperforming existing unsupervised reconstruction approaches.

Beyond Pixel Agreement: Large Language Models as Clinical Guardrails for Reliable Medical Image Segmentation

Jiaxi Sheng, Leyi Yu, Haoyue Li, Yifan Gao, Xin Gao

arxiv logopreprintJun 2 2025
Evaluating AI-generated medical image segmentations for clinical acceptability poses a significant challenge, as traditional pixelagreement metrics often fail to capture true diagnostic utility. This paper introduces Hierarchical Clinical Reasoner (HCR), a novel framework that leverages Large Language Models (LLMs) as clinical guardrails for reliable, zero-shot quality assessment. HCR employs a structured, multistage prompting strategy that guides LLMs through a detailed reasoning process, encompassing knowledge recall, visual feature analysis, anatomical inference, and clinical synthesis, to evaluate segmentations. We evaluated HCR on a diverse dataset across six medical imaging tasks. Our results show that HCR, utilizing models like Gemini 2.5 Flash, achieved a classification accuracy of 78.12%, performing comparably to, and in instances exceeding, dedicated vision models such as ResNet50 (72.92% accuracy) that were specifically trained for this task. The HCR framework not only provides accurate quality classifications but also generates interpretable, step-by-step reasoning for its assessments. This work demonstrates the potential of LLMs, when appropriately guided, to serve as sophisticated evaluators, offering a pathway towards more trustworthy and clinically-aligned quality control for AI in medical imaging.

Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning

Yijun Yang, Zhao-Yang Wang, Qiuping Liu, Shuwen Sun, Kang Wang, Rama Chellappa, Zongwei Zhou, Alan Yuille, Lei Zhu, Yu-Dong Zhang, Jieneng Chen

arxiv logopreprintJun 2 2025
Providing effective treatment and making informed clinical decisions are essential goals of modern medicine and clinical care. We are interested in simulating disease dynamics for clinical decision-making, leveraging recent advances in large generative models. To this end, we introduce the Medical World Model (MeWM), the first world model in medicine that visually predicts future disease states based on clinical decisions. MeWM comprises (i) vision-language models to serve as policy models, and (ii) tumor generative models as dynamics models. The policy model generates action plans, such as clinical treatments, while the dynamics model simulates tumor progression or regression under given treatment conditions. Building on this, we propose the inverse dynamics model that applies survival analysis to the simulated post-treatment tumor, enabling the evaluation of treatment efficacy and the selection of the optimal clinical action plan. As a result, the proposed MeWM simulates disease dynamics by synthesizing post-treatment tumors, with state-of-the-art specificity in Turing tests evaluated by radiologists. Simultaneously, its inverse dynamics model outperforms medical-specialized GPTs in optimizing individualized treatment protocols across all metrics. Notably, MeWM improves clinical decision-making for interventional physicians, boosting F1-score in selecting the optimal TACE protocol by 13%, paving the way for future integration of medical world models as the second readers.

Artificial Intelligence-Driven Innovations in Diabetes Care and Monitoring

Abdul Rahman, S., Mahadi, M., Yuliana, D., Budi Susilo, Y. K., Ariffin, A. E., Amgain, K.

medrxiv logopreprintJun 2 2025
This study explores Artificial Intelligence (AI)s transformative role in diabetes care and monitoring, focusing on innovations that optimize patient outcomes. AI, particularly machine learning and deep learning, significantly enhances early detection of complications like diabetic retinopathy and improves screening efficacy. The methodology employs a bibliometric analysis using Scopus, VOSviewer, and Publish or Perish, analyzing 235 articles from 2023-2025. Results indicate a strong interdisciplinary focus, with Computer Science and Medicine being dominant subject areas (36.9% and 12.9% respectively). Bibliographic coupling reveals robust international collaborations led by the U.S. (1558.52 link strength), UK, and China, with key influential documents by Zhu (2023c) and Annuzzi (2023). This research highlights AIs impact on enhancing monitoring, personalized treatment, and proactive care, while acknowledging challenges in data privacy and ethical deployment. Future work should bridge technological advancements with real-world implementation to create equitable and efficient diabetes care systems.

Evaluating the performance and potential bias of predictive models for the detection of transthyretin cardiac amyloidosis

Hourmozdi, J., Easton, N., Benigeri, S., Thomas, J. D., Narang, A., Ouyang, D., Duffy, G., Upton, R., Hawkes, W., Akerman, A., Okwuosa, I., Kline, A., Kho, A. N., Luo, Y., Shah, S. J., Ahmad, F. S.

medrxiv logopreprintJun 2 2025
BackgroundDelays in the diagnosis of transthyretin amyloid cardiomyopathy (ATTR-CM) contribute to the significant morbidity of the condition, especially in the era of disease-modifying therapies. Screening for ATTR-CM with AI and other algorithms may improve timely diagnosis, but these algorithms have not been directly compared. ObjectivesThe aim of this study was to compare the performance of four algorithms for ATTR-CM detection in a heart failure population and assess the risk for harms due to model bias. MethodsWe identified patients in an integrated health system from 2010-2022 with ATTR-CM and age- and sex-matched them to controls with heart failure to target 5% prevalence. We compared the performance of a claims-based random forest model (Huda et al. model), a regression-based score (Mayo ATTR-CM), and two deep learning echo models (EchoNet-LVH and EchoGo(R) Amyloidosis). We evaluated for bias using standard fairness metrics. ResultsThe analytical cohort included 176 confirmed cases of ATTR-CM and 3192 control patients with 79.2% self-identified as White and 9.0% as Black. The Huda et al. model performed poorly (AUC 0.49). Both deep learning echo models had a higher AUC when compared to the Mayo ATTR-CM Score (EchoNet-LVH 0.88; EchoGo Amyloidosis 0.92; Mayo ATTR-CM Score 0.79; DeLong P<0.001 for both). Bias auditing met fairness criteria for equal opportunity among patients who identified as Black. ConclusionsDeep learning, echo-based models to detect ATTR-CM demonstrated best overall discrimination when compared to two other models in external validation with low risk of harms due to racial bias.

Synthetic Ultrasound Image Generation for Breast Cancer Diagnosis Using cVAE-WGAN Models: An Approach Based on Generative Artificial Intelligence

Mondillo, G., Masino, M., Colosimo, S., Perrotta, A., Frattolillo, V., Abbate, F. G.

medrxiv logopreprintJun 2 2025
The scarcity and imbalance of medical image datasets hinder the development of robust computer-aided diagnosis (CAD) systems for breast cancer. This study explores the application of advanced generative models, based on generative artificial intelligence (GenAI), for the synthesis of digital breast ultrasound images. Using a hybrid Conditional Variational Autoencoder-Wasserstein Generative Adversarial Network (CVAE-WGAN) architecture, we developed a system to generate high-quality synthetic images conditioned on the class (malignant vs. normal/benign). These synthetic images, generated from the low-resolution BreastMNIST dataset and filtered for quality, were systematically integrated with real training data at different mixing ratios (W). The performance of a CNN classifier trained on these mixed datasets was evaluated against a baseline model trained only on real data balanced with SMOTE. The optimal integration (mixing weight W=0.25) produced a significant performance increase on the real test set: +8.17% in macro-average F1-score and +4.58% in accuracy compared to using real data alone. Analysis confirmed the originality of the generated samples. This approach offers a promising solution for overcoming data limitations in image-based breast cancer diagnostics, potentially improving the capabilities of CAD systems.
Page 31 of 55548 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.