Sort by:
Page 6 of 14134 results

A fully open AI foundation model applied to chest radiography.

Ma D, Pang J, Gotway MB, Liang J

pubmed logopapersJun 11 2025
Chest radiography frequently serves as baseline imaging for most lung diseases<sup>1</sup>. Deep learning has great potential for automating the interpretation of chest radiography<sup>2</sup>. However, existing chest radiographic deep learning models are limited in diagnostic scope, generalizability, adaptability, robustness and extensibility. To overcome these limitations, we have developed Ark<sup>+</sup>, a foundation model applied to chest radiography and pretrained by cyclically accruing and reusing the knowledge from heterogeneous expert labels in numerous datasets. Ark<sup>+</sup> excels in diagnosing thoracic diseases. It expands the diagnostic scope and addresses potential misdiagnosis. It can adapt to evolving diagnostic needs and respond to novel diseases. It can learn rare conditions from a few samples and transfer to new diagnostic settings without training. It tolerates data biases and long-tailed distributions, and it supports federated learning to preserve privacy. All codes and pretrained models have been released, so that Ark<sup>+</sup> is open for fine-tuning, local adaptation and improvement. It is extensible to several modalities. Thus, it is a foundation model for medical imaging. The exceptional capabilities of Ark<sup>+</sup> stem from our insight: aggregating various datasets diversifies the patient populations and accrues knowledge from many experts to yield unprecedented performance while reducing annotation costs<sup>3</sup>. The development of Ark<sup>+</sup> reveals that open models trained by accruing and reusing knowledge from heterogeneous expert annotations with a multitude of public (big or small) datasets can surpass the performance of proprietary models trained on large data. We hope that our findings will inspire more researchers to share code and datasets or federate privacy-preserving data to create open foundation models with diverse, global expertise and patient populations, thus accelerating open science and democratizing AI for medicine.

A Privacy-Preserving Federated Learning Framework for Generalizable CBCT to Synthetic CT Translation in Head and Neck

Ciro Benito Raggio, Paolo Zaffino, Maria Francesca Spadea

arxiv logopreprintJun 10 2025
Shortened Abstract Cone-beam computed tomography (CBCT) has become a widely adopted modality for image-guided radiotherapy (IGRT). However, CBCT suffers from increased noise, limited soft-tissue contrast, and artifacts, resulting in unreliable Hounsfield unit values and hindering direct dose calculation. Synthetic CT (sCT) generation from CBCT addresses these issues, especially using deep learning (DL) methods. Existing approaches are limited by institutional heterogeneity, scanner-dependent variations, and data privacy regulations that prevent multi-center data sharing. To overcome these challenges, we propose a cross-silo horizontal federated learning (FL) approach for CBCT-to-sCT synthesis in the head and neck region, extending our FedSynthCT framework. A conditional generative adversarial network was collaboratively trained on data from three European medical centers in the public SynthRAD2025 challenge dataset. The federated model demonstrated effective generalization across centers, with mean absolute error (MAE) ranging from $64.38\pm13.63$ to $85.90\pm7.10$ HU, structural similarity index (SSIM) from $0.882\pm0.022$ to $0.922\pm0.039$, and peak signal-to-noise ratio (PSNR) from $32.86\pm0.94$ to $34.91\pm1.04$ dB. Notably, on an external validation dataset of 60 patients, comparable performance was achieved (MAE: $75.22\pm11.81$ HU, SSIM: $0.904\pm0.034$, PSNR: $33.52\pm2.06$ dB) without additional training, confirming robust generalization despite protocol, scanner differences and registration errors. These findings demonstrate the technical feasibility of FL for CBCT-to-sCT synthesis while preserving data privacy and offer a collaborative solution for developing generalizable models across institutions without centralized data sharing or site-specific fine-tuning.

The RSNA Lumbar Degenerative Imaging Spine Classification (LumbarDISC) Dataset

Tyler J. Richards, Adam E. Flanders, Errol Colak, Luciano M. Prevedello, Robyn L. Ball, Felipe Kitamura, John Mongan, Maryam Vazirabad, Hui-Ming Lin, Anne Kendell, Thanat Kanthawang, Salita Angkurawaranon, Emre Altinmakas, Hakan Dogan, Paulo Eduardo de Aguiar Kuriki, Arjuna Somasundaram, Christopher Ruston, Deniz Bulja, Naida Spahovic, Jennifer Sommer, Sirui Jiang, Eduardo Moreno Judice de Mattos Farina, Eduardo Caminha Nunes, Michael Brassil, Megan McNamara, Johanna Ortiz, Jacob Peoples, Vinson L. Uytana, Anthony Kam, Venkata N. S. Dola, Daniel Murphy, David Vu, Dataset Contributor Group, Dataset Annotator Group, Competition Data Notebook Group, Jason F. Talbott

arxiv logopreprintJun 10 2025
The Radiological Society of North America (RSNA) Lumbar Degenerative Imaging Spine Classification (LumbarDISC) dataset is the largest publicly available dataset of adult MRI lumbar spine examinations annotated for degenerative changes. The dataset includes 2,697 patients with a total of 8,593 image series from 8 institutions across 6 countries and 5 continents. The dataset is available for free for non-commercial use via Kaggle and RSNA Medical Imaging Resource of AI (MIRA). The dataset was created for the RSNA 2024 Lumbar Spine Degenerative Classification competition where competitors developed deep learning models to grade degenerative changes in the lumbar spine. The degree of spinal canal, subarticular recess, and neural foraminal stenosis was graded at each intervertebral disc level in the lumbar spine. The images were annotated by expert volunteer neuroradiologists and musculoskeletal radiologists from the RSNA, American Society of Neuroradiology, and the American Society of Spine Radiology. This dataset aims to facilitate research and development in machine learning and lumbar spine imaging to lead to improved patient care and clinical efficiency.

HAIBU-ReMUD: Reasoning Multimodal Ultrasound Dataset and Model Bridging to General Specific Domains

Shijie Wang, Yilun Zhang, Zeyu Lai, Dexing Kong

arxiv logopreprintJun 9 2025
Multimodal large language models (MLLMs) have shown great potential in general domains but perform poorly in some specific domains due to a lack of domain-specific data, such as image-text data or vedio-text data. In some specific domains, there is abundant graphic and textual data scattered around, but lacks standardized arrangement. In the field of medical ultrasound, there are ultrasonic diagnostic books, ultrasonic clinical guidelines, ultrasonic diagnostic reports, and so on. However, these ultrasonic materials are often saved in the forms of PDF, images, etc., and cannot be directly used for the training of MLLMs. This paper proposes a novel image-text reasoning supervised fine-tuning data generation pipeline to create specific domain quadruplets (image, question, thinking trace, and answer) from domain-specific materials. A medical ultrasound domain dataset ReMUD is established, containing over 45,000 reasoning and non-reasoning supervised fine-tuning Question Answering (QA) and Visual Question Answering (VQA) data. The ReMUD-7B model, fine-tuned on Qwen2.5-VL-7B-Instruct, outperforms general-domain MLLMs in medical ultrasound field. To facilitate research, the ReMUD dataset, data generation codebase, and ReMUD-7B parameters will be released at https://github.com/ShiDaizi/ReMUD, addressing the data shortage issue in specific domain MLLMs.

Advancing respiratory disease diagnosis: A deep learning and vision transformer-based approach with a novel X-ray dataset.

Alghadhban A, Ramadan RA, Alazmi M

pubmed logopapersJun 9 2025
With the increasing prevalence of respiratory diseases such as pneumonia and COVID-19, timely and accurate diagnosis is critical. This paper makes significant contributions to the field of respiratory disease classification by utilizing X-ray images and advanced machine learning techniques such as deep learning (DL) and Vision Transformers (ViT). First, the paper systematically reviews the current diagnostic methodologies, analyzing the recent advancement in DL and ViT techniques through a comprehensive analysis of the review articles published between 2017 and 2024, excluding short reviews and overviews. The review not only analyses the existing knowledge but also identifies the critical gaps in the field as well as the lack of diversity of the comprehensive and diverse datasets for training the machine learning models. To address such limitations, the paper extensively evaluates DL-based models on publicly available datasets, analyzing key performance metrics such as accuracy, precision, recall, and F1-score. Our evaluations reveal that the current datasets are mostly limited to the narrow subsets of pulmonary diseases, which might lead to some challenges, including overfitting, poor generalization, and reduced possibility of using advanced machine learning techniques in real-world applications. For instance, DL and ViT models require extensive data for effective learning. The primary contribution of this paper is not only the review of the most recent articles and surveys of respiratory diseases and DL models, including ViT, but also introduces a novel, diverse dataset comprising 7867 X-ray images from 5263 patients across three local hospitals, covering 49 distinct pulmonary diseases. The dataset is expected to enhance DL and ViT model training and improve the generalization of those models in various real-world medical image scenarios. By addressing the data scarcity issue, this paper paves the for more reliable and robust disease classification, improving clinical decision-making. Additionally, the article highlights the critical challenges that still need to be addressed, such as dataset bias and variations of X-ray image quality, as well as the need for further clinical validation. Furthermore, the study underscores the critical role of DL in medical diagnosis and highlights the necessity of comprehensive, well-annotated datasets to improve model robustness and clinical reliability. Through these contributions, the paper provides the basis and foundation of future research on respiratory disease diagnosis using AI-driven methodologies. Although the paper tries to cover all the work done between 2017 and 2024, this research might have some limitations of this research, including the review period before 2017 might have foundational work. At the same time, the rapid development of AI might make the earlier methods less relevant.

APTOS-2024 challenge report: Generation of synthetic 3D OCT images from fundus photographs

Bowen Liu, Weiyi Zhang, Peranut Chotcomwongse, Xiaolan Chen, Ruoyu Chen, Pawin Pakaymaskul, Niracha Arjkongharn, Nattaporn Vongsa, Xuelian Cheng, Zongyuan Ge, Kun Huang, Xiaohui Li, Yiru Duan, Zhenbang Wang, BaoYe Xie, Qiang Chen, Huazhu Fu, Michael A. Mahr, Jiaqi Qu, Wangyiyang Chen, Shiye Wang, Yubo Tan, Yongjie Li, Mingguang He, Danli Shi, Paisan Ruamviboonsuk

arxiv logopreprintJun 9 2025
Optical Coherence Tomography (OCT) provides high-resolution, 3D, and non-invasive visualization of retinal layers in vivo, serving as a critical tool for lesion localization and disease diagnosis. However, its widespread adoption is limited by equipment costs and the need for specialized operators. In comparison, 2D color fundus photography offers faster acquisition and greater accessibility with less dependence on expensive devices. Although generative artificial intelligence has demonstrated promising results in medical image synthesis, translating 2D fundus images into 3D OCT images presents unique challenges due to inherent differences in data dimensionality and biological information between modalities. To advance generative models in the fundus-to-3D-OCT setting, the Asia Pacific Tele-Ophthalmology Society (APTOS-2024) organized a challenge titled Artificial Intelligence-based OCT Generation from Fundus Images. This paper details the challenge framework (referred to as APTOS-2024 Challenge), including: the benchmark dataset, evaluation methodology featuring two fidelity metrics-image-based distance (pixel-level OCT B-scan similarity) and video-based distance (semantic-level volumetric consistency), and analysis of top-performing solutions. The challenge attracted 342 participating teams, with 42 preliminary submissions and 9 finalists. Leading methodologies incorporated innovations in hybrid data preprocessing or augmentation (cross-modality collaborative paradigms), pre-training on external ophthalmic imaging datasets, integration of vision foundation models, and model architecture improvement. The APTOS-2024 Challenge is the first benchmark demonstrating the feasibility of fundus-to-3D-OCT synthesis as a potential solution for improving ophthalmic care accessibility in under-resourced healthcare settings, while helping to expedite medical research and clinical applications.

Reliable Evaluation of MRI Motion Correction: Dataset and Insights

Kun Wang, Tobit Klug, Stefan Ruschke, Jan S. Kirschke, Reinhard Heckel

arxiv logopreprintJun 6 2025
Correcting motion artifacts in MRI is important, as they can hinder accurate diagnosis. However, evaluating deep learning-based and classical motion correction methods remains fundamentally difficult due to the lack of accessible ground-truth target data. To address this challenge, we study three evaluation approaches: real-world evaluation based on reference scans, simulated motion, and reference-free evaluation, each with its merits and shortcomings. To enable evaluation with real-world motion artifacts, we release PMoC3D, a dataset consisting of unprocessed Paired Motion-Corrupted 3D brain MRI data. To advance evaluation quality, we introduce MoMRISim, a feature-space metric trained for evaluating motion reconstructions. We assess each evaluation approach and find real-world evaluation together with MoMRISim, while not perfect, to be most reliable. Evaluation based on simulated motion systematically exaggerates algorithm performance, and reference-free evaluation overrates oversmoothed deep learning outputs.

Deep learning-enabled MRI phenotyping uncovers regional body composition heterogeneity and disease associations in two European population cohorts

Mertens, C. J., Haentze, H., Ziegelmayer, S., Kather, J. N., Truhn, D., Kim, S. H., Busch, F., Weller, D., Wiestler, B., Graf, M., Bamberg, F., Schlett, C. L., Weiss, J. B., Ringhof, S., Can, E., Schulz-Menger, J., Niendorf, T., Lammert, J., Molwitz, I., Kader, A., Hering, A., Meddeb, A., Nawabi, J., Schulze, M. B., Keil, T., Willich, S. N., Krist, L., Hadamitzky, M., Hannemann, A., Bassermann, F., Rueckert, D., Pischon, T., Hapfelmeier, A., Makowski, M. R., Bressem, K. K., Adams, L. C.

medrxiv logopreprintJun 6 2025
Body mass index (BMI) does not account for substantial inter-individual differences in regional fat and muscle compartments, which are relevant for the prevalence of cardiometabolic and cancer conditions. We applied a validated deep learning pipeline for automated segmentation of whole-body MRI scans in 45,851 adults from the UK Biobank and German National Cohort, enabling harmonized quantification of visceral (VAT), gluteofemoral (GFAT), and abdominal subcutaneous adipose tissue (ASAT), liver fat fraction (LFF), and trunk muscle volume. Associations with clinical conditions were evaluated using compartment measures adjusted for age, sex, height, and BMI. Our analysis demonstrates that regional adiposity and muscle volume show distinct associations with cardiometabolic and cancer prevalence, and that substantial disease heterogeneity exists within BMI strata. The analytic framework and reference data presented here will support future risk stratification efforts and facilitate the integration of automated MRI phenotyping into large-scale population and clinical research.

Prenatal detection of congenital heart defects using the deep learning-based image and video analysis: protocol for Clinical Artificial Intelligence in Fetal Echocardiography (CAIFE), an international multicentre multidisciplinary study.

Patey O, Hernandez-Cruz N, D'Alberti E, Salovic B, Noble JA, Papageorghiou AT

pubmed logopapersJun 5 2025
Congenital heart defect (CHD) is a significant, rapidly emerging global problem in child health and a leading cause of neonatal and childhood death. Prenatal detection of CHDs with the help of ultrasound allows better perinatal management of such pregnancies, leading to reduced neonatal mortality, morbidity and developmental complications. However, there is a wide variation in reported fetal heart problem detection rates from 34% to 85%, with some low- and middle-income countries detecting as low as 9.3% of cases before birth. Research has shown that deep learning-based or more general artificial intelligence (AI) models can support the detection of fetal CHDs more rapidly than humans performing ultrasound scan. Progress in this AI-based research depends on the availability of large, well-curated and diverse data of ultrasound images and videos of normal and abnormal fetal hearts. Currently, CHD detection based on AI models is not accurate enough for practical clinical use, in part due to the lack of ultrasound data available for machine learning as CHDs are rare and heterogeneous, the retrospective nature of published studies, the lack of multicentre and multidisciplinary collaboration, and utilisation of mostly standard planes still images of the fetal heart for AI models. Our aim is to develop AI models that could support clinicians in detecting fetal CHDs in real time, particularly in nonspecialist or low-resource settings where fetal echocardiography expertise is not readily available. We have designed the Clinical Artificial Intelligence Fetal Echocardiography (CAIFE) study as an international multicentre multidisciplinary collaboration led by a clinical and an engineering team at the University of Oxford. This study involves five multicountry hospital sites for data collection (Oxford, UK (n=1), London, UK (n=3) and Southport, Australia (n=1)). We plan to curate 14 000 retrospective ultrasound scans of fetuses with normal hearts (n=13 000) and fetuses with CHDs (n=1000), as well as 2400 prospective ultrasound cardiac scans, including the proposed research-specific CAIFE 10 s video sweeps, from fetuses with normal hearts (n=2000) and fetuses diagnosed with major CHDs (n=400). This gives a total of 16 400 retrospective and prospective ultrasound scans from the participating hospital sites. We will build, train and validate computational models capable of differentiating between normal fetal hearts and those diagnosed with CHDs and recognise specific types of CHDs. Data will be analysed using statistical metrics, namely, sensitivity, specificity and accuracy, which include calculating positive and negative predictive values for each outcome, compared with manual assessment. We will disseminate the findings through regional, national and international conferences and through peer-reviewed journals. The study was approved by the Health Research Authority, Care Research Wales and the Research Ethics Committee (Ref: 23/EM/0023; IRAS Project ID: 317510) on 8 March 2023. All collaborating hospitals have obtained the local trust research and development approvals.

ReXVQA: A Large-scale Visual Question Answering Benchmark for Generalist Chest X-ray Understanding

Ankit Pal, Jung-Oh Lee, Xiaoman Zhang, Malaikannan Sankarasubbu, Seunghyeon Roh, Won Jung Kim, Meesun Lee, Pranav Rajpurkar

arxiv logopreprintJun 4 2025
We present ReXVQA, the largest and most comprehensive benchmark for visual question answering (VQA) in chest radiology, comprising approximately 696,000 questions paired with 160,000 chest X-rays studies across training, validation, and test sets. Unlike prior efforts that rely heavily on template based queries, ReXVQA introduces a diverse and clinically authentic task suite reflecting five core radiological reasoning skills: presence assessment, location analysis, negation detection, differential diagnosis, and geometric reasoning. We evaluate eight state-of-the-art multimodal large language models, including MedGemma-4B-it, Qwen2.5-VL, Janus-Pro-7B, and Eagle2-9B. The best-performing model (MedGemma) achieves 83.24% overall accuracy. To bridge the gap between AI performance and clinical expertise, we conducted a comprehensive human reader study involving 3 radiology residents on 200 randomly sampled cases. Our evaluation demonstrates that MedGemma achieved superior performance (83.84% accuracy) compared to human readers (best radiology resident: 77.27%), representing a significant milestone where AI performance exceeds expert human evaluation on chest X-ray interpretation. The reader study reveals distinct performance patterns between AI models and human experts, with strong inter-reader agreement among radiologists while showing more variable agreement patterns between human readers and AI models. ReXVQA establishes a new standard for evaluating generalist radiological AI systems, offering public leaderboards, fine-grained evaluation splits, structured explanations, and category-level breakdowns. This benchmark lays the foundation for next-generation AI systems capable of mimicking expert-level clinical reasoning beyond narrow pathology classification. Our dataset will be open-sourced at https://huggingface.co/datasets/rajpurkarlab/ReXVQA
Page 6 of 14134 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.