Sort by:
Page 11 of 22215 results

Noise-induced self-supervised hybrid UNet transformer for ischemic stroke segmentation with limited data annotations.

Soh WK, Rajapakse JC

pubmed logopapersJun 5 2025
We extend the Hybrid Unet Transformer (HUT) foundation model, which combines the advantages of the CNN and Transformer architectures with a noisy self-supervised approach, and demonstrate it in an ischemic stroke lesion segmentation task. We introduce a self-supervised approach using a noise anchor and show that it can perform better than a supervised approach under a limited amount of annotated data. We supplement our pre-training process with an additional unannotated CT perfusion dataset to validate our approach. Compared to the supervised version, the noisy self-supervised HUT (HUT-NSS) outperforms its counterpart by a margin of 2.4% in terms of dice score. HUT-NSS, on average, gained a further margin of 7.2% dice score and 28.1% Hausdorff Distance score over the state-of-the-art network USSLNet on the CT perfusion scans of the Ischemic Stroke Lesion Segmentation (ISLES2018) dataset. In limited annotated data sets, we show that HUT-NSS gained 7.87% of the dice score over USSLNet when we used 50% of the annotated data sets for training. HUT-NSS gained 7.47% of the dice score over USSLNet when we used 10% of the annotated datasets, and HUT-NSS gained 5.34% of the dice score over USSLNet when we used 1% of the annotated datasets for training. The code is available at https://github.com/vicsohntu/HUTNSS_CT .

Average Calibration Losses for Reliable Uncertainty in Medical Image Segmentation

Theodore Barfoot, Luis C. Garcia-Peraza-Herrera, Samet Akcay, Ben Glocker, Tom Vercauteren

arxiv logopreprintJun 4 2025
Deep neural networks for medical image segmentation are often overconfident, compromising both reliability and clinical utility. In this work, we propose differentiable formulations of marginal L1 Average Calibration Error (mL1-ACE) as an auxiliary loss that can be computed on a per-image basis. We compare both hard- and soft-binning approaches to directly improve pixel-wise calibration. Our experiments on four datasets (ACDC, AMOS, KiTS, BraTS) demonstrate that incorporating mL1-ACE significantly reduces calibration errors, particularly Average Calibration Error (ACE) and Maximum Calibration Error (MCE), while largely maintaining high Dice Similarity Coefficients (DSCs). We find that the soft-binned variant yields the greatest improvements in calibration, over the Dice plus cross-entropy loss baseline, but often compromises segmentation performance, with hard-binned mL1-ACE maintaining segmentation performance, albeit with weaker calibration improvement. To gain further insight into calibration performance and its variability across an imaging dataset, we introduce dataset reliability histograms, an aggregation of per-image reliability diagrams. The resulting analysis highlights improved alignment between predicted confidences and true accuracies. Overall, our approach not only enhances the trustworthiness of segmentation predictions but also shows potential for safer integration of deep learning methods into clinical workflows. We share our code here: https://github.com/cai4cai/Average-Calibration-Losses

Guiding Registration with Emergent Similarity from Pre-Trained Diffusion Models

Nurislam Tursynbek, Hastings Greer, Basar Demir, Marc Niethammer

arxiv logopreprintJun 3 2025
Diffusion models, while trained for image generation, have emerged as powerful foundational feature extractors for downstream tasks. We find that off-the-shelf diffusion models, trained exclusively to generate natural RGB images, can identify semantically meaningful correspondences in medical images. Building on this observation, we propose to leverage diffusion model features as a similarity measure to guide deformable image registration networks. We show that common intensity-based similarity losses often fail in challenging scenarios, such as when certain anatomies are visible in one image but absent in another, leading to anatomically inaccurate alignments. In contrast, our method identifies true semantic correspondences, aligning meaningful structures while disregarding those not present across images. We demonstrate superior performance of our approach on two tasks: multimodal 2D registration (DXA to X-Ray) and monomodal 3D registration (brain-extracted to non-brain-extracted MRI). Code: https://github.com/uncbiag/dgir

petBrain: A New Pipeline for Amyloid, Tau Tangles and Neurodegeneration Quantification Using PET and MRI

Pierrick Coupé, Boris Mansencal, Floréal Morandat, Sergio Morell-Ortega, Nicolas Villain, Jose V. Manjón, Vincent Planche

arxiv logopreprintJun 3 2025
INTRODUCTION: Quantification of amyloid plaques (A), neurofibrillary tangles (T2), and neurodegeneration (N) using PET and MRI is critical for Alzheimer's disease (AD) diagnosis and prognosis. Existing pipelines face limitations regarding processing time, variability in tracer types, and challenges in multimodal integration. METHODS: We developed petBrain, a novel end-to-end processing pipeline for amyloid-PET, tau-PET, and structural MRI. It leverages deep learning-based segmentation, standardized biomarker quantification (Centiloid, CenTauR, HAVAs), and simultaneous estimation of A, T2, and N biomarkers. The pipeline is implemented as a web-based platform, requiring no local computational infrastructure or specialized software knowledge. RESULTS: petBrain provides reliable and rapid biomarker quantification, with results comparable to existing pipelines for A and T2. It shows strong concordance with data processed in ADNI databases. The staging and quantification of A/T2/N by petBrain demonstrated good agreement with CSF/plasma biomarkers, clinical status, and cognitive performance. DISCUSSION: petBrain represents a powerful and openly accessible platform for standardized AD biomarker analysis, facilitating applications in clinical research.

Open-PMC-18M: A High-Fidelity Large Scale Medical Dataset for Multimodal Representation Learning

Negin Baghbanzadeh, Sajad Ashkezari, Elham Dolatabadi, Arash Afkanpour

arxiv logopreprintJun 3 2025
Compound figures, which are multi-panel composites containing diverse subfigures, are ubiquitous in biomedical literature, yet large-scale subfigure extraction remains largely unaddressed. Prior work on subfigure extraction has been limited in both dataset size and generalizability, leaving a critical open question: How does high-fidelity image-text alignment via large-scale subfigure extraction impact representation learning in vision-language models? We address this gap by introducing a scalable subfigure extraction pipeline based on transformer-based object detection, trained on a synthetic corpus of 500,000 compound figures, and achieving state-of-the-art performance on both ImageCLEF 2016 and synthetic benchmarks. Using this pipeline, we release OPEN-PMC-18M, a large-scale high quality biomedical vision-language dataset comprising 18 million clinically relevant subfigure-caption pairs spanning radiology, microscopy, and visible light photography. We train and evaluate vision-language models on our curated datasets and show improved performance across retrieval, zero-shot classification, and robustness benchmarks, outperforming existing baselines. We release our dataset, models, and code to support reproducible benchmarks and further study into biomedical vision-language modeling and representation learning.

Open-PMC-18M: A High-Fidelity Large Scale Medical Dataset for Multimodal Representation Learning

Negin Baghbanzadeh, Sajad Ashkezari, Elham Dolatabadi, Arash Afkanpour

arxiv logopreprintJun 3 2025
Compound figures, which are multi-panel composites containing diverse subfigures, are ubiquitous in biomedical literature, yet large-scale subfigure extraction remains largely unaddressed. Prior work on subfigure extraction has been limited in both dataset size and generalizability, leaving a critical open question: How does high-fidelity image-text alignment via large-scale subfigure extraction impact representation learning in vision-language models? We address this gap by introducing a scalable subfigure extraction pipeline based on transformer-based object detection, trained on a synthetic corpus of 500,000 compound figures, and achieving state-of-the-art performance on both ImageCLEF 2016 and synthetic benchmarks. Using this pipeline, we release OPEN-PMC-18M, a large-scale high quality biomedical vision-language dataset comprising 18 million clinically relevant subfigure-caption pairs spanning radiology, microscopy, and visible light photography. We train and evaluate vision-language models on our curated datasets and show improved performance across retrieval, zero-shot classification, and robustness benchmarks, outperforming existing baselines. We release our dataset, models, and code to support reproducible benchmarks and further study into biomedical vision-language modeling and representation learning.

PARADIM: A Platform to Support Research at the Interface of Data Science and Medical Imaging.

Lemaréchal Y, Couture G, Pelletier F, Lefol R, Asselin PL, Ouellet S, Bernard J, Ebrahimpour L, Manem VSK, Topalis J, Schachtner B, Jodogne S, Joubert P, Jeblick K, Ingrisch M, Després P

pubmed logopapersJun 3 2025
This paper describes PARADIM, a digital infrastructure designed to support research at the interface of data science and medical imaging, with a focus on Research Data Management best practices. The platform is built from open-source components and rooted in the FAIR principles through strict compliance with the DICOM standard. It addresses key needs in data curation, governance, privacy, and scalable resource management. Supporting every stage of the data science discovery cycle, the platform offers robust functionalities for user identity and access management, data de-identification, storage, annotation, as well as model training and evaluation. Rich metadata are generated all along the research lifecycle to ensure the traceability and reproducibility of results. PARADIM hosts several medical image collections and allows the automation of large-scale, computationally intensive pipelines (e.g., automatic segmentation, dose calculations, AI model evaluation). The platform fills a gap at the interface of data science and medical imaging, where digital infrastructures are key in the development, evaluation, and deployment of innovative solutions in the real world.

MRI super-resolution reconstruction using efficient diffusion probabilistic model with residual shifting.

Safari M, Wang S, Eidex Z, Li Q, Qiu RLJ, Middlebrooks EH, Yu DS, Yang X

pubmed logopapersJun 3 2025
Magnetic resonance imaging (MRI) is essential in clinical and research contexts, providing exceptional soft-tissue contrast. However, prolonged acquisition times often lead to patient discomfort and motion artifacts. Diffusion-based deep learning super-resolution (SR) techniques reconstruct high-resolution (HR) images from low-resolution (LR) pairs, but they involve extensive sampling steps, limiting real-time application. To overcome these issues, this study introduces a residual error-shifting mechanism markedly reducing sampling steps while maintaining vital anatomical details, thereby accelerating MRI reconstruction. We developed Res-SRDiff, a novel diffusion-based SR framework incorporating residual error shifting into the forward diffusion process. This integration aligns the degraded HR and LR distributions, enabling efficient HR image reconstruction. We evaluated Res-SRDiff using ultra-high-field brain T1 MP2RAGE maps and T2-weighted prostate images, benchmarking it against Bicubic, Pix2pix, CycleGAN, SPSR, I2SB, and TM-DDPM methods. Quantitative assessments employed peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), gradient magnitude similarity deviation (GMSD), and learned perceptual image patch similarity (LPIPS). Additionally, we qualitatively and quantitatively assessed the proposed framework's individual components through an ablation study and conducted a Likert-based image quality evaluation. Res-SRDiff significantly surpassed most comparison methods regarding PSNR, SSIM, and GMSD for both datasets, with statistically significant improvements (p-values≪0.05). The model achieved high-fidelity image reconstruction using only four sampling steps, drastically reducing computation time to under one second per slice. In contrast, traditional methods like TM-DDPM and I2SB required approximately 20 and 38 seconds per slice, respectively. Qualitative analysis showed Res-SRDiff effectively preserved fine anatomical details and lesion morphologies. The Likert study indicated that our method received the highest scores, 4.14±0.77(brain) and 4.80±0.40(prostate). Res-SRDiff demonstrates efficiency and accuracy, markedly improving computational speed and image quality. Incorporating residual error shifting into diffusion-based SR facilitates rapid, robust HR image reconstruction, enhancing clinical MRI workflow and advancing medical imaging research. Code available at https://github.com/mosaf/Res-SRDiff.

Disease-Grading Networks with Asymmetric Gaussian Distribution for Medical Imaging.

Tang W, Yang Z

pubmed logopapersJun 2 2025
Deep learning-based disease grading technologies facilitate timely medical intervention due to their high efficiency and accuracy. Recent advancements have enhanced grading performance by incorporating the ordinal relationships of disease labels. However, existing methods often assume same probability distributions for disease labels across instances within the same category, overlooking variations in label distributions. Additionally, the hyperparameters of these distributions are typically determined empirically, which may not accurately reflect the true distribution. To address these limitations, we propose a disease grading network utilizing a sample-aware asymmetric Gaussian label distribution, termed DGN-AGLD. This approach includes a variance predictor designed to learn and predict parameters that control the asymmetry of the Gaussian distribution, enabling distinct label distributions within the same category. This module can be seamlessly integrated into standard deep learning networks. Experimental results on four disease datasets validate the effectiveness and superiority of the proposed method, particularly on the IDRiD dataset, where it achieves a diabetic retinopathy accuracy of 77.67%. Furthermore, our method extends to joint disease grading tasks, yielding superior results and demonstrating significant generalization capabilities. Visual analysis indicates that our method more accurately captures the trend of disease progression by leveraging the asymmetry in label distribution. Our code is publicly available on https://github.com/ahtwq/AGNet.

Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models.

Lian C, Zhou HY, Liang D, Qin J, Wang L

pubmed logopapersJun 2 2025
Medical vision-language alignment through cross-modal contrastive learning shows promising performance in image-text matching tasks, such as retrieval and zero-shot classification. However, conventional cross-modal contrastive learning (CLIP-based) methods suffer from suboptimal visual representation capabilities, which also limits their effectiveness in vision-language alignment. In contrast, although the models pretrained via multimodal masked modeling struggle with direct cross-modal matching, they excel in visual representation. To address this contradiction, we propose ALTA (ALign Through Adapting), an efficient medical vision-language alignment method that utilizes only about 8% of the trainable parameters and less than 1/5 of the computational consumption required for masked record modeling. ALTA achieves superior performance in vision-language matching tasks like retrieval and zero-shot classification by adapting the pretrained vision model from masked record modeling. Additionally, we integrate temporal-multiview radiograph inputs to enhance the information consistency between radiographs and their corresponding descriptions in reports, further improving the vision-language alignment. Experimental evaluations show that ALTA outperforms the best-performing counterpart by over 4% absolute points in text-to-image accuracy and approximately 6% absolute points in image-to-text retrieval accuracy. The adaptation of vision-language models during efficient alignment also promotes better vision and language understanding. Code is publicly available at https://github.com/DopamineLcy/ALTA.
Page 11 of 22215 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.