Sort by:
Page 25 of 35350 results

MRI super-resolution reconstruction using efficient diffusion probabilistic model with residual shifting.

Safari M, Wang S, Eidex Z, Li Q, Qiu RLJ, Middlebrooks EH, Yu DS, Yang X

pubmed logopapersJun 3 2025
Magnetic resonance imaging (MRI) is essential in clinical and research contexts, providing exceptional soft-tissue contrast. However, prolonged acquisition times often lead to patient discomfort and motion artifacts. Diffusion-based deep learning super-resolution (SR) techniques reconstruct high-resolution (HR) images from low-resolution (LR) pairs, but they involve extensive sampling steps, limiting real-time application. To overcome these issues, this study introduces a residual error-shifting mechanism markedly reducing sampling steps while maintaining vital anatomical details, thereby accelerating MRI reconstruction. We developed Res-SRDiff, a novel diffusion-based SR framework incorporating residual error shifting into the forward diffusion process. This integration aligns the degraded HR and LR distributions, enabling efficient HR image reconstruction. We evaluated Res-SRDiff using ultra-high-field brain T1 MP2RAGE maps and T2-weighted prostate images, benchmarking it against Bicubic, Pix2pix, CycleGAN, SPSR, I2SB, and TM-DDPM methods. Quantitative assessments employed peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), gradient magnitude similarity deviation (GMSD), and learned perceptual image patch similarity (LPIPS). Additionally, we qualitatively and quantitatively assessed the proposed framework's individual components through an ablation study and conducted a Likert-based image quality evaluation. Res-SRDiff significantly surpassed most comparison methods regarding PSNR, SSIM, and GMSD for both datasets, with statistically significant improvements (p-values≪0.05). The model achieved high-fidelity image reconstruction using only four sampling steps, drastically reducing computation time to under one second per slice. In contrast, traditional methods like TM-DDPM and I2SB required approximately 20 and 38 seconds per slice, respectively. Qualitative analysis showed Res-SRDiff effectively preserved fine anatomical details and lesion morphologies. The Likert study indicated that our method received the highest scores, 4.14±0.77(brain) and 4.80±0.40(prostate). Res-SRDiff demonstrates efficiency and accuracy, markedly improving computational speed and image quality. Incorporating residual error shifting into diffusion-based SR facilitates rapid, robust HR image reconstruction, enhancing clinical MRI workflow and advancing medical imaging research. Code available at https://github.com/mosaf/Res-SRDiff.

Open-PMC-18M: A High-Fidelity Large Scale Medical Dataset for Multimodal Representation Learning

Negin Baghbanzadeh, Sajad Ashkezari, Elham Dolatabadi, Arash Afkanpour

arxiv logopreprintJun 3 2025
Compound figures, which are multi-panel composites containing diverse subfigures, are ubiquitous in biomedical literature, yet large-scale subfigure extraction remains largely unaddressed. Prior work on subfigure extraction has been limited in both dataset size and generalizability, leaving a critical open question: How does high-fidelity image-text alignment via large-scale subfigure extraction impact representation learning in vision-language models? We address this gap by introducing a scalable subfigure extraction pipeline based on transformer-based object detection, trained on a synthetic corpus of 500,000 compound figures, and achieving state-of-the-art performance on both ImageCLEF 2016 and synthetic benchmarks. Using this pipeline, we release OPEN-PMC-18M, a large-scale high quality biomedical vision-language dataset comprising 18 million clinically relevant subfigure-caption pairs spanning radiology, microscopy, and visible light photography. We train and evaluate vision-language models on our curated datasets and show improved performance across retrieval, zero-shot classification, and robustness benchmarks, outperforming existing baselines. We release our dataset, models, and code to support reproducible benchmarks and further study into biomedical vision-language modeling and representation learning.

Open-PMC-18M: A High-Fidelity Large Scale Medical Dataset for Multimodal Representation Learning

Negin Baghbanzadeh, Sajad Ashkezari, Elham Dolatabadi, Arash Afkanpour

arxiv logopreprintJun 3 2025
Compound figures, which are multi-panel composites containing diverse subfigures, are ubiquitous in biomedical literature, yet large-scale subfigure extraction remains largely unaddressed. Prior work on subfigure extraction has been limited in both dataset size and generalizability, leaving a critical open question: How does high-fidelity image-text alignment via large-scale subfigure extraction impact representation learning in vision-language models? We address this gap by introducing a scalable subfigure extraction pipeline based on transformer-based object detection, trained on a synthetic corpus of 500,000 compound figures, and achieving state-of-the-art performance on both ImageCLEF 2016 and synthetic benchmarks. Using this pipeline, we release OPEN-PMC-18M, a large-scale high quality biomedical vision-language dataset comprising 18 million clinically relevant subfigure-caption pairs spanning radiology, microscopy, and visible light photography. We train and evaluate vision-language models on our curated datasets and show improved performance across retrieval, zero-shot classification, and robustness benchmarks, outperforming existing baselines. We release our dataset, models, and code to support reproducible benchmarks and further study into biomedical vision-language modeling and representation learning.

Disease-Grading Networks with Asymmetric Gaussian Distribution for Medical Imaging.

Tang W, Yang Z

pubmed logopapersJun 2 2025
Deep learning-based disease grading technologies facilitate timely medical intervention due to their high efficiency and accuracy. Recent advancements have enhanced grading performance by incorporating the ordinal relationships of disease labels. However, existing methods often assume same probability distributions for disease labels across instances within the same category, overlooking variations in label distributions. Additionally, the hyperparameters of these distributions are typically determined empirically, which may not accurately reflect the true distribution. To address these limitations, we propose a disease grading network utilizing a sample-aware asymmetric Gaussian label distribution, termed DGN-AGLD. This approach includes a variance predictor designed to learn and predict parameters that control the asymmetry of the Gaussian distribution, enabling distinct label distributions within the same category. This module can be seamlessly integrated into standard deep learning networks. Experimental results on four disease datasets validate the effectiveness and superiority of the proposed method, particularly on the IDRiD dataset, where it achieves a diabetic retinopathy accuracy of 77.67%. Furthermore, our method extends to joint disease grading tasks, yielding superior results and demonstrating significant generalization capabilities. Visual analysis indicates that our method more accurately captures the trend of disease progression by leveraging the asymmetry in label distribution. Our code is publicly available on https://github.com/ahtwq/AGNet.

Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models.

Lian C, Zhou HY, Liang D, Qin J, Wang L

pubmed logopapersJun 2 2025
Medical vision-language alignment through cross-modal contrastive learning shows promising performance in image-text matching tasks, such as retrieval and zero-shot classification. However, conventional cross-modal contrastive learning (CLIP-based) methods suffer from suboptimal visual representation capabilities, which also limits their effectiveness in vision-language alignment. In contrast, although the models pretrained via multimodal masked modeling struggle with direct cross-modal matching, they excel in visual representation. To address this contradiction, we propose ALTA (ALign Through Adapting), an efficient medical vision-language alignment method that utilizes only about 8% of the trainable parameters and less than 1/5 of the computational consumption required for masked record modeling. ALTA achieves superior performance in vision-language matching tasks like retrieval and zero-shot classification by adapting the pretrained vision model from masked record modeling. Additionally, we integrate temporal-multiview radiograph inputs to enhance the information consistency between radiographs and their corresponding descriptions in reports, further improving the vision-language alignment. Experimental evaluations show that ALTA outperforms the best-performing counterpart by over 4% absolute points in text-to-image accuracy and approximately 6% absolute points in image-to-text retrieval accuracy. The adaptation of vision-language models during efficient alignment also promotes better vision and language understanding. Code is publicly available at https://github.com/DopamineLcy/ALTA.

UniBrain: Universal Brain MRI diagnosis with hierarchical knowledge-enhanced pre-training.

Lei J, Dai L, Jiang H, Wu C, Zhang X, Zhang Y, Yao J, Xie W, Zhang Y, Li Y, Zhang Y, Wang Y

pubmed logopapersJun 1 2025
Magnetic Resonance Imaging (MRI) has become a pivotal tool in diagnosing brain diseases, with a wide array of computer-aided artificial intelligence methods being proposed to enhance diagnostic accuracy. However, early studies were often limited by small-scale datasets and a narrow range of disease types, which posed challenges in model generalization. This study presents UniBrain, a hierarchical knowledge-enhanced pre-training framework designed for universal brain MRI diagnosis. UniBrain leverages a large-scale dataset comprising 24,770 imaging-report pairs from routine diagnostics for pre-training. Unlike previous approaches that either focused solely on visual representation learning or used brute-force alignment between vision and language, the framework introduces a hierarchical alignment mechanism. This mechanism extracts structured knowledge from free-text clinical reports at multiple granularities, enabling vision-language alignment at both the sequence and case levels, thereby significantly improving feature learning efficiency. A coupled vision-language perception module is further employed for text-guided multi-label classification, which facilitates zero-shot evaluation and fine-tuning of downstream tasks without modifying the model architecture. UniBrain is validated on both in-domain and out-of-domain datasets, consistently surpassing existing state-of-the-art diagnostic models and demonstrating performance on par with radiologists in specific disease categories. It shows strong generalization capabilities across diverse tasks, highlighting its potential for broad clinical application. The code is available at https://github.com/ljy19970415/UniBrain.

Efficient slice anomaly detection network for 3D brain MRI Volume.

Zhang Z, Mohsenzadeh Y

pubmed logopapersJun 1 2025
Current anomaly detection methods excel with benchmark industrial data but struggle with natural images and medical data due to varying definitions of 'normal' and 'abnormal.' This makes accurate identification of deviations in these fields particularly challenging. Especially for 3D brain MRI data, all the state-of-the-art models are reconstruction-based with 3D convolutional neural networks which are memory-intensive, time-consuming and producing noisy outputs that require further post-processing. We propose a framework called Simple Slice-based Network (SimpleSliceNet), which utilizes a model pre-trained on ImageNet and fine-tuned on a separate MRI dataset as a 2D slice feature extractor to reduce computational cost. We aggregate the extracted features to perform anomaly detection tasks on 3D brain MRI volumes. Our model integrates a conditional normalizing flow to calculate log likelihood of features and employs the contrastive loss to enhance anomaly detection accuracy. The results indicate improved performance, showcasing our model's remarkable adaptability and effectiveness when addressing the challenges exists in brain MRI data. In addition, for the large-scale 3D brain volumes, our model SimpleSliceNet outperforms the state-of-the-art 2D and 3D models in terms of accuracy, memory usage and time consumption. Code is available at: https://github.com/Jarvisarmy/SimpleSliceNet.

Automated Ensemble Multimodal Machine Learning for Healthcare.

Imrie F, Denner S, Brunschwig LS, Maier-Hein K, van der Schaar M

pubmed logopapersJun 1 2025
The application of machine learning in medicine and healthcare has led to the creation of numerous diagnostic and prognostic models. However, despite their success, current approaches generally issue predictions using data from a single modality. This stands in stark contrast with clinician decision-making which employs diverse information from multiple sources. While several multimodal machine learning approaches exist, significant challenges in developing multimodal systems remain that are hindering clinical adoption. In this paper, we introduce a multimodal framework, AutoPrognosis-M, that enables the integration of structured clinical (tabular) data and medical imaging using automated machine learning. AutoPrognosis-M incorporates 17 imaging models, including convolutional neural networks and vision transformers, and three distinct multimodal fusion strategies. In an illustrative application using a multimodal skin lesion dataset, we highlight the importance of multimodal machine learning and the power of combining multiple fusion strategies using ensemble learning. We have open-sourced our framework as a tool for the community and hope it will accelerate the uptake of multimodal machine learning in healthcare and spur further innovation.

FedBCD: Federated Ultrasound Video and Image Joint Learning for Breast Cancer Diagnosis.

Deng T, Huang C, Cai M, Liu Y, Liu M, Lin J, Shi Z, Zhao B, Huang J, Liang C, Han G, Liu Z, Wang Y, Han C

pubmed logopapersJun 1 2025
Ultrasonography plays an essential role in breast cancer diagnosis. Current deep learning based studies train the models on either images or videos in a centralized learning manner, lacking consideration of joint benefits between two different modality models or the privacy issue of data centralization. In this study, we propose the first decentralized learning solution for joint learning with breast ultrasound video and image, called FedBCD. To enable the model to learn from images and videos simultaneously and seamlessly in client-level local training, we propose a Joint Ultrasound Video and Image Learning (JUVIL) model to bridge the dimension gap between video and image data by incorporating temporal and spatial adapters. The parameter-efficient design of JUVIL with trainable adapters and frozen backbone further reduces the computational cost and communication burden of federated learning, finally improving the overall efficiency. Moreover, considering conventional model-wise aggregation may lead to unstable federated training due to different modalities, data capacities in different clients, and different functionalities across layers. We further propose a Fisher information matrix (FIM) guided Layer-wise Aggregation method named FILA. By measuring layer-wise sensitivity with FIM, FILA assigns higher contributions to the clients with lower sensitivity, improving personalized performance during federated training. Extensive experiments on three image clients and one video client demonstrate the benefits of joint learning architecture, especially for the ones with small-scale data. FedBCD significantly outperforms nine federated learning methods on both video-based and image-based diagnoses, demonstrating the superiority and potential for clinical practice. Code is released at https://github.com/tianpeng-deng/FedBCD.

Adaptive Weighting Based Metal Artifact Reduction in CT Images.

Wang H, Wu Y, Wang Y, Wei D, Wu X, Ma J, Zheng Y

pubmed logopapersJun 1 2025
Against the metal artifact reduction (MAR) task in computed tomography (CT) imaging, most of the existing deep-learning-based approaches generally select a single Hounsfield unit (HU) window followed by a normalization operation to preprocess CT images. However, in practical clinical scenarios, different body tissues and organs are often inspected under varying window settings for good contrast. The methods trained on a fixed single window would lead to insufficient removal of metal artifacts when being transferred to deal with other windows. To alleviate this problem, few works have proposed to reconstruct the CT images under multiple-window configurations. Albeit achieving good reconstruction performance for different windows, they adopt to directly supervise each window learning in an equal weighting way based on the training set. To improve the learning flexibility and model generalizability, in this paper, we propose an adaptive weighting algorithm, called AdaW, for the multiple-window metal artifact reduction, which can be applied to different deep MAR network backbones. Specifically, we first formulate the multiple window learning task as a bi-level optimization problem. Then we derive an adaptive weighting optimization algorithm where the learning process for MAR under each window is automatically weighted via a learning-to-learn paradigm based on the training set and validation set. This rationality is finely substantiated through theoretical analysis. Based on different network backbones, experimental comparisons executed on five datasets with different body sites comprehensively validate the effectiveness of AdaW in helping improve the generalization performance as well as its good applicability. We will release the code at https://github.com/hongwang01/AdaW.
Page 25 of 35350 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.