Sort by:
Page 22 of 81807 results

VGS-ATD: Robust Distributed Learning for Multi-Label Medical Image Classification Under Heterogeneous and Imbalanced Conditions

Zehui Zhao, Laith Alzubaidi, Haider A. Alwzwazy, Jinglan Zhang, Yuantong Gu

arxiv logopreprintJul 23 2025
In recent years, advanced deep learning architectures have shown strong performance in medical imaging tasks. However, the traditional centralized learning paradigm poses serious privacy risks as all data is collected and trained on a single server. To mitigate this challenge, decentralized approaches such as federated learning and swarm learning have emerged, allowing model training on local nodes while sharing only model weights. While these methods enhance privacy, they struggle with heterogeneous and imbalanced data and suffer from inefficiencies due to frequent communication and the aggregation of weights. More critically, the dynamic and complex nature of clinical environments demands scalable AI systems capable of continuously learning from diverse modalities and multilabels. Yet, both centralized and decentralized models are prone to catastrophic forgetting during system expansion, often requiring full model retraining to incorporate new data. To address these limitations, we propose VGS-ATD, a novel distributed learning framework. To validate VGS-ATD, we evaluate it in experiments spanning 30 datasets and 80 independent labels across distributed nodes, VGS-ATD achieved an overall accuracy of 92.7%, outperforming centralized learning (84.9%) and swarm learning (72.99%), while federated learning failed under these conditions due to high requirements on computational resources. VGS-ATD also demonstrated strong scalability, with only a 1% drop in accuracy on existing nodes after expansion, compared to a 20% drop in centralized learning, highlighting its resilience to catastrophic forgetting. Additionally, it reduced computational costs by up to 50% relative to both centralized and swarm learning, confirming its superior efficiency and scalability.

CAPRI-CT: Causal Analysis and Predictive Reasoning for Image Quality Optimization in Computed Tomography

Sneha George Gnanakalavathy, Hairil Abdul Razak, Robert Meertens, Jonathan E. Fieldsend, Xujiong Ye, Mohammed M. Abdelsamea

arxiv logopreprintJul 23 2025
In computed tomography (CT), achieving high image quality while minimizing radiation exposure remains a key clinical challenge. This paper presents CAPRI-CT, a novel causal-aware deep learning framework for Causal Analysis and Predictive Reasoning for Image Quality Optimization in CT imaging. CAPRI-CT integrates image data with acquisition metadata (such as tube voltage, tube current, and contrast agent types) to model the underlying causal relationships that influence image quality. An ensemble of Variational Autoencoders (VAEs) is employed to extract meaningful features and generate causal representations from observational data, including CT images and associated imaging parameters. These input features are fused to predict the Signal-to-Noise Ratio (SNR) and support counterfactual inference, enabling what-if simulations, such as changes in contrast agents (types and concentrations) or scan parameters. CAPRI-CT is trained and validated using an ensemble learning approach, achieving strong predictive performance. By facilitating both prediction and interpretability, CAPRI-CT provides actionable insights that could help radiologists and technicians design more efficient CT protocols without repeated physical scans. The source code and dataset are publicly available at https://github.com/SnehaGeorge22/capri-ct.

MaskedCLIP: Bridging the Masked and CLIP Space for Semi-Supervised Medical Vision-Language Pre-training

Lei Zhu, Jun Zhou, Rick Siow Mong Goh, Yong Liu

arxiv logopreprintJul 23 2025
Foundation models have recently gained tremendous popularity in medical image analysis. State-of-the-art methods leverage either paired image-text data via vision-language pre-training or unpaired image data via self-supervised pre-training to learn foundation models with generalizable image features to boost downstream task performance. However, learning foundation models exclusively on either paired or unpaired image data limits their ability to learn richer and more comprehensive image features. In this paper, we investigate a novel task termed semi-supervised vision-language pre-training, aiming to fully harness the potential of both paired and unpaired image data for foundation model learning. To this end, we propose MaskedCLIP, a synergistic masked image modeling and contrastive language-image pre-training framework for semi-supervised vision-language pre-training. The key challenge in combining paired and unpaired image data for learning a foundation model lies in the incompatible feature spaces derived from these two types of data. To address this issue, we propose to connect the masked feature space with the CLIP feature space with a bridge transformer. In this way, the more semantic specific CLIP features can benefit from the more general masked features for semantic feature extraction. We further propose a masked knowledge distillation loss to distill semantic knowledge of original image features in CLIP feature space back to the predicted masked image features in masked feature space. With this mutually interactive design, our framework effectively leverages both paired and unpaired image data to learn more generalizable image features for downstream tasks. Extensive experiments on retinal image analysis demonstrate the effectiveness and data efficiency of our method.

Anatomically Based Multitask Deep Learning Radiomics Nomogram Predicts the Implant Failure Risk in Sinus Floor Elevation.

Zhu Y, Liu Y, Zhao Y, Lu Q, Wang W, Chen Y, Ji P, Chen T

pubmed logopapersJul 23 2025
To develop and assess the performance of an anatomically based multitask deep learning radiomics nomogram (AMDRN) system to predict implant failure risk before maxillary sinus floor elevation (MSFE) while incorporating automated segmentation of key anatomical structures. We retrospectively collected patients' preoperative cone beam computed tomography (CBCT) images and electronic medical records (EMRs). First, the nn-UNet v2 model was optimized to segment the maxillary sinus (MS), Schneiderian membrane (SM), and residual alveolar bone (RAB). Based on the segmentation mask, a deep learning model (3D-Attention-ResNet) and a radiomics model were developed to extract 3D features from CBCT scans, generating the DL Score, and Rad Score. Significant clinical features were also extracted from EMRs to build a clinical model. These components were then integrated using logistic regression (LR) to create the AMDRN model, which includes a visualization module to support clinical decision-making. Segmentation results for MS, RAB, and SM achieved high DICE coefficients on the test set, with values of 99.50% ± 0.84%, 92.53% ± 3.78%, and 91.58% ± 7.16%, respectively. On an independent test set, the Clinical model, Radiomics model, 3D-DL model, and AMDRN model achieved prediction accuracies of 60%, 76%, 82%, and 90%, respectively, with AMDRN achieving the highest AUC of 93%. The AMDRN system enables efficient preoperative prediction of implant failure risk in MSFE and accurate segmentation of critical anatomical structures, supporting personalized treatment planning and clinical risk management.

Re-identification of patients from imaging features extracted by foundation models.

Nebbia G, Kumar S, McNamara SM, Bridge C, Campbell JP, Chiang MF, Mandava N, Singh P, Kalpathy-Cramer J

pubmed logopapersJul 22 2025
Foundation models for medical imaging are a prominent research topic, but risks associated with the imaging features they can capture have not been explored. We aimed to assess whether imaging features from foundation models enable patient re-identification and to relate re-identification to demographic features prediction. Our data included Colour Fundus Photos (CFP), Optical Coherence Tomography (OCT) b-scans, and chest x-rays and we reported re-identification rates of 40.3%, 46.3%, and 25.9%, respectively. We reported varying performance on demographic features prediction depending on re-identification status (e.g., AUC-ROC for gender from CFP is 82.1% for re-identified images vs. 76.8% for non-re-identified ones). When training a deep learning model on the re-identification task, we reported performance of 82.3%, 93.9%, and 63.7% at image level on our internal CFP, OCT, and chest x-ray data. We showed that imaging features extracted from foundation models in ophthalmology and radiology include information that can lead to patient re-identification.

Pyramid Hierarchical Masked Diffusion Model for Imaging Synthesis

Xiaojiao Xiao, Qinmin Vivian Hu, Guanghui Wang

arxiv logopreprintJul 22 2025
Medical image synthesis plays a crucial role in clinical workflows, addressing the common issue of missing imaging modalities due to factors such as extended scan times, scan corruption, artifacts, patient motion, and intolerance to contrast agents. The paper presents a novel image synthesis network, the Pyramid Hierarchical Masked Diffusion Model (PHMDiff), which employs a multi-scale hierarchical approach for more detailed control over synthesizing high-quality images across different resolutions and layers. Specifically, this model utilizes randomly multi-scale high-proportion masks to speed up diffusion model training, and balances detail fidelity and overall structure. The integration of a Transformer-based Diffusion model process incorporates cross-granularity regularization, modeling the mutual information consistency across each granularity's latent spaces, thereby enhancing pixel-level perceptual accuracy. Comprehensive experiments on two challenging datasets demonstrate that PHMDiff achieves superior performance in both the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM), highlighting its capability to produce high-quality synthesized images with excellent structural integrity. Ablation studies further confirm the contributions of each component. Furthermore, the PHMDiff model, a multi-scale image synthesis framework across and within medical imaging modalities, shows significant advantages over other methods. The source code is available at https://github.com/xiaojiao929/PHMDiff

MLRU++: Multiscale Lightweight Residual UNETR++ with Attention for Efficient 3D Medical Image Segmentation

Nand Kumar Yadav, Rodrigue Rizk, Willium WC Chen, KC

arxiv logopreprintJul 22 2025
Accurate and efficient medical image segmentation is crucial but challenging due to anatomical variability and high computational demands on volumetric data. Recent hybrid CNN-Transformer architectures achieve state-of-the-art results but add significant complexity. In this paper, we propose MLRU++, a Multiscale Lightweight Residual UNETR++ architecture designed to balance segmentation accuracy and computational efficiency. It introduces two key innovations: a Lightweight Channel and Bottleneck Attention Module (LCBAM) that enhances contextual feature encoding with minimal overhead, and a Multiscale Bottleneck Block (M2B) in the decoder that captures fine-grained details via multi-resolution feature aggregation. Experiments on four publicly available benchmark datasets (Synapse, BTCV, ACDC, and Decathlon Lung) demonstrate that MLRU++ achieves state-of-the-art performance, with average Dice scores of 87.57% (Synapse), 93.00% (ACDC), and 81.12% (Lung). Compared to existing leading models, MLRU++ improves Dice scores by 5.38% and 2.12% on Synapse and ACDC, respectively, while significantly reducing parameter count and computational cost. Ablation studies evaluating LCBAM and M2B further confirm the effectiveness of the proposed architectural components. Results suggest that MLRU++ offers a practical and high-performing solution for 3D medical image segmentation tasks. Source code is available at: https://github.com/1027865/MLRUPP

SarAdapter: Prioritizing Attention on Semantic-Aware Representative Tokens for Enhanced Medical Image Segmentation.

Jiang W, Li Y, Liu Z, An L, Quellec G, Ou C

pubmed logopapersJul 22 2025
Transformer-based segmentation methods exhibit considerable potential in medical image analysis. However, their improved performance often comes with increased computational complexity, limiting their application in resource-constrained medical settings. Prior methods follow two independent tracks: (i) accelerating existing networks via semantic-aware routing, and (ii) optimizing token adapter design to enhance network performance. Despite directness, they encounter unavoidable defects (e.g., inflexible acceleration techniques or non-discriminative processing) limiting further improvements of quality-complexity trade-off. To address these shortcomings, we integrate these schemes by proposing the semantic-aware adapter (SarAdapter), which employs a semantic-based routing strategy, leveraging neural operators (ViT and CNN) of varying complexities. Specifically, it merges semantically similar tokens volume into low-resolution regions while preserving semantically distinct tokens as high-resolution regions. Additionally, we introduce a Mixed-adapter unit, which adaptively selects convolutional operators of varying complexities to better model regions at different scales. We evaluate our method on four medical datasets from three modalities and show that it achieves a superior balance between accuracy, model size, and efficiency. Notably, our proposed method achieves state-of-the-art segmentation quality on the Synapse dataset while reducing the number of tokens by 65.6%, signifying a substantial improvement in the efficiency of ViTs for the segmentation task.

Artificial intelligence in thyroid eye disease imaging: A systematic review.

Zhang H, Li Z, Chan HC, Song X, Zhou H, Fan X

pubmed logopapersJul 22 2025
Thyroid eye disease (TED) is a common, complex orbital disorder characterized by soft-tissue changes visible on imaging. Artificial intelligence (AI) offers promises for improving TED diagnosis and treatment; however, no systematic review has yet characterized the research landscape, key challenges, and future directions. We followed PRISMA guidelines to search multiple databases until January, 2025, for studies applying AI to computed tomography (CT), magnetic resonance imaging, and nuclear, facial or retinal imaging in TED patients. Using the APPRAISE-AI tool, we assessed study quality and included 41 studies covering various AI applications. Sample sizes ranged from 33 to 2,288 participants, predominantly East Asian. CT and facial imaging were the most common modalities, reported in 16 and 13 articles, respectively. Studies addressed clinical tasks-diagnosis, activity assessment, severity grading, and treatment prediction-and technical tasks-classification, segmentation, and image generation-with classification being the most frequent. Researchers primarily employed deep-learning models, such as residual network (ResNet) and Visual Geometry Group (VGG). Overall, the majority of the studies were of moderate quality. Image-based AI shows strong potential to improve diagnostic accuracy and guide personalized treatment strategies in TED. Future research should prioritize robust study designs, the creation of public datasets, multimodal imaging integration, and interdisciplinary collaboration to accelerate clinical translation.

MLRU++: Multiscale Lightweight Residual UNETR++ with Attention for Efficient 3D Medical Image Segmentation

Nand Kumar Yadav, Rodrigue Rizk, William CW Chen, KC

arxiv logopreprintJul 22 2025
Accurate and efficient medical image segmentation is crucial but challenging due to anatomical variability and high computational demands on volumetric data. Recent hybrid CNN-Transformer architectures achieve state-of-the-art results but add significant complexity. In this paper, we propose MLRU++, a Multiscale Lightweight Residual UNETR++ architecture designed to balance segmentation accuracy and computational efficiency. It introduces two key innovations: a Lightweight Channel and Bottleneck Attention Module (LCBAM) that enhances contextual feature encoding with minimal overhead, and a Multiscale Bottleneck Block (M2B) in the decoder that captures fine-grained details via multi-resolution feature aggregation. Experiments on four publicly available benchmark datasets (Synapse, BTCV, ACDC, and Decathlon Lung) demonstrate that MLRU++ achieves state-of-the-art performance, with average Dice scores of 87.57% (Synapse), 93.00% (ACDC), and 81.12% (Lung). Compared to existing leading models, MLRU++ improves Dice scores by 5.38% and 2.12% on Synapse and ACDC, respectively, while significantly reducing parameter count and computational cost. Ablation studies evaluating LCBAM and M2B further confirm the effectiveness of the proposed architectural components. Results suggest that MLRU++ offers a practical and high-performing solution for 3D medical image segmentation tasks. Source code is available at: https://github.com/1027865/MLRUPP
Page 22 of 81807 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.