Sort by:
Page 18 of 1331322 results

GCond: Gradient Conflict Resolution via Accumulation-based Stabilization for Large-Scale Multi-Task Learning

Evgeny Alves Limarenko, Anastasiia Alexandrovna Studenikina

arxiv logopreprintSep 8 2025
In multi-task learning (MTL), gradient conflict poses a significant challenge. Effective methods for addressing this problem, including PCGrad, CAGrad, and GradNorm, in their original implementations are computationally demanding, which significantly limits their application in modern large models and transformers. We propose Gradient Conductor (GCond), a method that builds upon PCGrad principles by combining them with gradient accumulation and an adaptive arbitration mechanism. We evaluated GCond on self-supervised learning tasks using MobileNetV3-Small and ConvNeXt architectures on the ImageNet 1K dataset and a combined head and neck CT scan dataset, comparing the proposed method against baseline linear combinations and state-of-the-art gradient conflict resolution methods. The stochastic mode of GCond achieved a two-fold computational speedup while maintaining optimization quality, and demonstrated superior performance across all evaluated metrics, achieving lower L1 and SSIM losses compared to other methods on both datasets. GCond exhibited high scalability, being successfully applied to both compact models (MobileNetV3-Small) and large architectures (ConvNeXt-tiny and ConvNeXt-Base). It also showed compatibility with modern optimizers such as AdamW and Lion/LARS. Therefore, GCond offers a scalable and efficient solution to the problem of gradient conflicts in multi-task learning.

XBusNet: Text-Guided Breast Ultrasound Segmentation via Multimodal Vision-Language Learning

Raja Mallina, Bryar Shareef

arxiv logopreprintSep 8 2025
Background: Precise breast ultrasound (BUS) segmentation supports reliable measurement, quantitative analysis, and downstream classification, yet remains difficult for small or low-contrast lesions with fuzzy margins and speckle noise. Text prompts can add clinical context, but directly applying weakly localized text-image cues (e.g., CAM/CLIP-derived signals) tends to produce coarse, blob-like responses that smear boundaries unless additional mechanisms recover fine edges. Methods: We propose XBusNet, a novel dual-prompt, dual-branch multimodal model that combines image features with clinically grounded text. A global pathway based on a CLIP Vision Transformer encodes whole-image semantics conditioned on lesion size and location, while a local U-Net pathway emphasizes precise boundaries and is modulated by prompts that describe shape, margin, and Breast Imaging Reporting and Data System (BI-RADS) terms. Prompts are assembled automatically from structured metadata, requiring no manual clicks. We evaluate on the Breast Lesions USG (BLU) dataset using five-fold cross-validation. Primary metrics are Dice and Intersection over Union (IoU); we also conduct size-stratified analyses and ablations to assess the roles of the global and local paths and the text-driven modulation. Results: XBusNet achieves state-of-the-art performance on BLU, with mean Dice of 0.8765 and IoU of 0.8149, outperforming six strong baselines. Small lesions show the largest gains, with fewer missed regions and fewer spurious activations. Ablation studies show complementary contributions of global context, local boundary modeling, and prompt-based modulation. Conclusions: A dual-prompt, dual-branch multimodal design that merges global semantics with local precision yields accurate BUS segmentation masks and improves robustness for small, low-contrast lesions.

MedSeqFT: Sequential Fine-tuning Foundation Models for 3D Medical Image Segmentation

Yiwen Ye, Yicheng Wu, Xiangde Luo, He Zhang, Ziyang Chen, Ting Dang, Yanning Zhang, Yong Xia

arxiv logopreprintSep 7 2025
Foundation models have become a promising paradigm for advancing medical image analysis, particularly for segmentation tasks where downstream applications often emerge sequentially. Existing fine-tuning strategies, however, remain limited: parallel fine-tuning isolates tasks and fails to exploit shared knowledge, while multi-task fine-tuning requires simultaneous access to all datasets and struggles with incremental task integration. To address these challenges, we propose MedSeqFT, a sequential fine-tuning framework that progressively adapts pre-trained models to new tasks while refining their representational capacity. MedSeqFT introduces two core components: (1) Maximum Data Similarity (MDS) selection, which identifies downstream samples most representative of the original pre-training distribution to preserve general knowledge, and (2) Knowledge and Generalization Retention Fine-Tuning (K&G RFT), a LoRA-based knowledge distillation scheme that balances task-specific adaptation with the retention of pre-trained knowledge. Extensive experiments on two multi-task datasets covering ten 3D segmentation tasks demonstrate that MedSeqFT consistently outperforms state-of-the-art fine-tuning strategies, yielding substantial performance gains (e.g., an average Dice improvement of 3.0%). Furthermore, evaluations on two unseen tasks (COVID-19-20 and Kidney) verify that MedSeqFT enhances transferability, particularly for tumor segmentation. Visual analyses of loss landscapes and parameter variations further highlight the robustness of MedSeqFT. These results establish sequential fine-tuning as an effective, knowledge-retentive paradigm for adapting foundation models to evolving clinical tasks. Code will be released.

A Deep Learning-Based Fully Automated Cardiac MRI Segmentation Approach for Tetralogy of Fallot Patients.

Chai WY, Lin G, Wang CJ, Chiang HJ, Ng SH, Kuo YS, Lin YC

pubmed logopapersSep 7 2025
Automated cardiac MR segmentation enables accurate and reproducible ventricular function assessment in Tetralogy of Fallot (ToF), whereas manual segmentation remains time-consuming and variable. To evaluate the deep learning (DL)-based models for automatic left ventricle (LV), right ventricle (RV), and LV myocardium segmentation in ToF, compared with manual reference standard annotations. Retrospective. 427 patients with diverse cardiac conditions (305 non-ToF, 122 ToF), with 395 for training/validation, 32 ToF for internal testing, and 12 external ToF for generalizability assessment. Steady-state free precession cine sequence at 1.5/3 T. U-Net, Deep U-Net, and MultiResUNet were trained under three regimes (non-ToF, ToF-only, mixed), using manual segmentations from one radiologist and one researcher (20 and 10 years of experience, respectively) as reference, with consensus for discrepancies. Performance for LV, RV, and LV myocardium was evaluated using Dice Similarity Coefficient (DSC), Intersection over Union (IoU), and F1-score, alongside regional (basal, middle, apical) and global ventricular function comparisons to manual results. Friedman tests were applied for architecture and regime comparisons, paired Wilcoxon tests for ED-ES differences, and Pearson's r for assessing agreement in global function. MultiResUNet model trained on a mixed dataset (TOF and non-TOF cases) achieved the best segmentation performance, with DSCs of 96.1% for LV and 93.5% for RV. In the internal test set, DSCs for LV, RV, and LV myocardium were 97.3%, 94.7%, and 90.7% at end-diastole, and 93.6%, 92.1%, and 87.8% at end-systole, with ventricular measurement correlations ranging from 0.84 to 0.99. Regional analysis showed LV DSCs of 96.3% (basal), 96.4% (middle), and 94.1% (apical), and RV DSCs of 92.8%, 94.2%, and 89.6%. External validation (n = 12) showed correlations ranging from 0.81 to 0.98. The MultiResUNet model enabled accurate automated cardiac MRI segmentation in ToF with the potential to streamline workflows and improve disease monitoring. 3. Stage 2.

Artificial intelligence-assisted assessment of metabolic response to tebentafusp in metastatic uveal melanoma: a long axial field-of-view [<sup>18</sup>F]FDG PET/CT study.

Sachpekidis C, Machiraju D, Strauss DS, Pan L, Kopp-Schneider A, Edenbrandt L, Dimitrakopoulou-Strauss A, Hassel JC

pubmed logopapersSep 6 2025
Tebentafusp has emerged as the first systemic therapy to significantly prolong survival in treatment-naïve HLA-A*02:01 + patients with unresectable or metastatic uveal melanoma (mUM). Notably, a survival benefit has been observed even in the absence of radiographic response. This study aims to investigate the feasibility and prognostic value of artificial intelligence (AI)-assisted quantification and metabolic response assessment of [<sup>18</sup>F]FDG long axial field-of-view (LAFOV) PET/CT in mUM patients undergoing tebentafusp therapy. Fifteen patients with mUM treated with tebentafusp underwent [<sup>18</sup>F]FDG LAFOV PET/CT at baseline and 3 months post-treatment. Total metabolic tumor volume (TMTV) and total lesion glycolysis (TLG) were quantified using a deep learning-based segmentation tool On the RECOMIA platform. Metabolic response was assessed according to AI-assisted PERCIST 1.0 criteria. Associations between PET-derived parameters and overall survival (OS) were evaluated using Kaplan-Meier survival analysis. The median follow up (95% CI) was 14.1 months (12.9 months - not available). Automated TMTV and TLG measurements were successfully obtained in all patients. Elevated baseline TMTV and TLG were significantly associated with shorter OS (TMTV: 16.9 vs. 27.2 months; TLG: 16.9 vs. 27.2 months; p < 0.05). Similarly, higher TMTV and TLG at 3 months post-treatment predicted poorer survival outcomes (TMTV: 14.3 vs. 24.5 months; TLG: 14.3 vs. 24.5 months; p < 0.05). AI-assisted PERCIST response evaluation identified six patients with disease control (complete metabolic response, partial metabolic response, stable metabolic disease) and nine with progressive metabolic disease. A trend toward improved OS was observed in patients with disease control (24.5 vs. 14.6 months, p = 0.08). Circulating tumor DNA (ctDNA) levels based on GNAQ and GNA11 mutations were available in 8 patients; after 3 months Of tebentafusp treatment, 5 showed reduced Or stable ctDNA levels, and 3 showed an increase (median OS: 24.5 vs. 3.3 months; p = 0.13). Patients with increasing ctDNA levels exhibited significantly higher TMTV and TLG on follow-up imaging. AI-assisted whole-body quantification of [1⁸F]FDG PET/CT and PERCIST-based response assessment are feasible and hold prognostic significance in tebentafusp-treated mUM. TMTV and TLG may serve as non-invasive imaging biomarkers for risk stratification and treatment monitoring in this malignancy.

VLSM-Ensemble: Ensembling CLIP-based Vision-Language Models for Enhanced Medical Image Segmentation

Julia Dietlmeier, Oluwabukola Grace Adegboro, Vayangi Ganepola, Claudia Mazo, Noel E. O'Connor

arxiv logopreprintSep 5 2025
Vision-language models and their adaptations to image segmentation tasks present enormous potential for producing highly accurate and interpretable results. However, implementations based on CLIP and BiomedCLIP are still lagging behind more sophisticated architectures such as CRIS. In this work, instead of focusing on text prompt engineering as is the norm, we attempt to narrow this gap by showing how to ensemble vision-language segmentation models (VLSMs) with a low-complexity CNN. By doing so, we achieve a significant Dice score improvement of 6.3% on the BKAI polyp dataset using the ensembled BiomedCLIPSeg, while other datasets exhibit gains ranging from 1% to 6%. Furthermore, we provide initial results on additional four radiology and non-radiology datasets. We conclude that ensembling works differently across these datasets (from outperforming to underperforming the CRIS model), indicating a topic for future investigation by the community. The code is available at https://github.com/juliadietlmeier/VLSM-Ensemble.

A dual-branch encoder network based on squeeze-and-excitation UNet and transformer for 3D PET-CT image tumor segmentation.

Li M, Zhu R, Li M, Wang H, Teng Y

pubmed logopapersSep 5 2025
Recognition of tumors is very important in clinical practice and radiomics; however, the segmentation task currently still needs to be done manually by experts. With the development of deep learning, automatic segmentation of tumors is gradually becoming possible. This paper combines the molecular information from PET and the pathology information from CT for tumor segmentation. A dual-branch encoder is designed based on SE-UNet (Squeeze-and-Excitation Normalization UNet) and Transformer, 3D Convolutional Block Attention Module (CBAM) is added to skip-connection, and BCE loss is used in training for improving segmentation accuracy. The new model is named TASE-UNet. The proposed method was tested on the HECKTOR2022 dataset, which obtains the best segmentation accuracy compared with state-of-the-art methods. Specifically, we obtained results of 76.10 <math xmlns="http://www.w3.org/1998/Math/MathML"><mo>%</mo></math> and 3.27 for the two key evaluation metrics, DSC and HD95. Experiments demonstrate that the designed network is reasonable and effective. The full implementation is available at https://github.com/LiMingrui1/TASE-UNet .

A generalist foundation model and database for open-world medical image segmentation.

Zhang S, Zhang Q, Zhang S, Liu X, Yue J, Lu M, Xu H, Yao J, Wei X, Cao J, Zhang X, Gao M, Shen J, Hao Y, Wang Y, Zhang X, Wu S, Zhang P, Cui S, Wang G

pubmed logopapersSep 5 2025
Vision foundation models have demonstrated vast potential in achieving generalist medical segmentation capability, providing a versatile, task-agnostic solution through a single model. However, current generalist models involve simple pre-training on various medical data containing irrelevant information, often resulting in the negative transfer phenomenon and degenerated performance. Furthermore, the practical applicability of foundation models across diverse open-world scenarios, especially in out-of-distribution (OOD) settings, has not been extensively evaluated. Here we construct a publicly accessible database, MedSegDB, based on a tree-structured hierarchy and annotated from 129 public medical segmentation repositories and 5 in-house datasets. We further propose a Generalist Medical Segmentation model (MedSegX), a vision foundation model trained with a model-agnostic Contextual Mixture of Adapter Experts (ConMoAE) for open-world segmentation. We conduct a comprehensive evaluation of MedSegX across a range of medical segmentation tasks. Experimental results indicate that MedSegX achieves state-of-the-art performance across various modalities and organ systems in in-distribution (ID) settings. In OOD and real-world clinical settings, MedSegX consistently maintains its performance in both zero-shot and data-efficient generalization, outperforming other foundation models.

Prostate MR image segmentation using a multi-stage network approach.

Jacobson LEO, Bader-El-Den M, Maurya L, Hopgood AA, Tamma V, Masum SK, Prendergast DJ, Osborn P

pubmed logopapersSep 5 2025
Prostate cancer (PCa) remains one of the most prevalent cancers among men, with over 1.4 million new cases and 375,304 deaths reported globally in 2020. Current diagnostic approaches, such as prostate-specific antigen (PSA) testing and trans-rectal ultrasound (TRUS)-guided biopsies, are often Limited by low specificity and accuracy. This study addresses these Limitations by leveraging deep learning-based image segmentation techniques on a dataset comprising 61,119 T2-weighted MR images from 1151 patients to enhance PCa detection and characterisation. A multi-stage segmentation approach, including one-stage, sequential two-stage, and end-to-end two-stage methods, was evaluated using various deep learning architectures. The MultiResUNet model, integrated into a multi-stage segmentation framework, demonstrated significant improvements in delineating prostate boundaries. The study utilised a dataset of over 61,000 T2-weighted magnetic resonance (MR) images from more than 1100 patients, employing three distinct segmentation strategies: one-stage, sequential two-stage, and end-to-end two-stage methods. The end-to-end approach, leveraging shared feature representations, consistently outperformed other methods, underscoring its effectiveness in enhancing diagnostic accuracy. These findings highlight the potential of advanced deep learning architectures in streamlining prostate cancer detection and treatment planning. Future work will focus on further optimisation of the models and assessing their generalisability to diverse medical imaging contexts.

Implementation of Fully Automated AI-Integrated System for Body Composition Assessment on Computed Tomography for Opportunistic Sarcopenia Screening: Multicenter Prospective Study.

Urooj B, Ko Y, Na S, Kim IO, Lee EH, Cho S, Jeong H, Khang S, Lee J, Kim KW

pubmed logopapersSep 5 2025
Opportunistic computed tomography (CT) screening for the evaluation of sarcopenia and myosteatosis has been gaining emphasis. A fully automated artificial intelligence (AI)-integrated system for body composition assessment on CT scans is a prerequisite for effective opportunistic screening. However, no study has evaluated the implementation of fully automated AI systems for opportunistic screening in real-world clinical practice for routine health check-ups. The aim of this study is to evaluate the performance and clinical utility of a fully automated AI-integrated system for body composition assessment on opportunistic CT during routine health check-ups. This prospective multicenter study included 537 patients who underwent routine health check-ups across 3 institutions. Our AI algorithm models are composed of selecting L3 slice and segmenting muscle and fat area in an end-to-end manner. The AI models were integrated into the Picture Archiving and Communication System (PACS) at each institution. Technical success rate, processing time, and segmentation accuracy in Dice similarity coefficient were assessed. Body composition metrics were analyzed across age and sex groups. The fully automated AI-integrated system successfully retrieved anonymized CT images from the PACS, performed L3 selection and segmentation, and provided body composition metrics, including muscle quality maps and muscle age. The technical success rate was 100% without any failed cases requiring manual adjustment. The mean processing time from CT acquisition to report generation was 4.12 seconds. Segmentation accuracy comparing AI results and human expert results was 97.4%. Significant age-related declines in skeletal muscle area and normal-attenuation muscle area were observed, alongside increases in low-attenuation muscle area and intramuscular adipose tissue. Implementation of the fully automated AI-integrated system significantly enhanced opportunistic sarcopenia screening, achieving excellent technical success and high segmentation accuracy without manual intervention. This system has the potential to transform routine health check-ups by providing rapid and accurate assessments of body composition.
Page 18 of 1331322 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.