Sort by:
Page 83 of 2382379 results

Rethinking Decoder Design: Improving Biomarker Segmentation Using Depth-to-Space Restoration and Residual Linear Attention

Saad Wazir, Daeyoung Kim

arxiv logopreprintJun 23 2025
Segmenting biomarkers in medical images is crucial for various biotech applications. Despite advances, Transformer and CNN based methods often struggle with variations in staining and morphology, limiting feature extraction. In medical image segmentation, where datasets often have limited sample availability, recent state-of-the-art (SOTA) methods achieve higher accuracy by leveraging pre-trained encoders, whereas end-to-end methods tend to underperform. This is due to challenges in effectively transferring rich multiscale features from encoders to decoders, as well as limitations in decoder efficiency. To address these issues, we propose an architecture that captures multi-scale local and global contextual information and a novel decoder design, which effectively integrates features from the encoder, emphasizes important channels and regions, and reconstructs spatial dimensions to enhance segmentation accuracy. Our method, compatible with various encoders, outperforms SOTA methods, as demonstrated by experiments on four datasets and ablation studies. Specifically, our method achieves absolute performance gains of 2.76% on MoNuSeg, 3.12% on DSB, 2.87% on Electron Microscopy, and 4.03% on TNBC datasets compared to existing SOTA methods. Code: https://github.com/saadwazir/MCADS-Decoder

BrainSymphony: A Transformer-Driven Fusion of fMRI Time Series and Structural Connectivity

Moein Khajehnejad, Forough Habibollahi, Adeel Razi

arxiv logopreprintJun 23 2025
Existing foundation models for neuroimaging are often prohibitively large and data-intensive. We introduce BrainSymphony, a lightweight, parameter-efficient foundation model that achieves state-of-the-art performance while being pre-trained on significantly smaller public datasets. BrainSymphony's strong multimodal architecture processes functional MRI data through parallel spatial and temporal transformer streams, which are then efficiently distilled into a unified representation by a Perceiver module. Concurrently, it models structural connectivity from diffusion MRI using a novel signed graph transformer to encode the brain's anatomical structure. These powerful, modality-specific representations are then integrated via an adaptive fusion gate. Despite its compact design, our model consistently outperforms larger models on a diverse range of downstream benchmarks, including classification, prediction, and unsupervised network identification tasks. Furthermore, our model revealed novel insights into brain dynamics using attention maps on a unique external psilocybin neuroimaging dataset (pre- and post-administration). BrainSymphony establishes that architecturally-aware, multimodal models can surpass their larger counterparts, paving the way for more accessible and powerful research in computational neuroscience.

Open Set Recognition for Endoscopic Image Classification: A Deep Learning Approach on the Kvasir Dataset

Kasra Moazzami, Seoyoun Son, John Lin, Sun Min Lee, Daniel Son, Hayeon Lee, Jeongho Lee, Seongji Lee

arxiv logopreprintJun 23 2025
Endoscopic image classification plays a pivotal role in medical diagnostics by identifying anatomical landmarks and pathological findings. However, conventional closed-set classification frameworks are inherently limited in open-world clinical settings, where previously unseen conditions can arise andcompromise model reliability. To address this, we explore the application of Open Set Recognition (OSR) techniques on the Kvasir dataset, a publicly available and diverse endoscopic image collection. In this study, we evaluate and compare the OSR capabilities of several representative deep learning architectures, including ResNet-50, Swin Transformer, and a hybrid ResNet-Transformer model, under both closed-set and open-set conditions. OpenMax is adopted as a baseline OSR method to assess the ability of these models to distinguish known classes from previously unseen categories. This work represents one of the first efforts to apply open set recognition to the Kvasir dataset and provides a foundational benchmark for evaluating OSR performance in medical image analysis. Our results offer practical insights into model behavior in clinically realistic settings and highlight the importance of OSR techniques for the safe deployment of AI systems in endoscopy.

Fine-tuned large language model for classifying CT-guided interventional radiology reports.

Yasaka K, Nishimura N, Fukushima T, Kubo T, Kiryu S, Abe O

pubmed logopapersJun 23 2025
BackgroundManual data curation was necessary to extract radiology reports due to the ambiguities of natural language.PurposeTo develop a fine-tuned large language model that classifies computed tomography (CT)-guided interventional radiology reports into technique categories and to compare its performance with that of the readers.Material and MethodsThis retrospective study included patients who underwent CT-guided interventional radiology between August 2008 and November 2024. Patients were chronologically assigned to the training (n = 1142; 646 men; mean age = 64.1 ± 15.7 years), validation (n = 131; 83 men; mean age = 66.1 ± 16.1 years), and test (n = 332; 196 men; mean age = 66.1 ± 14.8 years) datasets. In establishing a reference standard, reports were manually classified into categories 1 (drainage), 2 (lesion biopsy within fat or soft tissue density tissues), 3 (lung biopsy), and 4 (bone biopsy). The bi-directional encoder representation from the transformers model was fine-tuned with the training dataset, and the model with the best performance in the validation dataset was selected. The performance and required time for classification in the test dataset were compared between the best-performing model and the two readers.ResultsCategories 1/2/3/4 included 309/367/270/196, 30/42/40/19, and 75/124/78/55 patients for the training, validation, and test datasets, respectively. The model demonstrated an accuracy of 0.979 in the test dataset, which was significantly better than that of the readers (0.922-0.940) (<i>P</i> ≤0.012). The model classified reports within a 49.8-53.5-fold shorter time compared to readers.ConclusionThe fine-tuned large language model classified CT-guided interventional radiology reports into four categories demonstrating high accuracy within a remarkably short time.

Multimodal deep learning for predicting neoadjuvant treatment outcomes in breast cancer: a systematic review.

Krasniqi E, Filomeno L, Arcuri T, Ferretti G, Gasparro S, Fulvi A, Roselli A, D'Onofrio L, Pizzuti L, Barba M, Maugeri-Saccà M, Botti C, Graziano F, Puccica I, Cappelli S, Pelle F, Cavicchi F, Villanucci A, Paris I, Calabrò F, Rea S, Costantini M, Perracchio L, Sanguineti G, Takanen S, Marucci L, Greco L, Kayal R, Moscetti L, Marchesini E, Calonaci N, Blandino G, Caravagna G, Vici P

pubmed logopapersJun 23 2025
Pathological complete response (pCR) to neoadjuvant systemic therapy (NAST) is an established prognostic marker in breast cancer (BC). Multimodal deep learning (DL), integrating diverse data sources (radiology, pathology, omics, clinical), holds promise for improving pCR prediction accuracy. This systematic review synthesizes evidence on multimodal DL for pCR prediction and compares its performance against unimodal DL. Following PRISMA, we searched PubMed, Embase, and Web of Science (January 2015-April 2025) for studies applying DL to predict pCR in BC patients receiving NAST, using data from radiology, digital pathology (DP), multi-omics, and/or clinical records, and reporting AUC. Data on study design, DL architectures, and performance (AUC) were extracted. A narrative synthesis was conducted due to heterogeneity. Fifty-one studies, mostly retrospective (90.2%, median cohort 281), were included. Magnetic resonance imaging and DP were common primary modalities. Multimodal approaches were used in 52.9% of studies, often combining imaging with clinical data. Convolutional neural networks were the dominant architecture (88.2%). Longitudinal imaging improved prediction over baseline-only (median AUC 0.91 vs. 0.82). Overall, the median AUC across studies was 0.88, with 35.3% achieving AUC ≥ 0.90. Multimodal models showed a modest but consistent improvement over unimodal approaches (median AUC 0.88 vs. 0.83). Omics and clinical text were rarely primary DL inputs. DL models demonstrate promising accuracy for pCR prediction, especially when integrating multiple modalities and longitudinal imaging. However, significant methodological heterogeneity, reliance on retrospective data, and limited external validation hinder clinical translation. Future research should prioritize prospective validation, integration underutilized data (multi-omics, clinical), and explainable AI to advance DL predictors to the clinical setting.

Intelligent Virtual Dental Implant Placement via 3D Segmentation Strategy.

Cai G, Wen B, Gong Z, Lin Y, Liu H, Zeng P, Shi M, Wang R, Chen Z

pubmed logopapersJun 23 2025
Virtual dental implant placement in cone-beam computed tomography (CBCT) is a prerequisite for digital implant surgery, carrying clinical significance. However, manual placement is a complex process that should meet clinical essential requirements of restoration orientation, bone adaptation, and anatomical safety. This complexity presents challenges in balancing multiple considerations comprehensively and automating the entire workflow efficiently. This study aims to achieve intelligent virtual dental implant placement through a 3-dimensional (3D) segmentation strategy. Focusing on the missing mandibular first molars, we developed a segmentation module based on nnU-Net to generate the virtual implant from the edentulous region of CBCT and employed an approximation module for mathematical optimization. The generated virtual implant was integrated with the original CBCT to meet clinical requirements. A total of 190 CBCT scans from 4 centers were collected for model development and testing. This tool segmented the virtual implant with a surface Dice coefficient (sDice) of 0.903 and 0.884 on internal and external testing sets. Compared to the ground truth, the average deviations of the implant platform, implant apex, and angle were 0.850 ± 0.554 mm, 1.442 ± 0.539 mm, and 4.927 ± 3.804° on the internal testing set and 0.822 ± 0.353 mm, 1.467 ± 0.560 mm, and 5.517 ± 2.850° on the external testing set, respectively. The 3D segmentation-based artificial intelligence tool demonstrated good performance in predicting both the dimension and position of the virtual implants, showing significant clinical application potential in implant planning.

Development and validation of a SOTA-based system for biliopancreatic segmentation and station recognition system in EUS.

Zhang J, Zhang J, Chen H, Tian F, Zhang Y, Zhou Y, Jiang Z

pubmed logopapersJun 23 2025
Endoscopic ultrasound (EUS) is a vital tool for diagnosing biliopancreatic disease, offering detailed imaging to identify key abnormalities. Its interpretation demands expertise, which limits its accessibility for less trained practitioners. Thus, the creation of tools or systems to assist in interpreting EUS images is crucial for improving diagnostic accuracy and efficiency. To develop an AI-assisted EUS system for accurate pancreatic and biliopancreatic duct segmentation, and evaluate its impact on endoscopists' ability to identify biliary-pancreatic diseases during segmentation and anatomical localization. The EUS-AI system was designed to perform station positioning and anatomical structure segmentation. A total of 45,737 EUS images from 1852 patients were used for model training. Among them, 2881 images were for internal testing, and 2747 images from 208 patients were for external validation. Additionally, 340 images formed a man-machine competition test set. During the research process, various newer state-of-the-art (SOTA) deep learning algorithms were also compared. In classification, in the station recognition task, compared to the ResNet-50 and YOLOv8-CLS algorithms, the Mean Teacher algorithm achieved the highest accuracy, with an average of 95.60% (92.07%-99.12%) in the internal test set and 92.72% (88.30%-97.15%) in the external test set. For segmentation, compared to the UNet ++ and YOLOv8 algorithms, the U-Net v2 algorithm was optimal. Ultimately, the EUS-AI system was constructed using the optimal models from two tasks, and a man-machine competition experiment was conducted. The results demonstrated that the performance of the EUS-AI system significantly outperformed that of mid-level endoscopists, both in terms of position recognition (p < 0.001) and pancreas and biliopancreatic duct segmentation tasks (p < 0.001, p = 0.004). The EUS-AI system is expected to significantly shorten the learning curve for the pancreatic EUS examination and enhance procedural standardization.

Chest X-ray Foundation Model with Global and Local Representations Integration.

Yang Z, Xu X, Zhang J, Wang G, Kalra MK, Yan P

pubmed logopapersJun 23 2025
Chest X-ray (CXR) is the most frequently ordered imaging test, supporting diverse clinical tasks from thoracic disease detection to postoperative monitoring. However, task-specific classification models are limited in scope, require costly labeled data, and lack generalizability to out-of-distribution datasets. To address these challenges, we introduce CheXFound, a self-supervised vision foundation model that learns robust CXR representations and generalizes effectively across a wide range of downstream tasks. We pretrained CheXFound on a curated CXR-987K dataset, comprising over approximately 987K unique CXRs from 12 publicly available sources. We propose a Global and Local Representations Integration (GLoRI) head for downstream adaptations, by incorporating fine- and coarse-grained disease-specific local features with global image features for enhanced performance in multilabel classification. Our experimental results showed that CheXFound outperformed state-of-the-art models in classifying 40 disease findings across different prevalence levels on the CXR-LT 24 dataset and exhibited superior label efficiency on downstream tasks with limited training data. Additionally, CheXFound achieved significant improvements on downstream tasks with out-of-distribution datasets, including opportunistic cardiovascular disease risk estimation, mortality prediction, malpositioned tube detection, and anatomical structure segmentation. The above results demonstrate CheXFound's strong generalization capabilities, which will enable diverse downstream adaptations with improved label efficiency in future applications. The project source code is publicly available at https://github.com/RPIDIAL/CheXFound.

DCLNet: Double Collaborative Learning Network on Stationary-Dynamic Functional Brain Network for Brain Disease Classification.

Zhou J, Jie B, Wang Z, Zhang Z, Bian W, Yang Y, Li H, Sun F, Liu M

pubmed logopapersJun 23 2025
Stationary functional brain networks (sFBNs) and dynamic functional brain networks (dFBNs) derived from resting-state functional MRI characterize the complex interactions of the human brain from different aspects and could offer complementary information for brain disease analysis. Most current studies focus on sFBN or dFBN analysis, thus limiting the performance of brain network analysis. A few works have explored integrating sFBN and dFBN to identify brain diseases, and achieved better performance than conventional methods. However, these studies still ignore some valuable discriminative information, such as the distribution information of subjects between and within categories. This paper presents a Double Collaborative Learning Network (DCLNet), which takes advantage of both collaborative encoder and collaborative contrastive learning, to learn complementary information of sFBN and dFBN and distribution information of subjects between inter- and intra-categories for brain disease classification. Specifically, we first construct sFBN and dFBN using traditional correlation-based methods with rs-fMRI data, respectively. Then, we build a collaborative encoder to extract brain network features at different levels (i.e., connectivity-based, brain-region-based, and brain-network-based features), and design a prune-graft transformer module to embed the complementary information of the features at each level between two kinds of FBNs. We also develop a collaborative contrastive learning module to capture the distribution information of subjects between and within different categories, thereby learning the more discriminative features of brain networks. We evaluate the DCLNet on two real brain disease datasets with rs-fMRI data, with experimental results demonstrating the superiority of the proposed method.

Self-Supervised Optimization of RF Data Coherence for Improving Breast Reflection UCT Reconstruction.

He L, Liu Z, Cai Y, Zhang Q, Zhou L, Yuan J, Xu Y, Ding M, Yuchi M, Qiu W

pubmed logopapersJun 23 2025
Reflection Ultrasound Computed Tomography (UCT) is gaining prominence as an essential instrument for breast cancer screening. However, reflection UCT quality is often compromised by the variability in sound speed across breast tissue. Traditionally, reflection UCT utilizes the Delay and Sum (DAS) algorithm, where the Time of Flight significantly affects the coherence of the reflected radio frequency (RF) data, based on an oversimplified assumption of uniform sound speed. This study introduces three meticulously engineered modules that leverage the spatial correlation of receiving arrays to improve the coherence of RF data and enable more effective summation. These modules include the self-supervised blind RF data segment block (BSegB) and the state-space model-based strong reflection prediction block (SSM-SRP), followed by a polarity-based adaptive replacing refinement (PARR) strategy to suppress sidelobe noise caused by aperture narrowing. To assess the effectiveness of our method, we utilized standard image quality metrics, including Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), and Root Mean Squared Error (RMSE). Additionally, coherence factor (CF) and variance (Var) were employed to verify the method's ability to enhance signal coherence at the RF data level. The findings reveal that our approach greatly improves performance, achieving an average PSNR of 19.64 dB, an average SSIM of 0.71, and an average RMSE of 0.10, notably under conditions of sparse transmission. The conducted experimental analyses affirm the superior performance of our framework compared to alternative enhancement strategies, including adaptive beamforming methods and deep learning-based beamforming approaches.
Page 83 of 2382379 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.