Sort by:
Page 16 of 31302 results

SW-ViT: A Spatio-Temporal Vision Transformer Network with Post Denoiser for Sequential Multi-Push Ultrasound Shear Wave Elastography

Ahsan Habib Akash, MD Jahin Alam, Md. Kamrul Hasan

arxiv logopreprintMay 24 2025
Objective: Ultrasound Shear Wave Elastography (SWE) demonstrates great potential in assessing soft-tissue pathology by mapping tissue stiffness, which is linked to malignancy. Traditional SWE methods have shown promise in estimating tissue elasticity, yet their susceptibility to noise interference, reliance on limited training data, and inability to generate segmentation masks concurrently present notable challenges to accuracy and reliability. Approach: In this paper, we propose SW-ViT, a novel two-stage deep learning framework for SWE that integrates a CNN-Spatio-Temporal Vision Transformer-based reconstruction network with an efficient Transformer-based post-denoising network. The first stage uses a 3D ResNet encoder with multi-resolution spatio-temporal Transformer blocks that capture spatial and temporal features, followed by a squeeze-and-excitation attention decoder that reconstructs 2D stiffness maps. To address data limitations, a patch-based training strategy is adopted for localized learning and reconstruction. In the second stage, a denoising network with a shared encoder and dual decoders processes inclusion and background regions to produce a refined stiffness map and segmentation mask. A hybrid loss combining regional, smoothness, fusion, and Intersection over Union (IoU) components ensures improvements in both reconstruction and segmentation. Results: On simulated data, our method achieves PSNR of 32.68 dB, CNR of 46.78 dB, and SSIM of 0.995. On phantom data, results include PSNR of 21.11 dB, CNR of 42.14 dB, and SSIM of 0.936. Segmentation IoU values reach 0.949 (simulation) and 0.738 (phantom) with ASSD values being 0.184 and 1.011, respectively. Significance: SW-ViT delivers robust, high-quality elasticity map estimates from noisy SWE data and holds clear promise for clinical application.

MSLAU-Net: A Hybird CNN-Transformer Network for Medical Image Segmentation

Libin Lan, Yanxin Li, Xiaojuan Liu, Juan Zhou, Jianxun Zhang, Nannan Huang, Yudong Zhang

arxiv logopreprintMay 24 2025
Both CNN-based and Transformer-based methods have achieved remarkable success in medical image segmentation tasks. However, CNN-based methods struggle to effectively capture global contextual information due to the inherent limitations of convolution operations. Meanwhile, Transformer-based methods suffer from insufficient local feature modeling and face challenges related to the high computational complexity caused by the self-attention mechanism. To address these limitations, we propose a novel hybrid CNN-Transformer architecture, named MSLAU-Net, which integrates the strengths of both paradigms. The proposed MSLAU-Net incorporates two key ideas. First, it introduces Multi-Scale Linear Attention, designed to efficiently extract multi-scale features from medical images while modeling long-range dependencies with low computational complexity. Second, it adopts a top-down feature aggregation mechanism, which performs multi-level feature aggregation and restores spatial resolution using a lightweight structure. Extensive experiments conducted on benchmark datasets covering three imaging modalities demonstrate that the proposed MSLAU-Net outperforms other state-of-the-art methods on nearly all evaluation metrics, validating the superiority, effectiveness, and robustness of our approach. Our code is available at https://github.com/Monsoon49/MSLAU-Net.

Symbolic and hybrid AI for brain tissue segmentation using spatial model checking.

Belmonte G, Ciancia V, Massink M

pubmed logopapersMay 24 2025
Segmentation of 3D medical images, and brain segmentation in particular, is an important topic in neuroimaging and in radiotherapy. Overcoming the current, time consuming, practise of manual delineation of brain tumours and providing an accurate, explainable, and replicable method of segmentation of the tumour area and related tissues is therefore an open research challenge. In this paper, we first propose a novel symbolic approach to brain segmentation and delineation of brain lesions based on spatial model checking. This method has its foundations in the theory of closure spaces, a generalisation of topological spaces, and spatial logics. At its core is a high-level declarative logic language for image analysis, ImgQL, and an efficient spatial model checker, VoxLogicA, exploiting state-of-the-art image analysis libraries in its model checking algorithm. We then illustrate how this technique can be combined with Machine Learning techniques leading to a hybrid AI approach that provides accurate and explainable segmentation results. We show the results of the application of the symbolic approach on several public datasets with 3D magnetic resonance (MR) images. Three datasets are provided by the 2017, 2019 and 2020 international MICCAI BraTS Challenges with 210, 259 and 293 MR images, respectively, and the fourth is the BrainWeb dataset with 20 (synthetic) 3D patient images of the normal brain. We then apply the hybrid AI method to the BraTS 2020 training set. Our segmentation results are shown to be in line with the state-of-the-art with respect to other recent approaches, both from the accuracy point of view as well as from the view of computational efficiency, but with the advantage of them being explainable.

TK-Mamba: Marrying KAN with Mamba for Text-Driven 3D Medical Image Segmentation

Haoyu Yang, Yuxiang Cai, Jintao Chen, Xuhong Zhang, Wenhui Lei, Xiaoming Shi, Jianwei Yin, Yankai Jiang

arxiv logopreprintMay 24 2025
3D medical image segmentation is vital for clinical diagnosis and treatment but is challenged by high-dimensional data and complex spatial dependencies. Traditional single-modality networks, such as CNNs and Transformers, are often limited by computational inefficiency and constrained contextual modeling in 3D settings. We introduce a novel multimodal framework that leverages Mamba and Kolmogorov-Arnold Networks (KAN) as an efficient backbone for long-sequence modeling. Our approach features three key innovations: First, an EGSC (Enhanced Gated Spatial Convolution) module captures spatial information when unfolding 3D images into 1D sequences. Second, we extend Group-Rational KAN (GR-KAN), a Kolmogorov-Arnold Networks variant with rational basis functions, into 3D-Group-Rational KAN (3D-GR-KAN) for 3D medical imaging - its first application in this domain - enabling superior feature representation tailored to volumetric data. Third, a dual-branch text-driven strategy leverages CLIP's text embeddings: one branch swaps one-hot labels for semantic vectors to preserve inter-organ semantic relationships, while the other aligns images with detailed organ descriptions to enhance semantic alignment. Experiments on the Medical Segmentation Decathlon (MSD) and KiTS23 datasets show our method achieving state-of-the-art performance, surpassing existing approaches in accuracy and efficiency. This work highlights the power of combining advanced sequence modeling, extended network architectures, and vision-language synergy to push forward 3D medical image segmentation, delivering a scalable solution for clinical use. The source code is openly available at https://github.com/yhy-whu/TK-Mamba.

Novel Deep Learning Framework for Simultaneous Assessment of Left Ventricular Mass and Longitudinal Strain: Clinical Feasibility and Validation in Patients with Hypertrophic Cardiomyopathy

Park, J., Yoon, Y. E., Jang, Y., Jung, T., Jeon, J., Lee, S.-A., Choi, H.-M., Hwang, I.-C., Chun, E. J., Cho, G.-Y., Chang, H.-J.

medrxiv logopreprintMay 23 2025
BackgroundThis study aims to present the Segmentation-based Myocardial Advanced Refinement Tracking (SMART) system, a novel artificial intelligence (AI)-based framework for transthoracic echocardiography (TTE) that incorporates motion tracking and left ventricular (LV) myocardial segmentation for automated LV mass (LVM) and global longitudinal strain (LVGLS) assessment. MethodsThe SMART system demonstrates LV speckle tracking based on motion vector estimation, refined by structural information using endocardial and epicardial segmentation throughout the cardiac cycle. This approach enables automated measurement of LVMSMART and LVGLSSMART. The feasibility of SMART is validated in 111 hypertrophic cardiomyopathy (HCM) patients (median age: 58 years, 69% male) who underwent TTE and cardiac magnetic resonance imaging (CMR). ResultsLVGLSSMART showed a strong correlation with conventional manual LVGLS measurements (Pearsons correlation coefficient [PCC] 0.851; mean difference 0 [-2-0]). When compared to CMR as the reference standard for LVM, the conventional dimension-based TTE method overestimated LVM (PCC 0.652; mean difference: 106 [90-123]), whereas LVMSMART demonstrated excellent agreement with CMR (PCC 0.843; mean difference: 1 [-11-13]). For predicting extensive myocardial fibrosis, LVGLSSMART and LVMSMART exhibited performance comparable to conventional LVGLS and CMR (AUC: 0.72 and 0.66, respectively). Patients identified as high-risk for extensive fibrosis by LVGLSSMART and LVMSMART had significantly higher rates of adverse outcomes, including heart failure hospitalization, new-onset atrial fibrillation, and defibrillator implantation. ConclusionsThe SMART technique provides a comparable LVGLS evaluation and a more accurate LVM assessment than conventional TTE, with predictive values for myocardial fibrosis and adverse outcomes. These findings support its utility in HCM management.

Artificial intelligence automated measurements of spinopelvic parameters in adult spinal deformity-a systematic review.

Bishara A, Patel S, Warman A, Jo J, Hughes LP, Khalifeh JM, Azad TD

pubmed logopapersMay 23 2025
This review evaluates advances made in deep learning (DL) applications to automatic spinopelvic parameter estimation, comparing their accuracy to manual measurements performed by surgeons. The PubMed database was queried for studies on DL measurement of adult spinopelvic parameters between 2014 and 2024. Studies were excluded if they focused on pediatric patients, non-deformity-related conditions, non-human subjects, or if they lacked sufficient quantitative data comparing DL models to human measurements. Included studies were assessed based on model architecture, patient demographics, training, validation, testing methods, and sample sizes, as well as performance compared to manual methods. Of 442 screened articles, 16 were included, with sample sizes ranging from 15 to 9,832 radiograph images and reporting interclass correlation coefficients (ICCs) of 0.56 to 1.00. Measurements of pelvic tilt, pelvic incidence, T4-T12 kyphosis, L1-L4 lordosis, and SVA showed consistently high ICCs (>0.80) and low mean absolute deviations (MADs <6°), with substantial number of studies reporting pelvic tilt achieving an excellent ICC of 0.90 or greater. In contrast, T1-T12 kyphosis and L4-S1 lordosis exhibited lower ICCs and higher measurement errors. Overall, most DL models demonstrated strong correlations (>0.80) with clinician measurements and minimal differences compared to manual references, except for T1-T12 kyphosis (average Pearson correlation: 0.68), L1-L4 lordosis (average Pearson correlation: 0.75), and L4-S1 lordosis (average Pearson correlation: 0.65). Novel computer vision algorithms show promising accuracy in measuring spinopelvic parameters, comparable to manual surgeon measurements. Future research should focus on external validation, additional imaging modalities, and the feasibility of integration in clinical settings to assess model reliability and predictive capacity.

A deep learning model integrating domain-specific features for enhanced glaucoma diagnosis.

Xu J, Jing E, Chai Y

pubmed logopapersMay 23 2025
Glaucoma is a group of serious eye diseases that can cause incurable blindness. Despite the critical need for early detection, over 60% of cases remain undiagnosed, especially in less developed regions. Glaucoma diagnosis is a costly task and some models have been proposed to automate diagnosis based on images of the retina, specifically the area known as the optic cup and the associated disc where retinal blood vessels and nerves enter and leave the eye. However, diagnosis is complicated because both normal and glaucoma-affected eyes can vary greatly in appearance. Some normal cases, like glaucoma, exhibit a larger cup-to-disc ratio, one of the main diagnostic criteria, making it challenging to distinguish between them. We propose a deep learning model with domain features (DLMDF) to combine unstructured and structured features to distinguish between glaucoma and physiologic large cups. The structured features were based upon the known cup-to-disc ratios of the four quadrants of the optic discs in normal, physiologic large cups, and glaucomatous optic cups. We segmented each cup and disc using a fully convolutional neural network and then calculated the cup size, disc size, and cup-to-disc ratio of each quadrant. The unstructured features were learned from a deep convolutional neural network. The average precision (AP) for disc segmentation was 98.52%, and for cup segmentation it was also 98.57%. Thus, the relatively high AP values enabled us to calculate the 15 reliable features from each segmented disc and cup. In classification tasks, the DLMDF outperformed other models, achieving superior accuracy, precision, and recall. These results validate the effectiveness of combining deep learning-derived features with domain-specific structured features, underscoring the potential of this approach to advance glaucoma diagnosis.

AMVLM: Alignment-Multiplicity Aware Vision-Language Model for Semi-Supervised Medical Image Segmentation.

Pan Q, Li Z, Qiao W, Lou J, Yang Q, Yang G, Ji B

pubmed logopapersMay 23 2025
Low-quality pseudo labels pose a significant obstacle in semi-supervised medical image segmentation (SSMIS), impeding consistency learning on unlabeled data. Leveraging vision-language model (VLM) holds promise in ameliorating pseudo label quality by employing textual prompts to delineate segmentation regions, but it faces the challenge of cross-modal alignment uncertainty due to multiple correspondences (multiple images/texts tend to correspond to one text/image). Existing VLMs address this challenge by modeling semantics as distributions but such distributions lead to semantic degradation. To address these problems, we propose Alignment-Multiplicity Aware Vision-Language Model (AMVLM), a new VLM pre-training paradigm with two novel similarity metric strategies. (i) Cross-modal Similarity Supervision (CSS) proposes a probability distribution transformer to supervise similarity scores across fine-granularity semantics through measuring cross-modal distribution disparities, thus learning cross-modal multiple alignments. (ii) Intra-modal Contrastive Learning (ICL) takes into account the similarity metric of coarse-fine granularity information within each modality to encourage cross-modal semantic consistency. Furthermore, using the pretrained AMVLM, we propose a pioneering text-guided SSMIS network to compensate for the quality deficiencies of pseudo-labels. This network incorporates a text mask generator to produce multimodal supervision information, enhancing pseudo label quality and the model's consistency learning. Extensive experimentation validates the efficacy of our AMVLM-driven SSMIS, showcasing superior performance across four publicly available datasets. The code will be available at: https://github.com/QingtaoPan/AMVLM.

FreqU-FNet: Frequency-Aware U-Net for Imbalanced Medical Image Segmentation

Ruiqi Xing

arxiv logopreprintMay 23 2025
Medical image segmentation faces persistent challenges due to severe class imbalance and the frequency-specific distribution of anatomical structures. Most conventional CNN-based methods operate in the spatial domain and struggle to capture minority class signals, often affected by frequency aliasing and limited spectral selectivity. Transformer-based models, while powerful in modeling global dependencies, tend to overlook critical local details necessary for fine-grained segmentation. To overcome these limitations, we propose FreqU-FNet, a novel U-shaped segmentation architecture operating in the frequency domain. Our framework incorporates a Frequency Encoder that leverages Low-Pass Frequency Convolution and Daubechies wavelet-based downsampling to extract multi-scale spectral features. To reconstruct fine spatial details, we introduce a Spatial Learnable Decoder (SLD) equipped with an adaptive multi-branch upsampling strategy. Furthermore, we design a frequency-aware loss (FAL) function to enhance minority class learning. Extensive experiments on multiple medical segmentation benchmarks demonstrate that FreqU-FNet consistently outperforms both CNN and Transformer baselines, particularly in handling under-represented classes, by effectively exploiting discriminative frequency bands.

PDS-UKAN: Subdivision hopping connected to the U-KAN network for medical image segmentation.

Deng L, Wang W, Chen S, Yang X, Huang S, Wang J

pubmed logopapersMay 23 2025
Accurate and efficient segmentation of medical images plays a vital role in clinical tasks, such as diagnostic procedures and planning treatments. Traditional U-shaped encoder-decoder architectures, built on convolutional and transformer-based networks, have shown strong performance in medical image processing. However, the simple skip connections commonly used in these networks face limitations, such as insufficient nonlinear modeling capacity, weak global multiscale context modeling, and limited interpretability. To address these challenges, this study proposes the PDS-UKAN network, an innovative subdivision-based U-KAN architecture, designed to improve segmentation accuracy. The PDS-UKAN incorporates a PKAN module-comprising partial convolutions and Kolmogorov - Arnold network layers-into the encoder bottleneck, enhancing the network's nonlinear modeling and interpretability. Additionally, the proposed Dual-Branch Convolutional Boundary Enhancement Module (DBE) focuses on pixel-level boundary refinement, improving edge detail preservation in shallow skip connections. Meanwhile, the Skip Connection Channel Spatial Attention Module (SCCSA) mechanism is applied in the deeper skip connections to strengthen cross-dimensional interactions between channels and spatial features, mitigating the loss of spatial information due to downsampling. Extensive experiments across multiple medical imaging datasets demonstrate that PDS-UKAN consistently achieves superior performance compared to state-of-the-art (SOTA) methods.
Page 16 of 31302 results
Show
per page
Get Started

Upload your X-ray image and get interpretation.

Upload now →

Disclaimer: X-ray Interpreter's AI-generated results are for informational purposes only and not a substitute for professional medical advice. Always consult a healthcare professional for medical diagnosis and treatment.