Sort by:
Page 21 of 50497 results

Self-supervised feature learning for cardiac Cine MR image reconstruction

Siying Xu, Marcel Früh, Kerstin Hammernik, Andreas Lingg, Jens Kübler, Patrick Krumm, Daniel Rueckert, Sergios Gatidis, Thomas Küstner

arxiv logopreprintMay 29 2025
We propose a self-supervised feature learning assisted reconstruction (SSFL-Recon) framework for MRI reconstruction to address the limitation of existing supervised learning methods. Although recent deep learning-based methods have shown promising performance in MRI reconstruction, most require fully-sampled images for supervised learning, which is challenging in practice considering long acquisition times under respiratory or organ motion. Moreover, nearly all fully-sampled datasets are obtained from conventional reconstruction of mildly accelerated datasets, thus potentially biasing the achievable performance. The numerous undersampled datasets with different accelerations in clinical practice, hence, remain underutilized. To address these issues, we first train a self-supervised feature extractor on undersampled images to learn sampling-insensitive features. The pre-learned features are subsequently embedded in the self-supervised reconstruction network to assist in removing artifacts. Experiments were conducted retrospectively on an in-house 2D cardiac Cine dataset, including 91 cardiovascular patients and 38 healthy subjects. The results demonstrate that the proposed SSFL-Recon framework outperforms existing self-supervised MRI reconstruction methods and even exhibits comparable or better performance to supervised learning up to $16\times$ retrospective undersampling. The feature learning strategy can effectively extract global representations, which have proven beneficial in removing artifacts and increasing generalization ability during reconstruction.

Deep Modeling and Optimization of Medical Image Classification

Yihang Wu, Muhammad Owais, Reem Kateb, Ahmad Chaddad

arxiv logopreprintMay 29 2025
Deep models, such as convolutional neural networks (CNNs) and vision transformer (ViT), demonstrate remarkable performance in image classification. However, those deep models require large data to fine-tune, which is impractical in the medical domain due to the data privacy issue. Furthermore, despite the feasible performance of contrastive language image pre-training (CLIP) in the natural domain, the potential of CLIP has not been fully investigated in the medical field. To face these challenges, we considered three scenarios: 1) we introduce a novel CLIP variant using four CNNs and eight ViTs as image encoders for the classification of brain cancer and skin cancer, 2) we combine 12 deep models with two federated learning techniques to protect data privacy, and 3) we involve traditional machine learning (ML) methods to improve the generalization ability of those deep models in unseen domain data. The experimental results indicate that maxvit shows the highest averaged (AVG) test metrics (AVG = 87.03\%) in HAM10000 dataset with multimodal learning, while convnext\_l demonstrates remarkable test with an F1-score of 83.98\% compared to swin\_b with 81.33\% in FL model. Furthermore, the use of support vector machine (SVM) can improve the overall test metrics with AVG of $\sim 2\%$ for swin transformer series in ISIC2018. Our codes are available at https://github.com/AIPMLab/SkinCancerSimulation.

Can Large Language Models Challenge CNNS in Medical Image Analysis?

Shibbir Ahmed, Shahnewaz Karim Sakib, Anindya Bijoy Das

arxiv logopreprintMay 29 2025
This study presents a multimodal AI framework designed for precisely classifying medical diagnostic images. Utilizing publicly available datasets, the proposed system compares the strengths of convolutional neural networks (CNNs) and different large language models (LLMs). This in-depth comparative analysis highlights key differences in diagnostic performance, execution efficiency, and environmental impacts. Model evaluation was based on accuracy, F1-score, average execution time, average energy consumption, and estimated $CO_2$ emission. The findings indicate that although CNN-based models can outperform various multimodal techniques that incorporate both images and contextual information, applying additional filtering on top of LLMs can lead to substantial performance gains. These findings highlight the transformative potential of multimodal AI systems to enhance the reliability, efficiency, and scalability of medical diagnostics in clinical settings.

A combined attention mechanism for brain tumor segmentation of lower-grade glioma in magnetic resonance images.

Hedibi H, Beladgham M, Bouida A

pubmed logopapersMay 29 2025
Low-grade gliomas (LGGs) are among the most problematic brain tumors to reliably segment in FLAIR MRI, and effective delineation of these lesions is critical for clinical diagnosis, treatment planning, and patient monitoring. Nevertheless, conventional U-Net-based approaches usually suffer from the loss of critical structural details owing to repetitive down-sampling, while the encoder features often retain irrelevant information that is not properly utilized by the decoder. To solve these challenges, this paper offers a dual-attention U-shaped design, named ECASE-Unet, which seamlessly integrates Efficient Channel Attention (ECA) and Squeeze-and-Excitation (SE) blocks in both the encoder and decoder stages. By selectively recalibrating channel-wise information, the model increases diagnostically significant regions of interest and reduces noise. Furthermore, dilated convolutions are introduced at the bottleneck layer to capture multi-scale contextual cues without inflating computational complexity, and dropout regularization is systematically applied to prevent overfitting on heterogeneous data. Experimental results on the Kaggle Low-Grade-Glioma dataset suggest that ECASE-Unet greatly outperforms previous segmentation algorithms, reaching a Dice coefficient of 0.9197 and an Intersection over Union (IoU) of 0.8521. Comprehensive ablation studies further reveal that integrating ECA and SE modules delivers complementing benefits, supporting the model's robust efficacy in precisely identifying LGG boundaries. These findings underline the potential of ECASE-Unet to expedite clinical operations and improve patient outcomes. Future work will focus on improving the model's applicability to new MRI modalities and studying the integration of clinical characteristics for a more comprehensive characterization of brain tumors.

Exploring best-performing radiomic features with combined multilevel discrete wavelet decompositions for multiclass COVID-19 classification using chest X-ray images.

Özcan H

pubmed logopapersMay 29 2025
Discrete wavelet transforms have been applied in many machine learning models for the analysis of COVID-19; however, little is known about the impact of combined multilevel wavelet decompositions for the disease identification. This study proposes a computer-aided diagnosis system for addressing the combined multilevel effects of multiscale radiomic features on multiclass COVID-19 classification using chest X-ray images. A two-level discrete wavelet transform was applied to an optimal region of interest to obtain multiscale decompositions. Both approximation and detail coefficients were extensively investigated in varying frequency bands through 1240 experimental models. High dimensionality in the feature space was managed using a proposed filter- and wrapper-based feature selection approach. A comprehensive comparison was conducted between the bands and features to explore best-performing ensemble algorithm models. The results indicated that incorporating multilevel decompositions could lead to improved model performance. An inclusive region of interest, encompassing both lungs and the mediastinal regions, was identified to enhance feature representation. The light gradient-boosting machine, applied on combined bands with the features of basic, gray-level, Gabor, histogram of oriented gradients and local binary patterns, achieved the highest weighted precision, sensitivity, specificity, and accuracy of 97.50 %, 97.50 %, 98.75 %, and 97.50 %, respectively. The COVID-19-versus-the-rest receiver operating characteristic area under the curve was 0.9979. These results underscore the potential of combining decomposition levels with the original signals and employing an inclusive region of interest for effective COVID-19 detection, while the feature selection and training processes remain efficient within a practical computational time.

Gaussian random fields as an abstract representation of patient metadata for multimodal medical image segmentation.

Cassidy B, McBride C, Kendrick C, Reeves ND, Pappachan JM, Raad S, Yap MH

pubmed logopapersMay 29 2025
Growing rates of chronic wound occurrence, especially in patients with diabetes, has become a recent concerning trend. Chronic wounds are difficult and costly to treat, and have become a serious burden on health care systems worldwide. Innovative deep learning methods for the detection and monitoring of such wounds have the potential to reduce the impact to patients and clinicians. We present a novel multimodal segmentation method which allows for the introduction of patient metadata into the training workflow whereby the patient data are expressed as Gaussian random fields. Our results indicate that the proposed method improved performance when utilising multiple models, each trained on different metadata categories. Using the Diabetic Foot Ulcer Challenge 2022 test set, when compared to the baseline results (intersection over union = 0.4670, Dice similarity coefficient = 0.5908) we demonstrate improvements of +0.0220 and +0.0229 for intersection over union and Dice similarity coefficient respectively. This paper presents the first study to focus on integrating patient data into a chronic wound segmentation workflow. Our results show significant performance gains when training individual models using specific metadata categories, followed by average merging of prediction masks using distance transforms. All source code for this study is available at: https://github.com/mmu-dermatology-research/multimodal-grf.

Automated classification of midpalatal suture maturation stages from CBCTs using an end-to-end deep learning framework.

Milani OH, Mills L, Nikho A, Tliba M, Allareddy V, Ansari R, Cetin AE, Elnagar MH

pubmed logopapersMay 29 2025
Accurate classification of midpalatal suture maturation stages is critical for orthodontic diagnosis, treatment planning, and the assessment of maxillary growth. Cone Beam Computed Tomography (CBCT) imaging offers detailed insights into this craniofacial structure but poses unique challenges for deep learning image recognition model design due to its high dimensionality, noise artifacts, and variability in image quality. To address these challenges, we propose a novel technique that highlights key image features through a simple filtering process to improve image clarity prior to analysis, thereby enhancing the learning process and better aligning with the distribution of the input data domain. Our preprocessing steps include region-of-interest extraction, followed by high-pass and Sobel filtering for emphasis of low-level features. The feature extraction integrates Convolutional Neural Networks (CNN) architectures, such as EfficientNet and ResNet18, alongside our novel Multi-Filter Convolutional Residual Attention Network (MFCRAN) enhanced with Discrete Cosine Transform (DCT) layers. Moreover, to better capture the inherent order within the data classes, we augment the supervised training process with a ranking loss by attending to the relationship within the label domain. Furthermore, to adhere to diagnostic constraints while training the model, we introduce a tailored data augmentation strategy to improve classification accuracy and robustness. In order to validate our method, we employed a k-fold cross-validation protocol on a private dataset comprising 618 CBCT images, annotated into five stages (A, B, C, D, and E) by expert evaluators. The experimental results demonstrate the effectiveness of our proposed approach, achieving the highest classification accuracy of 79.02%, significantly outperforming competing architectures, which achieved accuracies ranging from 71.87 to 78.05%. This work introduces a novel and fully automated framework for midpalatal suture maturation classification, marking a substantial advancement in orthodontic diagnostics and treatment planning.

CT-denoimer: efficient contextual transformer network for low-dose CT denoising.

Zhang Y, Xu F, Zhang R, Guo Y, Wang H, Wei B, Ma F, Meng J, Liu J, Lu H, Chen Y

pubmed logopapersMay 29 2025
Low-dose computed tomography (LDCT) effectively reduces radiation exposure to patients, but introduces severe noise artifacts that affect diagnostic accuracy. Recently, Transformer-based network architectures have been widely applied to LDCT image denoising, generally achieving superior results compared to traditional convolutional methods. However, these methods are often hindered by high computational costs and struggles in capturing complex local contextual features, which negatively impact denoising performance. In this work, we propose CT-Denoimer, an efficient CT Denoising Transformer network that captures both global correlations and intricate, spatially varying local contextual details in CT images, enabling the generation of high-quality images. The core of our framework is a Transformer module that consists of two key components: the Multi-Dconv head Transposed Attention (MDTA) and the Mixed Contextual Feed-forward Network (MCFN). The MDTA block captures global correlations in the image with linear computational complexity, while the MCFN block manages multi-scale local contextual information, both static and dynamic, through a series of Enhanced Contextual Transformer (eCoT) modules. In addition, we incorporate Operation-Wise Attention Layers (OWALs) to enable collaborative refinement in the proposed CT-Denoimer, enhancing its ability to more effectively handle complex and varying noise patterns in LDCT images. Extensive experimental validation on both the AAPM-Mayo public dataset and a real-world clinical dataset demonstrated the state-of-the-art performance of the proposed CT-Denoimer. It achieved a peak signal-to-noise ratio (PSNR) of 33.681 dB, a structural similarity index measure (SSIM) of 0.921, an information fidelity criterion (IFC) of 2.857 and a visual information fidelity (VIF) of 0.349. Subjective assessment by radiologists gave an average score of 4.39, confirming its clinical applicability and clear advantages over existing methods. This study presents an innovative CT denoising Transformer network that sets a new benchmark in LDCT image denoising, excelling in both noise reduction and fine structure preservation.

ADC-MambaNet: A Lightweight U-Shaped Architecture with Mamba and Multi-Dimensional Priority Attention for Medical Image Segmentation.

Nguyen TN, Ho QH, Nguyen VQ, Pham VT, Tran TT

pubmed logopapersMay 29 2025
Medical image segmentation is becoming a growing crucial step in assisting with disease detection and diagnosis. However, medical images often exhibit complex structures and textures, resulting in the need for highly complex methods. Particularly, when Deep Learning methods are utilized, they often require large-scale pretraining, leading to significant memory demands and increased computational costs. The well-known Convolutional Neural Networks (CNNs) have become the backbone of medical image segmentation tasks thanks to their effective feature extraction abilities. However, they often struggle to capture global context due to the limited sizes of their kernels. To address this, various Transformer-based models have been introduced to learn long-range dependencies through self-attention mechanisms. However, these architectures typically incur relatively high computational complexity.
Methods: To address the aforementioned challenges, we propose a lightweight and computationally efficient model named ADC-MambaNet, which combines the conventional Depthwise Convolutional layers with the Mamba algorithm that can address the computational complexity of Transformers. In the proposed model, a new feature extractor named Harmonious Mamba-Convolution (HMC) block, and the Multi-Dimensional Priority Attention (MDPA) block have been designed. These blocks enhance the feature extraction process, thereby improving the overall performance of the model. In particular, the mechanisms enable the model to effectively capture local and global patterns from the feature maps while keeping the computational costs low. A novel loss function called the Balanced Normalized Cross Entropy is also introduced, bringing promising performance compared to other losses. Evaluations on five public medical image datasets: ISIC 2018 Lesion Segmentation, PH2, Data Science Bowl 2018, GlaS, and Lung X-ray demonstrate that ADC-MambaNet achieves higher evaluation scores while maintaining compact parameters and low computational complexity.
Conclusion: ADC-MambaNet offers a promising solution for accurate and efficient medical image segmentation, especially in resource-limited or edge-computing environments. Implementation code will be publicly accessible at: https://github.com/nqnguyen812/mambaseg-model.

RNN-AHF Framework: Enhancing Multi-focal Nature of Hypoxic Ischemic Encephalopathy Lesion Region in MRI Image Using Optimized Rough Neural Network Weight and Anti-Homomorphic Filter.

Thangeswari M, Muthucumaraswamy R, Anitha K, Shanker NR

pubmed logopapersMay 29 2025
Image enhancement of the Hypoxic-Ischemic Encephalopathy (HIE) lesion region in neonatal brain MR images is a challenging task due to the diffuse (i.e., multi-focal) nature, small size, and low contrast of the lesions. Classifying the stages of HIE is also difficult because of the unclear boundaries and edges of the lesions, which are dispersedthroughout the brain. Moreover, unclear boundaries and edges are due to chemical shifts, partial volume artifacts, and motion artifacts. Further, voxels may reflect signals from adjacent tissues. Existing algorithms perform poorly in HIE lesion enhancement due to artifacts, voxels, and the diffuse nature of the lesion. In this paper, we propose a Rough Neural Network and Anti-Homomorphic Filter (RNN-AHF) framework for the enhancement of the HIE lesion region. The RNN-AHF framework reduces the pixel dimensionality of the feature space, eliminates unnecessary pixels, and preserves essential pixels for lesion enhancement. The RNN efficiently learns and identifies pixel patterns and facilitates adaptive enhancement based on different weights in the neural network. The proposed RNN-AHF framework operates using optimized neural weights and an optimized training function. The hybridization of optimized weights and the training function enhances the lesion region with high contrast while preserving the boundaries and edges. The proposed RNN-AHF framework achieves a lesion image enhancement and classification accuracy of approximately 93.5%, which is better than traditional algorithms.
Page 21 of 50497 results
Show
per page
Get Started

Upload your X-ray image and get interpretation.

Upload now →

Disclaimer: X-ray Interpreter's AI-generated results are for informational purposes only and not a substitute for professional medical advice. Always consult a healthcare professional for medical diagnosis and treatment.