Sort by:
Page 127 of 1401396 results

Deep Learning for Breast Cancer Detection: Comparative Analysis of ConvNeXT and EfficientNet

Mahmudul Hasan

arxiv logopreprintMay 24 2025
Breast cancer is the most commonly occurring cancer worldwide. This cancer caused 670,000 deaths globally in 2022, as reported by the WHO. Yet since health officials began routine mammography screening in age groups deemed at risk in the 1980s, breast cancer mortality has decreased by 40% in high-income nations. Every day, a greater and greater number of people are receiving a breast cancer diagnosis. Reducing cancer-related deaths requires early detection and treatment. This paper compares two convolutional neural networks called ConvNeXT and EfficientNet to predict the likelihood of cancer in mammograms from screening exams. Preprocessing of the images, classification, and performance evaluation are main parts of the whole procedure. Several evaluation metrics were used to compare and evaluate the performance of the models. The result shows that ConvNeXT generates better results with a 94.33% AUC score, 93.36% accuracy, and 95.13% F-score compared to EfficientNet with a 92.34% AUC score, 91.47% accuracy, and 93.06% F-score on RSNA screening mammography breast cancer dataset.

TK-Mamba: Marrying KAN with Mamba for Text-Driven 3D Medical Image Segmentation

Haoyu Yang, Yuxiang Cai, Jintao Chen, Xuhong Zhang, Wenhui Lei, Xiaoming Shi, Jianwei Yin, Yankai Jiang

arxiv logopreprintMay 24 2025
3D medical image segmentation is vital for clinical diagnosis and treatment but is challenged by high-dimensional data and complex spatial dependencies. Traditional single-modality networks, such as CNNs and Transformers, are often limited by computational inefficiency and constrained contextual modeling in 3D settings. We introduce a novel multimodal framework that leverages Mamba and Kolmogorov-Arnold Networks (KAN) as an efficient backbone for long-sequence modeling. Our approach features three key innovations: First, an EGSC (Enhanced Gated Spatial Convolution) module captures spatial information when unfolding 3D images into 1D sequences. Second, we extend Group-Rational KAN (GR-KAN), a Kolmogorov-Arnold Networks variant with rational basis functions, into 3D-Group-Rational KAN (3D-GR-KAN) for 3D medical imaging - its first application in this domain - enabling superior feature representation tailored to volumetric data. Third, a dual-branch text-driven strategy leverages CLIP's text embeddings: one branch swaps one-hot labels for semantic vectors to preserve inter-organ semantic relationships, while the other aligns images with detailed organ descriptions to enhance semantic alignment. Experiments on the Medical Segmentation Decathlon (MSD) and KiTS23 datasets show our method achieving state-of-the-art performance, surpassing existing approaches in accuracy and efficiency. This work highlights the power of combining advanced sequence modeling, extended network architectures, and vision-language synergy to push forward 3D medical image segmentation, delivering a scalable solution for clinical use. The source code is openly available at https://github.com/yhy-whu/TK-Mamba.

Joint Reconstruction of Activity and Attenuation in PET by Diffusion Posterior Sampling in Wavelet Coefficient Space

Clémentine Phung-Ngoc, Alexandre Bousse, Antoine De Paepe, Hong-Phuong Dang, Olivier Saut, Dimitris Visvikis

arxiv logopreprintMay 24 2025
Attenuation correction (AC) is necessary for accurate activity quantification in positron emission tomography (PET). Conventional reconstruction methods typically rely on attenuation maps derived from a co-registered computed tomography (CT) or magnetic resonance imaging scan. However, this additional scan may complicate the imaging workflow, introduce misalignment artifacts and increase radiation exposure. In this paper, we propose a joint reconstruction of activity and attenuation (JRAA) approach that eliminates the need for auxiliary anatomical imaging by relying solely on emission data. This framework combines wavelet diffusion model (WDM) and diffusion posterior sampling (DPS) to reconstruct fully three-dimensional (3-D) data. Experimental results show our method outperforms maximum likelihood activity and attenuation (MLAA) and MLAA with UNet-based post processing, and yields high-quality noise-free reconstructions across various count settings when time-of-flight (TOF) information is available. It is also able to reconstruct non-TOF data, although the reconstruction quality significantly degrades in low-count (LC) conditions, limiting its practical effectiveness in such settings. This approach represents a step towards stand-alone PET imaging by reducing the dependence on anatomical modalities while maintaining quantification accuracy, even in low-count scenarios when TOF information is available.

MATI: A GPU-accelerated toolbox for microstructural diffusion MRI simulation and data fitting with a graphical user interface.

Xu J, Devan SP, Shi D, Pamulaparthi A, Yan N, Zu Z, Smith DS, Harkins KD, Gore JC, Jiang X

pubmed logopapersMay 24 2025
To introduce MATI (Microstructural Analysis Toolbox for Imaging), a versatile MATLAB-based toolbox that combines both simulation and data fitting capabilities for microstructural dMRI research. MATI provides a user-friendly, graphical user interface that enables researchers, including those without much programming experience, to perform advanced simulations and data analyses for microstructural MRI research. For simulation, MATI supports arbitrary microstructural tissues and pulse sequences. For data fitting, MATI supports a range of fitting methods, including traditional non-linear least squares, Bayesian approaches, machine learning, and dictionary matching methods, allowing users to tailor analyses based on specific research needs. Optimized with vectorized matrix operations and high-performance numerical libraries, MATI achieves high computational efficiency, enabling rapid simulations and data fitting on CPU and GPU hardware. While designed for microstructural dMRI, MATI's generalized framework can be extended to other imaging methods, making it a flexible and scalable tool for quantitative MRI research. MATI offers a significant step toward translating advanced microstructural MRI techniques into clinical applications.

Cross-Fusion Adaptive Feature Enhancement Transformer: Efficient high-frequency integration and sparse attention enhancement for brain MRI super-resolution.

Yang Z, Xiao H, Wang X, Zhou F, Deng T, Liu S

pubmed logopapersMay 24 2025
High-resolution magnetic resonance imaging (MRI) is essential for diagnosing and treating brain diseases. Transformer-based approaches demonstrate strong potential in MRI super-resolution by capturing long-range dependencies effectively. However, existing Transformer-based super-resolution methods face several challenges: (1) they primarily focus on low-frequency information, neglecting the utilization of high-frequency information; (2) they lack effective mechanisms to integrate both low-frequency and high-frequency information; (3) they struggle to effectively eliminate redundant information during the reconstruction process. To address these issues, we propose the Cross-fusion Adaptive Feature Enhancement Transformer (CAFET). Our model maximizes the potential of both CNNs and Transformers. It consists of four key blocks: a high-frequency enhancement block for extracting high-frequency information; a hybrid attention block for capturing global information and local fitting, which includes channel attention and shifted rectangular window attention; a large-window fusion attention block for integrating local high-frequency features and global low-frequency features; and an adaptive sparse overlapping attention block for dynamically retaining key information and enhancing the aggregation of cross-window features. Extensive experiments validate the effectiveness of the proposed method. On the BraTS and IXI datasets, with an upsampling factor of ×2, the proposed method achieves a maximum PSNR improvement of 2.4 dB and 1.3 dB compared to state-of-the-art methods, along with an SSIM improvement of up to 0.16% and 1.42%. Similarly, at an upsampling factor of ×4, the proposed method achieves a maximum PSNR improvement of 1.04 dB and 0.3 dB over the current leading methods, along with an SSIM improvement of up to 0.25% and 1.66%. Our method is capable of reconstructing high-quality super-resolution brain MRI images, demonstrating significant clinical potential.

Generalizable AI approach for detecting projection type and left-right reversal in chest X-rays.

Ohta Y, Katayama Y, Ichida T, Utsunomiya A, Ishida T

pubmed logopapersMay 23 2025
The verification of chest X-ray images involves several checkpoints, including orientation and reversal. To address the challenges of manual verification, this study developed an artificial intelligence (AI)-based system using a deep convolutional neural network (DCNN) to automatically verify the consistency between the imaging direction and examination orders. The system classified the chest X-ray images into four categories: anteroposterior (AP), posteroanterior (PA), flipped AP, and flipped PA. To evaluate the impact of internal and external datasets on the classification accuracy, the DCNN was trained using multiple publicly available chest X-ray datasets and tested on both internal and external data. The results demonstrated that the DCNN accurately classified the imaging directions and detected image reversal. However, the classification accuracy was strongly influenced by the training dataset. When trained exclusively on NIH data, the network achieved an accuracy of 98.9% on the same dataset; however, this reduced to 87.8% when evaluated with PADChest data. When trained on a mixed dataset, the accuracy improved to 96.4%; however, it decreased to 76.0% when tested on an external COVID-CXNet dataset. Further, using Grad-CAM, we visualized the decision-making process of the network, highlighting the areas of influence, such as the cardiac silhouette and arm positioning, depending on the imaging direction. Thus, this study demonstrated the potential of AI in assisting in automating the verification of imaging direction and positioning in chest X-rays. However, the network must be fine-tuned to local data characteristics to achieve optimal performance.

Construction of a Prediction Model for Adverse Perinatal Outcomes in Foetal Growth Restriction Based on a Machine Learning Algorithm: A Retrospective Study.

Meng X, Wang L, Wu M, Zhang N, Li X, Wu Q

pubmed logopapersMay 23 2025
To create and validate a machine learning (ML)-based model for predicting the adverse perinatal outcome (APO) in foetal growth restriction (FGR) at diagnosis. A retrospective study. Multi-centre in China. Pregnancies affected by FGR. We enrolled singleton foetuses with a perinatal diagnosis of FGR who were admitted between January 2021 and November 2023. A total of 361 pregnancies from Beijing Obstetrics and Gynecology Hospital were used as the training set and the internal test set. In comparison, data from 50 pregnancies from Haidian Maternal and Child Health Hospital were used as the external test set. Feature screening was performed using the random forest (RF), the Least Absolute Shrinkage and Selection Operator (LASSO) and logistic regression (LR). Subsequently, six ML methods, including Stacking, were used to construct models to predict the APO of FGR. Model's performance was evaluated through indicators such as the area under the receiver operating characteristic curve (AUROC). The Shapley Additive Explanation analysis was used to rank each model feature and explain the final model. Mean ± SD gestational age at diagnosis was 32.3 ± 4.8 weeks in the absent APO group and 27.3 ± 3.7 in the present APO group. Women enrolled in the present APO group had a higher rate of hypertension related to pregnancy (74.8% vs. 18.8%, p < 0.001). Among 17 candidate predictors (including maternal characteristics, maternal comorbidities, obstetric characteristics and ultrasound parameters), the integration of RF, LASSO and LR methodologies identified maternal body mass index, hypertension, gestational age at diagnosis of FGR, estimated foetal weight (EFW) z score, EFW growth velocity and abnormal umbilical artery Doppler (defined as a pulsatility index above the 95th percentile or instances of absent/reversed diastolic flow) as significant predictors. The Stacking model demonstrated a good performance in both the internal test set [AUROC: 0.861, 95% confidence interval (CI), 0.838-0.896] and the external test set [AUROC: 0.906, 95% CI, 0.875-0.947]. The calibration curve showed high agreement between the predicted and observed risks. The Hosmer-Lemeshow test for the internal and external test sets was p = 0.387 and p = 0.825, respectively. The ML algorithm for APO, which integrates maternal clinical factors and ultrasound parameters, demonstrates good predictive value for APO in FGR at diagnosis. This suggested that ML techniques may be a valid approach for the early detection of high-risk APO in FGR pregnancies.

AMVLM: Alignment-Multiplicity Aware Vision-Language Model for Semi-Supervised Medical Image Segmentation.

Pan Q, Li Z, Qiao W, Lou J, Yang Q, Yang G, Ji B

pubmed logopapersMay 23 2025
Low-quality pseudo labels pose a significant obstacle in semi-supervised medical image segmentation (SSMIS), impeding consistency learning on unlabeled data. Leveraging vision-language model (VLM) holds promise in ameliorating pseudo label quality by employing textual prompts to delineate segmentation regions, but it faces the challenge of cross-modal alignment uncertainty due to multiple correspondences (multiple images/texts tend to correspond to one text/image). Existing VLMs address this challenge by modeling semantics as distributions but such distributions lead to semantic degradation. To address these problems, we propose Alignment-Multiplicity Aware Vision-Language Model (AMVLM), a new VLM pre-training paradigm with two novel similarity metric strategies. (i) Cross-modal Similarity Supervision (CSS) proposes a probability distribution transformer to supervise similarity scores across fine-granularity semantics through measuring cross-modal distribution disparities, thus learning cross-modal multiple alignments. (ii) Intra-modal Contrastive Learning (ICL) takes into account the similarity metric of coarse-fine granularity information within each modality to encourage cross-modal semantic consistency. Furthermore, using the pretrained AMVLM, we propose a pioneering text-guided SSMIS network to compensate for the quality deficiencies of pseudo-labels. This network incorporates a text mask generator to produce multimodal supervision information, enhancing pseudo label quality and the model's consistency learning. Extensive experimentation validates the efficacy of our AMVLM-driven SSMIS, showcasing superior performance across four publicly available datasets. The code will be available at: https://github.com/QingtaoPan/AMVLM.

Improvement of deep learning-based dose conversion accuracy to a Monte Carlo algorithm in proton beam therapy for head and neck cancers.

Kato R, Kadoya N, Kato T, Tozuka R, Ogawa S, Murakami M, Jingu K

pubmed logopapersMay 23 2025
This study is aimed to clarify the effectiveness of the image-rotation technique and zooming augmentation to improve the accuracy of the deep learning (DL)-based dose conversion from pencil beam (PB) to Monte Carlo (MC) in proton beam therapy (PBT). We adapted 85 patients with head and neck cancers. The patient dataset was randomly divided into 101 plans (334 beams) for training/validation and 11 plans (34 beams) for testing. Further, we trained a DL model that inputs a computed tomography (CT) image and the PB dose in a single-proton field and outputs the MC dose, applying the image-rotation technique and zooming augmentation. We evaluated the DL-based dose conversion accuracy in a single-proton field. The average γ-passing rates (a criterion of 3%/3 mm) were 80.6 ± 6.6% for the PB dose, 87.6 ± 6.0% for the baseline model, 92.1 ± 4.7% for the image-rotation model, and 93.0 ± 5.2% for the data-augmentation model, respectively. Moreover, the average range differences for R90 were - 1.5 ± 3.6% in the PB dose, 0.2 ± 2.3% in the baseline model, -0.5 ± 1.2% in the image-rotation model, and - 0.5 ± 1.1% in the data-augmentation model, respectively. The doses as well as ranges were improved by the image-rotation technique and zooming augmentation. The image-rotation technique and zooming augmentation greatly improved the DL-based dose conversion accuracy from the PB to the MC. These techniques can be powerful tools for improving the DL-based dose calculation accuracy in PBT.

A deep learning model integrating domain-specific features for enhanced glaucoma diagnosis.

Xu J, Jing E, Chai Y

pubmed logopapersMay 23 2025
Glaucoma is a group of serious eye diseases that can cause incurable blindness. Despite the critical need for early detection, over 60% of cases remain undiagnosed, especially in less developed regions. Glaucoma diagnosis is a costly task and some models have been proposed to automate diagnosis based on images of the retina, specifically the area known as the optic cup and the associated disc where retinal blood vessels and nerves enter and leave the eye. However, diagnosis is complicated because both normal and glaucoma-affected eyes can vary greatly in appearance. Some normal cases, like glaucoma, exhibit a larger cup-to-disc ratio, one of the main diagnostic criteria, making it challenging to distinguish between them. We propose a deep learning model with domain features (DLMDF) to combine unstructured and structured features to distinguish between glaucoma and physiologic large cups. The structured features were based upon the known cup-to-disc ratios of the four quadrants of the optic discs in normal, physiologic large cups, and glaucomatous optic cups. We segmented each cup and disc using a fully convolutional neural network and then calculated the cup size, disc size, and cup-to-disc ratio of each quadrant. The unstructured features were learned from a deep convolutional neural network. The average precision (AP) for disc segmentation was 98.52%, and for cup segmentation it was also 98.57%. Thus, the relatively high AP values enabled us to calculate the 15 reliable features from each segmented disc and cup. In classification tasks, the DLMDF outperformed other models, achieving superior accuracy, precision, and recall. These results validate the effectiveness of combining deep learning-derived features with domain-specific structured features, underscoring the potential of this approach to advance glaucoma diagnosis.
Page 127 of 1401396 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.