Sort by:
Page 36 of 59587 results

Classification based deep learning models for lung cancer and disease using medical images

Ahmad Chaddad, Jihao Peng, Yihang Wu

arxiv logopreprintJul 2 2025
The use of deep learning (DL) in medical image analysis has significantly improved the ability to predict lung cancer. In this study, we introduce a novel deep convolutional neural network (CNN) model, named ResNet+, which is based on the established ResNet framework. This model is specifically designed to improve the prediction of lung cancer and diseases using the images. To address the challenge of missing feature information that occurs during the downsampling process in CNNs, we integrate the ResNet-D module, a variant designed to enhance feature extraction capabilities by modifying the downsampling layers, into the traditional ResNet model. Furthermore, a convolutional attention module was incorporated into the bottleneck layers to enhance model generalization by allowing the network to focus on relevant regions of the input images. We evaluated the proposed model using five public datasets, comprising lung cancer (LC2500 $n$=3183, IQ-OTH/NCCD $n$=1336, and LCC $n$=25000 images) and lung disease (ChestXray $n$=5856, and COVIDx-CT $n$=425024 images). To address class imbalance, we used data augmentation techniques to artificially increase the representation of underrepresented classes in the training dataset. The experimental results show that ResNet+ model demonstrated remarkable accuracy/F1, reaching 98.14/98.14\% on the LC25000 dataset and 99.25/99.13\% on the IQ-OTH/NCCD dataset. Furthermore, the ResNet+ model saved computational cost compared to the original ResNet series in predicting lung cancer images. The proposed model outperformed the baseline models on publicly available datasets, achieving better performance metrics. Our codes are publicly available at https://github.com/AIPMLab/Graduation-2024/tree/main/Peng.

Calibrated Self-supervised Vision Transformers Improve Intracranial Arterial Calcification Segmentation from Clinical CT Head Scans

Benjamin Jin, Grant Mair, Joanna M. Wardlaw, Maria del C. Valdés Hernández

arxiv logopreprintJul 2 2025
Vision Transformers (ViTs) have gained significant popularity in the natural image domain but have been less successful in 3D medical image segmentation. Nevertheless, 3D ViTs are particularly interesting for large medical imaging volumes due to their efficient self-supervised training within the masked autoencoder (MAE) framework, which enables the use of imaging data without the need for expensive manual annotations. intracranial arterial calcification (IAC) is an imaging biomarker visible on routinely acquired CT scans linked to neurovascular diseases such as stroke and dementia, and automated IAC quantification could enable their large-scale risk assessment. We pre-train ViTs with MAE and fine-tune them for IAC segmentation for the first time. To develop our models, we use highly heterogeneous data from a large clinical trial, the third International Stroke Trial (IST-3). We evaluate key aspects of MAE pre-trained ViTs in IAC segmentation, and analyse the clinical implications. We show: 1) our calibrated self-supervised ViT beats a strong supervised nnU-Net baseline by 3.2 Dice points, 2) low patch sizes are crucial for ViTs for IAC segmentation and interpolation upsampling with regular convolutions is preferable to transposed convolutions for ViT-based models, and 3) our ViTs increase robustness to higher slice thicknesses and improve risk group classification in a clinical scenario by 46%. Our code is available online.

A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs

Niccolò McConnell, Pardeep Vasudev, Daisuke Yamada, Daryl Cheng, Mehran Azimbagirad, John McCabe, Shahab Aslani, Ahmed H. Shahin, Yukun Zhou, The SUMMIT Consortium, Andre Altmann, Yipeng Hu, Paul Taylor, Sam M. Janes, Daniel C. Alexander, Joseph Jacob

arxiv logopreprintJul 2 2025
Low-dose computed tomography (LDCT) imaging employed in lung cancer screening (LCS) programs is increasing in uptake worldwide. LCS programs herald a generational opportunity to simultaneously detect cancer and non-cancer-related early-stage lung disease. Yet these efforts are hampered by a shortage of radiologists to interpret scans at scale. Here, we present TANGERINE, a computationally frugal, open-source vision foundation model for volumetric LDCT analysis. Designed for broad accessibility and rapid adaptation, TANGERINE can be fine-tuned off the shelf for a wide range of disease-specific tasks with limited computational resources and training data. Relative to models trained from scratch, TANGERINE demonstrates fast convergence during fine-tuning, thereby requiring significantly fewer GPU hours, and displays strong label efficiency, achieving comparable or superior performance with a fraction of fine-tuning data. Pretrained using self-supervised learning on over 98,000 thoracic LDCTs, including the UK's largest LCS initiative to date and 27 public datasets, TANGERINE achieves state-of-the-art performance across 14 disease classification tasks, including lung cancer and multiple respiratory diseases, while generalising robustly across diverse clinical centres. By extending a masked autoencoder framework to 3D imaging, TANGERINE offers a scalable solution for LDCT analysis, departing from recent closed, resource-intensive models by combining architectural simplicity, public availability, and modest computational requirements. Its accessible, open-source lightweight design lays the foundation for rapid integration into next-generation medical imaging tools that could transform LCS initiatives, allowing them to pivot from a singular focus on lung cancer detection to comprehensive respiratory disease management in high-risk populations.

Enhanced security for medical images using a new 5D hyper chaotic map and deep learning based segmentation.

Subathra S, Thanikaiselvan V

pubmed logopapersJul 2 2025
Medical image encryption is important for maintaining the confidentiality of sensitive medical data and protecting patient privacy. Contemporary healthcare systems store significant patient data in text and graphic form. This research proposes a New 5D hyperchaotic system combined with a customised U-Net architecture. Chaotic maps have become an increasingly popular method for encryption because of their remarkable characteristics, including statistical randomness and sensitivity to initial conditions. The significant region is segmented from the medical images using the U-Net network, and its statistics are utilised as initial conditions to generate the new random sequence. Initially, zig-zag scrambling confuses the pixel position of a medical image and applies further permutation with a new 5D hyperchaotic sequence. Two stages of diffusion are used, such as dynamic DNA flip and dynamic DNA XOR, to enhance the encryption algorithm's security against various attacks. The randomness of the New 5D hyperchaotic system is verified using the NIST SP800-22 statistical test, calculating the Lyapunov exponent and plotting the attractor diagram of the chaotic sequence. The algorithm validates with statistical measures such as PSNR, MSE, NPCR, UACI, entropy, and Chi-square values. Evaluation is performed for test images yields average horizontal, vertical, and diagonal correlation coefficients of -0.0018, -0.0002, and 0.0007, respectively, Shannon entropy of 7.9971, Kolmogorov Entropy value of 2.9469, NPCR of 99.61%, UACI of 33.49%, Chi-square "PASS" at both the 5% (293.2478) and 1% (310.4574) significance levels, key space is 2<sup>500</sup> and an average encryption time of approximately 2.93 s per 256 × 256 image on a standard desktop CPU. The performance comparisons use various encryption methods and demonstrate that the proposed method ensures secure reliability against various challenges.

Multi-scheme cross-level attention embedded U-shape transformer for MRI semantic segmentation.

Wang Q, Xue Y

pubmed logopapersJul 2 2025
Accurate MRI image segmentation is crucial for disease diagnosis, but current Transformer-based methods face two key challenges: limited capability to capture detailed information, leading to blurred boundaries and false localization, and the lack of MRI-specific embedding paradigms for attention modules, which limits their potential and representation capability. To address these challenges, this paper proposes a multi-scheme cross-level attention embedded U-shape Transformer (MSCL-SwinUNet). This model integrates cross-level spatial-wise attention (SW-Attention) to transfer detailed information from encoder to decoder, cross-stage channel-wise attention (CW-Attention) to filter out redundant features and enhance task-related channels, and multi-stage scale-wise attention (ScaleW-Attention) to adaptively process multi-scale features. Extensive experiments on the ACDC, MM-WHS and Synapse datasets demonstrate that the proposed MSCL-SwinUNet surpasses state-of-the-art methods in accuracy and generalizability. Visualization further confirms the superiority of our model in preserving detailed boundaries. This work not only advances Transformer-based segmentation in medical imaging but also provides new insights into designing MRI-specific attention embedding paradigms.Our code is available at https://github.com/waylans/MSCL-SwinUNet .

A deep learning-based computed tomography reading system for the diagnosis of lung cancer associated with cystic airspaces.

Hu Z, Zhang X, Yang J, Zhang B, Chen H, Shen W, Li H, Zhou Y, Zhang J, Qiu K, Xie Z, Xu G, Tan J, Pang C

pubmed logopapersJul 2 2025
To propose a deep learning model and explore its performance in the auxiliary diagnosis of lung cancer associated with cystic airspaces (LCCA) in computed tomography (CT) images. This study is a retrospective analysis that incorporated a total of 342 CT series, comprising 272 series from patients diagnosed with LCCA and 70 series from patients with pulmonary bulla. A deep learning model named LungSSFNet, developed based on nnUnet, was utilized for image recognition and segmentation by experienced thoracic surgeons. The dataset was divided into a training set (245 series), a validation set (62 series), and a test set (35 series). The performance of LungSSFNet was compared with other models such as UNet, M2Snet, TANet, MADGNet, and nnUnet to evaluate its effectiveness in recognizing and segmenting LCCA and pulmonary bulla. LungSSFNet achieved an intersection over union of 81.05% and a Dice similarity coefficient of 75.15% for LCCA, and 93.03% and 92.04% for pulmonary bulla, respectively. These outcomes demonstrate that LungSSFNet outperformed many existing models in segmentation tasks. Additionally, it attained an accuracy of 96.77%, a precision of 100%, and a sensitivity of 96.15%. LungSSFNet, a new deep-learning model, substantially improved the diagnosis of early-stage LCCA and is potentially valuable for auxiliary clinical decision-making. Our LungSSFNet code is available at https://github.com/zx0412/LungSSFNet .

A novel few-shot learning framework for supervised diffeomorphic image registration network.

Chen K, Han H, Wei J, Zhang Y

pubmed logopapersJul 2 2025
Image registration is a key technique in image processing and analysis. Due to its high complexity, the traditional registration frameworks often fail to meet real-time demands in practice. To address the real-time demand, several deep learning networks for registration have been proposed, including the supervised and the unsupervised networks. Unsupervised networks rely on large amounts of training data to minimize specific loss functions, but the lack of physical information constraints results in the lower accuracy compared with the supervised networks. However, the supervised networks in medical image registration face two major challenges: physical mesh folding and the scarcity of labeled training data. To address these two challenges, we propose a novel few-shot learning framework for image registration. The framework contains two parts: random diffeomorphism generator (RDG) and a supervised few-shot learning network for image registration. By randomly generating a complex vector field, the RDG produces a series of diffeomorphism. With the help of diffeomorphism generated by RDG, one can use only a few image data (theoretically, one image data is enough) to generate a series of labels for training the supervised few-shot learning network. Concerning the elimination of the physical mesh folding phenomenon, in the proposed network, the loss function is only required to ensure the smoothness of deformation (no other control for mesh folding elimination is necessary). The experimental results indicate that the proposed method demonstrates superior performance in eliminating physical mesh folding when compared to other existing learning-based methods. Our code is available at this link https://github.com/weijunping111/RDG-TMI.git.

3D MedDiffusion: A 3D Medical Latent Diffusion Model for Controllable and High-quality Medical Image Generation.

Wang H, Liu Z, Sun K, Wang X, Shen D, Cui Z

pubmed logopapersJul 2 2025
The generation of medical images presents significant challenges due to their high-resolution and three-dimensional nature. Existing methods often yield suboptimal performance in generating high-quality 3D medical images, and there is currently no universal generative framework for medical imaging. In this paper, we introduce a 3D Medical Latent Diffusion (3D MedDiffusion) model for controllable, high-quality 3D medical image generation. 3D MedDiffusion incorporates a novel, highly efficient Patch-Volume Autoencoder that compresses medical images into latent space through patch-wise encoding and recovers back into image space through volume-wise decoding. Additionally, we design a new noise estimator to capture both local details and global structural information during diffusion denoising process. 3D MedDiffusion can generate fine-detailed, high-resolution images (up to 512x512x512) and effectively adapt to various downstream tasks as it is trained on large-scale datasets covering CT and MRI modalities and different anatomical regions (from head to leg). Experimental results demonstrate that 3D MedDiffusion surpasses state-of-the-art methods in generative quality and exhibits strong generalizability across tasks such as sparse-view CT reconstruction, fast MRI reconstruction, and data augmentation for segmentationand classification. Source code and checkpoints are available at https://github.com/ShanghaiTech-IMPACT/3D-MedDiffusion.

Mamba-based deformable medical image registration with an annotated brain MR-CT dataset.

Wang Y, Guo T, Yuan W, Shu S, Meng C, Bai X

pubmed logopapersJul 1 2025
Deformable registration is essential in medical image analysis, especially for handling various multi- and mono-modal registration tasks in neuroimaging. Existing studies lack exploration of brain MR-CT registration, and face challenges in both accuracy and efficiency improvements of learning-based methods. To enlarge the practice of multi-modal registration in brain, we present SR-Reg, a new benchmark dataset comprising 180 volumetric paired MR-CT images and annotated anatomical regions. Building on this foundation, we introduce MambaMorph, a novel deformable registration network based on an efficient state space model Mamba for global feature learning, with a fine-grained feature extractor for low-level embedding. Experimental results demonstrate that MambaMorph surpasses advanced ConvNet-based and Transformer-based networks across several multi- and mono-modal tasks, showcasing impressive enhancements of efficacy and efficiency. Code and dataset are available at https://github.com/mileswyn/MambaMorph.

MedGround-R1: Advancing Medical Image Grounding via Spatial-Semantic Rewarded Group Relative Policy Optimization

Huihui Xu, Yuanpeng Nie, Hualiang Wang, Ying Chen, Wei Li, Junzhi Ning, Lihao Liu, Hongqiu Wang, Lei Zhu, Jiyao Liu, Xiaomeng Li, Junjun He

arxiv logopreprintJul 1 2025
Medical Image Grounding (MIG), which involves localizing specific regions in medical images based on textual descriptions, requires models to not only perceive regions but also deduce spatial relationships of these regions. Existing Vision-Language Models (VLMs) for MIG often rely on Supervised Fine-Tuning (SFT) with large amounts of Chain-of-Thought (CoT) reasoning annotations, which are expensive and time-consuming to acquire. Recently, DeepSeek-R1 demonstrated that Large Language Models (LLMs) can acquire reasoning abilities through Group Relative Policy Optimization (GRPO) without requiring CoT annotations. In this paper, we adapt the GRPO reinforcement learning framework to VLMs for Medical Image Grounding. We propose the Spatial-Semantic Rewarded Group Relative Policy Optimization to train the model without CoT reasoning annotations. Specifically, we introduce Spatial-Semantic Rewards, which combine spatial accuracy reward and semantic consistency reward to provide nuanced feedback for both spatially positive and negative completions. Additionally, we propose to use the Chain-of-Box template, which integrates visual information of referring bounding boxes into the <think> reasoning process, enabling the model to explicitly reason about spatial regions during intermediate steps. Experiments on three datasets MS-CXR, ChestX-ray8, and M3D-RefSeg demonstrate that our method achieves state-of-the-art performance in Medical Image Grounding. Ablation studies further validate the effectiveness of each component in our approach. Code, checkpoints, and datasets are available at https://github.com/bio-mlhui/MedGround-R1
Page 36 of 59587 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.