Tensor enhanced chest cancer classification via CNN and Vision Transformer models.
Authors
Affiliations (5)
Affiliations (5)
- Software Engineering, Fatima Jinnah Women University, Rawalpindi, Punjab, Pakistan.
- Faculty of Computer and Information Systems, Islamic University of Madinah, Medina, Saudi Arabia.
- Department of Computer Science and Information, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi Arabia.
- Department of Computer Science and Information, Applied College, Taibah University, Madinah, Saudi Arabia.
- Energy, Industry, and Advanced Technologies Research Center, Taibah University, Madinah, Saudi Arabia.
Abstract
Lung diseases, particularly lung cancer, remain a leading cause of mortality worldwide, accounting for approximately 1.8 million deaths annually. Early and accurate diagnosis is critical for improving patient outcomes. This study also introduces a unified platform for evaluating multiple convolutional neural network architectures and comparing them to a Vision Transformer model while utilizing a common tensor-based preprocessing pipeline for classifying lung cancer with CT/PET-CT imaging. To enhance model adaptability, all input images were initially converted into tensors prior to training, enabling implicit fine-tuning without altering the original architecture. The YOLOTransfer dataset, comprising diverse and annotated medical images, was used to benchmark model performance. Classical CNN models such as AlexNet, VGG-16, ResNet-50, DenseNet, and EfficientNet were compared against ViT in terms of accuracy, sensitivity, specificity, F1-score, and AUC-ROC. Among all models, ResNet-50 and EfficientNet achieved the highest accuracy, while the Vision Transformer showed competitive results in capturing complex global patterns. The findings highlight the complementary strengths of convolutional and transformer-based architectures for medical image analysis and demonstrate the feasibility of deep learning approaches for lung cancer detection.