Back to all papers

A hybrid deep learning approach integrating CNN and transformer for lung cancer classification using CT scans.

March 17, 2026pubmed logopapers

Authors

Yousafzai SN,Nasir IM,Mansour S,Negm N,Alhashmi AA,Alharbi MA,Kim E

Affiliations (8)

  • Department of Computer Science, HITEC University, Taxila, 47080, Pakistan. [email protected].
  • Faculty of Informatics, Kaunas University of Technology, 51368, Kaunas, Lithuania.
  • Department of Radiological Sciences, College of Health and Rehabilitation Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh, 11671, Saudi Arabia.
  • Department of Computer Science, College of Science & Art at Mahayil, King Khalid University, Abha, Saudi Arabia.
  • Department of Computer Science, College of Science, Northern Border University, Arar, Saudi Arabia.
  • Department of Information Science, College of Humanities and Social Sciences, King Saud University, P. O Box 28095, Riyadh, 11437, Saudi Arabia.
  • Department of Information Systems, Hanyang University, Seoul, 04763, Republic of Korea. [email protected].
  • Department of Artificial Intelligence, Hanyang University, Seoul, 04763, Republic of Korea. [email protected].

Abstract

Lung cancer is an extremely fatal kind of cancer, resulting in the deaths of almost 7.6 million individuals annually around the globe. Nevertheless, a timely diagnosis is a crucial necessity for enhancing the likelihood of human survival. Regarding tumor identification, CT scans are normally used to identify affected areas. Nevertheless, CT imaging face significant problems such as poor visibility of tumor locations and high false negative rates. The small dataset size of medical imaging makes it challenging to capture local lesion features by iterative training, considering all input features equally. This work integrates Convolutional Neural Network (CNN) and Improved Swin Transformer (C-Swin), a deep learning model that extracts and integrates fine-grained local and global features. C-Swin has Transformer encoder and a CNN module. The CNN module extracts local features, whereas the Transformer module captures global features. The Transformer encoder uses a hybrid shifted window attention method to focus on a spatial region of the CT image, reducing background semantic information and improving local feature capture accuracy. The proposed method is validated using the publicly accessible Kaggle dataset namely IQ-OTH/NCCD with three classes. the proposed C-Swin model achieved average accuracy of 96.26%, precision of 97.48%, recall of 96.39% and F1-score of 97.42%. The numerical findings unequivocally demonstrate that our proposed method surpasses various existing methods with an increase in accuracy ranging from 2.31% to 6.81%. The C-Swin model is capable of extracting detailed local lesion features, resulting in improved classification performance.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.