Comparison of CNNs and Transformer Models in Diagnosing Bone Metastases in Bone Scans Using Grad-CAM.

Authors

Pak S,Son HJ,Kim D,Woo JY,Yang I,Hwang HS,Rim D,Choi MS,Lee SH

Affiliations (6)

  • Department of Medicine, Hallym University College of Medicine, Chuncheon, Gangwon, Republic of Korea.
  • Department of Nuclear Medicine, Dankook University Medical Center, Cheonan, Chungnam, Republic of Korea.
  • Department of Nuclear Medicine, Hallym University Sacred Heart Hospital, Hallym University College of Medicine, Anyang, Gyeonggi, Republic of Korea.
  • Department of Radiology, Hallym University Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul, Republic of Korea.
  • Rowan, Seoul, Republic of Korea.
  • PE Data Solution, SK hynix, Icheon, Gyeonggi, Republic of Korea.

Abstract

Convolutional neural networks (CNNs) have been studied for detecting bone metastases on bone scans; however, the application of ConvNeXt and transformer models has not yet been explored. This study aims to evaluate the performance of various deep learning models, including the ConvNeXt and transformer models, in diagnosing metastatic lesions from bone scans. We retrospectively analyzed bone scans from patients with cancer obtained at 2 institutions: the training and validation sets (n=4626) were from Hospital 1 and the test set (n=1428) was from Hospital 2. The deep learning models evaluated included ResNet18, the Data-Efficient Image Transformer (DeiT), the Vision Transformer (ViT Large 16), the Swin Transformer (Swin Base), and ConvNeXt Large. Gradient-weighted class activation mapping (Grad-CAM) was used for visualization. Both the validation set and the test set demonstrated that the ConvNeXt large model (0.969 and 0.885, respectively) exhibited the best performance, followed by the Swin Base model (0.965 and 0.840, respectively), both of which significantly outperformed ResNet (0.892 and 0.725, respectively). Subgroup analyses revealed that all the models demonstrated greater diagnostic accuracy for patients with polymetastasis compared with those with oligometastasis. Grad-CAM visualization revealed that the ConvNeXt Large model focused more on identifying local lesions, whereas the Swin Base model focused on global areas such as the axial skeleton and pelvis. Compared with traditional CNN and transformer models, the ConvNeXt model demonstrated superior diagnostic performance in detecting bone metastases from bone scans, especially in cases of polymetastasis, suggesting its potential in medical image analysis.

Topics

Bone NeoplasmsNeural Networks, ComputerImage Processing, Computer-AssistedBone and BonesJournal ArticleComparative Study

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.