COVID-19CT+: A public dataset of CT images for COVID-19 retrospective analysis.
Authors
Affiliations (8)
Affiliations (8)
- Microscopic Image and Medical Image Analysis Group, College of Medicine and Biological Information Engineering, Northeastern University, China.
- School of Computer Science and Engineering, Northeastern University, China.
- School of Computer Science and Engineering, University of New South Wales, Sydney, Australia.
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, Beijing, China.
- Institute of Medical Informatics, University of Luebeck, Luebeck, Germany.
- Shengjing Hospital, China Medical University, Shenyang, China.
- Chengdu University of Traditional Chinese Medicine, Chengdu, China.
- Software College of Northeastern University, Northeastern University, China.
Abstract
Background and objectiveCOVID-19 is considered as the biggest global health disaster in the 21st century, and it has a huge impact on the world.MethodsThis paper publishes a publicly available dataset of CT images of multiple types of pneumonia (COVID-19CT+). Specifically, the dataset contains 409,619 CT images of 1333 patients, with subset-A containing 312 community-acquired pneumonia cases and subset-B containing 1021 COVID-19 cases. In order to demonstrate that there are differences in the methods used to classify COVID-19CT+ images across time, we selected 13 classical machine learning classifiers and 5 deep learning classifiers to test the image classification task.ResultsIn this study, two sets of experiments are conducted using traditional machine learning and deep learning methods, the first set of experiments is the classification of COVID-19 in Subset-B versus COVID-19 white lung disease, and the second set of experiments is the classification of community-acquired pneumonia in Subset-A versus COVID-19 in Subset-B, demonstrating that the different periods of the methods differed on COVID-19CT+. On the first set of experiments, the accuracy of traditional machine learning reaches a maximum of 97.3% and a minimum of only 62.6%. Deep learning algorithms reaches a maximum of 97.9% and a minimum of 85.7%. On the second set of experiments, traditional machine learning reaches a high of 94.6% accuracy and a low of 56.8%. The deep learning algorithm reaches a high of 91.9% and a low of 86.3%.ConclusionsThe COVID-19CT+ in this study covers a large number of CT images of patients with COVID-19 and community-acquired pneumonia and is one of the largest datasets available. We expect that this dataset will attract more researchers to participate in exploring new automated diagnostic algorithms to contribute to the improvement of the diagnostic accuracy and efficiency of COVID-19.