Large-scale generative tumor synthesis in computed tomography images for improving tumor recognition.
Authors
Affiliations (13)
Affiliations (13)
- Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China.
- Tencent AI Lab, Shenzhen, China.
- Department of Biomedical Informatics, Harvard University, Boston, USA.
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China.
- Department of Radiology, Shenzhen People's Hospital, Shenzhen, China.
- Department of Radiology, The Third Affiliated Hospital of Southern Medical University, Guangzhou, China.
- Department of Radiology, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou, China.
- Department of Diagnostic Radiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China.
- Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China. [email protected].
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong, China. [email protected].
- Division of Life Science, The Hong Kong University of Science and Technology, Hong Kong, China. [email protected].
- State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Hong Kong, China. [email protected].
- Shenzhen-Hong Kong Collaborative Innovation Research Institute, The Hong Kong University of Science and Technology, Shenzhen, China. [email protected].
Abstract
AI-driven tumor recognition unlocks new possibilities for precise tumor screening and diagnosis. However, the progress is heavily hampered by the scarcity of annotated datasets, demanding extensive efforts by radiologists. To this end, we introduce FreeTumor, a Generative AI framework to enable large-scale tumor synthesis for mitigating data scarcity. Specifically, FreeTumor effectively leverages limited labeled data and large-scale unlabeled data for training. Unleashing the power of large-scale data, FreeTumor is capable of synthesizing a large number of realistic tumors for augmenting training datasets. We curate a large-scale dataset comprising 161,310 Computed Tomography (CT) volumes for tumor synthesis and recognition, with only 2.3% containing annotated tumors. 13 board-certified radiologists are engaged to discern between synthetic and real tumors, rigorously validating the quality of synthetic tumors. Through high-quality tumor synthesis, FreeTumor showcases a notable superiority over state-of-the-art tumor recognition methods, indicating promising prospects in clinical applications.