Development of a large-scale grounded vision language dataset for chest CT analysis.
Authors
Affiliations (8)
Affiliations (8)
- Shanghai Jiao Tong University, Shanghai, China.
- Shanghai AI Laboratory, Shanghai, China.
- University of Science and Technology of China, Anhui, China.
- Fudan University, Shanghai, China.
- Shanghai Jiao Tong University, Shanghai, China. [email protected].
- Shanghai AI Laboratory, Shanghai, China. [email protected].
- Shanghai Jiao Tong University, Shanghai, China. [email protected].
- Shanghai AI Laboratory, Shanghai, China. [email protected].
Abstract
Developing generalist foundation model has recently attracted tremendous attention in the field of AI for Medicine, which requires open-source medical image datasets that incorporate diverse supervision signals across various imaging modalities. In this paper, we introduce RadGenome-Chest CT, a comprehensive, large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE. Specifically, we leverage the latest powerful universal segmentation model and large language models, to extend the original datasets from the following aspects: organ-level segmentation masks covering 197 categories, which provide intermediate reasoning visual clues for interpretation; 665K multigranularity grounded reports, where each sentence of the report is linked to the corresponding anatomical region of CT volume with a segmentation mask; 1.2M grounded VQA pairs, where questions and answers are all linked with reference segmentation masks, enabling models to associate visual evidence with textual explanations. We believe that RadGenome-Chest CT can significantly advance the development of multimodal medical foundation models, by training to generate texts based on given segmentation regions, which is unattainable with previous relevant datasets.