Multimodal data curation via interoperability: use cases with the Medical Imaging and Data Resource Center.
Authors
Affiliations (10)
Affiliations (10)
- Division of Imaging, Diagnostics, and Software Reliability, Office of Science and Engineering Laboratories, CDRH, US FDA, Silver Spring, MD, USA. [email protected].
- Department of Radiology, University of Chicago, Chicago, IL, USA. [email protected].
- Division of Imaging, Diagnostics, and Software Reliability, Office of Science and Engineering Laboratories, CDRH, US FDA, Silver Spring, MD, USA.
- Center for Translational Data Science, University of Chicago, Chicago, IL, USA.
- Department of Radiology, University of Chicago, Chicago, IL, USA.
- National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Bethesda, MD, USA.
- Department of Medicine, University of California, San Diego, CA, USA.
- Department of Public Health Sciences, University of Chicago, Chicago, IL, USA.
- Department of Radiology, Stanford University School of Medicine, Stanford, CA, USA.
- Office of the Director, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, USA.
Abstract
Interoperability (the ability of data or tools from non-cooperating resources to integrate or work together with minimal effort) is particularly important for curation of multimodal datasets from multiple data sources. The Medical Imaging and Data Resource Center (MIDRC), a multi-institutional collaborative initiative to collect, curate, and share medical imaging datasets, has made interoperability with other data commons one of its top priorities. The purpose of this study was to demonstrate the interoperability between MIDRC and two other data repositories, BioData Catalyst (BDC) and National Clinical Cohort Collaborative (N3C). Using interoperability capabilities of the data repositories, we built two cohorts for example use cases, with each containing clinical and imaging data on matched patients. The representativeness of the cohorts is characterized by comparing with CDC population statistics using the Jensen-Shannon distance. The process and methods of interoperability demonstrated in this work can be utilized by MIDRC, BDC, and N3C users to create multimodal datasets for development of artificial intelligence/machine learning models.