Semiautomated Extraction of Research Topics and Trends From National Cancer Institute Funding in Radiological Sciences From 2000 to 2020.
Authors
Affiliations (5)
Affiliations (5)
- University of Washington School of Medicine.
- Department of Radiation Oncology, University of Washington.
- Detroit Medical Center, Detroit, Michigan, USA.
- Department of Radiation Oncology, University of Washington; Department of Radiology, University of Washington, Seattle, Washington.
- Department of Radiation Oncology, University of Washington. Electronic address: [email protected].
Abstract
Investigators and funding organizations desire knowledge on topics and trends in publicly funded research but current efforts for manual categorization have been limited in breadth and depth of understanding. We present a semiautomated analysis of 21 years of R-type National Cancer Institute (NCI) grants to departments of radiation oncology and radiology using natural language processing. We selected all noneducation R-type NCI grants from 2000 to 2020 awarded to departments of radiation oncology/radiology with affiliated schools of medicine. We used pretrained word embedding vectors to represent each grant abstract. A sequential clustering algorithm assigned each grant to 1 of 60 clusters representing research topics; we repeated the same workflow for 15 clusters for comparison. Each cluster was then manually named using the top words and closest documents to each cluster centroid. The interpretability of document embeddings was evaluated by projecting them onto 2 dimensions. Changes in clusters over time were used to examine temporal funding trends. We included 5874 grants totaling 1.9 billion dollars of NCI funding over 21 years. The human-model agreement was similar to the human interrater agreement. Two-dimensional projections of grant clusters showed 2 dominant axes: physics-biology and therapeutic-diagnostic. Therapeutic and physics clusters have grown faster over time than diagnostic and biology clusters. The 3 topics with largest funding increase were imaging biomarkers, informatics, and radiopharmaceuticals, which all had a mean annual growth of >$218,000. The 3 topics with largest funding decrease were cellular stress response, advanced imaging hardware technology, and improving performance of breast cancer computer-aided detection, which all had a mean decrease of >$110,000. We developed a semiautomated natural language processing approach to analyze research topics and funding trends. We applied this approach to NCI funding in the radiological sciences to extract both domains of research being funded and temporal trends.