Original Research ARTICLE
Application of text analytics to extract and analyze material-application pairs from a large scientific corpus
- 1Center for Innovation Strategy and Policy, SRI International, United States
- 2Artificial Intelligence Center, SRI International, United States
When assessing the importance of materials (or other components) to a given set of applications, machine analysis of a very large corpus of scientific abstracts can provide an analyst a base of insights to develop further. The use of text analytics reduces the time required to conduct an evaluation, while allowing analysts to experiment with a multitude of different hypotheses. Because the scope and quantity of metadata analyzed can, and should, be large, any divergence from what a human analyst determines and what the text analysis shows provides a prompt for the human analyst to reassess any preliminary findings. In this work, we have successfully extracted material-application pairs and ranked them on their importance. This method provides a novel way to map scientific advances in a particular material to the application for which it is used.
Approximately 438,000 titles and abstracts of scientific papers published from 1992 to 2011 were used to examine sixteen materials. This analysis used co-clustering text analysis to associate individual materials with specific clean energy applications, evaluate the importance of materials to specific applications, and assess their importance to clean energy overall. Our analysis reproduced the judgements of experts in assigning material importance to applications. The validated methods were then used to map the replacement of one material with another material in a specific application (batteries).
Keywords: Machine learning classification, Science Policy, co-clustering, text analytics, critical materials, big data
Received: 31 Aug 2017;
Accepted: 26 Dec 2017.
Edited by:Kevin Boyack, SciTech Strategies, Inc., United States
Reviewed by:Karin Verspoor, University of Melbourne, Australia
Jos Winnink, Leiden University, Netherlands
Copyright: © 2017 Kalathil, Byrnes, Randazzese, Harnett and Freyman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: PhD. Christina A. Freyman, SRI International, Center for Innovation Strategy and Policy, 1100 Wilson Blvd., Suite 2800, Menlo Park, 22043, VA, United States, email@example.com