Research Topic

Mining Scientific Papers: NLP-enhanced Bibliometrics

About this Research Topic

During the last decade, the availability of scientific papers in full text and in in machine-readable formats has become more and more widespread thanks to the growing number of publications on online platforms such as ArXiv, CiteSeer or PLoS and so forth. At the same time, research in the field of natural language processing and computational linguistics have provided a number of open source tools for versatile text processing (e.g. NLTK, Mallet, OpenNLP, CoreNLP, Gate, CiteSpace). The rise of Open Access publishing and the standardized formats for the representation of scientific papers (such as NLM-JATS, TEI, DocBook), and the availability of full-text datasets for research experiments and information retrieval corpora (e.g. PubMed, JSTOR, iSearch) have made possible to perform bibliometric studies not only considering the metadata of papers but also their full text content.

Scientific papers are highly structured texts and display specific properties related to their references but also argumentative and rhetorical structure. Recent research in this field has concentrated on the construction of ontologies for the citations in scientific papers (e.g. CiTO, Linked Science) and studies of the distribution of references. However, up to now full-text mining efforts are rarely used to provide data for bibliometric analyses. While bibliometrics traditionally relies on the analysis of metadata of scientific papers, we explore the ways full-text processing of scientific papers and linguistic analyses can contribute to bibliometric studies.

This Research Topic aims to discuss novel approaches and provide insights into scientific writing that can bring new perspectives to understand both the nature of citations and the nature of scientific papers. The possibility to enrich metadata by the full-text processing of papers offers new fields of application to bibliometrics studies. Full text offers a new field of investigation, where the major problems arise around the organization and structure of text, the extraction of information and its representation on the level of metadata. Furthermore, the study of contexts around in-text citations offers new perspectives related to the semantic dimension of citations. The analyses of citation contexts and the semantic categorization of publications will allow us to rethink co-citation networks, bibliographic coupling and other bibliometric techniques.

This Research Topic aims to promote interdisciplinary research in bibliometrics, natural language processing and computational linguistics in order to study the ways bibliometrics can benefit from large-scale text analytics and sense mining of scientific papers. We encourage contributions on theoretical findings, practical methods, technologies on the processing of scientific corpora involving full text processing, semantic analysis, text mining, citation classification and related topics. We also encourage surveys and evaluations of state-of-the-art methods, as well as more exploratory papers to identify novel challenges and pave the way to future theoretical frameworks.


Keywords: Data mining, Text Mining, Semantic publishing, Scientific papers, Bibliometrics, Scientometrics, Natural Language Processing, Computational Linguistics, Citation Content Analysis, Academic Search


Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.

During the last decade, the availability of scientific papers in full text and in in machine-readable formats has become more and more widespread thanks to the growing number of publications on online platforms such as ArXiv, CiteSeer or PLoS and so forth. At the same time, research in the field of natural language processing and computational linguistics have provided a number of open source tools for versatile text processing (e.g. NLTK, Mallet, OpenNLP, CoreNLP, Gate, CiteSpace). The rise of Open Access publishing and the standardized formats for the representation of scientific papers (such as NLM-JATS, TEI, DocBook), and the availability of full-text datasets for research experiments and information retrieval corpora (e.g. PubMed, JSTOR, iSearch) have made possible to perform bibliometric studies not only considering the metadata of papers but also their full text content.

Scientific papers are highly structured texts and display specific properties related to their references but also argumentative and rhetorical structure. Recent research in this field has concentrated on the construction of ontologies for the citations in scientific papers (e.g. CiTO, Linked Science) and studies of the distribution of references. However, up to now full-text mining efforts are rarely used to provide data for bibliometric analyses. While bibliometrics traditionally relies on the analysis of metadata of scientific papers, we explore the ways full-text processing of scientific papers and linguistic analyses can contribute to bibliometric studies.

This Research Topic aims to discuss novel approaches and provide insights into scientific writing that can bring new perspectives to understand both the nature of citations and the nature of scientific papers. The possibility to enrich metadata by the full-text processing of papers offers new fields of application to bibliometrics studies. Full text offers a new field of investigation, where the major problems arise around the organization and structure of text, the extraction of information and its representation on the level of metadata. Furthermore, the study of contexts around in-text citations offers new perspectives related to the semantic dimension of citations. The analyses of citation contexts and the semantic categorization of publications will allow us to rethink co-citation networks, bibliographic coupling and other bibliometric techniques.

This Research Topic aims to promote interdisciplinary research in bibliometrics, natural language processing and computational linguistics in order to study the ways bibliometrics can benefit from large-scale text analytics and sense mining of scientific papers. We encourage contributions on theoretical findings, practical methods, technologies on the processing of scientific corpora involving full text processing, semantic analysis, text mining, citation classification and related topics. We also encourage surveys and evaluations of state-of-the-art methods, as well as more exploratory papers to identify novel challenges and pave the way to future theoretical frameworks.


Keywords: Data mining, Text Mining, Semantic publishing, Scientific papers, Bibliometrics, Scientometrics, Natural Language Processing, Computational Linguistics, Citation Content Analysis, Academic Search


Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.

About Frontiers Research Topics

With their unique mixes of varied contributions from Original Research to Review Articles, Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author.

Topic Editors

Loading..

Submission Deadlines

30 January 2018 Manuscript

Participating Journals

Manuscripts can be submitted to this Research Topic via the following journals:

Loading..

Topic Editors

Loading..

Submission Deadlines

30 January 2018 Manuscript

Participating Journals

Manuscripts can be submitted to this Research Topic via the following journals:

Loading..
Loading..

total views article views article downloads topic views

}
 
Top countries
Top referring sites
Loading..

Comments

Loading..

Add a comment

Add comment
Back to top