Frontiers reaches 6.4 on Journal Impact Factors

Original Research ARTICLE Provisionally accepted The full-text will be published soon. Notify me

Front. Digit. Humanit. | doi: 10.3389/fdigh.2018.00002

Ensemble NER: Evaluating Named Entity Recognition Tools in the Identification of Place Names in Historical Corpora

 Miguel Won1,  Patricia Murrieta-Flores2* and Bruno Martins1
  • 1INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal
  • 2History, Lancaster University, United Kingdom

The field of Spatial Humanities has advanced substantially in the past years. The identification and extraction of toponyms and spatial information mentioned in historical text collections has allowed its use in innovative ways, making possible the application of spatial analysis and the mapping of these places with Geographic Information Systems. For instance, automated place name identification is nowadays possible with Named Entity Recognition (NER) systems. Statistical NER methods based on supervised learning, in particular, are highly successful with modern datasets. However, there are still major challenges to address when dealing with historical corpora. These challenges include language changes over time, spelling variations, transliterations, OCR errors, and sources written in multiple languages among others. In this article, considering a task of place name recognition over two collections of historical correspondence, we report an evaluation of five NER systems and an approach that combines these through a voting system. We found that although individual performance of each NER system was corpus dependent, the ensemble combination was able to achieve consistent measures of precision and recall, outperforming the individual NER systems. Additionally, the results showed that these NER system are not strongly dependent on pre-processing and translation to modern English.

Keywords: spatial humanities, Digital Humanities, Natural Language Processing, historical corpora, Toponym recognition, Early-Modern English, Republic of Letters

Received: 24 Nov 2017; Accepted: 12 Feb 2018.

Edited by:

Arianna Ciula, King's College London, United Kingdom

Reviewed by:

Adam Crymble, University of Hertfordshire, United Kingdom
Sara Tonelli, Fondazione Bruno Kessler, Italy  

Copyright: © 2018 Won, Murrieta-Flores and Martins. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Dr. Patricia Murrieta-Flores, Lancaster University, History, Bowland College, Lancaster University, City of Lancaster, LA1 4YT, United Kingdom,