Editorial: Computational Linguistics and Literature

Szpakowicz, Stan; Feldman, Anna; Kazantseva, Anna

doi:10.3389/fdigh.2018.00024

EDITORIAL article

Front. Digit. Humanit., 27 September 2018

Sec. Digital Literary Studies

Volume 5 - 2018 | https://doi.org/10.3389/fdigh.2018.00024

This article is part of the Research TopicComputational Linguistics and LiteratureView all 5 articles

Editorial: Computational Linguistics and Literature

Stan Szpakowicz¹^*

Anna Feldman²

Anna Kazantseva³

¹School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON, Canada
²Department of Lingustics and Department of Computer Science, Montclair State University, Montclair, NJ, United States
³National Research Council Canada (NRC-CNRC), Ottawa, ON, Canada

Editorial on the Research Topic
Computational Linguistics and Literature

Computational Linguistics—or, more technically, Natural Language Processing—has made great strides in the past several years. Machine learning (ML) is the technology of choice; deep learning, in particular, has pushed the envelope. These methods work best in narrow domains, given vast amounts of data and asked for information rather than for interpretation. Literary data are not limited by topic, and they are hardly ever plentiful enough. That is why work on such data has been, as yet, on the periphery of research and development in Computational Linguistics. This Research Topic aims to bring the processing of literary data to the attention of a broader audience.

The early versions of the articles we present here have been published at ACL workshops on Computational Linguistics for Literature held in 2015 and 2016¹. The papers offer a representative enough sample of the research brought to those workshops.

Toral et al. tackle a weakness of an ostensibly successful major application, Machine Translation (MT). The raw results—especially long machine-translated texts, novels above all—are so error-ridden that substantial post-editing is required. Here is the big question: is MT even worthwhile in such cases? In an instructive experiment with a chapter of a novel, the authors show that the overall cost of translation from scratch significantly outstrips the cost of post-editing. They also show that neural MT beats the more traditional phrase-based statistical MT in terms of productivity gains.

Dubremetz and Nivre look at three rhetorical figures in which words are repeated to a strong stylistic effect: chiasmus, epanaphora, and epiphora². While it is trivial to find repeated lexical material, and indeed such repetition is not infrequent, the actual rhetorical devices are very rare and impossible to recognize automatically without sufficient training data for a ML system. The authors run a multi-faceted experiment with literary texts, scientific texts and quotations. They show that various devices predominate in various types of texts. For example, and perhaps a little surprisingly, the seemingly very artistic chiasmus is more likely to appear in scientific texts³.

Navarro-Colorado processes a large corpus of Golden Age Spanish sonnets. He uses latent Dirichlet allocation (LDA), a ML algorithm widely accepted as good at determining topics in a text. In addition to a few fairly technical findings, the paper shows that LDA recognizes either a theme in a sonnet, or a poetic motif. The immediate conclusions apply to texts of the same type—moderately short well-structured poems—but the paper can spur similar investigation on similar literary data.

Hench and Estes work with highly specialized data: Middle High German epic poetry. We will not steal the authors' thunder; they do a great job of presenting this esoteric matter in a clear and entertaining fashion. In a nutshell, they examine stress patterns in verses, and study how medieval poets used such patterns for semantic and stylistic purposes. Once again, ML is at play. The authors rely on a conditional random field, a method useful in analyzing sequential data.

Author Contributions

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

This Research Topic owes everything to the authors and the reviewers—our thanks to all. We also thank Cecilia Ovesdotter Alm for an initial review of one of the papers.

Footnotes

1. ^A previous special issue builds upon selected papers from the workshops held in 2012, 2013, and 2014.

2. ^See the paper for definitions.

3. ^The paper actually studies a type of chiasmus called antimetabole, e.g., “eat to live, not live to eat”.

Keywords: computational linguisitics, literary data, machine learning, machine translation, post-editing, rhetorical figures, themes and motifs in poetry, Middle High German epic poetry

Citation: Szpakowicz S, Feldman A and Kazantseva A (2018) Editorial: Computational Linguistics and Literature. Front. Digit. Humanit. 5:24. doi: 10.3389/fdigh.2018.00024

Received: 23 August 2018; Accepted: 06 September 2018;
Published: 27 September 2018.

Edited and reviewed by: Jean-Gabriel Ganascia, Université Pierre et Marie Curie, France

Copyright © 2018 Szpakowicz, Feldman and Kazantseva. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Stan Szpakowicz, c3pwYWs0NEBnbWFpbC5jb20=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.