Edited by: Qing Cai, East China Normal University, China
Reviewed by: Arturo Hernandez, University of Houston, United States; Falk Huettig, Max Planck Institute for Psycholinguistics, Netherlands
This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Most interpreting theories claim that different interpreting types should involve varied processing mechanisms and procedures. However, few studies have examined their underlying differences. Even though some previous results based on quantitative approaches show that different interpreting types yield outputs of varying lexical and syntactic features, the grammatical parsing approach is limited. Language sequences that form without relying on parsing or processing with a specific linguistic approach or grammar excel other quantitative approaches at revealing the sequential behavior of language production. As a non-grammatically-bound unit of language sequences, frequency motif can visualize the local distribution of content and function words, and can also statistically classify languages and identify text types. Thus, the current research investigates the distribution, length and position-dependent properties of frequency motifs across different interpreting outputs in pursuit of the sequential generation behaviors. It is found that the distribution, the length and certain position-dependent properties of the specific language sequences differ significantly across simultaneous interpreting and consecutive interpreting output. The features of frequency motifs manifest that both interpreting output is produced in the manner that abides by the least effort principle. The current research suggests that interpreting types can be differentiated through this type of language sequential unit and offers evidence for how the different task features mediate the sequential organization of interpreting output under different demand to achieve cognitive load minimization.
Interpreting is a particularly demanding language processing task for the cognitive system (
Interpreting types have been explored and discussed in different theoretical models. As defined by
By contrast, SI is modeled into a one-step process consisting of simultaneous efforts:
With the theoretical models repeatedly emphasizing the distinctive processes of SI and CI, empirical researches have rarely touched upon the differences between SI and CI output directly. Among the few is a debate on whether greater accuracy is achieved in CI or SI.
The relative paucity of direct comparisons between SI and CI on the distinct processes renders it still an open question what exactly the different underlying mechanisms are in these two tasks. One possible reason is the lack of operational indicators, which leads to the rationale of the present study. The previous treebank-based study has demonstrated that SI and CI outputs differ in dependency distance (
Motif is a prototypical example of language sequence. As suggested, motif is a simple and machine-operable technique to determine and process linguistic sequential information, which proves to be a reliable approach to automatic text classification (
Accordingly, a F-motif can be constructed as “a continuous series of equal or increasing frequency values (
First and foremost, as a linear syntagmatic/sequential unit of word frequency, F-motif can visualize the local distribution of function words in the sentences (
For instance, the frequency value of each token in the sentence On trade issue we have always maintained that trade disputes should be resolved through consultations from a certain corpus was determined based on the given file. The result is shown in
Frequency values of ‘On trade issue we have always maintained that trade disputes should be resolved through consultations.’
On | 23 |
Trade | 14 |
Issue | 6 |
We | 44 |
Have | 29 |
Always | 3 |
Maintained | 1 |
That | 70 |
Trade | 14 |
Disputes | 1 |
Should | 8 |
Be | 24 |
Resolved | 1 |
Through | 3 |
Consultations | 3 |
Thus, the F-motifs of this sentence were generated according to the definition: (23) (14) (6-44) (29) (3) (1-70) (14) (1-8-24) (1-17) (3). There are six function words (i.e., articles, conjunction like “that,” prepositions like “on,” pronoun like “we,” and non-lexical verbs such as do, be and have) and nine content words (i.e., nouns like “trade,” “issue,” “disputes,” and “consultations,” lexical verbs like “maintain,” “should,” “resolved,” “adjectives,” adverbs like “always,” numerals and ordinals). It can be observed even in this short sentence that the frequencies of the content words (the highest is 14 and the mean is 6.25) are much lower than those of the function words (the lowest is 17 and the mean is 34.5). This difference can be even more illustrative when discussed in terms of word sequences, and herein F-motifs. Firstly, any features of the F-motif are equivalent to its counterparts of specific serial word sequences in the texts, and thus the distribution of F-motif is exactly the distribution of each serial word sequence in the given text. Secondly, it reveals how the sentence is truncated by the function words of higher frequency values, or in other words it shows the relative position of function and content words in a local context. Thirdly, since the last items of F-motif are likely function words, the length of F-motifs is also closely correlated with the local density of function words. Fourthly, the frequency values of each position in F-motifs of different-lengths can reflect the choice of content words in different relative position to local function words.
Moreover, when the motifs in language production are studied in a quantitative context, they are reflective of how people deal with the demand in the process of text (or speech) generation. A confirmation is that motifs display a lawful distributional behavior similar to other well-known linguistic units (
Furthermore, interrelations between length and frequency of sequence types are also expected to reveal certain properties of the sequential units and are constantly under investigation. According to synergetic linguistics, language systems present ‘self-organization’ and ‘self-regulation’ features in terms of the distribution of its linguistic units (
So far, the empirical description of the statistics of motif sequences has been used for the comparison of authors (
To sum up, the applicability of the regulation of motif in the basic linguistic level has been verified. However, previous studies generally use written materials as the subject of study, while the sequential units in spoken context were seldom explored. Since the spoken utterances are generally “extemporary” and produced one after another in sequences as opposed to the possible planning and revision in writing contexts, the sequential-related properties may provide us better insights into authentic spoken materials. Moreover, the synergetic linguistics argues that in the self-organizing language system, the order parameters mediating between the needs of the language users and the mechanisms of production and perception is dominated by the requirement to minimize the production effort and memorization effort (
The present study will explore whether the language sequences in the output are also sensitive to different interpreting types. The following specific questions will be examined:
Can the frequency distribution of language sequential units of frequency classify interpreting types?
Can the length distribution of language sequential units of frequency classify interpreting types?
What are the position-dependent properties of the language sequential units of frequency in SI and CI output?
What are the psychological motivations underlying the varied distribution of language sequences in SI and CI output?
The current research intends to verify whether distinctive sequential patterns exist in the output across different modes of interpreting. To realize this goal, we built a self-built parallel corpus with transcribed real-world materials for two sub-corpora, namely, (1) a CI corpus consisting of the English interpretations and the source texts in Chinese of press conferences of the National People’s Congress from 2009 to 2016; (2) a SI corpus made up of 21 English interpretations and the Chinese source speech of keynote speeches recorded at the Boao Forum of Asia, Davos Forum from 2009 to 2016, as well as BRICs summits, sessions of the U.N. General Assembly, and China-ASEAN conferences during that time period. Across the parallel corpus, the source language is Chinese and the target language is English, and all interpretations were carried out from the mother tongue into the interpreters’ second language. In order to achieve a valid contrast between SI and CI, files of approximately 57,000 words were selected from each sub-corpus of English interpretations and their Chinese source texts are selected accordingly.
Sizes of sub-corpora.
Sub-corpora | Chinese/English | No. of files | Running words in texts |
---|---|---|---|
SI | English | 21 | 57199 |
Chinese | 21 | 80802 | |
CI | English | 8 | 57154 |
Chinese | 8 | 76314 | |
Total | 48 | 271469 |
Given that the frequency value of words is particularly susceptible to text size, the sub-corpora were segmented to balance the text size. Thus, each output English sample file has approximately 4,000 tokens to ensure the validity of comparisons between sub-corpora. The segmentation was made without splitting a complete paragraph, and 28 equally-sized English files were obtained. The Chinese source texts were segmented in accordance with the English segmentation and 28 Chinese files were obtained. The frequency values of these 56 files of similar size were counted through Antconc, and the F-motifs were determined with respect to the frequency values of words in the given file. F-motifs of all the files in SI and CI were formed by Perl programs like the example given in the previous sector.
Then, the rank frequency distributions of F-motifs of both output groups were determined by ZM distribution, which is proven to be well-fitted with rank-frequency distribution in most cases and meaningful for investigation concerning motif (
For instance,
The rank frequency distribution of F-motifs modeled by ZM distribution (
The two parameters in the function, i.e.,
The three parameters in the function, i.e.,
Results are presented in three progressive aspects: (1) a classification of SI and CI output via a comparison of F-motif distribution parameters to fit the ZM models; (2) an investigation of the local distribution of function words in SI and CI by comparing the length of F-motifs; (3) identification of word choice preference in SI and CI by comparing the position-dependent frequencies.
The rank frequency of F-motifs in SI and CI are fitted with ZM distribution and the parameters extracted from these models are further analyzed between SI and CI. Fitting the ZM distribution to the data of total F-motif tokens in the output yields excellent results. Models fit are all excellent according to
Parameters of ZM model for F-motifs of SI and CI.
Parameters |
||||||||
---|---|---|---|---|---|---|---|---|
Group | a | b | X2 | P(X2) | df | C | ||
SI | 0.9449 | 5.3162 | 1148.5143 | 0.0000 | 6469 | 0.0395 | 8225 | 0.9748 |
CI | 0.8787 | 36.3978 | 562.2163 | 0.0000 | 6412 | 0.0544 | 8325 | 0.9446 |
As can be seen in
Rank frequency distribution of the highest 50 frequency F-motifs in SI and CI.
The mean frequency of the highest 50 frequently occurred F-motifs is higher in CI (118.16) than SI (110.56), and that of the 20 most frequently occurred F-motifs is also higher in CI (168.4) than SI (151.42).
We then applied the Altmann-Fitter to all the 28 texts for analysis, and extracted the parameters for each file, which are listed in
Independent sample
The values of parameter a and b in SI and CI. (The values of parameter a is shown in the left column of y-axis and the value of parameter b is shown in the right column.)
Since interpreting is a process mediating between source language and target language, the variance in the output might be attributed to the differences in source texts. In order to determine the possible reasons for the divergence, the rank frequency distribution of the F-motif of the source texts was applied to the ZM model, and the results are shown in
The F-motifs of Chinese input also present excellent fit with the ZM model, with a goodness of fit
To further test the possible effect of the input text on the output text in terms of the distribution of F-motifs, a zero-lagged Pearson correlation was calculated. The planned positive correlation was found only for SI group, parameter
Another factor of potential influence on the output of interpreting is the individual styles of interpreters (
Distinctive SI and CI output are determined with the distribution patterns. To further understand the effect of interpreting types on the distribution of function and content words in the target language, the length of F-motif in SI and CI are compared. Two approaches were performed: (1) the comparison of parameters of the fit models; (2) the comparison of the numbers of shorter and longer F-motifs.
The lengths of F-motifs in both interpreting types fit well with Hyper-Pascal distribution, with
The length distribution of F-motifs modeled by Hyper-Pascal distribution (
Parameters of Hyper-Pascal model in fitting to length distribution of F-motifs of SI and CI.
Parameters |
||||||||
---|---|---|---|---|---|---|---|---|
Group | X2 | P(X2) | df | C | ||||
SI | 1.0708 | 0.1833 | 0.2478 | 25.6751 | 0.1845 | 3.0000 | 0.0132 | 0.9921 |
CI | 1.4574 | 0.2544 | 0.2290 | 28.9415 | 0.6634 | 3.0000 | 0.0148 | 0.9902 |
We then applied the Altmann-Fitter to all the 28 texts for analysis, and extracted the related information for further comparison (see in
Again, we checked the length distribution of F-motifs of the input texts in Chinese and independent sample
The results of ANOVA test also rule out the possible effect of interpreting style of the three interpreters on the length of F-motif in the output,
Hence, neither the source text nor the interpreting style of varied interpreters underlies the variances in the length differences of F-motif in SI and CI output.
Next, a comparison of the total number of shorter (1, 2, and 3 words) and longer (4–7 words) F-motifs between SI and CI was conducted with an independent sample
The number of shorter (1–3 words) F-motifs and longer (4–7 words) F-motifs in SI and CI output (∗ indicates to where significant difference is detected).
In the previous section, it is found that both the distribution and length of F-motif in interpreting output differ across interpreting groups. More information regarding the function and content word choices can be attained if we re-assess the data from a perspective of the position-related information of the F-motif.
A (reversed) interrelation between the length and frequency of linguistic units/sequences has been confirmed (
Logarithmic mean frequency of words in each position in F-motif of SI and CI (∗ indicates to where significant difference is detected).
The mean frequency values in each position in F-motif across groups generally appear the same patterns: (1) the mean frequencies of the last position in each F-motif length are generally higher in SI than in CI; (2) the mean frequency values of each position except the last position in each length of F-motif is generally higher in CI than in SI.
More specifically, in shorter F-motifs, the last positions present a significantly higher mean frequency in SI than CI while no significant differences are detected in other positions. For instance, the last positions of two-word (L2P2) and three-word F-motifs (L3P3) show significant differences. In L2P2, SI (
Furthermore, the frequency values of different position point to different words or word classes in the text. On the one hand, it is found that the content words with the highest frequency value in all CI text is “China,” and its mean frequency is 42.21. The most frequently used content words in SI are “China,” “development” and “economic,” the mean frequency of which are 59.5. Thus, the words in the last position of F-motif of all lengths (except one-word F-motif) are very likely function words. On the other hand, the words in the third and fourth position of longer F-motifs are mostly content words according to the frequency value.
The current research is the very first effort investigating the different linguistic features of SI and CI output by employing a linguistic sequence visualizing the local distribution of function words without relying on grammatical parses. This study complements previous treebank-based studies by quantitatively examining the non-grammatically-bound language sequences in different interpreting outputs. It is further confirmed that the output of different interpreting types, differs not only in dependency parsed information, but in the local, sequential distribution of function words. Given that the distribution of F-motif abides by the principle of least effort, the current findings highlight the different mechanisms in SI and CI in realizing production and memory effort minimization.
Our results indicate that the output texts of SI and CI entail F-motifs of different distribution, lengths and position-dependent frequencies, regardless of the differences in text size, input texts or the interpreting style of individual interpreters. To be specific, it is found out that: firstly, only the distribution of SI output F-motifs is significantly correlated with that of input; secondly, CI generates more short F-motifs (one-to-three words motifs) while SI produces more long F-motifs (four-to-seven word motifs); and thirdly, the mean frequencies of content words in the same position of the long F-motif in CI are higher those in SI.
The present study first compares the ZM parameters fit by the F-motif in SI and CI ouput. Though they both fit the same ZM model, significant differences are found even when the influence of input text and individual style of interpreters are excluded. The different patterns demonstrate that SI and CI outputs are two distinctive inter-languages and that different operational mechanisms are involved in the processes.
In addition, it is indicated in the correlation tests that only SI output is significantly affected by the input in terms of the frequency of these language sequences. This result is a manifestation that the sequential organization of the output in SI is closely constrained by the input whereas CI reformulation is more independent. It corroborates the findings in previous studies comparing the dependency distance of SI and CI output (
The results of the present study favor this proposition. On the one hand, the input F-motif has an essential impact on the output F-motif of SI but no such correlation is found for CI. In the quantitative context, F-motif distribution in the input text can explain about 40% of the variances in that of the output of SI. In the local context, it means that the sequential frequency values of the output in SI is synchronized with those of the input. However, no significant correlation is found for the distribution of F-motifs between the input and the output of CI. Thus, we speculate that SI is produced closely in line with the input text, thus the linear sequences of word frequency of the output are distributed in alignment with those of the input.
On the other hand, ZM parameters of F-motifs in SI output vary a lot while those of CI are limited to a small range. Since the parameters of the input F-motif of both interpreting types fluctuate, it is assumed that the clustering of the parameters of the F-motifs in CI output is attributed to the mediation effect in the interpreting process. In other words, instead of retaining the diverse sequential orders of the source text, CI interpreters may tend to employ more frequently used structures or sequences and thus yield F-motifs bearing more regular and consistent distributions. This assumption fits squarely into the fact that the F-motifs of CI output show a greater central tendency as the mean frequency of the most frequently occurred F-motifs (top 50) is higher in CI than SI, and the standard deviation is larger in SI than CI. In sum, to lessen the processing difficulties, SI interpreters tend to follow the sequences of the input whereas CI interpreters not only adopt structures of less complexity but also employ more frequently used language sequences.
The results for fitting the length distributions of SI and CI F-motif to models corroborate with the length distribution of length-motifs of written texts, as both fit well with the Hyper-Pascal model (
It is postulated that the distinctive types of text (written vs. spoken) contribute to the different length distributions of motifs. Drawing on evidence from spoken language corpora and multiple languages,
F-motif, a sequential unit consisting of words of non-decreasing frequency, can be regarded as word bundles segmented by high-frequency words. As mentioned above, function words and content words are dispersed asymmetrically on a continuum of the frequency value in each text. Most of the function words are of high-frequency and they are either the one-word F-motifs or the words in the last position of F-motif sequences. The longer the F-motif is, the more content words are in the sequence. Thus, shorter F-motifs can be indirectly linked with a dense distribution of function words. It is concluded in consequence that function words are more densely distributed in CI than SI output, which is possibly due to the different mechanisms of producing sequences during the two interpreting types.
It is generally believed that SI interpreters, constrained by the temporal pressure, handle the source speech in piecemeal (
Conversely, CI interpreters receive speakers’ uninterrupted utterances in portions of at least a few sentences. Though interpreters in this working mode are not taxed much attention from the simultaneous presentation of input and output speech, more time is required in taking notes but only part of the information can be taken down. Thus, it generates an added pressure and extra load on working memory (
In the present study, two notable differences of position-dependent frequencies of F-motifs are detected: (1) the mean frequencies of the last-position words in shorter F-motifs are higher in SI than in CI; (2) the mean frequencies of the third and fourth positions in longer F-motifs are higher in CI than in SI.
The frequencies of L2P2 and L3P3 are significantly higher in SI than CI F-motifs. An exhaustive search for words falling in the frequency range of L2P2 and L3P3 finds out that they belong to the same top-frequency function words (in, to, of, and, the) in both groups. Thus, there is no practical difference whatsoever between SI and CI in terms of content word or function word choices in shorter F-motifs.
Statistically, only the mean frequencies of the third and fourth positions of longer F-motifs (L5P3, L5P4, L6P3, and L6P4) are significantly higher in CI than SI. A further check indicates that the frequency values in these positions mainly point to content words. No significant differences between SI and CI were found in the aspect of function words. In other words, the function word usage in longer F-motifs does not differ between SI and CI. However, the output of the two interpreting types varies in content word choices in longer F-motif. As is emphasized above, the length of F-motif is indirectly related to the distribution of content and function words. Longer F-motif consists of more content words and one function word, where content words are more densely distributed. As a result, the position dependent differences signify that CI interpreters tend to use more frequently used words than in SI when function words are not locally accessible. It has been argued in the previous section that interpreters tend to rely on structural information to memorize input messages and generate more function words in the output sequences to alleviate working memory burden. When there is less structure-related information in the sequence, more pressure is imposed on the CI interpreter, who might resort to high-frequency, polysemous content words to lessen the production load (
To recap, we assume that two processing approaches underlie the differences between SI and CI output in terms of the sequential organizations. For SI, the simultaneity nature poses high demand on the coordination between input and output, guiding the interpreters’ efforts to retain the textual sequences of the input text. Conversely, CI interpreters store and reformulate the messages effectively via structure related information to lessen the memory and processing load. Thus, they tend to produce more frequently used sequences, where function words are more densely distributed. Or otherwise, CI interpreters adopt frequently used content words if less function words are accessible for scaffolding.
Though the sequential linguistic units of motif have been introduced into the linguistic world for a short period of time, its application into linguistic research is promising.
Previous studies using different types of motifs have proved its usage in text, genre and language types classification. However, using F-motif in investigating interpreting uncovers its reflection of human cognitive constraints in producing language sequences. Essentially, types of attention-sharing and overloading of working memory are generally postulated to be the cognitive underpinnings of interpreting (
In the present study, the usage of F-motif is extended to quantitatively investigate the local distribution of function words and the sequential order of high and low frequency words in a given text. The sequential and distributional information can, to some extent, reflect the word choice and the sequential production mechanism of language, particularly spoken language. More importantly, it is shown that F-motif can be used to mirror the different types of cognitive demand involved in different tasks. Firstly, F-motif of interpreting can be modeled with ZM distribution model and its correlation test results with the input texts reflect whether the linear sequences of word frequency of the output are distributed in alignment with or independent of those of the input. Secondly, the length of F-motifs reflects the density of function words in word sequences, and thus mirrors the different mechanisms of producing sequences during the two interpreting types to minimize the storage and producing effort. For example, to alleviate storage burden, CI interpreters can rely more on structure-related information and thus generate the output with more densely distributed function words, which leads to more short F-motifs. Thirdly, in certain positions of long F-motifs, different word choices are also evidenced in these two interpreting types. It is noticed that CI interpreters tend to use more frequently used words than in SI when structural information is not locally accessible. In other words, the position-dependent frequency of F-motifs offers detailed explanation for word choices in sequential language production.
As suggested in a recent commentary, the quantitative studies on interpreting tasks and their underlying cognitive mechanisms under different circumstances serve as an arena for the integration of approaches to the investigation into language foundations and human cognitions (
The current research investigates the distribution, length and the position-dependent properties of a language sequential unit, F-motif, in SI and CI outputs. It is found that the distribution and the lengths of F-motifs differ significantly across SI and CI output. The mean frequencies of the content words in some positions of the longer F-motifs vary between SI and CI, which confirms the requirement of minimum producing and memory effort in interpreting process. The different sequence-related features of SI and CI output are the results of varied cognitive constraints involved in the interpreting processes and the correspondent coping mechanism of interpreters.
The present study may offer a novel method to differentiate different interpreting types and to quantify the differences in a reasonable way. Such a quantification can be viewed as an indicator of how far the real-world SI and CI output differs. Moreover, the sequential delivery of expert interpreters sets an example for novel interpreters, who should be trained specifically for each interpreting type. The length and position-dependent frequencies can be related to specific structural properties of interpreting types, which may very likely offer insights into the development of artificial intelligence in interpreting tasks. Other basic linguistic properties can be further investigated with this approach to better understanding the sequential processing in interpreting.
JL and QL conceived and designed the experiments. JL, QL, and YL performed the experiments, collected the data, and performed the data analyses. All authors contributed to the interpretation of results and the writing of the manuscript and approved the final version of the manuscript for submission.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Special thanks go to Prof. Haitao Liu for his insightful comments and suggestions and also to Dr. Matthew Reeve for his help in polishing the language.
The Supplementary Material for this article can be found online at: