“Textual Prosody” Can Change Impressions of Reading in People With Normal Hearing and Hearing Loss

Uetsuki, Miki; Watanabe, Junji; Maruya, Kazushi

doi:10.3389/fpsyg.2020.548619

ORIGINAL RESEARCH article

Front. Psychol., 17 December 2020

Sec. Psychology of Language

Volume 11 - 2020 | https://doi.org/10.3389/fpsyg.2020.548619

“Textual Prosody” Can Change Impressions of Reading in People With Normal Hearing and Hearing Loss

Miki Uetsuki^1*

Junji Watanabe²

Kazushi Maruya²

¹Department of Community Studies, Aoyama Gakuin University, Kanagawa, Japan
²Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Kanagawa, Japan

Recently, dynamic text presentation, such as scrolling text, has been widely used. Texts are often presented at constant timing and speed in conventional dynamic text presentation. However, dynamic text presentation enables visually presented texts to indicate timing information, such as prosody, and the texts might influence the impression of reading. In this paper, we examined this possibility by focusing on the temporal features of digital text in which texts are represented sequentially and with varying speed, duration, and timing. We call this “textual prosody.” We used three types of textual prosody: “Recorded,” “Shuffled,” and “Constant.” Recorded prosody is the reproduction of a reader’s reading with pauses and varying speed that simulates talking. Shuffled prosody randomly shuffles the time course of speed and pauses in the recorded type. Constant prosody has a constant presentation speed and provides no timing information. Experiment 1 examined the effect of textual prosody on people with normal hearing. Participants read dynamic text with textual prosody silently and rated their impressions of texts. The results showed that readers with normal hearing preferred recorded textual prosody and constant prosody at the optimum speed (6 letters/second). Recorded prosody was also preferred at a low presentation speed. Experiment 2 examined the characteristics of textual prosody using an articulatory suppression paradigm. The results showed that some textual prosody was stored in the articulatory loop despite it being presented visually. In Experiment 3, we examined the effect of textual prosody with readers with hearing loss. The results demonstrated that readers with hearing loss had positive impressions at relatively low presentation speeds when the recorded prosody was presented. The results of this study indicate that the temporal structure is processed regardless of whether the input is visual or auditory. Moreover, these results suggest that textual prosody can enrich reading not only in people with normal hearing but also in those with hearing loss, regardless of acoustic experiences.

Introduction

Dynamic text presentation is used in everyday life, such as in electronic advertisements and TV tickers. Text scrolling within a fixed speed and direction is often used to show a larger amount of information in a limited space. In addition, text flashing is sometimes used to catch the audience’s attention. The presentation rate of dynamic text is an important factor for readers, as they cannot control the speed in most conventional dynamic text presentation. When the presentation speed is too high, the reading performance becomes poor (Juola et al., 1982; Potter, 1984; Miyake et al., 1994; Wang and Kan, 2004; Bélanger et al., 2012). On the other hand, very low speeds also result in poor reading performance (Legge et al., 1989), as the rhythm in reading is lost because readers cannot extract information beyond individual words (Gibson and Levin, 1975, p. 539). Uetsuki et al. (2017) demonstrated that there is an optimum speed of dynamic text with a rate similar to the oral reading rate of news readers. Chujyo et al. (1993) and Morita et al. (2007) also found that the larger the number of characters displayed, the faster and more comfortable is the speed. This means that when the size of the presentation window is large, the information obtained from the peripheral vision is also utilized to read scrolling texts while chunking appropriately.

Though dynamic text is useful, conventional dynamic text cannot convey complex paralinguistic information, such as emotions, intonations, speaker’s speed, and duration that can be conveyed by spoken language. While reading silently, we often have a subjective experience of inner speech, which resembles overt speech (Filik and Barber, 2011). Hirose (2003), Ashby and Clifton (2005), Ashby (2006), and Hirotani et al. (2006) reported prosodic processing while reading texts silently. For example, Hirotani et al. (2006) demonstrated that readers pause at punctuation marks during silent reading, suggesting that intonation boundaries and punctuation are associated. Instead, explicit prosody, which is defined as an intrinsic feature of spoken language concerned with phonetic features including intonation, rhythm, pauses, and speed of a speech utterance (Cutler et al., 1997), is one of the most powerful ways to convey paralinguistic information in spoken language. It contains useful information for communicating emotional states and the intentions of speakers beyond linguistic representations. While we can use complex temporal structures (timing information) to add paralinguistic information to spoken language, the temporal structure used in dynamic text presentation is limited to relatively simple ways such as kinetic typography and animated texts. Even with those simple temporal structures, some researchers have noted the possibility that those structures can convey emotionality (e.g., Wong, 1996; Malik et al., 2009). Concerning visual temporal structures, Potter (1984, p. 91) pointed out the possibility that when letters appear sequentially and the speed and timing of their appearance are changed, the temporal information conveyed by the dynamic text presentation might play a similar role as the prosody in the spoken language. If dynamic text presentation contains complex and appropriate temporal structures, the added paralinguistic information may enhance our reading. In other words, reading might become smoother or impressions of reading might be enriched by the temporal structure of visually presented texts.

This paper addresses three questions. First, we focus on whether impressions of reading are affected, as they are with prosody, by adding visual timing information (i.e., varying the speed and timings of pauses) to the written language. We call this “textual prosody.” The temporal structures in conventional dynamic text presentation (e.g., scrolling with a constant speed) do not convey timing information. For our purpose, we adopted a special dynamic text presentation format to enable textual prosody. In this format, the letters are statically displayed, but the contrast of letters changes dynamically (Maruya et al., 2012, 2013; Uetsuki et al., 2017). For example, when the characters are displayed, their contrast increases from zero over 2 s, stays at the maximum contrast level for a second, and decreases to zero over 2 s. In other words, the letters appeared gradually, remain at high contrast for a while, and disappear gradually. In addition, we recorded the reading speeds of one example reader at each text location and modulated the speed of text appearance based on the recorded reading speed. The complex temporal profile based on actual human behaviors may give a sense of animacy, a feeling that something living is present and behaving with a particular intention (Heider and Simmel, 1944; Michotte, 1963; Dasser et al., 1989; Tremoulet and Feldman, 2000; Gao et al., 2010) and affects the reader’s impression of content (Maruya et al., 2013). For example, Maruya et al. (2013) demonstrated that readers have warmer and softer impressions when texts were presented at a low speed.

Our second research question was whether information conveyed by textual prosody is stored visually or auditorily. Although texts are presented visually, they may convey temporal information as prosody. Normally, auditory prosody information is initially processed through the listener’s ears and stored auditorily. On the other hand, textual prosody is initially processed through the reader’s eyes. It is not clear whether the information conveyed by textual prosody is stored visually or auditorily. To determine this, we examined the characteristics of textual prosody using an articulatory suppression paradigm. Articulatory suppression is a research tool that is often used to explore phonological processing in reading (Morita and Saito, 2007; Leinenger, 2014). In the working memory framework of Baddeley and Hitch (Baddeley and Hitch, 1974; Baddeley, 1986; Norris et al., 2018), the articulatory loop performs subvocal rehearsal and record written input into a phonological form that can be retained in the phonological store. Articulatory suppression prevents the articulatory loop selectively (Baddeley, 1986). Auditory material has an obligatory access to the phonological store, whereas only a part of visually presented information will enter the phonological store (Hanley and Bakopoulou, 2003). Filik and Barber (2011) demonstrated that articulatory phonology activated during sentence reading contains readers’ accents. If suppression interferes with a reading task, phonological coding is assumed to be necessary or at least a part of the task under investigation (Leinenger, 2014).

Finally, we asked whether the effect of textual prosody depends on acoustic experiences in daily life. People with hearing loss exhibit problems in learning to read as a result of the difficulties they face in developing spoken language (Gallego et al., 2016). There seems to be a sensitive period in early postnatal life, during which the brain is highly efficient in establishing connections between the auditory input of speech and the development of linguistic skills (Kuhl et al., 2005; Markman et al., 2011; Gallego et al., 2016). The higher the sound deprivation in the initial years of life, the greater is the negative impact on the maturation of auditory pathways and reading comprehension (Connor and Zwolan, 2004; Sainz and de la Torre, 2005; Gallego et al., 2016). In regard to the language processing of readers with hearing loss, however, Hanson and Fowler (1987) demonstrated that when the participants were asked to judge whether the simultaneously presented two letter strings were English words or not (a lexical decision task), both the normal hearing and hearing loss readers could respond faster to rhyming word pairs (e.g., MARK-DARK, LOAD-TOAD, DONE-NONE, and SAVE-WAVE) than to non-rhyming word pairs (e.g., MARK-TOAD, LOAD-DARK, BONE-GONE, and HAVE-CAVE). This result provided evidence that readers with hearing loss could access phonological information (see Hanson and Fowler (1987) for discussion of the effect of visual similarity). Thus, textual prosody may enrich readings of people with hearing loss.

Sign language is often used as a means of communication in people with hearing loss. Both spoken and signed languages are acquired naturally. Sign language possesses all the linguistic complexity and levels of structure of spoken language (Newman et al., 2010). Spoken and sign languages share many properties, such as phonology (Sandler and Lillo-Martin, 2006; Villameriel et al., 2019). Additionally, sign language exploits sets of regular prosodic features (Morgan et al., 2007), and the prosody of the sign is based on both the timing of the sign’s complete articulation from beginning to end and the fixed ordering of different segments within the movement (Villameriel et al., 2019). Therefore, the effect of textual prosody could be observed if one utilizes their experience of sign language and lip reading to process textual prosody.

In this study, therefore, we examined whether textual prosody can affect impressions of reading (we focus on readability, favorability, and emotionality) of participants with hearing loss in addition to those of participants with normal hearing. If textual prosody can enrich people’s reading, especially in people with hearing loss, it should be a useful tool to convey speech speeds and timing information visually.

We conducted three experiments in which we manipulated the textual prosody of visually presented language and asked readers to judge their impressions of readability, favorability, or emotionality (hereafter, impressions of reading). The textual prosody was manipulated in three ways: recorded, shuffled, and constant prosody. Recorded prosody reproduces utterance speed and duration in correspondence with letters. Shuffled prosody is inappropriate in the sense that the timings of pauses do not match the boundaries of sentences or words. Constant prosody has constant presentation speed and has no prosody. In Experiment 1, we examined the effects of textual prosody, showing visual timing information in readers with normal hearing. If textual prosody affects the impression of reading, the scores of the impressions would be higher/lower than when there is no textual prosody. In Experiment 2, we examined how textual prosody is stored with an articulatory suppression paradigm. If representation of textual prosody is stored in the articulatory loop, scores of impressions of reading under articulatory suppression condition should be worse than under the no suppression condition. In Experiment 3, we examined whether textual prosody affects the impressions of readers with hearing loss. Although people with hearing loss have experience with prosody information in sign language and lip reading, they have less auditory experience. If textual prosody affects the impressions of readers with hearing loss, the scores of the impressions would be higher in recorded prosody than in constant or shuffled prosodies.

Experiment 1

In this experiment, we presented text to readers with normal hearing at various presentation speeds and with various textual prosodies (timing information of texts). It is assumed that textual prosody may enrich the impression of reading because it offers more paralinguistic information visually. However, prosodic processing occurs even when people read texts silently (Hirose, 2003; Ashby, 2006; Hirotani et al., 2006). If readers adopt their own output of prosodic processing, textual prosody may be ignored and not be utilized. Therefore, we examined whether textual prosody affects the impressions of reading.

Materials and Methods

Participants

This experiment was conducted with two groups to confirm the reproducibility of the results. The first group comprised 29 female college students who volunteered to participate in the experiment. The participants were either in their first or second year in college, and their mean age was 18.62 years (SD: 0.81). The second group comprised 23 female college students who volunteered to participate in the experiment. The participants were either in their second or third year in college, and the mean age of participants was 19.57 years (SD: 0.65). There were no duplicate participants between the first and second groups. The sample size of each group was determined based on prior studies (Maruya et al., 2012, 2013; Uetsuki et al., 2017). There were no participants with hearing loss. This experiment was performed in compliance with the Declaration of Helsinki. The protocol was approved by the Ethics Committee of Hakodate Junior College (approval number: H21-02) or by the Ethics Committee of Aoyama Gakuin University (approval number: Ao18–5). This study was carried out in accordance with the recommendations of Provisions of Experiments, Ethics Committee of Hakodate Junior College and the recommendations of Aoyama Gakuin University Ethics Committee for human research with written informed consent from all participants.

Dynamic Text Presentation Format

We adapted a special dynamic text presentation format to present textual prosody described in the introduction section. To measure, record, and present reading positions, we utilized a computer program “Yu bi Yomu” on tablet devices (Maruya et al., 2012, 2013; Uetsuki et al., 2017). In this software, onscreen text is barely visible at the initial display and, when a user touches the panel, the contrast of the letter at finger position increases and then decreases (right illustration in Figure 1). If the user traces the sentence from the beginning, the sentences will appear and disappear sequentially. The user traces characters according to his/her reading. For example, if the user reads a word slowly, they trace the word slowly. This mode is called “Tracing mode.” In addition to this function, the software can present letters so that the timing of contrast change for each letter shifts at constant temporal intervals. The letters appear and disappear as if the contrast change moves with a constant speed (left illustration in Figure 1). This mode is called “Automatic mode.” We made stimuli video using this software and we presented texts in automatic mode and video recorded in tracing mode in our experiment.

FIGURE 1

Figure 1. Dynamic text presentation format. In automatic mode, letters appear on the white background with a constant temporal interval (left). In tracing mode, users can trace letters that are barely visible at initial display (right) and replay it on the white background (left).

Textual Prosody (Visual Timing Information)

We also created three types of textual prosodies (hereafter, “prosody type”) for each presentation speed using the software. One of these types of textual prosodies is “recorded.” This type uses the tracing mode. One of the authors traced the sentences using pauses and changing the tracing speed as if talking. The tracing movement is assumed to exhibit similar speed and timing as that of their actual or inner speech. The software “Yu bi Yomu” can record a video of how the letters appear and disappear and replay it without barely visible letters at initial presentation. Here, we expected that if the user traces sentences as if talking and replays it, the movement of contrast change might play a role similar to prosody in speech, such as speed and timing. The author read texts with the intention that the meaning of the text would be conveyed correctly. For example, the author paused according to the syllables or large units of text or at the boundaries of some clauses. He/she also read the words that might convey emotions (e.g., “thank you” and “congratulations”) or might be important slower than other words. Only a single reader’s tracing was used as recorded prosody because the more averaged the tracing of multiple readers, the more often the stimulus was displayed at a constant speed. Recorded prosody is appropriate in the sense that the timings of pauses matches the boundaries of a sentence or word. The second type of prosody is “shuffled.” This type randomly shuffles the time course of speed and pauses in the recorded type. Accordingly, pauses occur while presenting a letter, or in the middle of a syllable, a word, or a large unit of text. Shuffled prosody is irrelevant in the sense that the timings of pauses do not match the boundaries of sentences or words. The acceleration of speed is non-zero in recorded and shuffled prosody. The third type of prosody is “constant,” in which the presentation speed is constant and the acceleration of speed is zero. This was achieved in an automatic mode.

Presentation Speed

We varied the presentation speed, i.e., 3, 6, or 12 letters/second (hereafter, LPS). This was because previous studies (Price et al., 1996; Uetsuki et al., 2017) showed that the impressions of reading were the most enhanced at 6 LPS. Thus, we used 6 LPS, 3 LPS (1/2 times the speed), and 12 LPS (2 times the speeds). To create three presentation speed conditions for “recorded prosody,” the author adjusted the tracing so that the presentation speed was obtained by dividing the presentation time by the number of letters to create 3, 6, and 12 LPS conditions, respectively. The author tried to read in the same way under three conditions, except for speed (that is, he/she tried to pause or change the speed at the same positions under 3, 6, and 12 LPS conditions). Thus, strictly speaking, recorded prosody is different for the three different presentation speed conditions. We cannot deny the existence of the effects of prosodic variability. However, it is assumed that effects of prosodic variability are much smaller than effects of presentation speed.

Text Stimuli

We used four types of Japanese plain text. One was called “Thank you”; it conveys gratitude (“Thank you very much all the time.”). The second one was called “Telegram”; it is a typical telegram that celebrates success on an examination (“You kept at a long and arduous task, and you achieved success. Now you are in a good spring where many flowers bloom. Congratulations for passing the exam.”). The third, “Weather forecast,” presents sentences typical of weather reports on TV news (“Same as yesterday, the area around Japan is in a winter-style air pressure arrangement. It is cloudy in the central city of Tokyo today, and it will rain in some places.”). The forth, “Earthquake warning,” warns of an imminent earthquake and is very familiar to Japanese people (“This is an emergency earthquake flash report. Please beware of a strong shake.”). Japanese text stimuli are in Table A1.

Procedures

The software was run on a tablet computer (Apple iPad) and connected to a projector (EPSON Inc., EB-535W). The texts were presented on the projector. The participants were divided into two groups and observed the text stimuli. They sat in three rows, at distances of 2–5 m from the screen. The visual angle of a letter was about 1–2.5°. All participants reported that the stimulus texts were adequately visible.

This experiment was conducted for each text. The nine stimuli (three speeds × three prosody types) per text were presented. It was confirmed that the impression of static text did not change before and after reading the dynamic text repeatedly (Maruya et al., 2013). Thus, the influence of repetition should be small. We presented the stimuli in a randomized order within text stimuli, and the order of trials differed for each participant group. The letters were not visible at first. When a trial started, one of the nine text stimuli was presented, and participants silently read it. After reading, they rated their impressions of the text on a scale from −50 to 50 points (semantic differential method, 100 scales; Osgood et al., 1957; Snider and Osgood, 1969). We measured the impression of reading as readability (readable–unreadable), favorability (like–dislike), and emotionality (emotional–businesslike). Participants rated each impression par trial. The participants evaluated the strength of their impressions of dynamic texts with numerical values. Each condition was repeated twice, and the means of the two trials were used for analysis. This experiment included 72 trials in total (three speeds × three visual prosodies × two repetitions × four texts) and took about 1 h.

Results

This experiment was conducted with two groups. The data of the two groups were merged because both the groups displayed similar tendencies. Each condition was repeated twice, and the means were calculated as rating values. We also merged the values of the four texts because the tendencies of data in the four texts were similar. A two-way within-subject Analysis of Variance (ANOVA), with prosody type and presentation speed as factors and the rating value as the dependent value, was conducted for each judgment. The degree of freedom was corrected by Greenhouse-Geisser correction when the Mauchly’s sphericity test was found to be significant. When the interaction was significant, we tested simple main effects, using Bonferroni corrections.

The overall results are shown in Figure 2. The main effect of presentation speed and the main effect of prosody type were observed in judgments of favorability. The interactions of speed and prosody type were significant for judgments of readability and emotionality. Table 1 shows the summaries of ANOVA results. Most of the Bonferroni-corrected simple main effects revealed that the impression scores at 6 LPS were higher than those at 3 or 12 LPS, irrespective of prosody types. Participants felt the texts were more readable, favorable, and emotional at 6 LPS. In terms of textual prosody, recorded prosody was consistently more positive than shuffled prosody at 3 and 6 LPS, though not necessarily statistically significant. This suggests that recorded prosody is more readable, favorable and emotional than shuffled prosody under 6 LPS. The difference between recorded and constant prosody was not significant except at 3 LPS in Emotionality.

FIGURE 2

Figure 2. Rating values of Experiment 1 (normal hearing; all texts). Error bars show 95% confidential intervals. Data of four texts were merged. Blue stars show the effect of prosody, and red stars show the effect of presentation speed. ^∗p < 0.05.

TABLE 1

Table 1. Results of ANOVA of impression of reading in Experiment 1 (normal hearing; all texts).

Discussion

In general, the impression of reading at 6 LPS was the highest among readers with normal hearing. This finding is consistent with the optimum reading speed reported by Uetsuki et al. (2017). It is suggested that impressions tend to be the most positive when reading speed is comfortable. As for textual prosody, impressions in recorded prosody tended to be consistently positive compared to those in shuffled prosody at 3 and 6 LPS, although the differences were not necessarily statistically significant. The effect of constant prosody was not different from that of recorded prosody under most conditions. It may be suggested that prosodic processing may occur (Hirose, 2003; Ashby, 2006; Hirotani et al., 2006) when constant prosody is presented, and that the impressions of texts are affected by the reader’s own prosody.

Except for emotionality, the effect of textual prosody was not so clear at 12 LPS, that is, under fast text presentation conditions. The pauses of textual prosody were relatively shorter and the overall speed was faster (the absolute speed difference between relatively slow and fast reading was small) at 12 LPS. The differences between the three textual prosodies may become smaller at 12 LPS because the absolute amount of duration and speed change was the smallest in faster conditions. It is assumed that this is the reason why the effect of textual prosody was not clear at 12 LPS. On the other hand, the rating values in recorded prosody tended to be higher than those in shuffled prosody at 3 and 6 LPS. It is assumed that the effect of recorded prosody was relatively stronger because the absolute duration and speed change were larger at lower speeds. The differences between the textual prosody types were smaller than those observed between the three conditions in presentation speeds.

The rating values of 3, 6, and 12 LPS for each textual prosody in Figure 2 were in the form of an inverted U shape. The kurtoses of the inverted U shape of shuffled and constant prosodies were smaller than those of recorded prosody. However, the kurtosis of the inverted U shape was larger in recorded prosody, and the rating values were high at certain presentation speeds. For emotionality, the peak of the inverted U shape of recorded prosody might be shifted to a slower presentation rate.

In sum, for readers with normal hearing, some enhancement of impression of reading by textual prosody was observed. For recorded prosody, the degradation of impressions at the slow presentation speed could be somehow alleviated. When the presentation speed was appropriate at 6 LPS, presentation with recorded and constant prosody was preferred. When the temporal structure of recorded presentation was shuffled, participants’ evaluations generally were low, which means that the mere presence of acceleration is not sufficient to cause the enhancement by textual prosody and that an appropriate temporal structure is required.

Experiment 2

Experiment 1 shows that readers with normal hearing prefer text to be presented at the rate of 6 LPS and that textual prosody can affect their impressions of reading. The characteristics of textual prosody, however, are not clear. For example, it is not apparent whether textual prosody is converted to visual or to auditory representations. In this experiment, we examined whether textual prosody is stored as visual or auditory information using an articulatory suppression paradigm. Articulatory suppression prevents the articulatory loop selectively (Baddeley, 1986). If suppression impairs the impression of reading at least one condition, it is assumed that reading texts with textual prosody requires phonological coding.

Although we used three types of prosodies, only recorded prosody is useful and should be utilized because it pauses according to the syllables or large units of text or at the boundaries of some clauses. Constant and shuffled prosody do not need to be stored because constant prosody has no specific prosody and shuffled prosody is irrelevant. Thus, it is assumed that only recorded prosody should be affected by articulatory suppression and that the impressions may be impaired if textual prosody is stored in the articulatory loop. In contrast, if recorded prosody is stored visually and not stored in the articulatory loop, storing the prosody should be easy and the impressions may not be impaired even when participants are articulatory suppressed. We examined this prediction using articulatory suppression and the results revealed the characteristics of textual prosody. Though memory tests are normally used with articulatory suppression, it could not be used because we had presented the same texts repeatedly. We then measured readers’ impressions of texts with articulatory suppression.