The Language of Gángan, A Yorùbá Talking Drum

It is widely known that Yorùbá drummers communicate through their native drums. This paper investigates the grammar of gángan, which belongs to a family of Yoruba drums called dùndún. The results of this study show that Yorùbá drummers represent the phonetic realisation of lexical and grammatical tones of their language with the drum. Statistically, the speech tones and the acoustic correlate of the corresponding drum representations have a significant positive relationship. In both spoken and drum communication, vowel (V) and consonant-vowel (CV) prosodic units have different statuses. To conclude, Yorùbá drummers communicate via the gángan drum by transposing certain phonemic features and maybe phonological conditions of their language to musical forms.


INTRODUCTION
Cultures around the world communicate through musical instruments by transposing linguistic features to music melodies (e.g., Bradley, 1979  To distinguish speech surrogate systems that represent the phonemic aspects of a language from those that represent meaning without reference to phonemic inventories, Stern (1957) refers to the former as "abridged" and the latter as "lexical ideogram". This work solely focuses on an abridged system.
Ranging from a bell to an electric guitar, many musical instruments possess the capacity to be a speech surrogate (Lo-Bamijoko and Joy, 1987;Agawu, 2016). For example, Yorùbá musicians communicate with both native and "imported musical instruments" (Waterman, 2000, p. 199), but the native drums are better known and more studied owing to their ability to encode the tones of Yorùbá (Beier, 1954;Euba, 1967Euba, , 1990Villepastour, 2010). As a result of this capability, the dùndún drums of the Yorùbá people are popularly known as "talking drums", the communicative capability of which is the focus of this work.
Although the drum communication is based on encoding the phonemic features of Yorùbá with the music instrument, the speech surrogate system is mostly studied from musical and anthropological perspectives. As McPherson (2019) notes, studying musical speech surrogates from a linguistic perspective "can offer valuable insights into what phonological aspects are being represented with the drum", and the data from the system can serve as language-external "evidence for phonological representations and theory". The present study investigates how Yorùbá musicians communicate with gángan, which belongs to a family of Yorùbá drums called dùndún. Unlike previous studies of dùndún, this study utilises linguistic instrumentation and methodology.
Given that the surrogate system is based on Yorùbá, it is essential to understand the basic sound inventory of the language. Before turning to the main focus of this work, the basic sound inventory is presented in Section 2. The description of the drum which is used in this study is presented in Section 3. The data in this work is from a linguistic experiment, so the data source and the method of data collection are discussed in Section 4. The articulatory results of the experiment are presented in Section 5. The acoustic results of drumming lexical tones are presented in Section 6. The section also focuses on the strength of the relationship between Yorùbá words and their corresponding drum renditions. In addition to the lexical tones, the participants also drummed words with grammatical tones. The results of drumming the grammatical tones are presented in Section 7. In the final section Section 8, the summary, discussion and conclusion are presented 1 .

BACKGROUND ON YORÙBÁ SOUND INVENTORY
Yorùbá is a Volta-Niger language spoken in West Africa and most prominently South Western Nigeria (Blench, 2019). To understand what is encoded with a talking drum, it is imperative to understand certain aspects of the sound patterns of the source language used for the drum communication. For this reason, this section presents a description of the relevant sound patterns in Standard Yorùbá, which is the focus of this work.

Syllable in Speech
There is a consensus that a syllable in Standard Yorùbá is constructed of a vowel with an onset consonant (CV) (Awóbùlúyì, 1978;Orie, 2000;Orie and Pulleyblank, 2002;Pulleyblank, 2009). A vowel is the peak of a syllable, and the consonant is an onset 1a).
Following the proposal in Orie, 2000, an onsetless vowel (V) such as the examples in 1b) does not constitute a syllable in Standard Yoruba. Orie, 2000 presents a whole range of evidence in support of this account. For example, wordinitially, any tone can be associated with CV syllables, but onsetless vowels cannot bear a High tone in the same context 2). While there are V prefixes with a mid or low tone, the only high tone prefix has an onset 3). Orie, 2000 for more evidence.
The syllable template is represented using a moraic structure in 4). Under standard moraic theory, a vowel projects a mora, which is a unit of length (Hyman, 1985;McCarthy, 1986;Hayes, 1989). Following the account in (Orie, 2000), only a vowel with an onset consonant (CV) can project a syllable in Standard Yorùbá. In this account, a V unit constitutes a mora, not a syllable.
In addition to the forms in 1), there are VV units in the language. VV units like 5) constitute a sequence of two vowels not a long vowel or diphthong (Orie and Pulleyblank, 2002).
Under the moraic structure, VV units are bimoraic. There is very little (if any) evidence that VV sequences form single syllables (Orie, 2000;Orie and Pulleyblank, 2002). For example, VV sequences like 5a) are derived through intervocalic-consonant deletion and vocalic assimilation in the language (Akinlabi, 1993). The present work is based on Standard Yorùbá. In this work, I adopt the account in Orie (2000) that CV is the only syllable in this variety of Yorùbá. I will only refer to the moraic status of VV units in this work. Following from this, CV and V are monomoraic, but only the CV is syllabic.

Tone in Speech
Yorùbá is a tone language, which means pitch contrasts bring about lexical or grammatical distinctions in meaning (Yip, 2002;Hyman, 2018). As shown in 6), the language has three contrastive tones, namely H (igh), M (id) and L (ow) (Akinlabi, 1985; 1 This is an extension of the preliminary version, which is published in LLA (Akinbo, 2019). While the version in LLA only focuses on lexical tones and syllables, the version here focuses on syllables, lexical tones and grammatical tones, and it compares spoken forms with their corresponding drum renditions. The present paper also improves on the description of the drum. This research is part of the project funded by SSHRC Insight grant (#435-2016-0369) Pulleyblank, 1986).
The tone-bearing unit in the language is a mora (Akinlabi and Liberman, 2000;Pulleyblank, 2004). Initial vowels in Yorùbá do not bear an H tone (Akinlabi, 1985;Pulleyblank, 2004;Pulleyblank, 2009). With sequences of H-L and L-H tones in the language, a contour tone is formed on the second tone 3 . However, this does not happen in H-M, M-H, L-M or M-L sequences (Ward, 1952;Akinlabi and Liberman, 1995). For example, words like /kpákò/ H-L "chewing stick" and /ìlú / "city" are realised as (kpákô) H-HL and (ìlú) L-LH respectively. The other relevant tonal process involves raising the pitch value of an H tone in a sequence of H-L tones and lowering the pitch of an L tone in a sequence of L-H tones (Akinlabi and Liberman, 1995;Laniran and Clements, 2003). For example, the pitch of H in /bájɔ/ "Báyọ̀ " (a name) is higher than that of H in /bájɔ/ "exit through", and the pitch value of L in /ìlú/ "city" is lower than that of L in /ìlu/ "puncher".
Yorùbá also has a "subject marking H tone" Akinlabi and Liberman, 2000). To realise the H tone, an L-L noun phrase (henceforth NP) surfaces as L-LH 7a), but an H-H NP remains unchanged 7b). While an M-M NP surfaces as M-H 7c), an H-M NP surfaces with an extra H-tone mora (7d).
In genitive constructions, an M-tone extra vowel is realised on the possessum (Akinlabi and Liberman, 2001). Consider the examples below 8).
The basic description of the Yorùbá syllable and tone patterns has been detailed in this section. Throughout this paper, we will refer to the discussion in this section.
Most of the studies on Yorùbá drum communication are based on natural or semi-natural musical performance with limited or no acoustic evidence (e.g, Beier, 1954;Euba, 1967;Euba, 1990;Villepastour, 2010). Given that the communicative capability of the drums is based on encoding phonemic features of a drummer's language, it would be insightful to analyse the system like we analyse human language. By using linguistic instruments and methodology, the present work investigates how a Yorùbá drum communicates. The discussion in the present section will help us understand the linguistic features that are encoded with the talking drum. I now turn to the properties of the talking drum.

THE COMPONENTS OF DÙNDÚN
Dùndún is the generic name for a family of Yorùbá drums, and the word dùndún literally means "sweet sound" (Ruskin, 2013, p. 1). The dùndún ensemble comprises ìyáàlù, ìsáájú, kẹ ríkẹ rì, gúdúgúdú and ìkẹ́ yìn (Euba, 1990). Ìyáàlù (lit. "mother drum") is the lead drum of the ensemble (Durojaye, 2020). The drums in the ensemble can be distinguished based on their structures, relative sizes and performance techniques and musical functions in the ensemble 4 . With the exception of gúdúgúdú, all the drums in the dùndún ensemble are hourglass-shaped pressure drums 5 . By considering the bells and cloth around the lead drum, Euba (1990) identifies nine components for the hourglass-shaped drums, but all the hourglass-shaped drums have seven components in common. The seven components are a wooden resonator, two surface membranes, tension cords that change the membrane pitches, a taut rope for tuning the drum, a strap to carry drum, leather rings ẹ̀ gì for holding the membranes in place and a curved stick to play the drum. The image in Figure 1 is adapted from Arewa and Adekola (1980), but the image and labels have been modified. This study solely focuses on gángan which belongs to the family of Yorùbá drums called dùndún 6 .
The hourglass shape of the drums is from the carving of the wooden resonator into an hourglass shape with a tunnel-like hole linking the two ends. Traditionally, the ideal wood for making the drum resonator is "igi ọ̀ mọ̀ ", which is a Cordia alliodora (Linn) tree (Lawal et al., 2010;Ọ mọ́ bọ́ lá, 2019). Each opening of the drum resonator is covered with a drum membrane. The ideal material for the drum membranes is soft animal skins. So, the source of the drum membranes varies from one drum maker to another. Some drum makers prefer goatskin while others prefer the skin of a cow's foetus (BattaBox, 2017;dijiaderoGBA, 2018). My main consultant prefers the skin of a goat's foetus.
The drum membranes are stretched over the holed ends of the wooden resonator. The edges of the two drum membranes are connected and tightly held together with strings, which are made from a leather material. These strings form the tension cords or the wall of the drums, and the tensioning of these strings varies the tightness of the drum membrane. In order to communicate with the drum, the membranes of the drum need to be loose or slack on the resonator. However, when the drum has not been used for a while, the drum membrane shrinks and becomes tightly held to the resonator. To make the drum suitable for communication, the drum is tuned by using the taut rope to compress the tension cords for at least 4 hours. The taut rope can be made from leather or wool.
The drum is played with a curved stick. To avoid puncturing the drum, the head of the stick is covered with a flat rubber or leather material. To carry the drum, a strap is made from leather and padded cloth. Depending of the handedness of the drummer, the drum is suspended over the left or right shoulder with the strap and hangs under the opposite armpit.
Previous research suggests that Yorùbá drummers communicate by means of the drum by varying the compression of the tension cords (e.g. Beier, 1954;Euba, 1990;Villepastour, 2010). Unlike the previous studies, the present paper describes an articulatory and acoustic study of how Yorùbá drummers communicate with the talking drum by drawing insight from linguistic analysis of Yorùbá tones and syllables. The study is guided by the following specific questions: 1) How are tonal processes such low-tone lowering and high-tone raising represented by talking drum? 2) Are both lexical and grammatical tones represented with the talking drum? 3) How strong is the relationship between the spoken form of Yorùbá words and the corresponding drum rendition? To answer these questions, a linguistic experiment is conducted. The details of the experiment are presented in the following section.

METHODOLOGY
The data in this work were elicited at Diamond FM radio station, University of Ibadan, Nigeria from five male drummers who are native speakers of Yorùbá. Four of the drummers have at least 7-18 years of drumming experience, and one has 2 years of drumming experience. The data were recorded in a soundproofed room with a SHURE WH30XLR cardioid condenser (a headset microphone) and a Rode NGT2 supercardioid condenser (a shotgun microphone) at the sampling rate of 48.1 kHz in WAV format. The microphones were attached to a zoomQ8 camcorder. The headset microphone captured the speech of the drummer, and the shotgun microphone was pointed at the drum. The audio from the two microphones were saved as separate files at the same time as the video files. All the participants in the study used the same drum except for one who insisted on using his own drum.
The stimuli in this work consist of monosyllabic and bisyllabic words, and CVV units. The monosyllabic words with level tones are three words with each word bearing either H, L or M. The bisyllabic and VCV units with level tones cover nine tonal combinatorial possibilities 9). The CVV units cover three tone types: L-H rẹ̀ ẹ́ "be tired" (lit. "tired it"), H-M ríi "see it" and M-H jẹ ẹ́ "eat it".
In musical traditions with speech surrogates, [ (Euba, 1990), p. 193] identifies three forms of drumming, namely "direct speech form", "musical speech form" and "song form". In this work, the drummers only drummed the stimuli in speech mode, which "involves the direct reproduction of the pitches and rhythms of spoken language" (Agawu, 2016, p. 128). Simply put, the drummers spoke the stimuli then drummed them. This was repeated three times for each stimulus. It bears mentioning that the experiment in this work is specifically designed to investigate the drumming of tones and syllable, so issues relating to segmental identity are only discussed in relation to tones.

RESULT 1: ARTICULATION OF YORÙBÁ WORDS WITH TALKING DRUM
The articulation of Yorùbá words with gángan is discussed in this section. Note, in drumming, only one of the surfaces of a dùndún drum is used. Regardless of the speech tone which was represented by the drum, the drummers struck the drum membrane once to produce a CV word. For a CVCV or VCV word, the drum membrane was struck twice. When the bimoraic CVV units were produced in natural speech, the drummers struck the drum membrane once. In careful or deliberate speech, the drum membrane were struck twice. These drumming options for the CVV units are presented in Figure 2.
As discussed in Section 3, varying the compression of the tension cords affects the tightness of the drum membrane. To understand the representation of tones with the talking drum, the drumming of the minimal set [rá] "disappear", [ra] "rub" and [rà] "buy" are observed in isolation. To drum the H-tone word [rá] 'disappear', the tension cords were tightly compressed, then the drum membrane is struck. For the drumming of the M-tone word [ra] "rub", the tension cords of the drum were loosely compressed before the drummer struck the drum membrane. By striking the drum membrane without compressing the tension cords, the L-tone word [rà] "buy" was drummed. Considering that the pitch value of the initial L drum tone in L-L and L-M sequences is higher than the pitch value of the initial L in the L-H sequence (see Section 6.2), the drummers may have compressed the tension cords for the initial L tone in L-M and L-L sequences but not for the initial L in the L-H sequence. In this case, the drummers rendered the L-tone lowering in the L-H sequence by producing the lowest tone possible. However, the compressions in L-M and L-L sequences were not visually perceptible in the video of the drumming experiment.
To drum the H-M sequence of a CVV unit in a normal speech, the tension cords of the drum were tightly compressed then the drum membrane was struck for the initial H. As the membrane was vibrating from the strike, the tension cords were then slightly loosened for the M tone. For a word with an M-H sequence of a CVV unit in a normal speech, the tension cords of the drum were slightly compressed then the drum membrane was struck. As the drum resonates from the initial strike, the drummer tightly compressed the drum. The drummers articulated words with an L-H of a CVV unit by striking the drum membrane without compressing the tension cords. As the drum vibrates from the strike, they tightly compressed the drum. In Table 1, the summary of the drumming is presented.
As shown in the table, the drummers produced H with a tight compression of the tension cords, M tone with a light compression and a L tone with no or minimal compression. In this case, the tones in the Yorùbá words are distinctly encoded by varying the compression of the tension cords. The summary also shows that the drummers obligatorily struck the drum membrane for CV and word-initial V. However, the drum membrane is optionally struck for word-medial V. The asymmetry between the drumming of CV and V can be accounted if we consider the status of CV and V in Standard Yorùbá speech. Based on the proposal in Orie (2000), V and the syllabic CV are monomoraic, but only CV is syllabic in Standard Yorùbá. The proposal is based on the asymmetry in the distribution of V and CV (Section 2.1). In line with the account of Standard Yorùbá syllable in Orie (2000), this shows that V and CV have different status in both drumming and speech.

RESULT 2: ACOUSTIC RESULTS OF DRUMMING WORDS WITH LEXICAL TONES
This section presents the acoustic results of drumming the words with only lexical tones. In order to determine  the strength of the relationship between the speech tones and the corresponding drum representation, this section also compares the acoustic results of speech tones to those of their musical correspondents. Throughout this work, the drum representation of speech tones are called drum tones.

Acoustic Correlates of Tones in Talking Drum
The acoustic correlates of the drum strokes and tensioning are presented in this section. This is based on the words that were produced in Section 5. The acoustic results in this section are based on the data from three drummers who used the same drum.
The results of this study show that the acoustic cues for drum tones are the pitch contours. As a result of this, the discussion mainly focuses on pitch tracks. The musical representations of the contrastive speech tones in Yorùbá, namely H, L and M, have distinctive pitch tracks. As shown in Figure 3, the pitch of H tone is higher than those of M and L tones, and the pitch of M tone is higher than that of L tone.
The drum tones were manually annotated in Praat (Boersma, 2001). Using a script written by Riebold (2013), F 0 values of the pitch tracks were extracted at 50% point for the three drum tones. The mean F 0 values, which are presented in Table 2, show that the drum tones have distinctive F 0 values.
The correlation between speech tones and drummed tone are also shown for the phrases in (10). The speech and musical forms of the utterances were produced at least four times by the participants.
The mean F 0 values of the drummed tones are plotted against those of the speech tone. In Figure 4, the y-axis indicates the acoustic measurement of pitch contour in  F 0 (Hz), and the x-axis indicates the sequences of tone in the Yorùbá phrase in 10). As shown in Figure 4, the pitch contours of the speech are similar to those of their drum renditions. Relative to the pitch values of the tones in speech, the pitch values of tones in drumming are amplified. By this, I mean that the pitches of H and M tones are higher and that the pitch of an L tone is lower in drumming. The difference between the pitch curves of the speech and drum in Figure 4B might be an effect of the intrinsic F 0 of vowels (Hombert, 1977).
Using ggpubr (Kassambara, 2018), the correlation coefficient of the speech and drum tones are calculated for the utterances in 10). The results of the calculation are presented in Figure 5.
In Figure 5, the results of the correlation test show that the relationship between the speech tones and the drum tones are positively strong (R ≥ 0.98) and statistically significant (p ≤ 0.0043).
In sum, the pitch contours are the acoustic correlates of drum tones, and the pitch values clearly distinguishes the drum tones. There is a strong positive relationship between the speech tones and the drum tones, and that this relationship is statistically significant.

Acoustic Results of Tonal Co-occurrence
The present section focuses on the tonal co-occurrence and tonal processes in speech and drumming. The discussion in Section 2 shows that a sequence of H-L tone in Yorùbá is realised as H-HL. Similarly, a sequence of L-H is realised as L-LH. This pattern of tone transfer is referred to as pitch delay in the linguistic literature (Akinlabi and Liberman, 1995;Yip, 2002).
In Figure 6, the tone transfer is shown with the pitch tracks of the words pákò "chewing stick" and ìlú "city". When we compare the pitch trajectories of these words to those of their drum rendition in Figure 7, we observe that the pitch trajectories of the drum rendition are similar to those of the corresponding speech.
As shown in Figures 6, 7, the sequence of H-L tones, a LH contour is formed on the second tone in both speech and drum. Similarly, in the sequence of L-H tones, a HL contour is formed on the second tone in both speech and drum. This contour formation does not occur on the second tone in H-M, M-H, M-L or L-M. Quantitative data on the pitch contours of the second tone in two sequences of tones are presented in Figure 8. In the graph, the y-axis indicates the acoustic measurement of pitch contour in F 0 (Hz), and the x-axis indicates the proportional duration of the tones. There are three panels in the graph, where the three tones form one panel each.
If we consider that the second tone in the H-L sequence has to be articulated on the drum with no compression of the tension cords on the L syllable (Section 5), one way of explaining the contour formation on the L drum tone in the H-L sequence is as follows. The drummers strike the drum membrane to encode the first syllable, they then compress the tension cords for the articulation of the initial tone as the drum resonates. However, the initial compression is sustained until after striking the drum membrane for the second syllable. As the drum resonates from the second strike, the tension cords are not compressed for the articulation of the second drum tone. By sustaining the initial compression (or lack of compression for the L-H sequence) until after striking the drum membrane for the second syllable, the drummers should be able to articulate the contour formation. While this is assumed to be the explanation for the contour, this hypothesis should be tested experimentally in future research. To check if there is any effect of the following drum tone on a word-initial drum tone, F 0 values of word-initial drum tones in a sequence of two tones were boxplotted 7 in Figure 9 by using ggplot2 (Wickham, 2016;Kassambara, 2018).
As shown in Figure 9, H, L and M drum tones are clearly distinctive. The F 0 value of the H drum tone is higher when it precedes an L drum tone, and that of the L tone is lower when it precedes an H drum tone. For the M drum tone, it has a higher F 0 value when it precedes an M drum tone. The H-raising and L-lowering are consistent with the patterns in speech (see the discussion in Section 2). The difference between the F 0 values of the H tone before H and L tones is statistically significant, but the difference of the H-tone values before M and L tones is not statistically significant. Similarly, the difference between the F 0 values of the H tone before H and M tones is not statistically significant. When we compare the following H to the following L, there is no significant effect on the F 0 value of the preceding L. Similarly, comparing the following M to the following L shows no significant effect on the F 0 value of the L tone. However, comparing the following H to the following M shows a significant effect on the F 0 of the L tone. The comparisons of H, L or M have no significant effects on the F 0 value of the preceding M tone.
In sum, the results in this section suggest that the contour formation on the second speech tone in the sequence of H-L and L-H is represented with the talking drum, and that the H-raising and L-lowering in speech are also represented with the drum.

RESULT 3: GRAMMATICAL TONES IN TALKING DRUM
The discussion so far has focused on the drumming of lexical tones. In this section, we focus on the drumming of the subject H tone and the extra vowel with a M tone in the genitive constructions, which are discussed in Section 2. Examples of the subject H tone and the M-tone extra vowel are presented in 11) and 12) respectively.
To investigate whether the subject H tone and the extra vowel with a M tone are encoded in drumming, sentences with NPs in subject and possessum positions were drummed four times by each participant. The NPs with the sequence of H-L and L-H tones are excluded in order to control for the contour formation on the second tone (Section 6.2). The mean F 0 trajectories of the subjects and the possessa are respectively plotted against a control group, which is the NPs in isolation. Figure 10 contains NPs with the isolation sequences of L-L and M-M tones. When we compare the occurrence of these NPs in isolation to their occurrence in a subject position, we see that the second half of the pitch trajectory for the NPs is 7 The y-axis contains the F 0 of the drum tone while the distributional characteristics of the F 0 range of each tone in an environment is represented in the box. The box contains the middle 50% of the F 0 range. The mid-line in the box marks the median. The top cell of the box contains the maximum 25% of the box and the bottom cell of the box contains the minimum 25% of the box. Each of the lines below and above the box contains 25% of the F 0 values which are outside the middle 50%. The dots represent the outliers in the data. The numbers on the bars connecting the compared groups is the p(robability) values of the observed differences between the groups. p-values that are &0.05 indicates that the differences are statistically significant.

Akinbo
A Yorùbá Talking Drum raised when they occur as the subject. The trajectory is consistent with the subject H tone found in speech. This suggests that the drummers did not only represent but exaggerate the subject H tone with the drum.
Before discussing the results of the M-tone extra vowel in Figure 11, we need to note that the bimoraic possessum and the M-tone extra vowel were produced as trimoraic (C) VCVV forms in natural speech, so the drummers only struck the drum membrane twice for this constituent. This is consistent with the account in Section 5.
I now turn to the description of the results in Figure 11. The Yorùbá forms that were used for the five patterns in Figure 11 are presented in A. With the exception of the NP with the sequence of M-M tones, the F 0 trajectories and the duration of the possessa are consistent with the presence of the M-tone extra vowel in speech. However, the curves in ML and MH sequences in isolation are not found speech contours (Akinlabi and Liberman, 1995). For the NP with the sequence of M-M tones, the F 0 trajectory is consistent with the spoken form, but the duration of the drummed form is not consistent with the trimoraic status of the possessum with the M-tone extra vowel. This is because the tension cords of the drum is only compressed once for CVV moras with the same tone. So, why are the tension cords not compressed for a longer duration like the other forms? To answer this question, we have to look at the tone of the object pronouns in Yorùbá 13).
The object pronouns in Yorùbá have an H tone underlyingly. When the pronouns follow a tone-bearing unit with an H tone, they surface with an M tone (Akinlabi, 1985;Akinlabi and Liberman, 2000). 13) is an example of this pattern. As argued in (Pulleyblank, 2004), the tonal alternation in 13) is the effect of the obligatory contour principle (OCP), which requires adjacent tones to be distinct at the melodic level of the grammar (e.g, Leben, 1973;Goldsmith, 1976;McCarthy, 1986;Archangeli and Pulleyblank, 1994). Lowering the H tone of the pronominal clitic to an M tone is the solution that is adopted for the form in (13-b).
The co-occurrence of the NP-final vowel and the M-tone extra vowel would have resulted in an OCP violation if the NP-final vowel and the extra vowel associate with different M tones (14a-b). For this form, the best way to satisfy OCP involves the adjacent morae associating with 1 M tone (14-c). As proposed in Pulleyblank (2004), this is the solution that is adopted in Yorùbá. Based on this, it is possibly the case that the drummers only encode 1 M tone, which associates with the three morae in the bisyllabic form.
To summarise, the grammatical tones are encoded with the talking drum. This is expected given that the phonetic realisation

SUMMARY, DISCUSSION AND CONCLUSION
The present study shows that the drummers obligatorily struck the drum membrane for a CV syllable or a word-initial V form. When the V form occurs word-medially, the drum was optionally struck. By varying the compression of the tension cords, the drummers represented the tones of the words. The acoustic results show that H, M and L drum tones have distinctive pitches. In sequences of H-L and L-H drum tones, a contour is formed on the second drum tone. The pitch of the H drum tone is significantly raised in a sequence of H-L drum tones. However, the pitch of the L drum tone is significantly lowered in a sequence of L-H drum tones presumably because it is produced with no tension. The pitch tracks of the grammatical tones in the drumming are consistent with those of the grammatical tones in speech. Comparing the pitch contours of the speech tones to those of the corresponding drum tones shows that there is a significantly strong positive relationship between the speech tones and the corresponding drum tones.
The results support previous findings that the lexical and grammatical tones are distinctively represented with the talking drum (Euba, 1990;Villepastour, 2010). Note that Villepastour (2010) considers the tones of segmental morphemes [e.g. the negative marker (kò)] as grammatical tones, but in the phonology literature, the term grammatical tone is confined to grammatical morphemes with tones as their phonetic exponents (Akinlabi and Liberman, 2000;Akinlabi and Liberman, 2001;Rolle, 2018). That V and CV forms have different status in drumming is in line the account in Orie (2000) that there is syllable asymmetry in Standard Yorùbá.
The present study has multiple implications. First, the musical representation of the lexical tones, the grammatical tones and the phonetic details of the tones could serve as language-external evidence for linguistic theory, such as the Theory of Adaptive Dispersion (TAD) (Liljencrants and Lindblom, 1972;Diehl and Lindblom, 2004). In the TAD, "preferred phoneme and feature inventories reflect the listener-oriented selection criterion of auditory distinctiveness", which is "achieved through maximal dispersion of phonemes in the available phonetic space" (Liljencrants and Lindblom, 1972;Diehl and Lindblom, 2004). Although Liljencrants and Lindblom, 1972) proposes the theory for segmental features, Yoshida (2011) extends it to tone inventories and argues that the motivation for H-raising and L-lowering in languages like Yorùbá and Japanese is the maximisation of contrast. That the drummers encoded H-raising and L-raising with the drum can also be the effect of contrast maximisation. Encoding these tone processes with the drum could serve as language external evidence for TAD.
Studies show that vowel and consonant types affect the pitch value of a tone (Hombert, 1977;Whalen et al., 1995;Whalen et al., 1999). If we take into consideration the claim that the tone-based speech surrogates like Yorùbá do not encode segmental identity (McPherson, 2019), the second implication of the present study is that tone-based speech surrogates create a segment-neutral ground for testing linguistic theory on tones. Third, while the link between music and language in African music is emphasised by musicologists and music teachers (e.g. Westphal, 1948;Nketia, 1963;Nketia, 1970;Ekwueme, 1974;Agawu, 1988;Agawu, 2001;Agawu, 2016), African-music education rarely incorporates linguistic courses, especially phonetics and phonology (Oehrle, 1991;Horton, 1997;Nzewi, 1999). That the drummers encode the phonetic details implies that phonetics and phonology courses could be essential in African-music education. In other words, to play an instrument like gángan successfully, a level of understanding of the source language for the instrument (and by that learning to speak the language) is important.
Researchers have studied the effect of musical knowledge or performance on second language acquisition, and the results of the studies suggest that music experience, rhythmic language and leisurely activities contribute to the success of perceiving and producing tones in second language acquisition (e.g. Orie, 2006;Gottfried, 2007;Wayland et al., 2010;Cooper and Wang, 2012). Considering that abridged surrogate systems such as the one in Yorùbá is based on the tone, it would be interesting to investigate whether training second language learners to perceive or encode speech tones with talking drums could aid the acquisition of tones.
Just as the pitch values of H and L tones in Yorùbá speech, the pitch values of the H and L drum notes vary depending on the following drum note in speech mode. This suggests that the pitch values of the gángan notes are also flexible. For example, the talking drum can produce infinite pitch possibilities between its highest and lowest pitches, but the pitch ranges can be categorised into three drum tones, namely H, M and L. In fact, Yorùbá musicians teach the talking drum with reference to the three drum tones. If we consider that drummers might not compress the tension cords at the same rate twice for a specific drum tone, the pitch value of drum notes might vary on different occasions.
It bears mentioning that "the dùndún drum does not have a language of its own; it is the drummer who speaks through the drum" (Eluyefa, 2011, p. 76). In this sense, if a dùndún drummer speaks English, the drummer can definitely present English phrases through the drum. In fact, Eluyefa (2011) reports cases of dùndún drummers representing English phrases with the drum. A similar example comes from the performance of the Yorùbá musician, King Sunny Ade, in Essen Germany. The musician instructed his drummer to drum the English phrase "hello ladies and gentlemen" with a dùndún drum 8 . Considering that English, unlike Yorùbá, is not a tone language, it would be interesting to investigate the use of a tone-based speech surrogate on non-tonal languages.
The limitation of the present study is that it is based on laboratorylike conditions in the studio. As a result of this, the result might not represent speech-surrogate data in natural musical performance. The study is also limited to words and short phrases. Future research on the language of talking drum should compare the representation of Yorùbá phrases in direct speech mode and musical speech mode.
To conclude, this study shows that Yorùbá drummers represent lexical tones, grammatical tones and the phonetic realisation of these tones with a talking drum. The relationship between the drum tones and the corresponding speech tones is positively strong. Just as in speech, V and CV units have different status in drumming.

A STIMULI FOR GENITIVE CONSTRUCTIONS
The Yorùbá forms, which were drummed for the five patterns in Figure 11, are presented in this appendix.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Behavioural REB, The University of British Columbia. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
SA collected designed the stimuli, collected the data, analysed the data and wrote the results.