A New Semi-automated Method for Assessing Avian Acoustic Networks Reveals that Juvenile and Adult Zebra Finches Have Separate Calling Networks

Social networks are often inferred from spatial associations, but other parameters like acoustic communication are likely to play a central role in within group interactions. However, it is currently difficult to determine which individual initiates vocalizations, or who responds to whom. To this aim, we designed a method that allows analyzing group vocal network while controlling for spatial networks, by positioning each group member in equidistant individual cages and analyzing continuous vocal interactions semi-automatically. We applied this method to two types of zebra finch groups, composed of either two adult females and two juveniles, or four young adults (juveniles from the first groups). Young often co-occur in the same social group as adults but are likely to have a different social role, which may be reflected in their vocal interactions. Therefore, we tested the hypothesis that the social structure of the group influences the parameters of the group vocal network. We found that groups including juveniles presented periods with higher level of activity than groups composed of young adults. Using two types of analyses (Markov analysis and cross-correlation), we showed that juveniles as well as adults were more likely to respond to individuals of their own age-class (i.e. to call one after another, in terms of turn-taking, and within a short time-window, in terms of time delay). When juveniles turned into adulthood, they showed adult characteristics of vocal patterns. Together our results suggest that vocal behavior changes during ontogeny, and individuals are more strongly connected with individuals of the same age-class within acoustic networks.


INTRODUCTION
Social interactions with adults during ontogeny are likely to shape the social developmental trajectories of juvenile individuals. Indeed, some behaviors like courtship, mate choice preferences or foraging skills are partly shaped by social conditions during ontogeny (Freeberg, 1996;Farine et al., 2015) or at adulthood (Freeberg, 2000;Verzijden et al., 2012;Westerman et al., 2012). It has been shown that complex social environments, providing more opportunities for learning, allow individuals to improve their courtship performance or mate choice (during ontogeny, Miller et al., 2008;at adulthood, Oh and Badyaev, 2010;Jordan and Brooks, 2012). For example in brown-headed cowbirds (Molothrus ater), young males housed with adult females improvise more song elements than males housed with juvenile females (Miller et al., 2008). Adult females seem to be more selective in their interactions with males than juvenile females, and this study suggests the role of social interactions with adults in young male vocal development (Miller et al., 2008).
Social interactions between peers also take place during ontogeny and may shape the social behavior at adulthood (Bertin et al., 2007;Mariette et al., 2013). For example in zebra finches, the presence of male siblings interferes with the learning of the father's song (Tchernichovski and Nottebohm, 1998). The presence of a female sibling seems to have a positive effect (Adret, 2004). Moreover, it has also been shown that a horizontal transmission of the father's song can occur between two young zebra finch males (Derégnaucourt and Gahr, 2013).
Therefore, studying how juveniles fit into social networks may be central to our understanding of individual developmental trajectories.
Most of the time, social interactions and networks are inferred from proximal measures such as spatial co-occurence or close-contact interaction (Aplin et al., 2013;Farine, 2015;Strandburg-Peshkin et al., 2015). However, it is likely that in groups where members are in close proximity, not all members interact equally with each other, making the social network analysis ineffective in that case. Moreover, in many species, acoustic communication is likely to play a central role in social interactions. However, since acoustic signals can be directed both to individuals at short or long distances, spatial proximity may not necessarily correlate with vocal interactions. Therefore, directly characterizing networks of acoustic communication may be extremely useful for understanding social interactions.
Vocal communication has long been studied in the context of pairwise exchange between one sender and one receiver, but communication networks have progressively received more attention (McGregor, 2005). For example, audience effects are defined as the influence of the presence of other conspecifics on a sender's vocal behavior (Evans and Marler, 1994;Vignal et al., 2004). Eavesdropping is defined as extracting information from signaling interactions while not being the main recipient and seems to occur in many species (McGregor and Dabelsteen, 1996). In birds for example, "eavesdroppers" can respond to vocal exchanges even if they were not part of it initially (Mennill et al., 2002). Multiple individuals may also be involved on both sides of the communication process, such as when a group acts collectively as senders, directing acoustic signals to a group of receivers (Harrington and Mech, 1979;Farabaugh, 1982;Mitani, 1984;McComb et al., 1994).
Vocal communication often relies on temporal and structural regularities in the emission of vocalizations, such as turn-taking (Takahashi et al., 2013;Henry et al., 2015). For example, in humans, turn-taking allows interlocutors to enhance mutual attention and responsiveness (France et al., 2001). Some studies showed that the ability to respect conversation rules, in particular turn-taking may be acquired during development (Hauser, 1992;Miura, 1993;Black and Logan, 1995;Lemasson et al., 2010Lemasson et al., , 2011Chow et al., 2015;Takahashi et al., 2016).
The zebra finch (Taeniopygia guttata) is a perfectly suited model for studying social interactions during ontogeny using an acoustic communication network. The zebra finch is a socially monogamous and highly social passerine native to the semiarid zone of Australia, that forages and moves in groups (Zann, 1996). After nutritional independence, juveniles mostly associate with individuals of the same age, with whom some may form affiliative bonds (Zann, 1996). Social experience with peers has developmental consequences, as it affects mating success at adulthood (Mariette et al., 2013). Zebra finches rely heavily on acoustic communication for social interaction (Vignal et al., 2004;Elie et al., 2010;Boucaud et al., 2015;Gill et al., 2015) and start to do so early in life. Indeed, nestlings beg for food and the structure of these begging calls is plastic in response to social interactions with parents . After fledging, juveniles discriminate the calls of their parents (Jacot et al., 2010;Mulard et al., 2010) and their nest-mates (Ligout et al., 2015) from the calls of other individuals. Young males learn their song by imitation of an adult tutor (Slater et al., 1988). When adult, both males and females utter a repertoire of single-syllable calls while only males sing very stereotyped songs of several syllables (Zann, 1996). Among the calls categories, distance calls are the loudest calls, and convey information on both the sex and the identity of the bird (Vignal et al., 2004(Vignal et al., , 2008Forstmeier et al., 2009;Vignal and Mathevon, 2011;Elie and Theunissen, 2016).
The main objective of the present study was to describe zebra finch vocal interactions within an "acoustic network" during ontogeny by comparing the dynamics of vocal interactions of (1) individuals when they were juveniles among adults and (2) the same individuals once they become young adults.
To this aim, we designed a set-up that allows recording of vocal interactions but controls the spatial network. Birds were in individual cages so that they were not able to physically interact and inter-individual distances were fixed. We developed an inhouse software that automatically detects vocalizations from hours of passive recording, tags individuals' vocalizations as well as automatically removes non-vocalizations (wings or cage noise) using classification. The resulting vocal signal was analyzed using metrics of vocal activity (number of vocalizations, vocalization rate), vocal timing (cross-correlation), and vocal sequence or turn-taking (Markov analysis).

Subjects and Housing Conditions
Fifty-six juveniles (28 males and 28 females) aged from 36 to 84 days (mean ± sd: 50.2 ± 10.6, N = 56 birds), as well as eight adult females were recorded in the first phase. In the second phase, we recorded the juveniles from phase 1 when they were young adults (48 young adults, including 23 females and 25 males aged from 158 to 230 days). Both phases took place from May 2011 to February 2012. All birds came from our breeding colony (ENES laboratory, University of Saint-Etienne).
The juveniles were born in a large indoor aviary (6.5 × 5.5 × 3.5 m; temperature: 20-30 • C, daylight: 07:30-20:30) where 28 adult domestic zebra finch pairs were allowed to breed freely and produced 45 broods in total (from April to August 2011). Genetic parents of the broods were not known (because of potential extra-pair copulation and egg dumping), but social parents were known because all juveniles were identified with an individually numbered band before fledging from the nest. After reaching nutritional independence (30-35 days), juveniles were caught in the aviary and transferred to individual cages (40 × 40 × 25) equipped with perches. The eight adult females were also housed in individual cages. In the first phase, adult females were familiar with each other and not with the juveniles, and juveniles were familiar with each other and not with the adult females (juveniles could come from the same nest or not). In the second phase, familiar and unfamiliar young adults (i.e., hold in the same or different rooms between the first and second phases) were present in each group. All birds were kept under the same environmental conditions: temperature between 24 and 26 • C, daylight: 07:30-20:30, water, seeds and cuttlefish bones ad libitum and supplemented with salad once a week.

Protocol
Recordings took place in a sound-attenuating chamber (2.22 m height × 1.76 m width × 2.28 m length, Silence Box model B, Tip Top Wood, France) fitting four cages (40 × 40 × 25 cm) with one microphone per cage (Figure 1). Cages were separated by 1 m. Microphones (Sennheiser MD42) were connected to a recorder (zoom R16) and suspended from the ceiling 20 cm above the top of the cage. A group of four birds was recorded on two morning sessions, separated by 1 day. On the day between the two sessions, we moved the cages to a second sound-attenuating chamber mimicking the recording chamber. On days of recordings, we moved cages to the recording chamber 15 min before starting the recording. All groups were placed in a sound-attenuating chamber the day before each day of recording so that they could habituate to new surroundings. This protocol allowed studying two groups of four birds in parallel. Each time we moved the cages into a room, we randomly changed the relative positions of the cages so as to control for the potential effect of neighbors' identity and position in the chamber. On each recording day, we recorded vocal exchanges during 3 h starting at 10:30 ± 01:24 (mean ± sd, N = 77, recording start time was random according to groups and conditions).

Groups' Composition
We recorded birds during two phases. During the first phase, we recorded groups of four birds made with two adult females and two juveniles of either sex ( Figure 1A). During the second phase we recorded groups of four young adults (2 females and 2 males), using the juvenile birds from the first stage ( Figure 1B). The time between the two recording phases was on average 148 ± 28 days (mean ± sd, N = 36) for a given bird.

Vocalization Extractions
Vocalizations from 250 h of recording were automatically extracted using in-house software. These programs were written in python (http://www.python.org) by authors H.A.S. and M.S.A.F using open-source libraries. This software accuracy was validated and used in previous studies (Elie et al., 2011;Perez et al., 2015). Vocalization detection was a pipeline of three stages.
The first process was a simple threshold-based sound detection based on a high-pass filtered energy envelope (1024 samples FFT; 441 Hz sampling; cut-off frequency: 500 Hz). During the second stage, each sound whose peak was extracted was reconstructed by exploring the two sides of the sound and keeping area with energy higher than 10% of the peak. Thus, each event was either lengthened or shortened to obtain the same amplitude range during the event. This allowed a good estimate of the vocalization duration. The third stage simply merged overlapping waveform segments. Together, the three stages produced start, end, and duration values for each sound event detected in the recording.
Two additional stages enabled to assign each vocalization to its emitter and also remove cage or wing noises. The first additional stage attributed each vocalization to a bird by removing double vocalization, i.e., vocalization emitted by one bird and recorded by its microphone but also recorded by the microphones of all other birds of the group by using energy and delay differences. This allowed us to precisely determine who vocalized at any moment, even in the case of two birds producing two overlapping vocalizations. The second stage removed cage or wings noises using a machine learning process. We trained a supervised classifier using a data set composed of 4500 random extracted sounds from all of our data. Each sound was classified by one expert (MSAF) as "vocalization" or "non-vocalization." The classification was performed on the spectrogram of the sounds reduced to 50 ms. The idea is to reduce the quantity of information in term of time and frequency, and sample this information in such a way that we will get the same amount of information for each vocalization (short or long). The spectrogram matrix was first reduced to the frequencies of interest-between 500 Hz and 6 kHz. Then two cases appear: if the vocalization duration is longer than 50 ms, we extract 50 ms in the center of the spectrogram, and if the vocalization duration is lower than 50 ms, we keep all the spectrogram and we center it in a 50 ms spectrogram padding the remaining with zeros. The resulting matrix is seen as a vector which contains the flattened spectrogram.
We trained a Random Forest classifier (Breiman, 2001) with 1500 sounds. This classifier had an overall rate of error below 10% of the remaining 3000 sounds.
This procedure allowed us to extract two types of calls from the zebra finch repertoire: tet calls i.e., soft and short harmonic stacks with almost no frequency modulation (Zann, 1975(Zann, , 1996Elie and Theunissen, 2016), and distance calls i.e., complex sound consisting of a harmonic series modulated in frequency as well as amplitude (Zann, 1996;Elie and Theunissen, 2016). Males can also perform songs, which are stereotyped series of syllables in a short period of time.
Finally, because we were primarily interested in the temporal dynamic of the exchange, we did not distinguish between different types of vocalizations in the following analyses. Two types of groups were tested: (A) groups composed of two juveniles and two adult females (Phase 1), and (B) groups of four young adults (already tested as juveniles in the first type of group) (Phase 2). The recording duration as well as the average time between the two phases for a given bird is indicated (mean ± sd, N = 36).

Data Analysis
We separated the analysis in three parts described below: vocal activity, as well as cross-correlations and Markov analyses used to build acoustic networks.

Vocal Activity
We computed two types of vocal activity metrics. The first type described the group general vocal activity. First we measured the overall vocalization rate, i.e. the total number of vocalizations produced by all individuals in the group divided by the duration of the recording. Then, we measured some characteristics of the vocal bursts. In order to find vocal bursts in a recording, we computed the mean vocalization rate over the whole day, and we extracted the bursts as periods in which the vocalization rate was 10% higher than the mean vocalization rate (with a time step of 1 min with an overlap of 30 s). We then measured the number of bursts, the average vocalization rate in bursts, the burst mean duration, the total duration of bursts in a recording, the interburst interval, and the latency to burst (i.e., the time between the recording' start and the beginning of the first burst).
Secondly, we measured the number of vocalizations per individual. We did not need to normalize this number of vocalizations by the recording duration because all recordings lasted the same time (3 h).

Cross-Correlation
We first characterized the groups' acoustic networks, based on the temporal proximity of vocal activity (functionally equivalent to spatial proximity in co-occurrence networks). In the network, each node is a bird, and the (undirected) edge between two nodes is weighted by the temporal synchrony between the two corresponding birds.
We assessed the vocal temporal synchrony between two birds by computing the cross-correlation using 500 ms time bins. To do that we split the time into 500 ms bins, and each bird signal was one if the bird vocalized within this period, and zero if it did not vocalize. We computed the cross-correlation (cc) between two birds signals with the following formula: − mean(S bird2 ))]/(std((S bird1 )) * std(S bird2 )) Where Sbird1 and Sbird2 are the vocal signals of the two birds as a function of t (time).
The cross-correlation is computed with normalization, i.e., by centering and scaling by the standard deviation (zscoring) of both vocal signals. The result is therefore independent of the total number of vocalizations.
If the cross-correlation shows high positive values, it means that both birds vocalize and remain silent together more often. If the cross-correlation is negative, it means that whenever one bird is vocalizing or silent the other is more often silent or vocalizing respectively.
For each day of recording we computed cross-correlations for all possible dyads of birds.

Markov Analysis
We then studied the groups' acoustic networks by analyzing the turn-taking.
To establish turn-taking, we only considered the order in which vocalizations were emitted, without consideration of the time between these vocalizations. For that we used Markov chains.
Vocal sequences (taken over the 3 h of recording) were simply transformed into a sequence of caller's identity numbers (e.g. 1,123,113,134). Modeling this as a "four states" process (corresponding to four birds), this vocal sequence can be viewed as a stochastic process that "jumps" from state to state (from one bird to one other). In the Markov hypothesis the caller's identity depends only on the previous caller according to a transition probability (for example the probability of having bird 1 after bird 2). More precisely, a Markov matrix of size 4 × 4 depicts the probability of jumping from one identity to the other: in this matrix, an entry at line i and column j is the probability when the caller is i that the next caller will be j. By construction, this matrix reproduces both the average number of vocalizations for each individual and the first order transition.
We compared the maximum transition probabilities between dyads of birds (e.g., between bird i and bird j, the max transition probability is max(proba(i,j); proba(j,i)), with proba(i,j) the probability for j to vocalize just after i). As for the previous analysis, in the network each node is a bird, and the (undirected) edge between two nodes is weighted by the maximum transition probability between the two corresponding birds.

Stastistics
All statistical tests were performed using R software (R Core Team, 2014). Linear mixed models were built with the lmer function (lme4 R package), and generalized mixed models were built with the glmer function (lme4 R package) (Bates et al., 2014). Models outputs from "Anova" (car library) (Fox and Weisberg, 2011) and "summary" functions are presented.

Model Validation
Before being interpreted each model was checked, paying particular attention to its residuals. For generalized linear models with a Poisson family, overdispersion was tested with the "overdisp.glmer" function of the "RVAideMemoire" package (Hervé, 2014), and if the model presented overdispersion we used a negative binomial family. The model validity was also checked with the plotresid function from the "RVAideMemoire" package before interpreting the model results.

Model Selection
We chose to build biologically relevant models and we kept the full model as recommended by Forstmeier and Schielzeth (2011).

Model Estimates and Confidence Intervals
When possible we added information about the quantification of the biological effect given by the models. Confidence intervals were computed with the "confint.merMod" function of the lme4 package. We used the "profile" method for the linear mixed models and the "Wald" method for the negative binomial models.

Model Random Factors
We only kept random factors that had a non-null variance in the model. If we were interested in the significance of the random factors included in the model, we used the following method. We first looked at the values of their residuals in the model summary ("summary" function in lme4 package). We then built two different models: one model including the random factor, and one model without the random factor. We compared these models using the "Anova" function, and if these models were not significantly different we assumed that the random factor effect was not significant. All random factors with non-null variance were kept in the models even if they had no significant effect.

Vocal Activity Group general vocal activity
First, for the group general vocal activity we built a Principal Component Analysis (PCA) over six parameters: the number of bursts, the average vocalization rate in bursts, the burst mean duration, the total duration of bursts, the inter-burst interval, and the latency to burst. We found two axes with eigenvalue above 1 that explained 88.5% of the data variability. The first axis describes the general pattern of how bursts were distributed in time (61.7%), and the second axis the density of vocalizations during the recording both within burst and overall (26.8%) (Figure 2).
We built one linear mixed model per PCA axis (PCi) with the following structure: PCi∼GroupType+(1|GroupID)+(1|Day)+(1|StartTime), GroupType having two levels: 2Juv2Ad and 4YAd. The random factors were the group identity (GroupID), the day of recording (Day), and the hour of the recording start (StartTime).
The group type 4YAd had always the same sex ratio (2 females and 2 males). As a second step we restricted the analysis to the first group type 2Juv2Ad alone to study the potential influence of group sex ratio [possible sex ratio for juveniles: 2 males (2M), 2 females (2F) or 1 male and 1 female (1F1M)].

Number of vocalizations per individual
We built the following generalized mixed linear model (negative binomial family): The response variable was the number of vocalizations. The factor Sex had two levels, M or F. We used a negative binomial model because the model using a Poisson distribution presented overdispersion. The model indicated an interaction between GroupType and Sex at the significance threshold so we studied it using the lsmeans R function.
We built a second model to study the influence of being a juvenile or an adult for GroupType = 2Juv2Ad.
The factor JuvAd had two levels: Juv or Ad.
For groups including juveniles, as several factors were linked, we had to build additional models to deal with confounding effects. We built a model using juvenile data only to test the influence of the sex on the number of vocalizations. As the factor SexRatio was strongly linked to the factor Sex we did not include it in this model:  Table 1. Boxes are median, first and third quartiles (Q1 and Q3 respectively). The upper whisker is located at the *smaller* of the maximum × value and Q3 + 1.5 Inter Quartile Range (IQR), whereas the lower whisker is located at the *larger* of the smallest × value and Q1 − 1.5 IQR. Individual points more extreme in value than Q3 + 1.5 IQR are plotted separately at the high end, and those below Q1 − 1.5 IQR are plotted separately on the low end. (B) Variable loadings of the PCA including six parameters on bursts. The first two axes (with eigen-value above 1) explained 88.5% of the data variability. *p < 0.05.
We then built a model using the females' data only to test the difference between adult and juvenile females (as the males were juveniles only).

Cross Correlation
First we built a model in order to compare the cross-correlation between group types (2Juv2Ad and 4YAd): The distance between two birds could be 1 or 2 (1: birds were on the same edge of the square, 2: birds were placed on the diagonal). The factor Sex1Sex2 had three levels: FF, MM, or FM and represented the sexes of both birds from which we computed the cross-correlation.
As the interaction between the group type and the sex was significant we first separated the dataset by group type and analyzed them separately: GroupType=2Juv2Ad: the factor Sex1Sex2 was strongly linked to the factors JuvAd (three levels: JuvJuv, AdAd, JuvAd) which indicated if the dyads of birds comprised only juveniles, only adults or one juvenile and one adult and SexRatio (as the SexRatio could differ between groups), therefore we first built the following model including factors SexRatio and JuvAd: cc∼JuvAd+Dist+ SexRatio+ JuvAd:Dist+ JuvAd:SexRatio +(1|GroupID)+(1| Day)+(1|Bird1ID)+(1|Bird2ID)+(1|StartTime) We then separated the dataset by sexes to assess the difference between the cross-correlations of two juveniles and two young adults. As we had only one data point per bird in this case, the only remaining random factor is Day. For each value of Sex1Sex2 (MM, MF, FF) we built the following model:

Markov Analysis
We first built a model to compare the maximum transition probabilities between group types (2Juv2Ad and 4YAd): As the interaction between GroupType and Sex1Sex2 was significant we analyzed the group types separately, as we did for Model statistical results are shown. Linear mixed effect models ("lmer" function from "lme4" R package) were built. Number of observations in the dataset for each fixed effect is given. We present the results from the R "summary" function.
the cross-correlation.

Vocal Activity
Group General Vocal Activity We found an effect of the group type on the second composite score of the PCA, which mainly depicted the vocalization rate in bursts and the total length of bursts. Groups including juveniles and adults presented lower scores in PC2 than groups including only young adults, which means that vocalization rate in bursts and total duration of bursting was higher in the former than in the latter (Figure 2, Table 1). We found no effect of the group type or sex ratio on the first composite score of the PCA (number of bursts, inter-burst interval, mean length of bursts) ( Table 1).

Number of Vocalizations per Individual
We found differences between group types depending on the sex (Figure 3). The juvenile males emitted more vocalizations than all other birds (adults, young adults, and juvenile females). Adults emitted less vocalizations than juveniles. This difference was more pronounced for juvenile males than juvenile females (Figure 3, Table 2). Vocalization rate in juveniles was 1.34 times [1.03;1.71] higher than in adults (numbers in brackets are 95% confidence interval of the effect estimated by the model). Among juveniles, the vocalization rate was 1.39 times [1.18;1.63] higher in males than in females. Male songs may increase the number of vocalizations. To account for the song occurrence, we counted the total number of detected song syllables (from all males) over 10 min (randomly chosen from 1 day) for each group (i.e., we counted songs over 3.5 h of recording in total), which we compared to the total number of detected vocalizations of these males. For juveniles we found that song syllables represented only 2.3 ± 7% of the total detected vocalizations in males. Individual changes in vocalization rate along ontogeny are shown in Supplementary Figure 1 Table 2. Boxes are median, first and third quartiles (Q1 and Q3 respectively). The upper whisker is located at the *smaller* of the maximum × value and Q3 + 1.5 Inter Quartile Range (IQR), whereas the lower whisker is located at the *larger* of the smallest × value and Q1 − 1.5 IQR. Individual points more extreme in value than Q3 + 1.5 IQR are plotted separately at the high end, and those below Q1 − 1.5 IQR are plotted separately on the low end. ***p < 0.001, **p < 0.1, p < 0.1.

Cross Correlation
Young adult groups presented significantly higher crosscorrelation values than groups of juveniles and adults. We found that cross-correlation values (i.e., temporal synchrony of vocalizations) between one juvenile and one adult (Juv-Ad) were lower than those between two adults (Ad-Ad). Cross-correlation values between two juveniles (Juv-Juv) were intermediate ( Figure 4A, Table 3). Supplementary Figure 3 illustrates these results with four examples of groups with juveniles. We also found sex differences between groups: synchrony between 1 male and 1 female increased from juveniles to young adults, whereas it remained the same between 2 males or 2 females ( Figure 4B, Table 3). Specifically, female-male dyads increased their cross-correlation value from 0.09 [0.07;0.12] (juveniles) to 0.13 [0.10;0.16] (young adults). There was no crosscorrelation difference between the sexes within groups including juveniles and adults. Also, there was no difference in crosscorrelation between the 2 days of recording.

Markov Analysis
The maximum transition probabilities (i.e., turn-taking) did not differ between group types ( Figure 5A, Table 4).
The maximum transition probabilities were higher between two juveniles than between other dyads (AdAd, two adults or JuvAd, one adult and one juvenile). Thus, juveniles were more likely to vocalize after another juvenile's vocalization in the turntaking sequence. The average of maximum transition probability was the same between two adults or two young adults ( Figure 5B, Table 4). Also, there was no difference in transition probabilities between the 2 days of recording.

DISCUSSION
Using our in-house software we were able to automatically detect vocalizations from hours of passive recordings in groups of four zebra finches. This allowed us to assess information about the acoustic network of groups composed of adults and juveniles compared to groups of only young adults. We found that groups including juveniles presented periods with higher level of activity than groups composed of young adults only and within their groups, juveniles vocalized more than adults. Furthermore, we saw that two adults were more likely to vocalize together within a short time window (cross-correlation) than one adult and one juvenile, and that juveniles were more likely to vocalize after one another in turn-taking sequences (Markov analysis). Finally, when juveniles turned into adulthood, they showed adult characteristics of vocal patterns (number of vocalizations, crosscorrelation, turn-taking).
Groups including juveniles had a higher vocalization rate during bursts, and these bursts lasted longer. At the individual level, juveniles had a higher vocalization rate than adults or young adults. First, juveniles could be more active in general in their behavior than adults. Indeed, in several species the locomotor activity is higher in young individuals than in older individuals (Van Waas and Soffié, 1996;Ingram, 2000). By vocalizing more, juveniles get opportunities to vocally interact in a greater diversity of contexts, which may be important to develop their social skills. In cowbirds, it has been shown that a complex social environment (in which birds changed regularly of social groups) can lead to a greater social competence and also a higher mating success (White et al., 2010). Vocalizing more might also allow juveniles to practice conversation rules, and more precisely to learn to respect turn-taking rules. Indeed, some studies show that the ability to respect turns may be acquired during development (Hauser, 1992;Miura, 1993;Black and Logan, 1995;Lemasson et al., 2010Lemasson et al., , 2011Chow et al., 2015;Takahashi et al., 2016).
Juvenile males' vocalization rate was higher than juvenile females' vocalization rate. Two potential interpretations need to be addressed here. First, this result could be due to our method, which is not able to discriminate between calls and songs' syllables. However, as indicated in the results, we concluded that the contribution of songs represented an average of 2.3% of all male vocalizations. This could not account for the difference between juvenile males and females' number of vocalizations,  Model statistical results are shown. Generalized linear mixed effect models with negative binomial family ("lmer" function from "lme4" R package) were built. Number of observations in the dataset for each fixed effect is given. We present the results from the R "summary" function.
because males gave 24.8% more vocalizations than females. Second, the two adults with the juveniles were always two adult females. Juvenile males may vocalize more than juvenile females in the presence of adult females (and not adult males). A previous study analyzed the response of zebra finch juveniles (aged of 56.5 ± 2.4 days) to the playback of calls of familiar adult females (Mulard et al., 2010). However, the authors found no difference between the sexes in their response to adult female calls (number of calls and latency of response). Still, the vocal response to a playback and to real vocal interactions is probably different. Also, contrary to this previous study, our adult females were unfamiliar to the juveniles, and this could explain the differences between our results. It thus remains to be tested whether the difference of vocal activity between juvenile males and females in our results is triggered by the sex and/or the familiarity of the adults interacting with the juveniles. Cross-correlation is a measure of vocal synchrony between individuals. A high cross-correlation between two individuals (two nodes in the acoustic network) means that these individuals usually vocalize together (or remain silent together) within 500 ms. Akin to spatial connectedness, we considered that birds that vocalize regularly together are connected. In our results, the cross-correlation was lower between one juvenile and one adult than between two juveniles, which was itself lower than between two adults. In our setup, all adults were females (no adult male), so interactions between juvenile males and adults could not be vocal imitation for song learning (like with a male song tutor) but could be social reinforcement of song production by adult females. However, more generally, interactions between juveniles (females or males) and adults could be social reinforcement of vocalization use. In our results, interactions between juveniles and adults showed less synchrony than vocal interactions between juveniles, so the latter probably function as stronger reinforcements of vocalization use. In our study adult females were familiar with each other and not with the juveniles, and juveniles were familiar with each other and not with the adult females. These differences in familiarity may therefore have contributed to the lower cross-correlation between adult females and juveniles, as individuals may respond more to familiar individuals. However, cross-correlation and maximum transition values were similar between young adults in the second phase and adult females in the first phase, even though not all young adults were familiar with each other. Furthermore, we did not observe an increase in average cross-correlation or maximum transition values between the first and second recording days per phase, although all four birds were presumably becoming more    Model statistical results are shown. Linear mixed effect models ("lmer" function from "lme4" R package) were built. Number of observations in the dataset for each fixed effect is given. We present the results from the R "summary" function.  Table 3. Different letters indicate significant differences. Boxes are median, first and third quartiles (Q1 and Q3 respectively). The upper whisker is located at the *smaller* of the maximum × value and Q3 + 1.5 Inter Quartile Range (IQR), whereas the lower whisker is located at the *larger* of the smallest × value and Q1 − 1.5 IQR. Individual points more extreme in value than Q3 + 1.5 IQR are plotted separately at the high end, and those below Q1 − 1.5 IQR are plotted separately on the low end. ***p < 0.001, **p < 0.1, *p < 0.05, p < 0.1.
familiar with each other as they remain together in the same room. Overall, familiarity is therefore unlikely to fully explain our results. Instead, our results suggest that (1) individuals interact preferentially within their age group (because the crosscorrelation between one adult and one juvenile had the lowest value), and that (2) adults are more precise and regular in their vocalization timing (because they had the highest cross-correlation value). Adults may be less likely to interact with a juvenile when juveniles are less reliable in the timing or information content of their vocalizations or when the information juveniles provide is irrelevant for adults. For example, in juvenile Richardson's ground squirrels (Spermophilus richardsonii), if an individual frequently calls when no predators are nearby, its calls do not reliably predict the presence of a predator and the calls of this individual are ignored by others. Young individuals may call in response to more stimuli, many of which are not threatening to adults (Cheney and Seyfarth,  Table 4. Boxes are median, first and third quartiles (Q1 and Q3 respectively). The upper whisker is located at the *smaller* of the maximum × value and Q3 + 1.5 Inter Quartile Range (IQR), whereas the lower whisker is located at the *larger* of the smallest × value and Q1 − 1.5 IQR. Individual points more extreme in value than Q3 + 1.5 IQR are plotted separately at the high end, and those below Q1 − 1.5 IQR are plotted separately on the low end. *p < 0.05. 1990; Hanson and Cross, 1997), and it might be advantageous for adults to ignore the calls from the juveniles. In a learning context, Chimpanzees (Pan troglodytes), are highly specific in their selection of conspecifics as models for observation: in response to a novel item, they watch and learn from the nutcracking activity of individuals in the same age group or older, but not younger than themselves (Biro et al., 2003).
Our analysis of turn-taking involving Markov chains showed that the probability of having a juvenile vocalization following a juvenile vocalization was higher than any other possibility. Contrary to the cross-correlation, turn-taking does not take into account the delay between vocalizations. Therefore, a high Markov probability between juveniles means that juveniles vocalized preferentially after a juvenile vocalization (without having an adult's vocalization between them), but the delay can be of any value (so potentially above the 500 ms threshold used in the cross-correlation analysis). The respect of turntaking requires attention and control and may be less easy to achieve for juveniles. Hauser (1992) showed that juvenile Vervets monkeys (Chlorocebus pygerythrus) overlap other individuals' calls more often than adults. This study estimated that 1/38 calls were interrupted when the exchange was between adults compared to 6/20 when the interacting individuals were young. This observation suggests that the ability to respect turns may be acquired during development. In Campbell's monkeys (Cercopithecus campbelli), the young are 12 times more likely than adults to interrupt turn-taking by vocalizing twice successively. Besides, only adult Campbell's monkeys displayed different levels of interest when hearing playbacks of vocal exchanges respecting or not the turn-taking rule (Lemasson et al., 2011). In nightingales (Luscinia megarhynchos), it has been suggested that overlapping (and therefore breaking the turn-taking rule) may be perceived as a directed aggressive signal (Naguib and Kipper, 2005). In this species, alternation in exchanges suggests that turn-taking rules allow turns to be taken between two or more interlocutors, and overlapping elicits "irritation" or a rupture of the exchange.
The cross-correlation between 1 male and 1 female increased from juveniles to young adults, whereas it remained the same between 2 males or 2 females. The young adults had reached the sexual maturity (between 2 and 3 months in zebra finches). In the wild, zebra finch juveniles are fully independent at 35 days and may start forming pairs at 3 months old (Zann, 1996). The tendency to interact with individuals from the opposite sex may increase after sexual maturity. In wild Chacma Baboons (Papio Cynocephalus Ursinus), females' reproductive state affects males' tendency to call to them (Palombit et al., 1999). Males grunted more often when approaching estrus females and lactating females, and rarely when approaching pregnant females. In addition, affinitive interactions between 1 male and 1 female occurred significantly more often when males grunted than when they silently approached females.
In this study we decided to keep all vocalizations types together, because we had too many factors interacting to be able to analyze rules of vocalization type use with a sufficient sample size. Besides, among all vocalizations types that zebra finches can produce, in the conditions of our experiment (cages at short distances) only three of them were produced: tets, distance calls, and songs. However, it would be interesting to study the vocal dynamics by separating the different vocalization types, because the dynamic of vocal exchange could change according to call type, as suggested by Gill et al. (2015).
Also, preventing physical contact and free movement of the birds is a limitation. However, our approach has the advantage to control the position of the birds. In a recent study, devices mounted on the birds were used to assign vocalizations in freely moving individuals (Gill et al., 2015) but it did not give the spatial position of each bird. New technologies are needed to be able to control for these different aspects at the same time.
Taken together, our results suggest that juveniles and adults have a separate vocal network (i.e., same age class individuals form distinct connected components within the network), and juveniles integrate the properties of the adult vocal network during ontogeny. Our findings highlight the benefits of considering acoustic networks, beside spatial associations, to infer social interactions within groups.

ETHIC STATEMENT
Experiments were performed under the authorization no. 42-218-0901-38 SV 09 (ENES Laboratory, Direction Departementale des Services Veterinaires de la Loire) and were in agreement with French and European legislation regarding experiments on animals.   Model statistical results are shown. Linear mixed effect models ("lmer" function from "lme4" R package) were built. Number of observations in the dataset for each fixed effect is given. We present the results from the R "summary" function.

AUTHOR CONTRIBUTIONS
MF carried out the data extraction and analysis, statistical analyses, drafted the manuscript. HS participated in the data analysis and drafted the manuscript. CV and MM designed the study, coordinated the study, performed the recordings and drafted the manuscript; All authors gave final approval for publication.