Born to Cry: A Genetic Dissection of Infant Vocalization

Infant vocalizations are one of the most fundamental and innate forms of behavior throughout avian and mammalian orders. They have a critical role in motivating parental care and contribute significantly to fitness and reproductive success. Dysregulation of these vocalizations has been reported to predict risk of central nervous system pathologies such as hypoxia, meningitis, or autism spectrum disorder. Here, we have used the expanded BXD family of mice, and a diallel cross between DBA/2J and C57BL/6J parental strains, to begin the process of genetically dissecting the numerous facets of infant vocalizations. We calculate heritability, estimate the role of parent-of-origin effects, and identify novel quantitative trait loci (QTLs) that control ultrasonic vocalizations (USVs) on postnatal days 7, 8, and 9; a stage that closely matches human infants at birth. Heritability estimates for the number and frequency of calls are low, suggesting that these traits are under high selective pressure. In contrast, duration and amplitude of calls have higher heritabilities, indicating lower selection, or their importance for kin recognition. We find suggestive evidence that amplitude of infant calls is dependent on the maternal genotype, independent of shared genetic variants. Finally, we identify two loci on Chrs 2 and 14 influencing call frequency, and a third locus on Chr 8 influencing the amplitude of vocalizations. All three loci contain strong candidate genes that merit further analysis. Understanding the genetic control of infant vocalizations is not just important for understanding the evolution of parent–offspring interactions, but also in understanding the earliest innate behaviors, the development of parent–offspring relations, and the early identification of behavioral abnormalities.


INTRODUCTION
Vocal communication is important for social interactions in all mammals, and it begins in rodents on the first postnatal day with the communication between an infant and its parents (Doty, 1974;Ehret and Haack, 1982;Maggio and Whitney, 1986;Oller et al., 2013;Arriaga, 2014). Mouse pups produce high frequency ultrasonic vocalizations (USVs) when stressed by loss of body temperature (Okon, 1970a), by hunger or by separation from the mother (Noirot, 1966;Ehret, 2005). Pups also vocalize in response to intense tactile stimulation (Okon, 1970b). These USVs are critical in triggering parental attention (Noirot, 1964;Smith, 1976;Ehret and Haack, 1982;Cohen-Salmon et al., 1985;D'Amato and Populin, 1987;Hernandez-Miranda et al., 2017) and to solicit food (Branchi et al., 2001). Abnormalities in infant crying patterns or acoustic characteristics can have diagnostic value. For example, the cries of a human baby have been reported to sound different during a benign colic or to predict risk of central nervous system pathologies such as hypoxia, meningitis, or more recently autism spectrum disorder (Sheinkopf et al., 2012;Gelfand, 2016;Chittora and Patil, 2017;Esposito et al., 2017;Smarius et al., 2017). Rodent USVs are prominent soon after birth and are therefore useful as a phenotyping tool (Roubertoux et al., 1996;Scattoni et al., 2009;Roy et al., 2012;Wöhr, 2014;Riede et al., 2015).
Differences in infant vocalization are prominent among genotypes and strains of mice and their intercrosses (Bell et al., 1972;Cohen-Salmon et al., 1985;Hahn et al., 1987Hahn et al., , 1997Roubertoux et al., 1996), but no common DNA sequence variants underlying differences in this innate form of communication have yet been mapped in humans or any other vertebrate. Here, we have used the expanded BXD family of mice to identify novel, naturally occurring gene loci that control several important facets of infant vocalizations. The BXD family consists of ∼150 strains generated by crossing C57BL/6J (B6) and DBA/2J (D2) strains of mice, and then inbreeding their progeny for more than 20 generations to produce so-called recombinant inbred strains (Peirce et al., 2004). Infant vocalizations of the parents of the BXD family differ significantly for a number of quantitative traits (Bell et al., 1972;Hennessy et al., 1980;Cohen-Salmon et al., 1985;Roubertoux et al., 1996;Hahn et al., 1997;Thornton et al., 2005). As shown in the present study, this leads to a wide phenotypic range among the BXD progeny.
The BXD family is widely used for mapping QTLs that influence behavioral, anatomical, and physiological phenotypes (Carneiro et al., 2009;Rosen et al., 2009;Philip et al., 2010;Xue et al., 2015;Delprato et al., 2017). The BXDs are particularly useful for developmental analyses of behavior because each strain and genome can be studied at many time points and using nearly isogenic sets of males and females. This makes it highly practical to study gene-by-environment interactions, sex differences, and indirect genetic effects (IGEs; Ashbrook et al., 2015aBaud et al., 2017). We have previously shown that IGEs of the offspring genome influence maternal behavior (Ashbrook et al., 2015a, and differences in infant vocalizations may be a key factor controlling parental care. Acoustic characteristics such as fundamental frequency and sound amplitude, have different effects on parental care (Lingle et al., 2012), and we therefore expect that these characteristics have different heritabilities (Gustafsson, 1986;Price and Schluter, 1991;Branchi et al., 2001;Thornton et al., 2005). A low heritability usually indicates intense neutralizing selection, whereas high heritabilities are often associated with phenotypes that either do not greatly affect fitness (relaxed selection) or that are controlled by balancing selection (Walsh and Lynch, 2018) that increases variability, perhaps in order to enhance kin and litter recognition (Forstmeier et al., 2009;Blumstein et al., 2013). Hahn et al. (1997) hypothesized that rates of calling has undergone more intense selection than other characters (Hahn et al., 1997), however Spence et al. (2016) have shown that acoustic characteristics usually do not change in isolation but are highly interrelated (Spence et al., 2016). Here, we have expanded the analysis of USVs in order to comprehensively evaluate heritability of spectral, temporal, and sound amplitude characteristics. We have categorized and quantified a series of distinct call categories-also known as the syllabic patterns (Scattoni et al., 2008a)-for genetic analysis. We combine these two methods to estimate heritability for both quantitative and qualitative characteristics of infant vocalizations. We expand on this using a morphological analysis of the vocal organ in order to identify potential sources of acoustic variation.
In this study, we have used a diallel cross of C57BL/6J (B6) and DBA/2J (D2) to estimate heritability and parent-of-origin effects of all USV traits. We follow up with a larger study using 41 member strains of the BXD family, to map loci for both the frequency and amplitude of calls. We ask four questions: • Can we detect parent-of-origin effects, such as parental genetic effects (PGEs), contributing to infant USVs? • How are the functions of different aspects of vocalization reflected in their heritability? • What regions of the genome modulate different facets of USVs, and are these different facets under selective or joint control? • Within these genomic regions, can we identify putative candidate genes with mechanisms that link to different aspects of infant vocalizations?

MATERIALS AND METHODS
There were two major components of this study, estimation of heritability and parent-of-origin effects by a diallel cross, and identification of QTLs using the BXD family (Figure 1). For the first part of the study, we studied a diallel cross of four strains of mice using six litters each: C57BL/6J (B6) (n = 29), DBA/2J (D2) (n = 24), and reciprocal F1 hybrids-B6D2F1 (n = 30) and D2B6F1 (n = 30). The four genotypes of pups consisted of: (a) B6 (B6 dam and B6 sire), (b) D2 (D2 dam and D2 sire), (c) B6D2F1 (B6 dam and D2 sire), and (d) D2B6F1 (D2 dam and B6 sire). For the second part of the study we collected USV data for 41 strains of the BXD recombinant inbred family (n > 310). This family is the most deeply phenotyped mammalian model system, consisting of experimentally tractable and genetically defined mouse strains capturing a large amount of naturally genetic variation, analogous to that found in a normal human population (Wang et al., 2016). BXD strains were chosen pseudorandomly dependent on availability, as no previous data was available for USVs in this population. We used BXD strains 1 (4 litters), 9 (6 litters), 24 (4 litters), 29 (4 litters), 32 (4 litters), 34 (4 litters), 40 (8 litters), 43 (4 litters), 44 (8 litters), 45 (6 litters), 48 (4 litters), 48a (8 litters), 49 (4 litters), 50 (8 litters), 51 (6 litters), 55 (4 litters), 60 (4 litters), 61 (4 litters), 62 (6 litters), 64 (4 litters), 65a (6 litters), 68 (4 litters), 69 (4 litters), 70 (6 FIGURE 1 | Outline of experimental design. The study is separated into two, complementary, parts, one using the BXD family for trait mapping, and the other using the parental diallel cross to study heritability and qualitative traits. The workflow of different aspects of the experiments are shown. (A) BXD strains were derived by crossing C57BL/6J (B6) and DBA/2J (D2) parents. The resulting heterozygous F1 mice were crossed to produce non-reproduceable F2 offspring representing recombinations of the parental B6 and D2 genomes. These F2 offspring were then inbred by sib-mating for 20 generations, at which point each strain represents a reproducible, unique mosaic of B and D alleles. These strains can then be maintained for repeated study of genetically identical individuals. Using a subset of 41 BXD strains we measured four quantitative traits (duration, amplitude, frequency, and number of calls) on postnatal days 7, 8, and 9. The traits were then mapped using GeneNetwork to identify significant quantitative trait loci (QTLs). The candidate genes within these QTLs were then selected based on the presents of non-synonymous single nucleotide polymorphisms (nsSNPs) or insertions/deletions (InDels). These candidate genes were then examined using gene annotation databases, gene expression data and literature searches. (B) A diallel cross of the two parental (B6 and D2) strains was produced, to create B6D2F1 and D2B6F1 offspring. These offspring are genetically autosomally identical except for which parent each allele was inherited from. The same four quantitative traits as above were measured in these animals, as well as the waveform pattern of their vocalizations on day 8, to produce 10 qualitative traits. Heritability was estimated, and dissected into the relative contribution of genetics, environment and parent-of-origin effects.
It is therefore helpful to adjust for kinship when mapping, as described below.
Parents and pups were raised in a closed-barrier specificpathogen-free vivarium at the University of Tennessee Health Science Center (Nash Annex). Infants were weighed and recorded at postnatal day (P)7, P8, and P9, between 2 and 5 PM (CST). Postnatal days 7-9 were chosen as they closely correspond to full term birth in humans. This was estimated using the Translating Time website 1 based on Workman et al. (2013), where mouse post-conception day 30 (approximately postnatal day 9) translates to human post-conception day 280 (approximately full term birth). This work was performed following guidelines approved by the UTHSC Animal Care and Use Committee.

Ultrasonic Vocalization Measurement
Individual pups were separated from their parents and placed into a freshly cleaned plastic container (12 cm × 8 cm × 7 cm) that was located in the center of large polystyrene foam box (28 cm × 21 cm × 18 cm). The end of an UltraSoundGate 116-200 microphone-sensitive to frequencies up to 340 kHzwas placed 8-10 cm above the pup and the temperature within the box was ∼21 • C. Vocalizations were recorded for the first 3 min of isolation at a sampling rate of 300 kHz using 16-bit A-to-D converter (Avisoft Bioacoustics 2 ). Prior to the end of each session, body temperature was measured on the chest using an infrared thermometer (IR-B153 from Braintree Scientific, Inc. 3 ). Recordings were transferred to Avisoft-SAS Lab Pro (version 4.40), and sonograms over the range of 20-250 kHz were generated using the following software parameters: (1) FFT length of 1,024 points (an interval of 4 ms), (2) frame size of 100%, (3) a Hamming smoother with a bandwidth of 317 Hz and a resolution of 244 Hz, (4) temporal resolution 1.024 ms with a Hamming smoother overlap of 75%, and 1/bandwidth of 3.2 ms.

Quantitative Analysis of Vocalizations
Quantitative analysis of vocalization was carried out for: (1) the number of calls, (2) the duration of calls, (3) the mean fundamental frequency, and (4) the mean peak amplitude of calls. These parameters were measured automatically using Avisoft-SAS Lab Pro software for the entire 3-min recording session on P7 to P9. Automated measurements excluded calls with insufficient signal-to-noise ratio. SAS-lab Pro allows the investigator to overlay spectrographic images and fundamental frequency tracking results. The accuracy of call detection by the software was verified manually by an experienced user. The user verified all the calls in all recorded files and, when necessary, missed calls were marked by hand to be included in the automatic parameter analysis. With regard to peak amplitude measurement, this was considered as the point with the highest energy within the spectrum of the call, and was recorded as the relative output voltage of the microphone signal. Due to the consistent placement of the microphone, some clipping of highvolume calls was noted, but this did not correlate with strain, and therefore will simple result in reduced estimates of amplitude. Peak frequency was defined as the frequency at the location of the peak amplitude within the spectrum of the call. All measurements were carried out using the same equipment in the same chamber. No differences in quantitative parameters of calling were detected in a comparison of sexes; therefore, data was collapsed. This is in line with previous findings of no differences between sexes (Hahn et al., 1997). Average values of all parameters were also calculated for each case over the 3 days of testing and used for the statistical comparison.

Qualitative Analysis of Vocalizations
Waveform patterns of calls recorded on P8 in D2, B6, B6D2F1, and D2B6F1 mice were examined in detail. Every call was classified as one of 10 distinct types (representative images in Supplementary Figure 1), based on internal pitch changes, lengths, and shapes, following the USV call categorization method proposed by Scattoni et al. (2008a). Each genotype produced 10 sonogram files from five different litters. We utilized one male and one female per litter. Sonograms were 3 min in length and selected from recordings at P8. We classified a total of 700 calls from B6, 1,305 from D2, 2,047 from B6D2F1 calls, and 1,972 from D2B6F1 calls. Average values of all call categories were used for the statistical comparison of D2, B6, B6D2F1, and D2B6F1 genotypes.

Morphology of the Vocal Organs
Ultrasonic vocalizations are generated in the larynx by a whistle mechanism (Roberts, 1975;Riede, 2011). A specific laryngeal morphology facilitates the whistle (Riede et al., 2017), and laryngeal and respiratory movements determine spectral, temporal, and amplitude characteristics (Riede, 2011(Riede, , 2013. If acoustic variation is caused by morphological differences of the vocal organ, we expect this to be reflected in one or more of the following characteristics: size of laryngeal cartilages (cricoid, thyroid and arytenoid cartilages), and the size of intrinsic larynx muscles (thyroarytenoid, cricothyroid, and lateral cricoarytenoid muscle). The size of these laryngeal structures were estimated through volume estimations using area measurements in serial histological sections. Additionally, we measured body mass, femur length, skull length, and skull width in order to estimate overall body size.
Specimens were fixed in formalin for 3 days and decalcified for 8 h. Body mass, skull width and length and femur length were measured on each pup. The head-neck region was embedded in paraffin. Coronal serial sections (5-µmthick) were taken every 50 microns starting ventral. Sections were stained with hematoxylin-eosin for a general overview. Sections were scanned with ScanScope CS2 (Aperio, Vista, CA, United States). Volume of cricoid cartilage, thyroid cartilage, arytenoid cartilage, thyroarytenoid muscle (TA), lateral cricoarytenoid muscle (LCA), and cricothyroid muscle (CT) were estimated from digital images of sections from 10 equally distant levels from ventral to dorsal. All measurements were made using ImageJ (vers. 1.47; NIH). Volumes were estimated by adding up the products of area measurements (mm 2 ) and the distance (in mm) to the next section level.

Statistical Analyses
Statistical analyses of the diallel cross were performed using R. ANOVA was used to analyze both qualitative and quantitative USV measurements and to evaluate strain-dependent effects. Pairwise comparison between strains was done using Tukey Frontiers in Behavioral Neuroscience | www.frontiersin.org post hoc analysis. The Kruskal-Wallis test was used to analyze body weight differences and USV traits between groups.
Heritability was estimated as the fraction of variance explained by strains in a simple ANOVA model. Sex was included in the initial analysis of vocal traits, but was not a useful predictor, and was not used as a cofactor. To compute an average estimate of heritability for each trait across all 3 days, we weighted the four heritability values by the inverse square of the errors. To calculate empirical confidence intervals, we used R to repeatedly recalculated heritability using 100,000 subsets of 95% of the data (e.g., for the day 7 traits in the BXD ∼295 of 310 samples) and recorded the 95th percentile of these as our 95% confidence intervals.
Different heritability values are given for the diallel cross as it is possible to deconvolute additive, dominance and parent-oforigin effects on heritability, and therefore we can report a narrow sense heritability (h 2 ). In the inbred BXD, offspring and parents are genetically identical, and therefore we cannot differentiate different components of heritability, and instead get a single, broad sense, heritability (H 2 ).
We also calculated H 2 RIx (Belknap, 1998), the heritability of strain means, in contrast to the heritability for individuals. This is defined as H 2 RIx = V a /(V a + V e /n), where V a is the genetic variability (variability between strains), V e is the environmental variability (variability within strains), and n is the number of within line replicates.

Genetic Dissection of Pup Vocalization
Density plots were produced in R using the density function in the stats package and plotted using the ggplot2 package (Wickham, 2016). Values were converted to z-scores, to aid in the comparison between plots. Normal probability plots, a special case of the qq plot, were produced in GeneNetwork, and can be reproduced there.

QTL Mapping
For QTL mapping, the average of each vocalization trait was calculated for each strain. Distributions of these means were generally close to normal and did not require scale transformations (Supplementary Figure 6). However, a few means were high outliers, and these values (>+ 2.5 z) were winsorized (shifting values so as to approximate a more normal distribution; Shete et al., 2004). In all cases, we have explicitly entered both the original and the winsorized values in trait descriptions in www.genenetwork.org, so that interested readers can evaluate impact (for example, see BXD Phenotype ID: 16359, 'Central nervous system, behavior, development, communication: Infant vocalization, mean peak frequency of call on day 7, mixed sexes from 2 to 5 litters [kHz] (Data winsorized, BXD86 from 55.419 to 56.192)'). All mean strain data are publicly available at GeneNetwork.org (GN phenotype identifiers: 16355-16362, 16600-16603, 16610-16612, 16617-16621, 18459-18461).
We initially mapped QTLs using the fast linear regression equations of Haley and Knott (1992). Genome-wide significant (p < 0.05), and suggestive (p < 0.63) thresholds were calculated by 5,000 permutations of the phenotypes, using GeneNetwork. The suggestive threshold corresponds to approximately one false QTL per genome-scan. Likelihood ratio statistic (LRS) scores were converted to log of the odds (LODs) scores by dividing by 4.61, and confidence intervals were defined by a LOD drop of 1.5 on either side of the peak value (Manichaikul et al., 2006). We considered QTL intervals that achieved genome-wide significance for one phenotype, and genome-wide suggestive for others, as highest priority for candidate gene analysis.
The January 2017 BXD genotype file was used 4 . Updated linear mixed model mapping algorithms are now available on GeneNetwork 2 5 (Sloan et al., 2016), that account for kinship among strains. These new algorithms include GEMMA (Zhou and Stephens, 2012), pyLMM 6 (Sul et al., 2016), and R/qtl2 7 . Traits with significant and suggestive QTLs detected using simple interval mapping were remapped using GEMMA to confirm LRS scores and locations of QTLs even in the presence of kinship structure.

Candidate Gene Identification
We examined genes within the QTL 1.5 LOD confidence intervals using QTLminer (Alberts and Schughart, 2010). We focused on genes with non-synonymous single nucleotide polymorphisms (nsSNPs) or insertions/deletions (InDels). We treated these genes as higher priority candidates, although we acknowledge that causal variants will often lie in a non-coding region. For each of these high priority candidates we then examined which Gene Ontology biological processes (Ashburner et al., 2000;Gene Ontology Consortium, 2015) and KEGG pathways (Kanehisa and Goto, 2000;Kanehisa et al., 2012) the gene was annotated as being part of, and highlighted those which may relate to vocalization. We also reviewed known effects of mutations in these high priority candidates using the Mouse Genome Informatics (MGI) Phenotypes, Alleles and Disease Models Search 8 (Bello et al., 2015). To cover more recent findings, which may not yet be reflected in annotation databases, we also checked the NCBI Entrez information 9 about the gene and its human homolog, particularly Related articles in PubMed and GeneRIFs.
We also examined expression of these candidate genes using gene expression databases. Gene expression data in P4 B6 mice is available from the Allen Brain Atlas, Developing Mouse Map (Allen Institute for Brain Science, 2008; Sunkin et al., 2013), and this was used to confirm where in the whole brain these genes are expressed. To get data specific to the BXD family, transcriptome data for many different organs and tissues are available in GeneNetwork, including some tissues with data at different ages. We surveyed all brain regions included in GeneNetwork (P3 neocortex, P3 striatum, P14 neocortex, P14 striatum, adult amygdala, adult cerebellum, adult hippocampal precursor cells, adult hippocampus, adult hypothalamus, adult midbrain, adult neocortex, adult nucleus accumbens, adult prefrontal cortex, adult striatum, adult ventral tegmental area) to find which of these transcripts have cis-eQTLs, and the expression of which correlate with USV traits. Both support the hypothesis that variant in these genes may be influencing vocalization phenotypes.

Quantitative USV Traits
The number of USVs made over a 3-min interval varied by strain (Figure 2A). Averaged across the 3 days of observation, D2 pups made more calls than B6 pups [170 ± 19 (SEM) versus 114 ± 14, p < 0.0005]. This is in line with previous observations (Roubertoux et al., 1996). F1 hybrids made more calls than B6 and D2 (B6D2F1 = 179 ± 16 and D2B6F1 = 219 ± 18; p < 0.01, apart from DBA/2J and B6D2F1 where there was not a significant difference). Using all four cofactors and their interactions (sex, body weight, age and litter identifier), the F ratio associated with strain was 26.913, and while this value was highly significant (p < 2E-12, 3 df) the corresponding estimate of narrow sense heritability (h 2 ) was modest-approximately 0.11. Litter identifiers can be used to estimate parental effects and these effects are strong (F = 10.335, p < 1E-14, 18 df) explaining 25% of the variance. Neither body weight nor sex were a predictor over the 3-day recording interval, explaining < 1% variance (p > 0.05) each. Age did have a small, significant effect (F = 2.914, p < 0.04, 3 df), but only explains 1% of the variance (Supplementary Table 1).
Taken together, these results (summarized in Supplementary  Table 1) show that all four quantitative traits have a strong heritable component (strain was significant for all of them), and therefore are appropriate for further genetic study in the BXD RI family. However, as we can resample the same genome repeatedly in inbred strains, the conventional heritability underestimates the heritability of the strain mean: that is, the heritability in a particular genome (which we can resample in inbred lines) rather than heritability in a particular individual (conventional heritability). This is referred to as H 2 RIx (Belknap, 1998) . Our estimates of H 2 RIx ranged from 0.62 for number of calls on day 9 to 0.98 for duration of calls on day 8 ( Table 1). Scattoni et al. (2008a) have defined ten distinct vocalization "syllables" generated by pups. To explore if these "syllables" show heritability, we performed an initial analysis on P8 in the diallel cross. All stains generated eight syllables; however, relative frequency of syllable types differs markedly among strains (Figure 3 and Supplementary Figure 1). D2 displayed high prevalence in production of complex (29%), flat (27%), upward (23%) and chevron (15%) calls, however, very low prevalence of downward (2%) and short (2%) calls on P8. In contrast, B6 pups produced two syllables (28%), short (20%), frequency steps (8%), flat (8%), downward (19%), complex (12%), and upward (4%) calls. Of note, very few chevron calls (1%) were produced. Reciprocal F1 pups emitted a spectrum of syllables, and the vast majority of calls produced by F1s were complex calls (30%). Surpassing the diversity of B6 and D2, F1s can produce downward (19%), two-syllable (18%), and frequency step (5%), as well as upward (9%), chevron (7%) and flat (16%) calls (Figure 3). For each call type, an ANOVA was performed to quantify the effect of strain, litter, body weight and their interactions and a Tukey post hoc was used to identify differences between strains (Supplementary Table 2). For seven types of calls (chevron, flat, frequency steps, harmonics, short, two-syllable, and upward) there was a significant effect of strain, and B6 and D2 were significantly different (Tukey p < 0.05) for all except harmonics, with only harmonics significantly different between B6D2F1 and D2B6F1 pups. Heritability was estimated for all qualitative traits and ranged from just 0.1 for harmonics to 0.65 for upward calls (Supplementary Table 2). We again also estimated H 2 RIx , which ranged from 0.57 for harmonics to 0.93 for upward calls ( Table 2).

Parent-of-Origin Effects on USVs
The reciprocal F1 females inherit identical nuclear genomes, including their X chromosomes, whereas males inherit identical autosomes, but differ in their sex chromosomes. For this reason, differences between the reciprocal F1 females are highly likely to be caused by either indirect parental effects or genomic imprinting, whereas differences between reciprocal F1 males could also be caused by the specific effects of the Y chromosome. Statistical comparison was performed on quantitative USV traits between reciprocal F1 pups, combined and split by sex (Figure 4). No significant difference was seen between reciprocal F1 females, although there was a suggestive difference in amplitude of calls on P8 (Kruskal-Wallis p = 0.054; Figure 4D). In males there were significant differences in number of calls on P7 and P8 ( Figure 4E) and amplitude of calls on P9 (Figure 4H). In the combined data, number of calls on P7 (Figure 4I), and amplitude of calls on P8 ( Figure 4L) were significantly different (Kruskal-Wallis p < 0.05).

Morphology of the Vocal Organs
We studied the effect of strain, age, sex, and their interactions on morphological characteristics by ANOVA, with Tukey Honest post hoc test. Between the reciprocal F1 strains we saw no differences in morphological characteristics. Therefore, these were combined, and the ANOVA analysis rerun. Body size is significantly influenced by strain (Table 3 and Figure 5A). Furthermore, thyroid cartilage size is significantly influenced by strain ( Figure 5E). We also estimated heritability of morphological features using the same ANOVAs: Body mass h 2 = 0.21, femur length h 2 = 0.27, skull length h 2 = 0.57, skull width h 2 = 0.65, thyroid cartilage volume h 2 = 0.58, first tracheal ring diameter h 2 = 0.18, cricoid cartilage volume h 2 = 0.058, TA muscle volume h 2 = 0.14, LCA muscle volume h 2 = 0.51, CT muscle volume h 2 = 0.30, and arytenoid cartilage volume h 2 = 0.12.

Genetic Dissection of Pup Vocalization
Since we identified a clear effect of strain, and therefore of genotype, on quantitative USV phenotypes above (see the section "Quantitative USV Traits"), we can now examine whether these genetic effects are likely to be caused by a small number of loci with large effects, or a large number of loci of small effect.
To do this, we produced probability density plots for all four quantitative traits, using the strain means and standard deviations of the mean for parents and all available BXD strains (Figure 6). These plots provide a visual assay of the complexity of the trait: at the extremes, a trait strongly modulated by one polymorphic locus would have two modes (representing the two homozygous genotypes-BB and DD), whereas more genetically complex traits modulated by many variants of small effects will be close to normal. Several of these plots appear to be made up of a mixture of two normal distributions (e.g., number of calls on day 7, amplitude on day 9), characterized by a 'shoulder' on the normal distribution, indicative of one locus of moderate to large effect segregating in the family. Other plots appear to be more normally distributed (e.g., duration of calls on day 8), suggesting several variants of smaller effect. Normal probability plots show the same patterns, with breaks (where the actual values suddenly jump, rather than following the x = y line) indicating a variant of large effect (e.g., the circled gaps in Figure 7, for call frequency on day 7). These are automatically generated in GeneNetwork. Normal probability plots for all 12 quantitative USV traits measured in the BXD population are shown in Supplementary Figure 6.
The distribution of call duration among all strains was approximately continuous between 16 and 49 ms ( Figure 8H). B6 pups produced the shortest duration calls among the BXD family on all 3 days (Figures 8E-H), whereas D2 pups were consistently among the top five strains for peak amplitude all 3 days (Figures 8M-P). Differences in the peak frequency are less consistent (Figures 8I-L) but the D2 strain was consistently in the bottom quartile. This could conceivably be an adaptation to impaired hearing of adult D2 mice (Cohen-Salmon et al., 1985;Zheng et al., 1999). The BXD family includes many strains  with phenotypes more extreme that either B6 or D2 parent (Figures 8A-D).

Heritability
Heritability was estimated using a simple ANOVA model. On all three days, number and frequency of calls had the lowest heritability (H 2 = 0.2-0.4) and duration and amplitude of calls had the highest heritability (H 2 = 0.53-0.65), with relatively small confidence intervals (Table 4 and Figure 9). The heritability estimates in the BXD RI are higher than the narrow sense heritability estimates above for the diallel cross, however, the order of the heritabilities (duration and amplitude higher than number and frequency) is the same as the diallel cross. It is unsurprising that heritability estimates are higher in the RI, as the RI are composed of homozygotes, which contribute more to additive genetic variance than heterozygotes (Belknap et al., 1996;Belknap, 1998), and we calculated H 2 , which will include epistatic effects. Again, we calculated estimates for H 2 RIx , which ranged from 0.55 for number of calls on day 9 to 0.89 for peak amplitude on day 8 (Table 5), and they correlated well with estimates from the diallel cross (Spearman's rho = 0.86, Supplementary  Figure 7). Power calculations allow us to estimate our power to detect effects of a given effect size. With a H 2 RIx value around 0.5, we can detect loci accounting for 0.5 of the genetic variance with 80% power (Supplementary Figure 8), while a H 2 RIx of 0.9 gives us the power to detect loci account for ∼0.38 of the genetic variance with 80% power (Supplementary  Figure 9).

QTL Mapping
Quantitative trait loci were mapped for number, duration, frequency, and amplitude of calls using Haley-Knott (HK) regression mapping (Supplementary Figures 2-5), PyLMM, and GEMMA as implemented in GeneNetwork. We found three QTLs which were genome-wide significant in at least one phenotype, and these were examined further to identify other phenotypes that had a genome-wide suggestive QTL at the same location.
The first was on Chr 14, and was significant for mean peak frequency on P7 (GN 16359; Supplementary Figure 4), and suggestive for mean peak frequency for all days combined (GN 16602; Supplementary Figure 4). When kinship was accounted for using GEMMA, this locus was also suggestive for mean peak frequency on P8 (GN 16360). The region appeared to consist of two QTLs: one spanning from 54.4 to 61.7 Mb, the other from 68.8 to 72.3 Mb (represented by two peaks rising above the significance threshold; Figure 10). Both of these positions appeared as significant with the GEMMA and pyLMM algorithms, suggesting they are not an artifact of the mapping method. Eight strains have different alleles at the two different peaks (that is, they have a B allele at the first peak and D allele at the second peak or a D allele at the first peak and a B allele at the second peak; Figure 10 upper rows). This suggests that the two peaks are two independent QTL, as it is not simply a continuation of the same allele with, for example, an unknown or heterozygous region between which reduces the LRS score. However, the two peaks have overlapping 1.5 LOD drop confidence intervals, meaning we cannot say with certainty whether they represent two loci or one. This longest potential confidence interval runs from 40.7 to 73.4 Mb. Having the D allele at this position increases the trait value (Figure 10; those strains with a green row at the top of the figure, representing the strain having the D allele at that position, tend to have a higher call frequency than those strains with a red row at the top of the figure, representing them having the B allele at that position).
The second QTL alters the same traits, and is located on Chr 2, with a remarkably short confidence interval from 179.5 to 180.8 Mb (Supplementary Figure 10). This locus is also confirmed using GEMMA and pyLMM. The D allele increases the trait value (strains with a D allele at this position tend to have a higher call frequency than those with the B allele at this position; Supplementary Figure 10).
The final QTL is on Chr 8, and is significant for peak amplitude on P7 (GN 16357), suggestive for peak amplitude for all days combined (GN 16603), and for peak amplitude on P8 (GN 16358; Supplementary Figure 5). The 1.5 LOD confidence interval is from 3.5 to 16.7 Mb (Supplementary Figure 11). As with the other two loci, the D allele increases trait values, and the QTL is again significant using GEMMA and pyLMM.

Candidate Gene Identification
For each of the QTLs identified above, all variants within the 1.5 LOD confidence interval were examined as potential candidate genes underlying the phenotype. The QTL on Chr 14 includes 407 genes, 237 of which have nsSNPs or InDels. Due to this large number, it is difficult to identify candidates with confidence, however, we highlight some genes of interest for future study in our Supplementary Material. For the remaining two QTL we were able to identify a small list of candidate genes The QTL on Chr 2 contains 44 genes, 31 of which have InDels or nsSNPs. These 31 genes are therefore candidates, as they contain variants which may affect their expression or function. To prioritize which of these 31 genes are most likely to be the gene underlying the phenotype we examined additional information: 8 of the 31 genes (Ss18l1, Gtpbp5, Osbpl2, Lama5, 1600027N09Rik,  Ogfr, Tcfl5, Dido1) have a cis-eQTLs in transcriptome data, indicating that variants within the gene alter its expression; expression of 4 of these 31 genes (Gtpbp5, Slco4a1, Ogfr, Slc17a9) correlate well with at least one of the peak frequency phenotypes, which suggests that altering expression of that gene may alter expression of the phenotype; and 12 of the 31 genes (Cdh4, Taf4, Psma7, Ss18l1, Osbpl2, Adrm1 (Rpn13), Lama5, Ntsr1, Col9a3, Dido1, Slc17a9, Bhlhe23) are associated with brain, behavior, or cranio-facial morphology in previous datasets, e.g., in knockout animals, which suggests that the gene acts in tissues relevant to  Table 3). No gene appears in all three lists, and so it is difficult to narrow the list further from these 17 genes (i.e., 17 genes appear in at least one of the three lists). The QTL on Chr 8 contains 185 genes, however, only 27 of these have InDels or nsSNPs. Therefore, we concentrate on these 27 genes as candidates and they were examined for the same information as the Chr 2 QTL above: 6 of the 27 genes (Abhd13, Dlgap2, Arhgef10, 2900016B01Rik, Kbtbd11, Csmd1) have a cis-eQTLs in transcriptome data; expression of 6 of these 27 genes (Dlgap2, Arhgef10, 2900016B01Rik, Kbtbd11, Myom2, Csmd1) correlate well with at least one of the mean peak amplitude phenotypes; and 11 of the 31 genes (Fam155a, Abhd13, Col4a1, Col4a2, Arhgef7, Tfdp1, Dlgap2, Cln8, Arhgef10, Myom2, Csmd1) are associated with brain, behavior, or cranio-facial morphology FIGURE 6 | Density plots for the four quantitative traits in the BXD family on postnatal days 7, 8, and 9. Density plots were produced in R using the density function in the stats package and plotted using the ggplot2 package. Values were converted to z-scores, to aid in the comparison between plots. Density is shown in gray. The red line on each plot represents the normal distribution, whereas the red and blue bars represent the mean values for the D2 and B6 parents, respectively.

USVs (Supplementary
in previous datasets (Supplementary Table 4). Dlgap2, Arhgef10, and Csmd1 appear in all three of these lists and are therefore are our mostly highly ranked candidates for altering mean peak amplitude of infant USVs. Further details of candidate genes can be found in the Supplementary Material.

DISCUSSION
Essentially all aspects of infant vocalization display striking and heritable differences. In this study, we have initiated a genetic dissection of these innate vocalizations with the ultimate goal of uncovering genes, sequence variants, and brain circuitry that underlie critical and innate behaviors. We computed heritability for 14 facets of vocalizations at a very early stage of development, equivalent to the perinatal period in humans (Workman et al., 2013) 10 . We also evaluated the role of parent-of-origin effects on these behaviors. Our main findings are that the heritability for 10 www.translatingtime.org qualitative USV traits-the use of specific syllables -is generally higher than those of quantitative traits such as amplitude, duration, vocal rate, and fundamental frequency. However, the heritability of all traits was sufficiently high to warrant QTL analysis. Parent-of-origin effects were modest. Using a large cohort of BXD strains, we were able to map three QTLs for major quantitative features of infant vocalization to Chrs 2, 8, and 14.

Heritability
We found that heritability varies significantly among quantitative and qualitative vocalization traits. Given the core function of these vocalizations in survival, it is probable that the underlying genetic mechanisms and CNS circuitry are conserved among mammals (Moore et al., 1997;Lingle and Riede, 2014). Heritability is an estimate of the fraction of variation of a trait generated by genetic factors in a specific population and environment. Heritability should not be confused with biological importance, and in fact, many of the most vital traits have low heritability due to intense stabilizing selection FIGURE 7 | Normal probability plot for frequency of call on day 7, produced using GeneNetwork.org (Trait ID 16359). The two 'jumps' representing major effect loci are circled. ( Price and Schluter, 1991). Crying is one of the most instinctive and innate behaviors and arises spontaneously, and unlike more plastic and learned behaviors, we have good reasons to suspect that infant vocalizations should be under strong stabilizing selection and consequently to have low heritability. Heritability was estimates using an ANOVA model, and this was repeated 100,000 times using random sub-samples of 95% of the data. Max heritability is the large heritability estimate in the 100,000 replicates and min heritability is the lowest heritability estimate in the 100,000 replicates.
Heritability of quantitative USV traits are indeed low: from 0.1 for vocal rate, 0.16 for fundamental frequency range, 0.30 for amplitude, to 0.5 for duration (Supplementary Table 1). Our results are concordant with previous work that focused on some of these traits (Hahn et al., 1997;Thornton et al., 2005) with heritabilities ranging from 0.1 to 0.35. The agreement is impressive despite methodological differences. We analyzed both quantitative and qualitative USV traits using a significantly longer sample period (3 min versus 18 s) and did not stimulate pups by cold temperatures or rough handling. The low heritability of call rate suggests that this attribute is under the strongest stabilizing selection.
In contrast, the higher heritability for call duration implies that this trait accommodates to greater variation without affecting growth and, ultimately, reproductive fitness. It would be of interest to determine if this finding generalizes to wild populations. Variation in syllabic call patterns also has a relatively high heritability and may not affecting viability. However, call syllabic pattern may play a role for kin recognition and quality of the maternal behavior (Brunelli et al., 2015), and there is evidence that vocalizations play an important function in the mutual recognition of mothers and offspring (Mogi et al., 2017). Mice show communal nesting behavior (Heiderstadt et al., 2014;Weidt et al., 2014), and there is a fitness advantage to being able to distinguish one's own offspring in a communal nest. In the extreme case of Mexican freetailed bat, mothers can recognize calls reliably in crèches with up to 4,000 pups per square meter (McCracken and Gustin, 2010).
We show that certain vocalization parameters, such as rate, are likely to be under strong stabilizing selection. This has been previously observed in a study of zebra finch, in which Forstmeier et al. (2009) demonstrated that heritability of nonlearned calls produced by females was considerably higher (0.4) than those of learned songs produced by males (0.1), indicating strong stabilizing selection for male song (Forstmeier et al., 2009).

Parent-of-Origin Effects
Parent-of-origin effects are defined as persistent differences in phenotypes of offspring caused by differences attributable strictly to parental genotype or phenotype. Reciprocal B6D2F1 and D2B6F1 female pups are genetically identical, including their X chromosomes. Only their mitochondrial genomes differ. However, the polarities of parents are reversed (Figure 1). Differences between reciprocal F1 female pups are therefore almost entirely due to parent-of-origin effects, genomic imprinting, or sampling error. Using our diallel cross design we have been able to detect these potential parent-of-origin effects, and detect a potential effect on peak amplitude of call.
Traits in one individual may be influenced by gene variants of other individuals, and these are referred to as social genetic effects or IGEs. The most obvious source of IGEs are PGEs, whereby the genotype of parents influences the phenotype of offspring. Siblings and other conspecific, and even parasites, pathogens, and predictors can be a strong source of IGEs (Baud et al., 2017). In contrast, genomic imprinting is due to epigenetic changes within the individual causing differential gene expression characterized by either complete or partial silencing of one parental allele (Barlow, 2011;Abramowitz and Bartolomei, 2012;Ashbrook and Hager, 2013). As both mothers and fathers had contact with the pups in our study, our observed PGEs could come from either parent.
Among quantitative USV traits only peak amplitude of call displayed a possible parent-of-origin effect. For call number, call duration, mean peak frequency, and all morphological traits, there were no significant parent-of-origin effect in reciprocal F1 females. In contrast, Thornton et al. (2005), detected maternal influences on rate, frequency, and duration of calls. The relative contributions of genetic background and early environmental factors on calling behavior in C57BL/6JOlaHsd and C57BL/6NCrl were studied by using an embryo-transfer procedure (Wöhr et al., 2008). Among several traits-call number, total calling time, call duration, peak frequency, peak amplitude, and frequency modulation-only amplitude was FIGURE 9 | Violin plot of heritability estimates in 100,000 random subsamples 95% of the data for the four quantitative traits on postnatal days 7, 8, and 9. Quartile intervals are shown by black lines, and days are represented by red, green, and blue fill, respectively. dependent on the genotype of the mother (Wöhr et al., 2008), in agreement with our findings.
The parental B6 and D2 mouse strains differ in their adult anxiety-related behavior. D2 has higher basal anxietylike behavior than B6 (Mozhui et al., 2010). Differences in parental anxiety-related behavior may contribute to PGEs. A hypothetical anxiety-related parental effect in modulation of peak amplitude is of interest because it has been demonstrated that call amplitude can be reduced by anxiolytic drugs (Insel et al., 1986).

Sources of Acoustic Variation
We found differences between B6 and D2 strains in four important acoustic features: (a) mean fundamental frequency is higher in B6, (b) sound amplitude is higher in D2 USVs, (c) ultrasonic calls are longer in D2 and (d) D2 pups vocalize at a higher rate (Figure 2). Which morphological or physiological characteristics could cause these differences? Our morphological results indicate that the B6 pups are larger in overall size than D2. Body mass, femur length, and skull dimensions are larger in the B6 (Figure 5). We know that USVs are produced by a laryngeal whistle mechanism, but larger body size does not necessarily translate to lower fundamental frequency. In order to produce an ultrasonic whistle, the glottal airflow interacts with the edge of the entrance hole into a laryngeal airsac ("ventral pouch" hereafter) which is located downstream from the vocal folds (Riede et al., 2017). The ventral pouch is entirely housed inside the laryngeal lumen surrounded by the thyroid cartilage (Riede et al., 2017). The fundamental frequency of an ultrasonic call is determined by two variables, (a) the distance between vocal folds and the ventral pouch edge, and (b) lung pressure (Riede et al., 2017). Both variables are precisely controlled by rodents (Riede, 2013). The fundamental frequency differences between B6 and D2 are therefore most likely explained by either of three variables: (a) laryngeal size difference, (b) laryngeal control differences, and/or (c) by different breathing movements. The greater thyroid cartilage size in B6 (Figure 5) would suggest a larger vocal fold-ventral pouch distance and this should result in lower fundamental frequencies, however, average fundamental frequency is higher in B6. Our findings therefore suggest that the difference in fundamental frequency that we see is caused by breathing movements.
In mammals, sound amplitude is predominantly controlled by lung pressure and secondarily by laryngeal configuration as well as interactions between the laryngeal sound source and the vocal tract filter (Riede and Brown, 2013). In rodent USVs, increased lung pressure seems to be related to an increased sound intensity (Riede, 2011), and control of breathing is essential for infant vocalizations (Hernandez-Miranda et al., 2017).
Longer call durations and a higher vocal rate can only be explained by differences in breathing movement control during vocalization. Cries are produced during the exhalatory phase of breathing (Riede, 2013;Sirotin et al., 2014). Previous work has shown different breathing patterns between mouse strains, including the B6, but the relationship between breathing movements and vocalization frequency has not been fully elucidated (Yamauchi et al., 2008;Chai et al., 2011). A dystonic rat model shows a close association between an overall slower breathing rate and different vocal characteristics compared to wild-type (Riede et al., 2015). Interestingly, rat pups that were selectively bred for high call rates of isolation-induced vocalization, produced calls of significantly increased amplitude but those pup with lower vocal rates also produced calls shorter in duration (Spence et al., 2016), suggesting that these vocal parameters are highly interrelated. A detailed study of the relationship between lung pressure, glottal configuration and rodent USV is necessary, and future studies should include respiratory measurements in the B6 and D2 strain and their hybrids in order to understand how differences in breathing control may translate into vocal differences.
How much are differences in brain development associated with acoustic variation between mouse strains? The brain is not only essential for controlling the breathing characterizes described above, but also for the ability to vocalize in response to appropriate stimuli. Selective breeding for certain traits, for example high call rates, also alters functions of brain systems and emotional behaviors throughout life (Spence et al., 2016). Candidate brain areas include the cerebellum, which plays an important role in vocalization and articulation. In mice, the cerebellum has been shown to be critically involved in generation of USVs (Fujita et al., 2008;Riede et al., 2015) whereas, in humans, the cerebellum is important for speech and language (Vias and Dick, 2017). Another important area is the nucleus tractus solitarius in the hindbrain (Hernandez-Miranda et al., 2017), which has been shown to be essential for infant vocalizations in mice.
Morphological features were only measured in the diallel cross, and not in the BXD lines. However, future studies will be able to employ a battery of physiological methods to understand strain-specific vocal motor control (Hegoburu et al., 2011;Riede, 2011) and new imaging techniques to determine shape differences in the vocal organ (Riede et al., 2017). This highlights one advantage of using inbred lines such as the BXD-future phenotypic measurements can be combined and correlated with the phenotypes we have collected, without all phenotypes needing to be measured in one individual or cohort.

Genetic Results
We identified three loci influencing call fundamental frequency and amplitude. Each covers a region of the genome containing from 44 to 407 protein-coding genes. The great majority of studies concerning the genetics of USVs have exploited gene knockouts, and knockouts of FoxP2, Cadherin-6, Fmr1, Drd2, and vasopressin receptor genes are already known to produce abnormal USVs (Shu et al., 2005;Scattoni et al., 2008a,b;Wang et al., 2008;Nakagawa et al., 2012;Roy et al., 2012;Curry et al., 2013). However, our data suggest that vocalization is a complex trait, influenced by many variants of small effect. Therefore, although knockout studies give us insight into potential mechanisms by which genes can influence infant USVs, they are likely to have a much larger effect that variants causing natural variation within a population, for example, the QTLs for adult male USVs recently mapped in the BXD family . Ehret (2005) identified three gene categories associated with pup USVs: (1) those that contribute to the development of sensory and perceptual systems (input), (2) those involved in regulation of emotion and motivation (processing), and (3) those linked to respiration and laryngeal control (output). Consequently, we concentrated on candidate genes known to be involved in (1) the sensory system, especially hearing, (2) neuronal development and morphology and (3) cranio-facial development and morphology. To find some of the natural variants underlying infant USVs, we performed QTL analysis on infant traits in 41 BXD strains. We mapped number, amplitude, frequency and rate of USVs on postnatal days (P)7, P8, and P9. We detected three QTLs which were significant for at least one of the traits, and found candidate genes within each which fell into the categories described above. We discuss some of these candidates in Supplementary Material.

Limitations and Future Work
As with all experiments, there are limitations to our study which provide areas for future research and ongoing investigation. FIGURE 10 | QTL map of the Chr 14 QTL for trait 16359, frequency of calls on postnatal day 7. In the top section, the location of genes are shown by yellow and purple blocks. Below this, the distribution of haplotype blocks in the 41 BXD strains is shown. Green represents the D allele, whereas red represents the B allele. It is clear that the majority of strains with high values (i.e., a high call frequency, right of the top section) have the D allele (green), whereas the majority of strains with the B allele (red) have low values (i.e., a low call frequency, right of the top section). In the bottom section, the blue line represents the genome scan, showing the likelihood ratio statistic (LRS) associated with each marker across the locus. The top, pink, line marks genome-wide significance (genome-wide p ≤ 0.05), the lower, gray, line the suggestive significance threshold (genome-wide p ≤ 0.63). The green line shows the additive coefficient, showing that the DBA/2J alleles increase trait values. The green axis on the right shows by how much the respective alleles increase trait values. The yellow bars represent the bootstrap values, with the left edge of each bar showing where the peak fell in each of the 2,000 resamples of the data.
A significant limitation is that our qualitative analysis (i.e., syllabic structure) was only carried out on a single postnatal day. This was due to the significant manual labor that was necessary for this analysis by a highly trained investigator, which made the analysis low throughput. Therefore, any conclusions we draw from this qualitative analysis should not be generalized to represent the entire postnatal period. Future work will make use of new, high-throughput, less labor intensive systems, such as the MUPET-Mouse machine learning system (Van Segbroeck et al., 2017;Knoll et al., 2018). These new methods will not only allow us to expand the number of days analyzed, allowing a longitudinal analysis throughout early life, but also the number of strains. The BXD population now consists of over 150 strains, which would allow greater precision of mapping, and potentially > 22,000 different reciprocal crosses to map parentof-origin effects. In light of this, our qualitative analysis is only a first step in a fuller investigation, expanding both longitudinally and laterally.
Similarly, our morphological analysis was only carried out in the diallel cross. This allowed us to demonstrate that morphological features which may be associated with vocalization can be detected and analyzed, but does not allow us to perform any genetic analyses. In future studied, we hope to be able to do this analysis in the BXD population, allowing direct correlation between our findings and morphological features, potentially detecting genomic variants underlying both.
Another large contributor to variance was litter effects. Litter effects encompass a large number of different aspects, from vivarium-related factors (e.g., position in the rack) to social effects (e.g., the total weight of the litter) to maternal effects (e.g., weight of the mother). It is therefore unsurprising that for some traits they can account for up to 51% of the variance (amplitude; Supplementary Table 1). These are all obviously interesting variables in themselves, and worthy of their own deeper investigations. However, because we used the mean of strain values we are able to calculate strain/genotype means, which reduce the 'noise' from strain effects and allow us to map QTL. This is reflected in the H 2 RIx where the number of strains is used to reduce the impact of environmental and stochastic developmental sources of variance (Belknap, 1998).
In our analysis of the BXD population, parent-of-origin effects, particularly maternal effects, are lost within this overarching litter effect. It is only in the diallele cross, and looking at the reciprocal F1 females, that we are able to identify definitive parent-of-origin effects, and even here we are not able to narrow these down further to specific types of parent-of-origin effects. To truly tease apart pre-natal IGEs, postnatal IGEs, imprinting, and other postnatal litter effects a large multicomponent study would be required. In this study, mothers of every strain would have eggs (or embryos) transferred from mothers of every other strain (allowing prenatal parent-of-origin to be identified), and then each of these mother-offspring strain combinations would have to be cross-fostered postnatally to every maternal strain (to allow identification of postnatal parent-of-origin effects, and the epistatic interactions between pre-and post-natal effects). Something similar to this has been done on a much smaller scale by Cowley et al. (1989), although they did not investigate a sufficient numbers of strains to be able to do a genetic analysis. To tease apart strain effects and maternal effects, a home-cage monitoring system would be required to monitor the behavior of all pups, their interactions with each other, and their interactions with the foster mother. Again, perhaps a machine learning approach would allow a feasible analysis of this much home cage data, and to co-analyze this with USV data.
We did not see a correlation between our USV data and maternal care data previously collected in the BXD family (Ashbrook et al., 2015a. Although this may be due to simple logistical differences, i.e., only using ∼18 strains in common between the two experiments, it could also be due to biological reasons. As mentioned above, it is well recorded that the D2 strain have hearing loss at a relatively early age (Cohen-Salmon et al., 1985;Zheng et al., 1999), and therefore approximately half of the BXD population would be expected to have this early age hearing loss. This may result in co-adaptation, such that strains which have hearing loss use other, nonvocal, methods to solicit maternal provisioning. On-going work is investigating hearing in the entire BXD population, and future studies should combine analysis of infant USVs, maternal hearing, and mother-offspring interactions to provide a full picture.

CONCLUSION
We have produced heritability estimates for several features of infant vocalization in mice, and defined QTLs underlying some of these core innate behaviors. All contain plausible candidate genes (Supplementary Material), but the ultimate aim of this strategy is to systematically move from QTLs, to genes, to mechanisms, and, potentially, translation to humans. This is obviously a long and complex process, but here we have taken the first step in defining phenotypes and identifying linked genomic regions. These regions will need to be confirmed with other methods, such as complimentary mouse populations, or, the potentially more useful method, reverse genetics, such as using CRISPR to insert the B or D allele in otherwise genetically identical strains. If large GWASs of infant vocalizations are carried out in humans, then they would provide an excellent dataset for comparison with this work, as we have done for other traits (Ashbrook et al., 2014(Ashbrook et al., , 2015bWilliams and Auwerx, 2015;Huang et al., 2018, unpublished).
Understanding infant vocalizations in mice is not just important for understanding the evolution of mother-offspring interactions, but also in understanding vocalization in human infants, perhaps with applications to the understanding and early identification of developmental disorders, such as autismspectrum disorders.

AUTHOR CONTRIBUTIONS
TR, MS, DH, LL, and RW contributed to the conception and design of the study. SR, BC, and TR collected the data. BC, TR, and DA performed the statistical analyses. DA and SR wrote the first draft of the manuscript. SR, BC, TR, DA, and RW wrote sections of the manuscript. MS and TR contributed to the interpretation of the data. All authors contributed to manuscript revision, read and approved the submitted version.

ACKNOWLEDGMENTS
We thank the support of the UT Center for Integrative and Translational Genomics, the UTHSC Neuroscience Institute and College of Medicine as well as funds from the UT-ORNL Governor's Chair, NIAAA U01 AA013499 and U01 AA016662.