Analogies of human speech and bird song: From vocal learning behavior to its neural basis

Vocal learning is a complex acquired social behavior that has been found only in very few animals. The process of animal vocal learning requires the participation of sensorimotor function. By accepting external auditory input and cooperating with repeated vocal imitation practice, a stable pattern of vocal information output is eventually formed. In parallel evolutionary branches, humans and songbirds share striking similarities in vocal learning behavior. For example, their vocal learning processes involve auditory feedback, complex syntactic structures, and sensitive periods. At the same time, they have evolved the hierarchical structure of special forebrain regions related to vocal motor control and vocal learning, which are organized and closely associated to the auditory cortex. By comparing the location, function, genome, and transcriptome of vocal learning-related brain regions, it was confirmed that songbird singing and human language-related neural control pathways have certain analogy. These common characteristics make songbirds an ideal animal model for studying the neural mechanisms of vocal learning behavior. The neural process of human language learning may be explained through similar neural mechanisms, and it can provide important insights for the treatment of language disorders.


Introduction
Vocal learning is a rare animal behavior that learns to replicate conspecific or heterologous sounds or even artificial sounds through a process of repetitive neural activity from auditory to vocal motor (Janik and Slater, 2000;Jarvis, 2007). The role of vocal learning is to communicate sound information between individuals, including conspecific recognition, information transmission, deceptive hunting, etc. (Carouso-Peck et al., 2021). Human language was once thought to be the single most unique form of complex vocal learning behaviors among all animals. With the development of research, it has been proved that a few mammals and some birds (typically songbirds) have vocal learning behaviors similar to human speech (Le Boeuf and Peterson, 1969;Ralls et al., 1985;Poole et al., 2005;Jarvis, 2007;Janik, 2014).
Although mammals and birds evolved from different sources, there is growing evidence that the vocal learning processes of the two species are highly similar (Jarvis, 2004(Jarvis, , 2019Pfenning et al., 2014;Gedman et al., 2022). A comparison of neural control brain regions and pathways associated with vocal learning in mammals and birds (mainly songbirds), has led to the gradual emergence of evolutionary pathways of vocal learning behavior across species (Jarvis, 2004;Bolhuis et al., 2010;Lipkind et al., 2013). We define the similarity between songbird song and human speech from vocal learning behavior to its neural basis as analogies. By comparing the location, function and gene expression profiles of the brain regions, it is suggested that the songbird song control pathway and human language related pathway have certain analogy (Jarvis, 2007;Pfenning et al., 2014;Gedman et al., 2022). In addition, songbird song and human speech also show convergent evolution features, which makes the complexity of vocal learning in songbirds and humans comparable (Corballis, 2009;White, 2010). Therefore, songbirds become an ideal model for the study of vocal learning behavior, and can provide an important reference for studying the mechanisms of human language acquisition and the treatment of language disorders. In this article, we provide an overview of vocal learning behavior and neural control pathways in animals, especially the evolutionary analogy between human language and songbird song.

Types of animal vocalizations
Animal vocalizations can be divided into two types. One is the innate call, such as rooster crowing, human laughter and crying, and so on, which are controlled by speciesspecific vocal motor nuclei in the brainstem (including the midbrain and medulla oblongata) without the involvement of auditory feedback. All vocal vertebrates, from fish, amphibians, reptiles, birds to mammals, including humans, share similar brainstem vocal control pathways (Jarvis, 2007;Vergne et al., 2009;Feng and Bass, 2016;Kelley et al., 2020). The other type of vocalizations is produced through acquired vocal learning, which is produced by specific vocal control structures in the forebrain via neural projections to regulate brainstem vocal motor nuclei. Two parallel neural pathways are responsible for forebrain regulation of vocalization: the limbic vocal control pathway controls innate non-verbal and emotional vocalization; the laryngeal motor cortical pathway regulates fine motor control of voluntary vocalization, such as speaking and singing, as well as the spontaneous production of innate vocalization (Ludlow, 2005;Simonyan and Horwitz, 2011).

Vocal learning behavior in mammals, including humans
Species with different vocal learning behaviors differ greatly in their ability to imitate and modify sounds. For example, small mammal bats can use complex articulation including isolated calls, courtship calls, and territorial calls to promote echolocation and social behavior, and can adapt echolocation and social calls containing individual and gender information to their social environment (Vernes, 2017). Some large mammals also have the ability of vocal learning, for example, elephants, sea lions, and seals that had been kept in captivity for a long time can learn simple human language (Le Boeuf and Peterson, 1969;Ralls et al., 1985;Poole et al., 2005). In addition, among marine mammals, cetacean calls, and dolphin whistles have their own specific frequencies, and their acoustic signals are used to maintain contact between individuals when they are separated (Janik, 2014). Recent research on one particular species of rodents has shown that naked mole-rats (Heterocephalus glaber) from different regions can produce sounds with unique group information, similar to dialects (Barker et al., 2021). More evidence is needed, of course, to prove whether naked mole-rats are capable of vocal learning.
In primates, human vocal learning is undoubtedly the most complex and one of the most important behavioral bases of human language (Tyack, 2020). Through vocal learning, humans can imitate individual and continuous sounds and adjust the pronunciation by auditory feedback system (Tyack, 2020). Thus, human language is an auditory-directed vocal learning behavior, which is a hallmark function that distinguishes humans and other vocal learners from vocal non-learning animals, including primates (Hurford, 2003). However, the evolution of human language does not appear to have involved any single evolutionary mechanism unique to humans (Locke and Bogin, 2006). At the same time, human language, including spoken and signed language, can be regarded as a gesture system, i.e., a way of communicating specific information through visible body and hand movements (Liebal and Call, 2012). In fact, primate gesture systems are so well developed that they can generate and perceive hand movements. Therefore, human language is thought to have evolved from the gesture system, in which simple words were expressed by gestures at the beginning, but as communication became more frequent, complex spoken words replaced gestures as a new form of communication (Rizzolatti and Craighero, 2004). This hypothesis about the gesture system is consistent with Jarvis's hypothesis that spoken language and sign language are equivalent to speech and signing respectively (Jarvis, 2019).

Similarities between bird song and human language
Research in the 1950s established that bird singing is a learned behavior (Nottebohm, 2014). Young birds need to imitate and practice the parent bird songs to form their own songs with complex acoustic characteristics. Birds with vocal learning behaviors, including parrots, songbirds, and hummingbirds, especially songbirds, whose songs are mainly for territorial defense and courtship behaviors (Thoms and Jürgens, 1987;Langmore, 1998;Rogers et al., 2006), have been widely used as model animals for studying the neural mechanisms of learned vocalization (Kao et al., 2005). A songbird's song usually consists of several syllables, which form a fixed or variable pattern of syllable combinations (Mooney, 2022). The process of song learning in songbird juveniles is similar to that of human language learning in human infants (Figure 1), which also requires the participation of auditory feedback. It can be divided into sensory Frontiers in Psychology 02 frontiersin.org stage (storing the learned song or language in the brain through the interaction between innate factors and the environment) and sensorimotor stage (refining the template song or language for output) (Prather et al., 2017). Moreover, the acquisition of the ability of auditory-vocal learning in both humans and songbirds occurs most rapidly during a critical early juvenile stage, the sensitive period (Doupe and Kuhl, 1999;Brainard and Doupe, 2002). The best time to learn their mother tongue is when human children are between 6 and 12 months old and begin to understand the external language and learn pronunciation (Hurford, 1991). In the case of songbirds, such as zebra finches (Taeniopygia guttata), juveniles learn the songs of their relatives during the sensitive period and gradually develop their own personalized and lifelong repertoire (Prather et al., 2017). Another important feature of human language is flexible control over complex syntactic structures, such as the repeated reordering of a set of words. Another songbird, Bengalese finches (Lonchura striata domestica), has the ability to control the ordering of syllables, which is similar to human control of the syntactic structure of language (Veit et al., 2021). According to these features, songbird song behavior has a high degree of similarity with human language function. Surprisingly, a recent study found that Australian musk ducks (Biziura lobata), a member of the Anseriformes family, also have vocal learning behavior, which provides more diversified information for deciphering the evolution of human language (Ten Cate and Fullagar, 2021).

Neural structures of controlling vocal learning behavior
The neural control of vocal learning behaviors does not rely on a single pathway or even a single brain region but is accomplished through the collaboration of related brain regions forming different pathways.
3.1. Neural control pathways of vocal learning behavior in mammals, including humans 3.1.1. The limbic vocal control pathway The limbic vocal control pathway in mammals, including primates, mainly controls innate vocalizations such as calls, crying and laughter. The periaqueductal gray (PAG) plays a central role in this pathway, as evidenced by the fact that damage to PAG results in complete vocal inability in cats, monkeys and humans (Adametz and O'Leary, 1959;Jürgens and Pratt, 1979;Esposito et al., 1999). PAG receives strong projections from the limbic system, anterior cingulate, insula, and orbitofrontal cortex. At the same time, PAG has strong projections dominating the nucleus ambiguus (Am) (Figures 2A, B). Am is the only motor neuron group that is directly involved in vocalization and can innervate the soft palate, pharynx, larynx, and diaphragm, intercostal muscles and abdominal muscles, which determine the intra-abdominal, intrathoracic, and subglottic pressures, and the control of these pressures is necessary to vocalization (Holstege and Subramanian, 2016).

The laryngeal motor cortical pathway
In order to combine individual articulation into sentences, human speech requires the involvement of the laryngeal motor cortical pathway. The human laryngeal motor cortex (LMC), located ventral to the primary motor cortex, is responsible for fine motor control of voluntary vocalization such as speech and singing, as well as regulating the spontaneous production of innate vocalization (Simonyan and Horwitz, 2011). Natural and fluent speech requires flexible control of pitch and pronunciation. In humans, this voice control function is distributed in two LMC subregions of each hemisphere, the dorsal LMC (dLMC, located between the cortical representations of the lips and the hands) and the ventral LMC (vLMC, occupying parts of the subcentral gyrus and the rolandic operculum) (Bouchard et al., 2013;Pfenning et al., 2014;Neef et al., 2021). In particular, the tone modulation of speaking and singing is thought to be mainly controlled by dLMC (Dichter et al., 2018). Studies of persistent stuttering symptoms have shown that the cause of speech fluency disorder is the loss of white matter in the left vLMC, resulting in the separation of vLMC and the left lateral language area (Sommer et al., 2002), which indicates that vLMC mainly controls the fluency of speech, and further supports the conclusion that vLMC is functionally separate from dLMC (Neef et al., 2021).
The laryngeal motor cortical pathway is directed from dLMC/vLMC to Am, which coordinates laryngeal muscle movements and respiratory rhythm to precisely control vocalization (Iwatsubo et al., 1990). dLMC/vLMC can also project to PAG and indirectly send commands to Am through the limbic vocal control pathway ( Figure 2B; Simonyan and Horwitz, 2011). If bilateral LMC is damaged or diseased, it will make the patient unable to speak and sing, but does not affect non-verbal vocalization, such as crying and laughing (Jürgens, 2002), indicating that LMC is not essential for the production of innate vocalizations, but is critical to human spoken vocalization. Unlike human LMC, which is located in the primary motor cortex, non-human primate LMC is located in the area 6 of premotor cortex (it is proposed to be premotor vLMC, similar to human vLMC) (Simonyan, 2014). This difference deserves special attention and may represent the evolutionary direction toward voluntary vocalization in humans (Simonyan and Horwitz, 2011).
In primates, another connection between LMC and the limbic vocal control pathway exists at the brainstem reticular formation (RF) (Figure 2A), particularly in the dorsal and parvocellular reticular nuclei of RF, which further forms direct connections with laryngeal motor neurons in Am, joint motor neurons in the trigeminal motor nucleus, the facial nucleus, the hypoglossal nucleus, and expiratory motor neurons in the thoracic and upper lumbar spinal cord (Thoms and Jürgens, 1987). Because the lack of direct projections from LMC to Am in non-human primates reduces the ability to directly modulate the activity of brainstem laryngeal motor neurons, the functional properties of RF are more important to vocal motor control in non-human primates than in humans (Iwatsubo et al., 1990).

Human language learning pathway
In addition to the vocal motor pathway (VMP), human language learning, including the memory of vocabulary and grammar, relies on an additional forebrain pathway, the cortexstriatum-thalamus loop, consisting mainly of motor language Frontiers in Psychology 03 frontiersin.org center Broca's area (in the posterior half of the left inferior frontal gyrus), the anterior striatum (ASt) and the anterior thalamus speech area (aT) (Figure 2B), and this language learning pathway is considered to be unavailable to non-human primates (Buckner et al., 1999;Jarvis, 2004;Gajardo-Vidal et al., 2021). Voluntary production of words and sentences through the motor cortex requires a large amount of memory and involves the activity of a large number of neurons, many of which are located in the Broca's area (Holstege and Subramanian, 2016). Broca's area is responsible for language acquisition and high-level spoken language function, plays an important role in understanding and producing complex grammar and other language functions, and is a key node for manipulating and transmitting neural information in the large cortical network responsible for key components of language generation (Davis et al., 2008). Broca's area is associated with several linguistic processes, including syntactic processing and unification, which involve the segmentation and concatenation of different types of linguistic information (Burton et al., 2000;Friederici, 2002). Although reading and repeating individual words does not involve semantic and syntactic processing, it does require the association of syllable sequences and motion gestures. Studies have shown that this association is coordinated by the interactions between Broca's area and the temporal cortex, which processes auditory information, and the frontal cortex, which is responsible for motor function (Flinker et al., 2015). And Broca's area is interconnected with LMC, so LMC receives instructions from Broca's area (Flinker et al., 2015). However, it was recently reported that damage to Broca's area alone does not affect long-term speech production after left frontal stroke, whereas that persistent speech production impairments can result from co-damage to Broca's area and its adjacent white matter (Gajardo-Vidal et al., 2021).
Meanwhile, as a core component of human motor skill learning, ASt receives signal input from Broca's area and remains activated for learning new words during the process of learning mother tongue in early childhood and second language in adulthood, indicating that ASt plays a key role in the process of language learning and memory (Simmonds et al., 2014). And the role of aT in speech, in addition to affecting the clarity of expression, may involve the mutual coordination of respiration and speech production (Bhatnager et al., 1989). Interestingly, the thalamus showed increased activity of predominantly leftsided neurons in response to language (Gogolitsin and Nechaev, 1990), consistent with the left-sided brain characteristic of human language. Neural pathways involved in innate vocalization and vocal learning. (A) Vocalization-related pathways in non-human primate chimpanzees (based on Kaas, 2012;Simonyan, 2014): the limbic vocal control pathway in gray; the laryngeal motor cortex pathway in blue. (B) Vocalization-related pathways in humans (based on Jarvis, 2007;Simonyan, 2014;Neef et al., 2021): the limbic vocal control pathway in gray; the laryngeal motor cortical pathway in blue; the language learning pathway in red. (C) Vocalization-related pathways in songbird zebra finches (based on Nottebohm, 1991;Jarvis, 2004): the innate brainstem vocal pathway in gray; the vocal motor pathway in blue; the anterior forebrain pathway related to song learning in red. The brain regions of the same color in songbirds and humans are analogous. The yellow part is the auditory system. LMC, laryngeal motor cortex; dLMC, dorsal LMC; vLMC, ventral LMC; RF, reticular formation; PAG, periaqueductal gray; Am, nucleus ambiguus; ASt, anterior striatum; aT, anterior thalamus speech area; HVC, used as a proper name; RA, robust nucleus of the arcopallium; LMAN, lateral part of the magnocellular nucleus of the anterior neostriatum; DLM, medial portion of the dorsolateral nucleus of the anterior thalamus; DM, dorsal medial midbrain nucleus; nXIIts, tracheosyringeal part of hypoglossal nucleus.
It is traditionally believed that the superior temporal regional cortex (the sensory language center Wernicke's area and its surrounding areas) is involved in the perception and memory of speech (Viceic et al., 2006). Although non-human primates also share homologous Broca-like and Wernicke-like areas with humans, damaging Broca-like area of monkeys and chimpanzees does not affect vocalization. The main reason is that their calls rely on the limbic vocal control pathway, rather than Broca-like area, which serves to understand gestures and facial emotions (Graïc et al., 2020). Meanwhile, Wernicke's area supports the brain to understand articulatory phonemes and sequences, a process necessary for language production, including repetition of pronunciation, word extraction and reading aloud (Binder, 2015). Moreover, understanding the grammatical relations of words in sentences is fundamental to human language and unique to humans (Hauser et al., 2002;Marslen-Wilson and Tyler, 2007).
The differences between human and other primates in the language system are not only in the function of brain regions, but also in the location of the vocal organ, i.e., the larynx. In early childhood, the position of the larynx in humans is not much different from that of chimpanzees, but the human larynx rapidly descends to neck in early juvenile life. The descending position of the larynx contributes to the development of the human respiratory and digestive tracts and the formation of language function (Nishimura, 2005;Nishimura et al., 2008). However, this may not be the main reason why chimpanzees and other primates do not have human vocal learning behavior, as the decline of the larynx is not unique to humans, and the main reason may still be differences in brain structure, rather than differences in the anatomy of vocal organs (Fitch et al., 2016;Boë et al., 2017;Fitch, 2018;Jarvis, 2019).

Neural control of instinctive vocalization in birds
The instinctive vocalization of all birds relies on a mammalianlike brainstem vocal control pathway, from the dorsomedial nucleus of the intercollicular complex (DM) in the midbrain to the tracheosyringeal part of hypoglossal nucleus (nXIIts) in the brainstem (Figure 2C), similar to the primate midbrain PAG and brainstem Am, respectively (Wild et al., 1997;Jarvis, 2004). The downstream projection pattern of DM in non-vocal learning birds and vocal learning birds is consistent with its role in respiratory-vocal regulation, and its neurons may project to both nXIIts vocal motor neurons and respiratory premotor neurons to realize the coordination between vocalization and respiration (Wild et al., 1997).

Vocal learning pathways in songbirds similar to humans
Comparative studies of gene expression pattern of adult animals and molecular embryology indicate that avian and mammalian brains share an analogous cortex-basal gangliathalamus-cortex circuit associated with vocal learning behavior (Reiner et al., 2004;Jarvis et al., 2005). It is suggested that the avian and mammalian vocal behaviors have similar neural structural basis. Hummingbirds show complex vocal abilities in social activities, but so far there is no in-depth research on their vocal learning (Ferreira et al., 2006;Duque and Carruth, 2022). Parrots also have the ability to learn vocalizations, but knowledge of their learning process is still limited (Ten Cate, 2021). However, the richness of songbird species (passerine birds, about 4,000 species), the diversity of their vocal learning characteristics and the convenience of breeding and captivity make songbirds the most well-studied branch of birds for vocal learning behavior (Ten Cate, 2021). The distinctive birdsong of white-crowned sparrows (Zonotrichia leucophrys) was first described by Marler and Tamura (1964). It was not until the 1970s that Nottebohm et al. (1976) from Rockefeller University discovered the neural pathways related to song vocalization and learning in the brain of canaries (Serinus canaria). Pfenning et al. (2014) compared the gene expression profiles of zebra finches and humans, showing that the telencephalon of songbirds is similar to the telencephalon of humans, and avian telencephalic subdivisions are similar to different subdivisions in mammals, and the brainstem nuclei of songbirds also correspond to the brainstem nuclei of humans. In the telencephalon, the cortex of songbirds closely resembles the human cerebral cortex, and the striatum of songbirds corresponds to the human striatum. Surprisingly, much of the neurobiological knowledge of human vocal learning has been inferred from the studies of songbirds (Saito and Maekawa, 1993;Doupe and Kuhl, 1999;Jarvis, 2004;Simmonds, 2015). Most of the vocal control nuclei in songbirds are located in the cortex, and two nuclei are located in the striatum and thalamus respectively, forming two interrelated song control pathways. One is the VMP, the other is the anterior forebrain pathway (AFP), collectively known as the song control system (Figure 2C).

Analogy of vocal motor pathways in songbirds and humans
The accurate song of songbirds depends on the regulation of VMP, which consists of the song premotor nucleus HVC (used as a proper name) and the robust nucleus of the arcopallium (RA) in the telencephalon and nXIIts in the brainstem (Figure 2C; Marler and Doupe, 2000). HVC is not only the initiating brain region of VMP, but also the main input source of AFP, which is responsible for encoding the motif song, concurrently receives input from the auditory system and respectively transmits the integrated auditory information to RA and the striatal song control nuclei of AFP (Yu and Margoliash, 1996). These functions are performed by two groups of neurons within HVC that project to RA and striatum respectively. They play different roles in encoding song or regulating vocal plasticity, and their corresponding neural activity characteristics during singing are also different (Hessler and Okanoya, 2018). The observation of local field potential (LFP) signals in male zebra finches during singing indicates that the characteristic changes of time frequency structure of HVC LFP may correspond to specific syllables in the motif song. In addition, the HVC LFP signal features are similar to those LFP signals associated with motor control in mammals, including humans and non-human primates (Brown et al., 2021). Language-related premotor neural activity was found early in the human Broca's area by electrophysiological recordings (Fried et al., 1981;Jarvis, 2004), while this area also receives signal input from the temporal auditory cortex and transmits the integrated auditory information to LMC and ASt, respectively (Doupe and Kuhl, 1999;Bolhuis and Gahr, 2006;Bolhuis et al., 2010). The idea that songbird HVC shares some similarities with human Broca's area has thus been partially accepted. However, the comparison of lesion experiments suggests that the cortical nucleus in the songbird AFP is more analogous to human Broca's area (Jarvis, 2004). Recent results of cellular transcriptomics further revealed the evolutionary features of songbird VMP. Although HVC and RA are not homologous with the mammalian neocortex, their similarity in cell types and connection mode suggests that VMP may have evolved to functionally resemble the mammalian neocortex (Colquitt et al., 2021). With the further study, the results of gene expression lineage analysis showed that the types of songbird HVC neurons are similar to those of human LMC layers 2-3, and human LMC layers 2-3 neurons project to LMC layer 5, just like songbird HVC neurons project to RA (Pfenning et al., 2014;Jarvis, 2019;Gedman et al., 2022).
RA is another major song premotor nucleus in the songbird forebrain and encodes important acoustic features of birdsongs (Sizemore and Perkel, 2008). RA is also the intersection nucleus of VMP and AFP, which integrates and encodes the input information from the superior nucleus HVC and AFP into the downstream nucleus nXIIts, and regulates the syringeal muscles and respiratory muscles to produce song behavior (Simonyan and Horwitz, 2011). RA dorsal neurons project to DM and modulate respiration and vocalization (Wild et al., 1997); RA ventral neurons project to nXIIts, which modulate syringeal muscle movements and ultimately control singing (Vicario, 1994). Functionally, both songbird RA and human LMC are vocal motor control brain regions, and damage to RA and LMC would cause both songbirds and humans to be unable to vocalize properly (Simonyan and Horwitz, 2011). Transcriptomic studies confirmed that songbird RA shares part of gene transcriptional profile with human LMC (Pfenning et al., 2014;Gedman et al., 2022). Further gene expression lineage alignment showed that the types of RA neurons are similar to those of human LMC layer 5 (Pfenning et al., 2014;Jarvis, 2019;Gedman et al., 2022). Recently, it has been reported that RA projection neurons exhibit electrophysiological features similar to those of specialized large pyramidal neurons in mammalian primary motor cortex, such as robust high-frequency firing, ultra-narrow spike waveforms, superfast Na + current inactivation kinetics, and large resurgent Na + currents (Zemel et al., 2021). In addition, it has been shown that the acoustic characteristics of learned song can be significantly affected by pharmacologically weakening or enhancing the activity of inhibitory interneurons in RA (Miller et al., 2017). This is similar to the extensive involvement of inhibitory interneurons in the regulation of motor planning and execution in the mammalian motor cortex (Merchant et al., 2012).

Analogy of songbird song learning pathway and human language learning pathway
Songbirds also have a song learning pathway, AFP, which is similar to human language learning pathway and consists of the lateral part of the magnocellular nucleus of the anterior neostriatum (LMAN), the avian basal ganglia area X and the medial portion of the dorsolateral nucleus of the anterior thalamus (DLM) to form the cortex-basal ganglia-thalamus circuit (Figure 2C; Sizemore and Perkel, 2008). AFP is critical to birdsong plasticity, which modulates the effects of social signals on song behavior (Kao et al., 2008), and provides an ideal system for studying the role of cortexbasal ganglia circuit on experience-dependent skill learning (for example, mother tongue learning of infants) (Achiro et al., 2017). Area X is a unique region in the basal ganglia of songbirds that is critical to song learning, which receives afferents from both LMAN of AFP and HVC of VMP, and is analogous to the mammalian striatum (Sasaki et al., 2006). All major physiological cell types found in the mammalian striatum exist in the avian area X, and both have nearly identical histochemical properties (Farries and Perkel, 2002). However, studies have shown that area X also contains neurons with the characteristics of the pallidum (Carrillo and Doupe, 2004). Two pallidal cell types in area X can be distinguished on the basis of singing-related neural activity, one of which is similar to thalamus-projecting neurons in the primate internal pallidal segment and the other is similar to non-thalamus-projecting neurons in the primate external pallidal segment (Goldberg et al., 2010). It has also been reported that the electrophysiological activities of two interneuron populations in area X, fast-spiking interneurons and external pallidal neurons, are different in response to the three behavioral states of nonsinging, undirected singing and female-directed singing in male zebra finches, suggesting that social context may differentially modulate activity of multiple neuron types in area X (Woolley, 2016). The results of lesion experiments support the idea that songbird area X is functionally more similar to human ASt (Jarvis, 2007). Lately, it has been reported that damage to area X can cause lasting changes in cells and gene expression in its upstream and downstream nuclei, and may trigger neuroprotective mechanisms in the brain regions connected with it (Lukacova et al., 2022). Both songbird area X and human ASt are activated in response to the task demand of attempting completely novel articulatory motor sequences, and decline rapidly during the subsequent "habituation" process (Simmonds et al., 2014). However, songbird area X remains active after the adult birdsongs have stereotyped, which may be a difference from humans (Jarvis and Nottebohm, 1997;Hessler and Doupe, 1999;Simmonds et al., 2014). Furthermore, a recent analysis of genome-wide data of human rhythm and songbird vocal learning showed that several sets of genes associated with song behavior expressed in area X of zebra finches were significantly enriched in the gene structure of human beat synchronization, which supports the genetic and evolutionary correlation between the two rhythm-related behaviors, human beat synchronization and songbird singing (Gordon et al., 2021).
Cortical nucleus LMAN receives afferents from DLM and is the output nucleus of AFP that projects to RA of VMP (Luo et al., 2001), which is also a nucleus necessary for the song acquisition process of juvenile songbirds and plays a key role in adult songbirds producing different types of songs in different environments (Bottjer and Altenau, 2010;Achiro et al., 2017). It was found that when juvenile zebra finches are learning to sing, the firing patterns of individual neurons in core and shell subregions of LMAN were related to the acoustic similarity of learned tutor syllables, and the response variability of shell but not core subregion neurons decreased with the development and song learning process (Achiro et al., 2017). Moreover, damage to LMAN will result in a gradual decrease in the variability of songs, eventually becoming a single rigid song (Woolley et al., 2014). Jarvis (2004) suggested that in humans, not only Broca's area but also the premotor LMC (preLMC) is involved in speech acquisition and advanced speech functions. Although the functional deficits caused by human preLMC damage are more complex, both the functional deficits caused by LMAN damage in songbirds and preLMC damage in humans result in reduced or even absent language imitation learning ability. In contrast, a recent study showed that enhancing the activity of LMAN can induce plastic changes in the acoustic structure of birdsongs, and cause singing repetitions and pauses similar to human stuttering symptoms (Chakraborty et al., 2017;Moorman et al., 2021).
DLM receives afferents from area X, and is analogous to the intralaminar nuclei of the mammalian thalamus (Nicholson et al., 2018), which is thought to be functionally similar to human aT, and is involved in the regulation of songbird song behavior (Jarvis, 2004). The electrophysiological properties of most DLM neurons are very similar to those of mammalian thalamocortical neurons, thus suggesting the conservation of thalamic neuron function in vertebrates (Luo and Perkel, 1999). Surprisingly, it was found that Bengalese finch DLM can also project to area X, suggesting that songbird area X receives feedback from thalamic regions while projecting to these regions, and further demonstrating the functional similarity between songbirds' basal ganglia and mammalian basal ganglia (Nicholson et al., 2018). Damage to both songbird DLM and human anterior thalamus can lead to vocal behavior disorders, and in humans there is temporary silence followed by aphasia, sometimes more severe than damage to ASt or premotor cortex, probably due to further convergence of striatal inputs to the thalamus (Graff-Radford et al., 1985;Halsema and Bottjer, 1991).

Conclusion and prospect
In summary, the brainstem vocal control pathways that control innate vocalizations exist in almost every species of mammals and birds, but only a few species, including humans and some birds, possess vocal learning abilities and related neural pathways. Accidental discoveries of individual cases of animals imitating human speech, such as elephants, seals, and parrots, suggest that there may be other species' vocal learning behaviors that have not yet been deciphered. Meanwhile, the differences of complex vocal behaviors and their neural control between mammals and birds predict that the formation of vocal learning behaviors in different species may have multiple independent origins (Tyack, 2020). In recent decades, it has been clearly understood that human beings and songbirds at a completely different evolutionary level have similar evolutionary paths of vocal behaviors. The results of studies at the level of genomics and transcriptomics suggest the potential analogy of neural pathways related to vocal learning between the two species.
The ethology of songbird song learning has been studied for decades from its inception to its interdisciplinary study with neurobiology. The anatomical structure of songbird song control system and its role in regulating song behavior have been comprehensively understood (Nottebohm, 2005;Jarvis, 2019). The effects of neurotransmitters, hormones, neurotrophins, and other bioactive substances on the song behavior of songbirds remain to be further studied (Meng et al., 2016(Meng et al., , 2017Tanaka et al., 2018;Wang et al., 2019Wang et al., , 2020Jaffe and Brainard, 2020;Macedo-Lima and Remage-Healey, 2020;Miller et al., 2020;Zhang et al., 2022). Some cutting-edge technologies have pushed the field to a deeper level. Many speech disorders may be related to neurotransmitter signaling (Anderson et al., 1999;Craig-McQuaide et al., 2014). A study using a combination of optogenetics and gene manipulation techniques has shown that singing disorders in songbirds may be related to dopaminergic signaling in area X, which may be similar to the occurrence of language disorders (Xiao et al., 2021). However, it is still unclear how various neurotransmitters, hormones and neurotrophins regulate songbird singing behavior through related neural pathways. Optogenetics, chemogenetics and other targeted neural pathway manipulation techniques can be a key link between behavior and neural activity (Singh Alvarado et al., 2021). In the meantime, the related cell types and gene expression patterns of birds and mammals were compared by single-cell sequencing technology to reveal their evolutionary analogy (Colquitt et al., 2021). Commonly used songbird models were gene-edited using CRISPR/Cas9 technology to make them more widely applicable for multi-purpose studies . The effects of experience and internal and external environment on the neurogenome or transcriptome of songbirds and their correlation with singing behavior were revealed by epigenomics studies (Kelly et al., 2018). These studies may be the focus of birdsong neurobiology.
More challengingly, how vocal learning changes with time and experience in different ages, the patterns of activity and association of relevant brain regions, and how auditory feedback plays a role are key issues that both human language and bird song research fields share and need to address (Doupe and Kuhl, 1999). However, many of the invasive experiments exploring the physiological mechanisms of vocalization, including language learning, cannot be performed in healthy humans. Given that avian song learning may share the underlying cellular and molecular regulatory mechanisms with human language learning, drawing on the research model of avian song behavior studies could shed light on the neural mechanisms of human language learning and the treatment of language disorders (Medina et al., 2022).

Author contributions
YZ: investigation and writing-original draft. LZ: investigation. JZ: visualization. SW: funding acquisition. WM: conceptualization, validation, supervision, funding acquisition, and writing-review and editing. All authors contributed to the article and approved the submitted version.

Funding
This work was supported by the National Natural Science Foundation of China (32160123, 31660292, and 31860605), the Key Project of Natural Science Foundation of Jiangxi Province (20212ACB205002), the Natural Science Foundation of Jiangxi Province (20202BABL205022 and 20212BAB205003), and the Innovation Foundation of JXSTNU (YC2021-X10).

Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.