ZP4 Is Present in Murine Zona Pellucida and Is Not Responsible for the Specific Gamete Interaction

Mammalian eggs are surrounded by an extracellular matrix called the zona pellucida (ZP). This envelope participates in processes such as acrosome reaction induction, sperm binding, protection of the oviductal embryo, and may be involved in speciation. In eutherian mammals, this coat is formed of three or four glycoproteins (ZP1–ZP4). While Mus musculus has been used as a model to study the ZP for more than 35 years, surprisingly, it is the only eutherian species in which the ZP is formed of three glycoproteins Zp1, Zp2, and Zp3, Zp4 being a pseudogene. Zp4 was lost in the Mus lineage after it diverged from Rattus, although it is not known when precisely this loss occurred. In this work, the status of Zp4 in several murine rodents was tested by phylogenetic, molecular, and proteomic analyses. Additionally, assays of cross in vitro fertilization between three and four ZP rodents were performed to test the effect of the presence of Zp4 in murine ZP and its possible involvement in reproductive isolation. Our results showed that Zp4 pseudogenization is restricted to the subgenus Mus, which diverged around 6 MYA. Heterologous in vitro fertilization assays demonstrate that a ZP formed of four glycoproteins is not a barrier for the spermatozoa of species with a ZP formed of three glycoproteins. This study identifies the existence of several mouse species with four ZPs that can be considered suitable for use as an experimental animal model to understand the structural and functional roles of the four ZP proteins in other species, including human.


INTRODUCTION
The zona pellucida (ZP) is an extracellular coat that surrounds mammalian oocytes and early embryos. This envelope participates in important events during fertilization and early embryo development, such as the species-specific gamete recognition, acrosome reaction induction, preventing polyspermy, and protecting the oviductal embryo (Yanagimachi, 1994;Dean, 2007;Wassarman and Litscher, 2009;Gupta and Bhandari, 2011;Gupta et al., 2012;Tanihara et al., 2013;Shu et al., 2015). The composition of the ZP matrix has been elucidated in many species, and has been seen to be composed of three to four glycoproteins in eutherians (Bleil and Wassarman, 1980;Hedrick and Wardrip, 1987;Lefièvre et al., 2004;Hoodbhoy et al., 2005;Ganguly et al., 2008;Goudet et al., 2008;Izquierdo-Rico et al., 2009;Stetson et al., 2012Stetson et al., , 2015Moros-Nicolás et al., 2018c), and four to seven in marsupials and monotremes (Frankenberg and Renfree, 2018;Moros-Nicolás et al., 2018a;Wu et al., 2018). Indeed, the composition of ZP is more variable than was previously expected. Eutherian mammals can be classified into three categories according to their ZP protein composition: (a) species with four glycoproteins (ZP1, ZP2, ZP3, and ZP4) such as human, rat, hamster, horse, rabbit, cat, cheetah, ferret, tiger, panda, polar bear, and walrus (Lefièvre et al., 2004;Hoodbhoy et al., 2005;Izquierdo-Rico et al., 2009;Mugnier et al., 2009;Stetson et al., 2012Stetson et al., , 2015Moros-Nicolás et al., 2018c); (b) species whose ZP is formed of ZP2, ZP3, and ZP4 (pig, cow, marmoset, tarsier, dog, Weddell seal, and Antarctic fur seal) (Hedrick and Wardrip, 1987;Noguchi et al., 1994;Goudet et al., 2008;Stetson et al., 2012;Moros-Nicolás et al., 2018c); and (c) species whose ZP is formed of ZP1, ZP2, and ZP3 (house mouse) (Bleil and Wassarman, 1980). Studies on the molecular evolution of the ZP family has helped to better understand the species-specific differences in the ZP composition. However, there is no consensus in relation with the ZP nomenclature or the number of ZP subfamilies (Spargo and Hope, 2003;Goudet et al., 2008;Feng et al., 2018;Wu et al., 2018). The first events in the ZP evolution occurred before the evolution of the first amphibians (Spargo and Hope, 2003). The ancestral ZPC gene and the precursor of ZP2, ZP4, ZPD, and ZPAX subfamilies appeared after a gene duplication event (Spargo and Hope, 2003). This precursor duplicated several times over a short period of evolutionary history, and led to the ancestral ZPAX gene and the ancestral of ZP2, ZP4, and ZPD genes. Afterwards, duplication events have occurred in several lineages, the most important during early evolution of the amniotes and giving rise to ZP1 and ZP4 groups within the ZPB subfamily (Hughes and Barratt, 1999;Bausek et al., 2000;Goudet et al., 2008). Thus, it was assumed that ZP1 and ZP4, previously considered orthologs, are in fact paralogues. Some species retain the two copies of the ancestral gene (ZP1 and ZP4, in the four glycoprotein model), and others conserved only one (ZP1 or ZP4, the three glycoprotein model). In this last case, one of the copies (ZP1 or ZP4) was lost after a duplication event due to a pseudogenization process (Goudet et al., 2008).
On the other hand, surprisingly, the pseudogenization of ZP4 has been described only in the house mouse (Mus musculus) and in two South American marsupials (common opossum and gray short-tailed opossum) (Goudet et al., 2008;Moros-Nicolás et al., 2018a). In marsupials, this pseudogenization occurred after the split between the South American and Australasian marsupials dated at 80 MYA and before the divergence of common opossum and gray short-tailed opossum, between 20 and 30 MYA (Meredith et al., 2008;Jansa et al., 2014).
To date, ZP2 and ZP3 proteins are present in all species, which means that the functions of these proteins are essential; indeed, mouse Zp2 and Zp3 are indispensable for fertilization and embryo development (Liu et al., 1996;Rankin et al., 1996Rankin et al., , 2001. Moreover, ZP2 was proven to be the primary sperm receptor in mice and human Burkart et al., 2012;Avella et al., 2014Avella et al., , 2016. Mouse ZP is formed of three proteins: Zp1, Zp2, and Zp3. However, Zp4 is transcribed in Mus musculus oocytes but lacks a protein product due to the presence of several stop codons in its open reading frame (ORF) (Lefièvre et al., 2004;Evsikov et al., 2006;Goudet et al., 2008). Moreover, mass spectrometry analysis has failed to identify this protein (Boja et al., 2003).
Ultrastructural evidences suggest that mouse ZP is composed of filaments. Three different models were described; the first one suggests a filamentous structure where Zp2-Zp3 heterodimers are the basic repeating units of the filaments with cross-linking of filaments by dimeric Zp1 (Greve and Wassarman, 1985;Wassarman, 1988). The second one proposed by Dean in 2004, describes a ZP formed by repeats of Zp3-Zp2 and Zp3-Zp1 heterodimers that form the main fibrillar structure, being bound through the glycoproteins Zp1 and Zp2 (Dean, 2004). The third one, is a variation of the first model, so that the Zp1 glycoprotein is incorporated into the long filaments through its ZP domain; therefore in the mouse, the ZP would be formed by a fibrillar framework constituted by long polymers of Zp1-Zp2-Zp3 which are joined to each other by Zp1 homodimers through disulfide bonds forming a three-dimensional structure (Monné and Jovine, 2011;Stsiapanava et al., 2020). However, the ZP structure of the species with four proteins remains unproven.
The house mouse (Mus musculus) is an index species for biomedical research, and has been used as a model to study the ZP for more than 35 years (Liu et al., 1996;Rankin et al., 1996Rankin et al., , 1999Rankin et al., , 2001Baibakov et al., 2012;Avella et al., 2014). Nevertheless, its ZP composition markedly differs from that seen in other mammals, including human. Thus, the lack of a good experimental animal model is one of the major hurdles to fully understanding the functional role of human ZP proteins (Gupta, 2018). Since rat (Rattus genus) has four glycoproteins and the house mouse (Mus genus) only three, Zp4 was probably lost after their divergence around 12 MYA (Jaeger et al., 1986;Jacobs et al., 1989Jacobs et al., , 1990Goudet et al., 2008). In order to determine more precisely when this pseudogenization took place, the Zp4 gene was sequenced in different species of rodent that belong to the same group as Mus and Rattus, the Murinae subfamily, involving a particularly comprehensive taxonomic sampling within the genus Mus (Musser and Carleton, 2005). This study permitted better understanding of the unusual composition of the mouse ZP and has led to the proposal of a new animal model for studying human ZP. In order to ascertain whether the presence of four ZP glycoproteins in its composition could affect fertilization, an in vitro heterologous fertilization study using closely-related murine species was performed.

Pseudogenization of Zp4 Occurs Only in Mus
Our aim was to amplify and sequence the region of genomic DNA encompassing the exons 1-9 of the Zp4 gene in several species of the subfamily Murinae (Table 1). Sequences were aligned to the corresponding genomic portions of DNA of Mus (chromosome 13) and Rattus (chromosome 17). Sequences of the mRNA obtained from Mus mattheyi, Mus pahari, and Mastomys coucha ovaries were added to determine the limits of the exons. The complete sequences (covering exons 1-9) were obtained only for Rattus rattus and Mus minutoides. For the other species, the coverage ranged between 70% (Apodemus flavicollis and Mus minutoides) and 13% (Lemniscomys striatus and Malacomys longipes).
These new sequences were aligned with Zp4 sequences of other muroid rodents found in Genbank or ENSEMBL ( Table 1). The full length alignment (exons 1-9) comprises 36 sequences and 5,140 bp (genomic DNA) and 1,301 bp (coding portion). Two portions of the sequences were scrutinized: the beginning of the gene (from exons 1-3) and the end of our alignment (exons 8 and 9), in both of which stop codons were found in Mus musculus (Goudet et al., 2008). The first fragment in all our samples was successfully amplified (except Lemniscomys striatus) and the second one in thirteen samples.
The results showed that stop codons are present in the first three exons of Zp4 in eight species of the subgenus Mus: M. caroli, M. cypriacus, M. cookii, M. famulus, M. macedonicus, M. musculus, M. spicilegus, and M. spretus (Figure 1); however, they were also present in exons 8 and 9 in M. musculus and M. spretus.
The phylogenetic tree (Figure 2) confirms the monophyly of this subgenus and indicates that the pseudogenization took place after the divergence of the subgenus Mus and before species diversification. Previous studies reported that the four subgenera of Mus diverged between 6 and 7 MYA (Lecompte et al., 2008;Pagès et al., 2012;Meheretu et al., 2015). Within the subgenus Mus the earliest offshoot is estimated to have appeared at around 5 MYA (Pagès et al., 2012), indicating that the pseudogenization took place between 5 and 7 MYA. Zp1, Zp2, Zp3, and Zp4 Are Expressed in Mus (Coelomys) pahari, Mus (Nannomys) mattheyi, and Mastomys coucha Ovaries To determine whether Zp4 pseudogenization affects only the subgenus Mus, it was necessary to confirm the expression of the four ZP genes and proteins in other subgenera belonging to the genus Mus, in our case Nannomys and Coelomys (Musser and Carleton, 2005;Pagès et al., 2012). Individuals of two species-Mus mattheyi (subgenus Nannomys) and Mus pahari (subgenus Coelomys)-were studied. Furthermore, a species from another genus of the Murinae subfamily, Mastomys coucha, was also analyzed. The species were selected according to their availability and phylogenetic interest for this study.
Using RT-PCR analysis, full-length cDNAs of Mus mattheyi and Mus pahari Zp1 (Supplementary Figure 1) and Zp4 (Supplementary Figure 2) were obtained from total RNA prepared from ovaries. The sequence analysis indicated that they have a complete coding region. The open reading frames (ORFs) encode polypeptides with a theoretical molecular weight of 68.61 and 59.51 kDa (Mus mattheyi Zp1 and Zp4) and 68.37 and 59.54 kDa (Mus pahari Zp1 and Zp4).
These genetic sequences would translate a predictive protein in both species with a high degree of similarity to ZP1 and ZP4 proteins of other mammals (Supplementary Figures 3, 4). In the N-terminal region, a signal peptide is present, whose peptidase cleavage site was predicted by means of the Bendtsen et al. (2004) algorithm. The C-terminal region corresponds to the transmembrane domain (TMD), and is followed by a cytoplasmic tail (Krogh et al., 2001). Moreover, a basic amino acid domain upstream of the TMD may serve as a consensus furin cleavage site (CFCS) (Arg-Arg-Arg-Arg/RRRR). The molecules have a conserved ZP domain, which is present in most sequences of envelope glycoproteins in many species. Upstream of the ZP domain, there is a trefoil domain, characteristic of ZP1 and ZP4 proteins, with six Cys residues, as reported for ZP proteins (Bork, 1993), and between the signal peptide and the trefoil domain a single ZP-N domain at their N-termini, as reported previously (Callebaut et al., 2007;Nishimura et al., 2019) (Supplementary Figures 3, 4). The presence of all these domains indicates that the Zp1 and Zp4 of these rodents could share an apparent similar molecular structure with other ZP proteins.
Thus, taking into consideration that ZP2 and ZP3 are present in all vertebrates described to date and that they have never suffered pseudogenization, the presence of a complete ORF of Zp1 and Zp4 mRNA in these murine species suggests a ZP consisting of four glycoproteins. Nevertheless, partial amplification of Zp2 and Zp3 were made in both species to demonstrate the presence of the four transcripts (Figure 3). Furthermore, Zp1, Zp2, Zp3, and Zp4 mRNAs from Mastomys coucha were also partially amplified (Figure 3).
The mRNA of Zp1, Zp2, Zp3, and Zp4 are effectively translated in proteins, as confirmed by the detection of several peptides belonging to Zp1 and Zp4 in Mus mattheyi and Mus pahari and the four proteins (Zp1, Zp2, Zp3, and Zp4) in Mastomys coucha (Table 2) by MS/MS analyses. A total of 12, 6, and 13 different peptides corresponding to Zp4 were identified in the different analyses, yielding a sequence coverage of 30.99, 16.74, and 49.08% for Mus mattheyi, Mus pahari, and Mastomys coucha, respectively (Figure 4).
Taken together, these data indicate that four ZP proteins are expressed in Mus mattheyi, Mus pahari, and Mastomys coucha ovaries.

Heterologous in vitro Fertilization (Oocyte With Three ZP Proteins vs. Four ZP Proteins)
The next question was whether the ZP composition of the egg could interfere with fertilization, for this reason we performed in vitro fertilization experiments with species differing in their ZP composition. Four rodents with different ZP composition were used: Mus musculus with three ZP proteins (Zp1, Zp2, and Zp3) and Mus mattheyi, Mus pahari, and Mastomys coucha with four ZP proteins (Zp1, Zp2, Zp3, and Zp4). The in vitro fertilization rates in a non-competitive context were analyzed ( Table 3).
Oocytes from the four species were co-incubated with spermatozoa in conspecific or heterospecific reciprocal crosses. Fertilization success in the conspecific crosses was high only in Mus musculus (79.16%), whereas in the other species the rate was zero or very low (0% in Mus mattheyi, 0.81% in Mastomys coucha, and 3.5% in Mus pahari). When Mus musculus spermatozoa participated in the fertilization the rates were 67.5% in co-incubation with ova from Mus pahari, 11.7% with ova from Mus mattheyi, and 0% with ova from Mastomys  coucha. The fertilization rate was very low when Mus mattheyi or Mus pahari spermatozoa were used, except for Mastomys coucha spermatozoa with Mus musculus oocytes (67.14%). Our observations showed that Mus mattheyi and Mus pahari sperm were still able to adhere to the heterologous ZP, but we observed that the number of spermatozoa that adhere to the ZP was much lower than in the control group. Besides, we observed that when the sperm adhered to the ZP, they remained immobile (data not shown). Taking into consideration the results obtained in the in vitro fertilization crosses between Mus musculus vs. Mus pahari (67.5%) and Mastomys coucha vs. Mus musculus (67.14%), it can be concluded that the presence of the 3 or 4 ZP proteins is not a limiting factor for in vitro fertilization.

Analysis of ZP4 Positive Selection
Genes with a role in fertilization show a common pattern of rapid evolution, which can be attributed to positive selection. An analysis of such selection in Zp4 gene was made in order to know the level of interspecific divergence and the existence of positive selection sites. The selection analysis pointed to significant positive selection for both muroid datasets. For the dataset of the 28 muroids, both tests (M1a vs. M2a and M7 vs. M8) were significant (p < 0.05 in the former case and p < 0.01 in the latter), and one site (141 R) showed a posterior probability > 95% (Table 4). For the dataset of the 13 muroids, both tests (M1a vs. M2a, M7 vs. M8) were significant (p < 0.01), and one site (547 L) showed a posterior probability > 95% (Table 4).

DISCUSSION
Our results indicate that the pseudogenization of Zp4 in mice is a relatively recent event that took place during the evolution of the genus Mus, a genus that encompasses more than 40 species divided into four subgenera: Mus, Pyromys, Nannomys, and Coelomys (Musser and Carleton, 2005;Shimada et al., 2010;Suzuki and Aplin, 2012). The subgenus Mus is by far the most extensively studied as it includes the cosmopolitan commensal Mus musculus (the house mouse), which has been used as a model to study the ZP and gamete interaction for the last four decades. This subgenus comprises 14 species (Auffray and Britton-Davidian, 2012). The other subgenera are Nannomys, the African pygmy mouse with 19 recognized species, and two South-East-Asian subgenera Coelomys, the shrew mouse with four species, and Pyromys the spiny mouse with five species (Musser and Carleton, 2005).
DNA sampling among the genus Mus included species belonging to the four subgenera, and the mRNA and protein analyses included species of two of these four subgenera for which no data had previously been available (Nannomys and Coelomys).

Analysis of Zp4 Protein in Mus (Nannomys) mattheyi and Mus (Coelomys) pahari
The DNA and mRNA analyses of Mus mattheyi and Mus pahari Zp4 sequences indicated the presence of a coding sequence for a full-length protein. The alignment showed a high degree of conservation: 87.32 and 88.05% with Mastomys coucha, 76.47 and 77.35% with hamster, 82.94 and 83.49% with rat, and 63.82 and 64.38% with human, for Mus mattheyi and Mus pahari, respectively. At the ZP domain, the 10 cysteines found were conserved in all the species. The furin cleavage site, described in human (Kiefer and Saling, 2002) and rat  coincided with the potential sites for the rodents analyzed (Supplementary Figure 4).
Six (Asn50, Asn74, Asn122, Asn209, Asn226, and Asn299) and five (Asn50, Asn74, Asn122, Asn209, and Asn299) potential N-glycosylation sites were identified in the mature protein in Mus mattheyi and Mus pahari, respectively. Of which, Asn122 and Asn209 were identified by proteomics in Mus mattheyi, indicating that these sites are not glycosylated or not always occupied (Figure 4). The potential N-glycosylation sites Asn50, Asn74, and Asn226 have been identified in rat Zp4 , so they seem to be conserved in Murinae; however, Asn299 has not been detected in the rest of the species analyzed, implying that there are differences in the glycosylation pattern between these species and the rat. Further studies are necessary to identify the glycosylation sites and the type of oligosaccharide chain present.
In mature Zp4 protein, a total of 76 and 73 potential Oglycosylation sites were found in Mus mattheyi and Mus pahari, respectively. Some of the peptides identified contained some of these O-glycosylation sites, 25 in Mus mattheyi and 11 in Mus pahari, so that they would be free or partially occupied in the mentioned proteins (Figure 4). O-glycosylation data are only available for sow and rat ZP4 (Kudo et al., 1998;Hoodbhoy et al., 2005). An O-glycosylated region (Thr296, Ser298, Ser301, Ser304, and Thr312) has been described in rat , and is conserved in Mus mattheyi and Mus pahari (Ser298, Ser301, and Ser304), in addition to Thr296, which is only present in Mus mattheyi (Supplementary Figure 4).
Taking into account that HPLC/MS analysis can be considered a semi-quantitative technique, the fact that the coverage for Zp1 in Mus mattheyi and Mus pahari was 7.22 and 7.98%, respectively, compared to the coverage for Zp4 of 30.99% in Mus mattheyi and 16.74% in Mus pahari, this suggests that Zp1 is less abundant in the ZP than Zp4. These results coincide with those published by Boja et al. (2003) for house mouse (ZP coverage of 56% for Zp1, 96% for Zp2 and 100% for Zp3), and for rat with a protein coverage of 52% for Zp1 and 70% for Zp4 Hoodbhoy et al., 2005). They would also agree with the results published for other species, such as rabbit (Stetson et al., 2012) and cat (Stetson et al., 2015), where the coverage of ZP1 is lower than that of ZP4; however, they do not coincide with the results published for hamster, where the coverage of ZP1 was 12.6% and   Peptides with a score higher than 5 and percentage-scored peak (SPI) intensity of 60%, which are the threshold criteria for a positive identification, are shown in italics. "n" represents the number of times that the peptide has been detected.
that of ZP4 11.2% . Future studies using quantitative proteomics are needed to clarify the ZP protein ratios in different species.

Evolution of ZP Proteins in Muroid Rodents
The DNA analysis in different taxa showed the presence of stop codons in the eight species belonging to subgenus Mus and no stop codons in the other Mus species. The gene expression analysis in Mus pahari (subgenus Coelomys) and Mus mattheyi (subgenus Nannomys) clearly demonstrated the presence of four transcripts (Zp1, Zp2, Zp3, and Zp4) using RT-PCR and four ZP proteins using proteomic analysis. These results agree with previous studies that reported the existence of four proteins in the ZP in other placental species like human (Lefièvre et al., 2004), rat , hamster (Izquierdo-Rico et al., 2009), rabbit (Stetson et al., 2012), cat (Stetson et al., 2015), or ferret (Moros-Nicolás et al., 2018c). These four proteins were also present in some marsupials, even though in this group the evolution of the ZP proteins is more complex as there are several copies of the ZP3 gene (Moros-Nicolás et al., 2018a). The four-protein composition could be considered as the ancestral state in eutherian mammals, and ZP1 or ZP4 being lost in some lineages during their evolution. Until now the ZP composition in the Murinae subfamily was only known in the rat (with four glycoproteins) and house mouse (with three glycoproteins), suggesting that Zp4 was lost after their divergence around 11.2 MYA (Aghová et al., 2018). Using comprehensive taxonomic sampling within the subfamily, especially within the genus Mus (with representative taxa of the four subgenera), we were able to narrowed down the approximate date of the loss of Zp4. First, all other murine genera included in our study seem to have a functional Zp4, suggesting a more recent loss meaning that it occurred within the genus Mus, which diverged around 7.2 MYA (Pagès et al., 2012). Second, we found that all Mus species from the Coelomys, Nannomys and Pyromys subgenera have four glycoproteins, while all species from the Mus subgenus have only three, suggesting that Zp4 pseudogenized early in the Mus subgenus lineage. Previous studies have reported that the four subgenera of Mus diverged between 6 and 7 MYA (Chevret et al., 2005;Lecompte et al., 2008;Pagès et al., 2012), initially the subgenus Coelomys, then Nannomys, and finally Mus and Pyromys (Veyrunes et al., 2006). Within the subgenus Mus, the earliest offshoot is estimated to have appeared at around 5 MYA (Pagès et al., 2012), indicating that Zp4 pseudogenization took place 5-7 MYA.
Pseudogenization events are not rare in the ZP family. ZP1 has been identified as a pseudogene in several species (Goudet et al., 2008;Tian et al., 2009;Stetson et al., 2012;Moros-Nicolás et al., 2018c). The ZPAX gene was lost in mammals before the divergence between marsupials and placentals (Tian et al., 2009). These multiple and independent loss events may be explained in part as the consequence of a duplication event, since ZP1 and ZP4 are paralogue genes (Bausek et al., 2000). After duplication, three evolution events may occurred to the duplicate copies: (a) the ancestral function is partitioned and shared by both copies (subfunctionalization); (b) one gene acquires a new function and the other retains the original one (neofunctionalization), or (c) one gene conserves the original function and the other degenerates to a pseudogene (Cañestro et al., 2013).
The duplication took place in vertebrates before the mammalian divergence (Goudet et al., 2008), and after the duplication event, ZP4 or ZP1 may have pseudogenized in some mammalian lineages, while the remaining ZP protein continued to perform the function of the ancestral gene. In the house mouse, the pseudogenization of Zp4 indicates that Zp1 retained the function of the ancestral gene.   The numbers in bold correspond to positively selected sites with P > 95%.
The evolution of male and female reproductive proteins was probably promoted by positive Darwinian selection. Moreover, comparative sequencing studies among taxonomic groups have led to the discovery that reproductive proteins evolve more rapidly than other genes expressed in other tissues (Swanson et al., 2001;Torgerson et al., 2002). This positive selection has been described in seminal plasma proteins (Kingan et al., 2003;Dorus et al., 2004), oviductal proteins (Moros-Nicolás et al., 2018b), and also in other proteins related with fertilization, such as ZP3, CatSper1 or CD9 (Swanson et al., 2001(Swanson et al., , 2003 These proteins would have been under a selective pressure that may be related to male-female interaction, in this case, sperm-egg interaction. Our analysis identified two sites in Zp4 that are under positive selection (141R and 547L). These results strongly suggest that Zp4 gene has been subjected to positive selection during evolution.
These amino acids appear to be important for the protein function. Future studies using direct mutagenesis will be useful to unravel the specific function of these amino acids in ZP4 protein.

Functional Implication of the Presence of ZP4 in the ZP Matrix
Several studies have analyzed the function of different ZP proteins. Recent studies in human demonstrate that mutations in ZP1 gene are related to infertility (Huang et al., 2014;Sun et al., 2019;Yuan et al., 2019), these mutations could affect the shuttling of glycoproteins to the secretory pathway, which would prevent the formation of the ZP around the ova, but also the formation and development of eggs (Huang et al., 2014;Sun et al., 2019;Yuan et al., 2019). The house mouse has provided interesting information on the functions of the different ZP proteins thanks to the use of animals genetically modified as KO and transgenic (Liu et al., 1996;Rankin et al., 1996Rankin et al., , 1999Rankin et al., , 2001Dean, 2004). It was demonstrated that Zp1 offers stability and structural integrity to the matrix. KO mice for Zp1 have an abnormal ZP, being more porous; however, these mice are fertile, although their litter sizes are low (Rankin et al., 1999). On the other hand, KO mice for Zp2 or Zp3 present oocytes that are not surrounded by a ZP (Liu et al., 1996;Rankin et al., 1996Rankin et al., , 2001. In the case of rodents with four ZP proteins, the roles played by Zp1 and Zp4 remain unresolved. The study of ZP4 was not possible until now in animal models because, while the KO technology was well-developed in the house mouse (Mus musculus), which has a pseudogenized Zp4, the technique was very difficult to perform in rat, hamster and rabbit. However, development of CRISPR-cas9 technology made it possible to develop the KO technique in species that possess ZP4 (Fan et al., 2014;Bae et al., 2020). In fact, we have recently reported the phenotype of the female rabbit without the ZP4 gene (Lamas-Toranzo et al., 2019). The female rabbit is subfertile and it was observed that this protein is crucial for the embryo development but not for fertilization (Lamas-Toranzo et al., 2019). Moreover, the ZP was significantly thinner, more permeable, and exhibited a more disorganized and fenestrated structure suggesting a structural role (Lamas-Toranzo et al., 2019). The development of KO animals for ZP4 in other species with four ZP proteins, like the rat or the mice presented in this work could also be a useful tool to study the function of this gene. Furthermore, transgenic mice showing a humanized ZP4 have provided valuable information (Yauger et al., 2011). Indeed, these transgenic mice are fertile; however, their ZP is not recognized by human sperm, which means that ZP4 is not sufficient to support human sperm binding to the ZP (Yauger et al., 2011).
A previous study has shown that heterologous fertilization between different species of rodents is possible, although the success is directly related to the phylogenetic proximity of the species (Roldan et al., 1985): the heterologous fertilization rate in vivo and in vitro is considerably lower than the homologous fertilization rate (Roldan et al., 1985;Roldan and Yanagimachi, 1989;Dean and Nachman, 2009;Martín-Coello et al., 2009). Furthermore, in those cases in which embryo culture was carried out, cleavage arrest or embryo degeneration was observed (Roldan et al., 1985). In our study, heterologous IVF was possible when the spermatozoa from Mus musculus had to fertilize the oocytes from Mus mattheyi and Mus pahari, demonstrating that the presence of Zp4 is not involved in the species-specific binding of sperm. This also means that a ZP formed of four glycoproteins is neither a physical nor biological barrier for the spermatozoa of species with a ZP formed of three glycoproteins, at least in in vitro conditions, indicating that Zp4 does not produce any steric hindrance that impedes the specific gamete interaction.

CONCLUSION
The present study provides new insights into the molecular evolution of Zp4 in rodents showing that Zp4 pseudogenization is restricted to the subgenus Mus, which diverged around 6 MYA. The use of murine species with four ZP proteins may therefore be suitable for studying the structure and functionality of ZP proteins from most species, including humans. These rodents must be considered a tool of great value due to the possibility of applying transgenesis and KO techniques to study the function of these proteins and because of their short reproduction cycle compared to other species.

Ethical Approvals
All animal procedures were performed following the Spanish Animal Protection Regulation, RD 1201/2005 which conforms to European Union Regulation 2003/65. All animal experiments were approved by the institutional review board of the University of Murcia according to the guide for Care and Use of Laboratory Animals as adopted by the Society for the Study of Reproduction.

Animals
Adult females and males of four species of murine rodents were used: the house mouse (Mus musculus), Matthey's mouse [Mus mattheyi (subgenus Nannomys)], Gairdner's shrewmouse [Mus pahari (subgenus Coelomys)], and the southern multimammate mouse Gairdner's shrewmouse (Mastomys coucha). Mus musculus specimens were of the hybrid strain C57CBAF1, purchased from Harlan Ibérica, (Barcelona, Spain), while Mus mattheyi and Mus pahari were obtained from the "Institut des Sciences de l'Evolution de Montpellier" (Montpellier, France) and Mastomys coucha specimens were obtained from Hobbyzoo (Madrid, Spain), whose species were verified by PCR amplification of the cytochrome b. Animals were kept under standard laboratory mouse conditions in an environmentally controlled room with a 14 h light:10 h darkness photoperiod under constant temperature and relative humidity conditions. Animals were provided with food (Harlan Ibérica, Barcelona, Spain) and water, both available ad libitum. Animals to be used for the experiments were weaned when they were 4 weeks old. Males were kept in individual cages and were used when they were >12 weeks old; after weaning, females were housed together. Mus musculus females were used when they were 6-8 weeks old, and Mastomys coucha, Mus mattheyi, and Mus pahari females were used when they were 6-12 weeks old.

Collection of Mouse Ovaries
Ovaries were obtained from three different species of mouse: Mastomys coucha, Mus mattheyi, and Mus pahari. The animals were sacrificed by CO 2 overdose, and the ovaries were obtained and frozen in liquid nitrogen and kept at −80 • C (for molecular and proteomic analysis) or washed in PBS and used directly (to obtain the ZPs).

Collection of Mouse Oocytes
Before oocytes were obtained from Mastomys coucha, Mus mattheyi, Mus musculus, and Mus pahari, the animals were subjected to a hormonal treatment to induce superovulation. Females were injected intraperitoneally with 7.5 IU of equine Chorionic Gonadotrophin (eCG) (Sigma-Aldrich, St. Louis, USA), followed 48 h later by 5 IU of human Chorionic Gonadotrophin (hCG) (Lepori Pharma, Spain). The animals were sacrificed 14 h after hCG injection by cervical dislocation and their oviducts were removed. Cumulus-oocyte-complexes (COCs) were obtained from the ampulla of the uterine tube and placed in PBS (for proteomic analysis) or HTF medium (for IVF analysis), COCs were removed or not by gently pipetting into 0.5% hyaluronidase (Sigma-Aldrich, St. Louis, USA).

Zona Pellucida Isolation in Mastomys coucha
To obtain the isolated ZPs, animals were subjected to ovarian stimulation and the oocytes were obtained as explained above. Cumulus cells (CCs) were removed by using 0.5% hyaluronidase (Sigma-Aldrich, St. Louis, USA) in PBS, and the ZPs were obtained after vigorous pipetting of each oocyte using a narrowbore micropipette in PBS, followed by four washes in PBS to eliminate the oocyte debris. ZPs were solubilized for 30 min at 65 • C (Accu Block TM , Labnet, USA), and kept at −20 • C until use.

In vitro Fertilization
As mentioned above females of the four species were subjected to a hormonal treatment to induce superovulation. In vitro fertilization was performed as previously described by Hourcade et al. (2010). Males were killed by cervical dislocation. The epididymides and vasa deferentia were removed from males of the four mouse species and placed in 1,000 µl of M2 medium, and adipose tissue and blood vessels were removed. The clean structures were placed in a 500 µl drop of Human Tubular Fluid (HTF) medium (BSA supplemented) covered with mineral oil (Sigma-Aldrich, St. Louis, USA), from which spermatozoa were collected. Concentrations were determined with a Bürker hemocytometer. Spermatozoa were incubated in HTF for 30 min at 37 • C with 5% CO 2 in air for capacitation.
Fourteen hours after hCG injection, females were sacrificed by cervical dislocation and their oviducts were removed. COCs were obtained from the ampulla of the uterine tube and using a wide-bore pipette tip, placed in 500 µl of HTF medium. Each sample was inseminated with a final concentration of 1 × 10 6 spermatozoa/ml, and 30 min after, each well was observed under an inverted microscope to assess sperm-oocyte binding. Gametes were co-incubated for 5 h at 37 • C under 5% CO 2 in air, after which, remaining CCs and attached sperm were removed by washing in HTF medium with a fine pipette; oocytes were then washed three times in potassium simplex optimized medium (KSOMaa) and placed in culture drop for 24 h at 37 • C under 5% CO 2 in air. Nine hours after insemination, the extent of successful fertilization was assessed in a group of presumptive zygotes by pronucleus visualization under a microscope. Another group of presumptive zygotes was confirmed 24 h after, by analyzing 2-cell stage embryos.

Pronucleus Visualization
Nine hours after insemination, oocytes were incubated in a 100 µl drop of 10 pg Hoechst-33342 dye (bisbenzimide trihydrochloride, Sigma, Madrid, Spain), before being placed on a coverslip for viewing by a fluorescent microscope (Nikon Optiphot-2). The DNA + H-33342 complex was excited with UV at 355 nm light and epifluorescence emission at 465, and photographed. For this, a G 365 excitation filter, an FT 395 dichromatic beam splitter, and an LP 420 barrier filter were used. Both epifluorescent and brightfield photographs were taken using a Coolpix MDC Lens, Nikon, Japan.

Samples and Genomic DNA Isolation
A total of 23 species of the subfamily Murinae were included in this study (Table 1). DNA was extracted from ethanol-preserved tissues obtained from the collection of Preserved Mammalian Tissues of the "Institut des Sciences de l'Evolution of Montpellier" and from mice of the "Conservatoire Génétique de Souris Sauvages de Montpellier" (Montpellier, France). Total DNA was extracted using a QIAamp DNA Mini Kit (Qiagen, Hilden, Germany) following the manufacturer's recommendations.

Obtaining Ovarian RNA and cDNA Synthesis
Total RNA was isolated from ovaries of three species: Mus mattheyi, Mus pahari, and Mastomys coucha using RNAqueous R kit (Ambion, Austin Texas, USA) according to the manufacturer's instructions. The first-strand cDNA was synthesized with the SuperScript First-Strand Synthesis System kit for RT-PCR (Invitrogen-Life Technologies, Carlsbad, USA), according to the manufacturer's protocol.

Amplification and Sequencing of ZP Genes
PCR amplifications were made using genomic DNA or ovarian cDNA as templates. Primers were designed from Mus musculus Zp1 (NM_009580), Zp2 (NM_011775), and Zp3 (NM_011776) sequences; in the case of Zp4, primers were designed from conserved regions of Zp4 in Mus musculus (NR_027813) and Rattus norvegicus (NM_172330) (Supplementary Table 1). Overlapping fragments were analyzed to sequence the region of genomic DNA encompassing the exons 1-9 of the Zp4 genes, which is 4,513 bp long for Rattus norvegicus. PCR amplification for DNA amplification was carried out in 50 µl reaction volume containing 5 µl of DNA or 2 µl of DNA, 0.5 µM of each primer, 200 µM of each dNTP, 2 mM MgCl 2 , and 1 IU of Taq DNA Polymerase (Fermentas, Waltham, USA) or 2.5 U of polymerase AmpliTaq Gold (Applied Biosystems, California, USA). PCR was carried out using a Mastercycler personal thermocycler (Eppendorf, Hamburg, Germany) or a T300 thermocycler (Biometra, Germany) following an initial denaturation cycle of 3 min at 95 • C, and then 30 cycles of 1 min at 95 • C, followed by 1 min at annealing temperature (depending on the primers) and then 1 min at 72 • C. The final extension time was 10 min at 72 • C. PCR products were analyzed by electrophoresis on 1.5% agarose gels. Four microliters of the PCR reaction mixture were mixed with loading buffer (Fermentas, Waltham, USA) and separated for 60 min at 90 V before visualizing under UV light using ethidium bromide (Sigma-Aldrich, St. Louis, USA).
Amplicons were carefully excised from the agarose gels and purified with the QIAquick Gel Extraction Kit Protocol (Qiagen) or DNA gel extraction kit (Millipore) or directly purified with the DNA Clean and Concentrator TM-5 (Zymo) according to the manufacturer's instructions. Amplicons were automatically sequenced using a 3,500 Genetic Analyzer (Applied Biosystems, California, USA) or send to Genome Express (Meylan, France). The new sequences were submitted to GenBank under the accession numbers: MH822867, MH822868, and MH822871 for Mus mattheyi, Mus pahari, and Mastomys coucha, respectively, (mRNA) and to EMBL under the accession numbers: LR990796-LR990832 for the DNA sequences.

Phylogenetic Analysis
To complete the dataset, ZP4 sequences from different muroid rodents (Table 1) were retrieved from GenBank and Ensembl gene predictions.
All these predictions were checked manually to detect annotation errors especially close to splicing sites. Similarity searches were performed using BLAST and BLAT against assembled genomes in Ensembl followed by a manual compilation of data to predict further genes or exons missing from the Ensembl predictions. It was also checked that the new sequences corresponded to a syntenic region of the corresponding chromosome. Only the exonic portions were kept for the phylogenetic analysis. Translated sequences were aligned using Muscle in Seaview (Gouy et al., 2010). The pseudogene sequences were added afterwards to the nucleotide alignment and manually aligned. The best-fit model of evolution (SYM+G) was determined using the Akaike information criterion (AIC; Akaike 1973), as implemented in jModelTest v2.1.7 (Darriba et al., 2012). Phylogenetic trees were reconstructed using two probabilistic approaches: maximum likelihood (ML) and Bayesian inferences (BI). The ML phylogeny was reconstructed with PhyML . The robustness of each node was assessed with 1,000 bootstrap replicates. BI was performed using MrBayes v3.2 (Ronquist et al., 2012). Four independent runs of 10,000,000 generations sampled every 500th generation were performed. A burn-in period was determined graphically using Tracer1.7 (Rambaut et al., 2018). It was also checked that the effective sample sizes (ESSs) were above 200 and that the average SD of split frequencies remained <0.05 after the burn-in threshold. We discarded 10% of the trees and visualized the resulting tree with FigTree v1.4 (Rambaut, 2016). The robustness of the nodes was estimated with Posterior Probabilities (PP).

Test for Evidence of Positive Selection
Selection analyses were made with the muroid datasets, modified to remove the pseudogene sequences, leading to the first alignment of 28 taxa with 711 bp (237 codons) and a second alignment with only the complete Zp4 sequences including 13 taxa (1,644 bp). The analyses were performed with CODEML from PAML4 (Yang, 2007). Data were analyzed under different models: M1a (neutral model), M2a (selection), M7 (beta distribution), and M8 (beta distribution and selection). The likelihood ratio test (LRT) of positive selection was performed on two pairs of models, M1a with M2a, and M7 with M8 (Yang, 2007).

Proteomic Analysis
The expression of ZP proteins was studied using proteomic analysis in Mastomys coucha, Mus mattheyi, and Mus pahari ZP. Ovaries were trimmed using small scissors and dissected to remove fat and connective tissue. The solubilized ZP was obtained according to the protocol previously described by our group Jiménez-Movilla et al., 2009). Solubilized ZP was also obtained from oocytes, for which oocyte ZP was solubilized at 65 • C in PBS buffer for 30 min. The analysis was carried out on an HPLC-MS system consisting of an Agilent 1100 Series HPLC (Agilent Technologies, Santa Clara, CA) equipped with a µ-well-plate autosampler and a capillary pump, and connected to an Agilent Ion-Trap XCT Plus mass spectrometer (Agilent Technologies, Santa Clara, CA) equipped with an electrospray (ESI) interface.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

ETHICS STATEMENT
All animal procedures were performed following the Spanish Animal Protection Regulation, RD 1201/2005 which conforms to European Union Regulation 2003/65. All animal experiments were approved by the institutional review board of the University of Murcia according to the guide for Care and Use of Laboratory Animals as adopted by the Society for the Study of Reproduction.

AUTHOR CONTRIBUTIONS
MJI-R, CM-N, and PC performed molecular analysis and performed the bioinformatic analysis. MP-C, RL-B, and AG-A performed the in vitro fertilization analysis. PC and MA designed the study and conceived the project. MJI-R, CM-N, PC, and MA wrote the paper. All authors discussed the results and commented on the manuscript.

ACKNOWLEDGMENTS
The authors thank Liliana López for caring for the mice, François Catzeflis for the ethanol-preserved tissues from the tissue collection of the Institut des Sciences de l'Evolution of Montpellier and the Conservatoire Génétique de la Souris Sauvage de Montpellier for providing mice, Dr. Alejandro Torrecillas Sánchez of the Molecular Biology Section and Carmen Lagares Martínez of the Experimental Animal Service, ACTI, University of Murcia.