Unveiling the Diversity of Immunoglobulin Heavy Constant Gamma (IGHG) Gene Segments in Brazilian Populations Reveals 28 Novel Alleles and Evidence of Gene Conversion and Natural Selection

Even though immunoglobulins are critical for immune responses and human survival, the diversity of the immunoglobulin heavy chain gene (IGH) is poorly known and mostly characterized only by serological methods. Moreover, this genomic region is not well-covered in genomic databases and genome-wide association studies due to particularities that impose technical difficulties for its analysis. Therefore, the IGH gene has never been systematically sequenced across populations. Here, we deliver an unprecedented and comprehensive characterization of the diversity of the IGHG1, IGHG2, and IGHG3 gene segments, which encode the constant region of the most abundant circulating immunoglobulins: IgG1, IgG2, and IgG3, respectively. We used Sanger sequencing to analyze 357 individuals from seven different Brazilian populations, including five Amerindian, one Japanese-descendant and one Euro-descendant population samples. We discovered 28 novel IGHG alleles and provided evidence that some of them may have been originated by gene conversion between common alleles of different gene segments. The rate of synonymous substitutions was significantly higher than the rate of the non-synonymous substitutions for IGHG1 and IGHG2 (p = 0.01 and 0.03, respectively), consistent with purifying selection. Fay and Wu's test showed significant negative values for most populations (p < 0.001), which indicates that positive selection in an adjacent position may be shaping IGHG variation by hitchhiking of variants in the vicinity, possibly the regions that encode the Ig variable regions. This study shows that the variation in the IGH gene is largely underestimated. Therefore, exploring its nucleotide diversity in populations may provide valuable information for comprehension of its evolution, its impact on diseases and vaccine research.

Even though immunoglobulins are critical for immune responses and human survival, the diversity of the immunoglobulin heavy chain gene (IGH) is poorly known and mostly characterized only by serological methods. Moreover, this genomic region is not well-covered in genomic databases and genome-wide association studies due to particularities that impose technical difficulties for its analysis. Therefore, the IGH gene has never been systematically sequenced across populations. Here, we deliver an unprecedented and comprehensive characterization of the diversity of the IGHG1, IGHG2, and IGHG3 gene segments, which encode the constant region of the most abundant circulating immunoglobulins: IgG1, IgG2, and IgG3, respectively. We used Sanger sequencing to analyze 357 individuals from seven different Brazilian populations, including five Amerindian, one Japanese-descendant and one Euro-descendant population samples. We discovered 28 novel IGHG alleles and provided evidence that some of them may have been originated by gene conversion between common alleles of different gene segments. The rate of synonymous substitutions was significantly higher than the rate of the non-synonymous substitutions for IGHG1 and IGHG2 (p = 0.01 and 0.03, respectively), consistent with purifying selection. Fay and Wu's test showed significant negative values for most populations (p < 0.001), which indicates that positive selection in an adjacent position may be shaping IGHG variation by hitchhiking of variants in the vicinity, possibly the regions that encode the Ig variable regions. This study shows that the variation in the IGH gene is largely underestimated. Therefore, exploring its nucleotide diversity in populations may provide valuable information for comprehension of its evolution, its impact on diseases and vaccine research.

INTRODUCTION
Immunoglobulins (Ig) are glycoproteins produced exclusively by activated B-lymphocytes and plasma cells that mediate humoral response against pathogens. Each B-cell clone presents an antigen-specific membrane-bound immunoglobulin that, together with CD79A and CD79B molecules, comprise the Bcell receptor (BCR). After stimulation by antigens, B-cells secrete immunoglobulins (antibodies) with the same antigen-binding sites than the membrane-bound molecules.
All Ig share a similar basic structure composed of four polypeptide chains: two identical heavy chains and two identical light chains. The heavy chain has one variable domain (V H ) and three or four constant domains (C H 1, C H 2, C H 3, and C H 4). Each light chain exhibits one variable domain (V L ) and one constant domain (C L ). The variable region (V H and V L ) is responsible for antigen recognition and binding while the constant regions (C H and C L ) primarily mediate the Ig effector functions, which includes complement activation and Fc Receptor binding (1,2).
During B-cell development, the IGH gene undergoes a somatic rearrangement, in which only one IGHV, one IGHD, and one IGHJ gene segment are combined to form the Ig variable region V H . In contrast, during clonal expansion after activation of the B-cell, IGHC gene segments go through a process called class-switch recombination, which determines the Ig class and subclass: IgM, IgD, IgG (IgG1, IgG2, IgG3, and IgG4), IgA (IgA1 and IgA2), and IgE. The human humoral immune response is mainly mediated by Ig gamma (IgG), which is subdivided into four subclasses, IgG1, IgG2, IgG3, and IgG4, ordered by decreasing abundance in peripheral blood (5). The constant regions of these four subclasses are encoded by the gene segments IGHG1, IGHG2, IGHG3, and IGHG4, respectively, the first three being the ones focused on this study. Each IGHG gene segment consists of three exons that encode the constant heavy domains (CH1, CH2, and CH3) and exon H, which encodes the hinge between the CH1 and CH2 domains (5).
Most of the human IgG diversity in populations has only been characterized by serological methods, which defined the immunoglobulin allotypes at the protein level. Ig allotypes are polymorphic epitopes (resulting from nucleotide variation) on the Ig constant domain that provide binding sites for antibodies (6). Certain IgG allotypes have been associated with susceptibility to cancer, autoimmune and infectious diseases (7)(8)(9).
Although the genetic variability of some IGHG gene segments has been characterized (10,11), it has never been systematically sequenced at the nucleotide level across populations. Thus, the diversity of these gene segments is probably underestimated. Additionally, this genomic region is not well-covered in genome-wide studies and genomic databases for two reasons: first, DNA samples used are often extracted from B-cell lines, which are not suitable for analyzing this region due to the somatic rearrangement within this locus; second, the high sequence similarity of these segments imposes technical difficulties for sequencing and genotyping (12).
Here, we analyzed the diversity of IGHG1, IGHG2, and IGHG3 in seven Brazilian populations: five Amerindian populations that have been genetically isolated for centuries and two urban populations. By analyzing deep sequencing data, we found 28 novel IGHG alleles, characterized the linkage disequilibrium of variants within these segments and analyzed the relationship among alleles. Additionally, we provided compelling evidence of the occurrence of gene conversion between different gene segments and evidence of purifying selection shaping IGHG diversity.
The Amerindians samples were collected between late 1980s and early 1990s. According to public data from the Brazilian Institute of Geography and Statistics (IBGE), there are approximately 900,000 Amerindians individuals in Brazil, distributed across 693 official indigenous lands (https://www. ibge.gov.br). The Guarani speak a Tupi-Guarani language, which belongs to the Tupi language family. The Kaingang speak Jê, which belongs to the Macro-Jê language stock. Analyzing mtDNA segments and the proposed time of origin of Tupi-Guarani and Jê linguistic families, Marrero and colleagues (13) estimated the Guarani population split in three partialities (Guarani Kaiowa, Guarani Ñandeva and Guarani Mbya) 1,800 years ago, while the different Kaingang populations would have split more recently, around 200 years ago. Since then, they are believed to have remained isolated from each other and other urban populations due to strong cultural and language barriers (14). A former study from our group estimated that the gene flow of these Amerindian populations with non-Amerindians was low, being 0% in Guarani Kaiowa, 4% in Guarani Mbya, 14% in Guarani Ñandeva and 7% in Kaingang (15).
The two urban samples were from Curitiba, the capital of Paraná State and the 5th largest city in Brazil. As a result of the Brazilian history of European colonization 500 years ago, and especially the more recent European migrations since the 19 th Century, the population of Curitiba is of predominantly European ancestry. According to the public data from IBGE, 78.7% of the inhabitants of Curitiba self-declared themselves as white, 16.7% as admixed, 3% as black, 1.4% as Asian, and 0.2% as Amerindian (https://www.ibge.gov.br). The population here referred as CTBA only included Euro-descendant individuals. Therefore, we excluded all individuals with known miscegenation with Amerindian and/or other non-European ancestries. Paraná State also hosts the second largest Japanese community in Brazil, one of the largest outside Japan. The Japanese migration started in the twentieth century with the Treaty of Friendship, Commerce and Navigation between Brazil and Japan. All Japanese-descendent individuals of this study (BrJAP) were born in Brazil, with either both parents or all four grandparents born in Japan. They reported no history of admixture with non-Japanese ethnicities.

DNA Extraction
Genomic DNA was extracted from peripheral blood samples by standard salting-out (16) or by the phenol-chloroform-isoamyl method (17). High-quality DNA has been stored at −80 • C since the extraction. DNA integrity was evaluated by 1% agarose gel electrophoresis and purity was accessed by spectrophotometry.

Sequencing and Allele Identification
We aligned all previously known IGHG alleles and designed primers to amplify each segment specifically. To define the best set of primers we used the following approaches: (i) we ruled out unspecific amplification by verifying that all amplicons did not exhibit any variant that was specific of other segments; (ii) we certified that the genotype distribution of all single nucleotide variable sites, in each amplicon, were in accordance to Hardy-Weinberg equilibrium (p > 0.05).
Sequencing was performed using Big Dye R Terminator Cycle Sequencing Standard v3.1 (Life Technologies, Carlsbad, CA, USA), according to manufacturer's instructions. The sequencing reactions were performed in a Mastercycler ep Gradient S thermocycler (Eppendorf, Hamburg, Germany) with a first step at 95 • C for 60 s and 25 cycles of 95 • C for 10 s, 50 • C for 5 s, and 60 • C for 4 min, followed by capillary electrophoresis in a 3500xl Genetic Analyzer Sequencer (Life Technologies, Carlsbad, CA, USA).
After sequencing, the alleles were identified according to the known alleles described at IMGT database (International ImMunoGeneTics Information System) (18). IMGT database provides public access to an integrated information system specialized in immunoglobulins (Ig), T cell receptors (TCR), and major histocompatibility complex (MHC) genes and molecules. All data submitted to the IMGT database are manually checked by experts in the field, which assure the deposit of highquality data.
The nucleotide sequence of each individual was aligned with consensus sequences with Mutation Surveyor R DNA Variant Analysis Software v5.0.1 (Softgenetics), and their variable sites were annotated. Alleles that were different from the ones listed in the IMGT database were considered novel and were subsequently confirmed by sequencing and/or molecular cloning as described below.

Data Analysis
Allelic frequencies were obtained by direct counting using GenAlEx v6.502 software (19). Hardy-Weinberg equilibrium was tested for each gene segment in all populations by Guo and Thompson's method (20), performed in Arlequin v3.5.2 software (21). IGHG haplotypes from different gene segments were estimated via ELB algorithm and this information was used for Gm allotype haplotype inference, according to the correspondence between nucleotide variants and allotypes described by Lefranc et al. (6). Linkage disequilibrium (LD) between single nucleotide variants of each gene segment was estimated with Haploview software (22).
Allele networks were performed with variants from each gene segment through the median-joining (MJ) algorithm (23) with Network v5.0 software. Allele frequencies were compared using the exact test of population differentiation (24) and populationpairwise F ST (25,26) with Arlequin v3.5.2 software (21). Principal component analysis (PCA) was performed using the Minitab 17 Statistics Software (27) for graphical representation of the genetic differences and similarities in the major components of variation among populations. The PCA was performed using inferred allotype haplotype frequencies to compare the frequencies from the study population with others that were previously described serologically. These haplotypes were classified according to Lefranc et al. (6), and detailed information is available in Table S3.

One Novel Single Nucleotide Variant and 28 Novel IGHG Alleles Have Been Discovered
Within all three gene segments in the seven populations analyzed, we found a total of 49 exonic variable sites, of which 26 were nonsynonymous substitutions. Based on the Grantham scale (32), which ranges from 5 to 215 according to the physicochemical distance between amino acid pairs, amino acid replacements were from low to moderate (15 < D < 103) ( Table 1). Of the single variable sites, 21 have not been reported in any of the previously described alleles at the IMGT database ( Table 1, in bold). We also found a novel synonymous IGHG3 single nucleotide variant at the position chr14:106235856 (GRCh37.p13 primary assembly) in the CTBA population. This new variant was submitted to the dbSNP database (34) under reference SNP ID number rs155533833 (NC_000014.8:g.106235856G>A).
A total of 28 novel IGHG alleles have been found in our study: nine in IGHG1 ( Table 2), nine in IGHG2 (Table 3), and ten in IGHG3 ( Table 4). All novel alleles have been confirmed either by sequencing or by molecular cloning followed by sequencing. Novel alleles have been submitted to IMGT Nomenclature Committee (18), which verified the accuracy of our data and assigned official names (reports #2018-2-0824 and #2018-5-1113).
Interestingly, some new alleles of all gene segments were observed at high frequency (f > 0.10; Table 5 Because most of the previous studies only described the immunoglobulin heavy chain diversity serologically, we inferred the serological Gm allotypes from our nucleotide sequence data, based on the nucleotide sequence description for each previously reported allotype (6), to allow comparison with previously reported variants. For example, the most frequent allele haplotype (alleles that are in the same chromosome and inherited together in a block) was the one comprising the gene segments IGHG1 * 02, IGHG2 * 03, IGHG3 * 14 (f = 0.182 to 0.740), which encodes the Gm haplotype "C" Gm21,26,27,28;17,1;(..), the most frequent lgG allotype haplotype in our populations (f = 0.21 to 0.77; Table 6). The correspondence between allele haplotype and allotype haplotypes are in the Table S4. More than one IGHG allele haplotype can define a single Gm allotype haplotype, as is the case of the Gm haplotype "B" Gm5,10,11,13,14,26,27;3;(..), that is encoded in our data by the allele haplotype IGHG3 * 11,IGHG2 * 03,IGHG2 * 03, by IGHG3 * 11,IGHG1 * 14,IGHG2 * 03, or by IGHG3 * 11,IGHG1 * 03,IGHG2 * 08. In order to simplify the interpretation of the data, Gm haplotype identifiers (from A to M) were used as suggested by Lefranc et al. (6).

Lower IGHG Diversity Was Observed in Amerindians and Frequencies Differed Significantly Among Populations
IGHG allelic frequencies varied across populations ( Table 5). A small number of highly frequent alleles were observed for all gene segments in Amerindian populations. Even though Guarani populations share a more recent common ancestor, allelic frequencies significantly differed among them (p < 0.01), with low to moderate F ST values (0.02-0.10) ( Table 7). Allelic frequencies did not differ between the two Kaingang populations Novel alleles (in bold) have been confirmed by sequencing and/or molecular cloning. Their official names have been assigned by IMGT nomenclature committee. IMGT, International ImMunoGeneTics Information System (18). Dots represent the consensus nucleotide. a According to Edelman et al. (33). b llotypes were inferred according to Lefranc et al. (6). # Number of copies observed in this study.
(p = 0.065; F ST = 0.03). More conspicuous differences were found between the Japanese-descendant and Euro-descendant populations compared to each other, and between each of these two populations compared to the Amerindian populations, with F ST values ranging from 0.11 to 0.52, indicating moderate to high genetic differentiation. The principal component analysis (PCA) grouping was consistent with ancestry and geography (Figure 2). Amerindians and Asians formed two separated groups close to each other. Europeans and admixed populations of predominantly European ancestry grouped together, while Africans were more distant.
Genotypic distributions for all gene segments were in accordance with Hardy-Weinberg equilibrium in all population samples (0.08 > p >1).

Distinct Linkage Disequilibrium Patterns Among Populations
Linkage disequilibrium (LD) patterns differed among populations ( Figure S2). Interestingly, each Guarani population exhibited a distinct LD pattern despite their close relationship. In GKW, only five variable sites were observed in all three gene segments, of which three were in absolute LD (D ′ = 1, r 2 = 1). In contrast, more variable sites (21 and 24) were observed for the other two Guarani populations. In addition, many variants that were in LD in GND were not observed in LD in GRC.

Sequence Analysis Suggests That Gene Conversion Between Frequent Alleles of Different Gene Segments Generated Novel Alleles
Median-Joining network (Figure 3) shows that the most frequent alleles IGHG1 * 01, IGHG2 * 03, and IGHG3 * 14 were central nodes in the network, with few nucleotides differing between them and the other alleles. The loops indicate possible recombination sites.
Alignment of all the known alleles of the IGHG1, IGHG2, and IGHG3 gene segments suggests that some novel alleles  discovered in this study could have been generated by gene conversion between alleles of different gene segments (Figure 3). For example, the novel allele IGHG1 * 11, present in BrJAP (f = 0.064), could have been generated by gene conversion between the most frequent IGHG2 allele (IGHG2 * 03; f = 0.579) and the most frequent IGHG1 allele (IGHG1 * 02; f = 0.60). In addition, gene conversion between the frequent IGHG2 * 03 and IGHG3 * 14 alleles (f = 0.735 and f = 0.833, respectively) could explain the origin of allele IGHG2 * 09 (f = 0.14).

Neutrality Tests Suggest Evidence of Natural Selection Shaping IGHG Polymorphism
Neutrality tests performed by Tajima's D, Fu and Li's D and F were non-significant for most populations. However, Fay and Wu's test resulted in significant negative values for most populations, which may indicate positive selection at an adjacent site ( Table 8).
Deviation of neutrality was also tested by analyzing synonymous and non-synonymous substitution rates across all the known and novel alleles of all gene segments (Tables S5-S7). Overall, the rate of synonymous substitutions (dS) was significantly higher than the rate of non-synonymous (dN) substitutions (dN/dS < 1) for IGHG1 and IGHG2 (p = 0.01 and 0.032, respectively) ( Table 9), consistent with purifying selection.

DISCUSSION
Our main goal was to deliver an unprecedented and comprehensive nucleotide sequencing-based characterization of the IGHG gene segments in populations of different ancestries. Before this study, only 30 IGHG alleles have been described for IGHG1, IGHG2, and IGHG3 together (18). Here, we report the discovery of 28 novel alleles, of which 16 were in a single population sample of Japanese descendants (n = 57) and seven in one population sample of Euro-descendants (n = 51). It is interesting that even in Amerindian populations, which exhibited a limited diversity, seven new alleles were found. This is clear evidence that the diversity of IGHG is far from being fully described and possibly a much larger number of novel alleles will be discovered as more populations are interrogated. We focused on the segments that code for the most abundant Ig in serum. Considering the homology and high sequence similarity, a different strategy would be needed for the precise characterization of IGHG4 due to the high frequency of duplications observed for this gene segment (35).
Some of the new alleles were highly frequent. The novel allele IGHG3 * 22, frequent in Guarani Mbya (GRC, f = 0.157), exhibited a lower frequency in Guarani Ñandeva (GND, f = 0.063), and was absent in Guarani Kaiowa (GKW). These three populations share a more recent common ancestor and the differences observed can be explained by its demographic history and genetic drift. Demographic factors played a major role in shaping the diversity of other genes important for immune responses in these same Amerindian populations (36). Genetic drift, particularly founder effect and bottleneck, may explain the lower diversity of IGHG in Amerindians and the fluctuation of their allelic frequencies. On the other hand, the IGHG3 * 22 allele was observed only in one Kaingang individual. This fact suggests   gene flow from Guarani to Kaingang. Although GRC and KRC remain isolated due to strong cultural barriers, their immediate vicinity did result in a low degree of admixture (14). IGHG3 * 11 is the most common IGHG3 allele in Eurodescendants (CTBA, f = 0.588) and was observed at lower frequency in Amerindians: GND (f = 0.094), GRC (f = 0.010), KIV (f = 0.038), KRC (f = 0.020), being absent in GKW. This allele corresponds to the allotype G3m5, 10,11,13,14,26,27 which has been previously shown to be highly frequent in Europeans but absent in non-admixed Amerindians (37)(38)(39)(40)(41)(42). Also, similar allele distribution was observed for IGHG1 * 03 in the study populations. These observations are consistent with previous studies from our group, which estimated the admixture rate of Guarani and Kaingang by analyzing HLA class II genes. In    that study, the estimated admixture rate with non-Amerindians was 14.3% for GND, 3.7% for GRC, 7.2% for Kaingang, and no admixture for GKW (15). The Gm allotype haplotype frequencies inferred from DNA sequencing in our study (in which the most common haplotypes were C and D) were similar to those found in former reports that characterized serologically the Guarani and Kaingang populations from Santa Catarina State, Brazil (42), and other native American populations (41,43,44).
Strong linkage disequilibrium (LD) (Figure S2) was observed in most Amerindian populations, as expected for these historically small populations that suffered strong genetic drift and multiple founder effects since the arrival of the first Americans to the continent and during their migration from the North to the South in the American continent. Interestingly, the patterns of LD differed among Guaranis, despite their shared ancestry. GKW exhibited a reduced number of variable sites, while GRC exhibited a reduced LD in comparison to GND. These differences could also be explained by genetic drift, as certain haplotypes that stochastically increased their frequencies in a population after their divergence may not have increased in the others.
In contrast, the Japanese-descendant and Euro-descendant populations have higher nucleotide and allele diversity and fewer FIGURE 2 | Principal component analysis using Gm allotype haplotype frequencies was consistent with geography and ancestry. For comparisons with previously described population, we inferred the Gm allotype frequencies based on the observed nucleotide sequences, according to Lefranc et al. (6). Circles represent population data from the literature and squares represent populations from the present study. All frequencies reported in the literature are listed in Table S3. SNPs in LD. Even so, SNPs from different gene segments are in LD in these urban populations. In BrJAP, SNPs of allotypes G1m17 (rs1071803) and G2m(..) (rs8009156) are in LD (D = 0.92; r 2 = 0.73) and are present in the allotype haplotypes C and D, reported as the most common in Japanese populations (45).
In the MJ networks (Figure 3), IGHG1 * 02, IGHG2 * 03 and IGHG3 * 14 were connected with most alleles and were present at high frequencies in all populations. This pattern suggests that most of the other known alleles could have been originated from them. In the IGHG2 MJ network, one loop shows two paths where substitutions at position 161 of exon CH3 and 230 of CH2 occurred to generate the IGHG2 * 05, * 07, and * 13 alleles. It can be hypothesized that a mutation occurred in one of them, for example, IGHG2 * 03 at position 161 of exon CH3, generated IGHG2 * 05 and this allele, likewise, might have mutated at position 230 of exon CH2 originating allele IGHG2 * 13. As independent mutations in the same positions are extremely unlikely, the fact that the IGHG2 * 07 allele has a variant in the same position (230 of CH2 exon) indicates that gene conversion between alleles IGHG2 * 13 and IGHG2 * 03 originated the IGHG2 * 07 allele. Moreover, we suggest that the novel alleles IGHG1 * 11 and IGHG2 * 09 resulted from gene conversion between two frequent alleles of different gene segments. Overall, our data point to a major role of recombination and gene conversion originating new IGHG alleles, which is consistent with the tandem positioning and high sequence similarity of these segments, which favor unequal crossing-over (46). Kaingang from Ivaí and Kaingang from Rio das Cobras presented low genetic differentiation (F ST = 0.032), and similar allele frequencies (Table 7), most probably because of their recent common origin and gene flow due to the absence of cultural barriers, in addition to their geographical proximity. The F ST values between the Guarani populations were low to moderate, which is an evidence of genetic drift affecting the IGHG diversity in these populations. These results are compatible with previous reports for mtDNA in the same populations, which indicated that divergence of the three Guarani populations occurred at around 1,800 years before present (ybp), much earlier than the separation of the Kaingang populations that was estimated at of 207 ybp (13).
The PCA results (Figure 2) were consistent with geography and ancestry and showed that our data are consistent with data obtained by serologic methods, previously reported in the literature. The exception was India, which grouped with Europeans and Euro-descendants. In fact, PCA grouping does not necessarily mean common ancestry, as it can also result from migration or stochastic factors, or convergent evolution by natural selection. The grouping solely reflects the similarities of the IGHG allelic frequencies in these populations.
The results of most neutrality tests suggested that natural selection is not the major factor responsible for shaping IGHG diversity in the study populations. In other words, for IGHG the impact of genetic drift due to demographical processes is possibly stronger than the signal left by natural selection. As is known, Amerindians have a long history of migrations and isolation, and went through severe bottlenecks after the European colonization (14). Still, in GKW and KRC for IGHG2 and KRC and CTBA for IGHG3, the results of Tajima's D, and Fu and Li's The mv nodes (median vector) are possible unsampled or extinct ancestral sequences generated by the MJ algorithm to connect the alleles. Alleles IGHG3 * 11 and IGHG3 * 12 (C) were grouped because they do not differ in nucleotide sequence, except for the hinge size. KIV, Kaingang from Ivaí; KRC, Kaingang from Rio das Cobras; GRC, Guarani Mbya; GKW, Guarani Kaiowa; GND, Guarani Ñandeva; BrJAP, Japanese-descendants; CTBA, Euro-descendants from Curitiba; NS, not sampled (alleles not observed in this study). The occurrence of multiple mutations in the same positions in different gene segments is extremely unlikely. In addition, sequence homology and tandem positioning favor unequal crossing over between high frequent alleles. Therefore, based on the multiple alignments, we suggest that    D and F tests indicated diversity sweeps due to bottlenecks or purifying selection. Analyzing all the currently known IGHG alleles, including the 28 novel alleles that we here described, we found that the codon-based dN/dS test showed significant results for purifying selection ( Table 9) for IGHG1 (p = 0.01) and IGHG2 (p = 0.03). We observed that synonymous (dS) substitution rates were higher than non-synonymous (dN) substitution rates. It was previously demonstrated that Gm1 allotypes have a different impact on the IgG1 ability to bind the Fc gamma receptor (FcγR)-like proteins from viruses. Antibodies with G1m1,2,17 allotype exhibit lower affinity to the viral FcγR-like protein of the human cytomegalovirus (HCMV), which decreases susceptibility to this infection (47). Similarly, the FcγR-like protein from herpes simplex virus (HSV) binds with lower affinity to antibodies carrying the G1m3 allotype due to certain residues in the CH1 and CH3 domains (9). In the light of our results, it is plausible to suggest that emerging amino acid replacements that favored binding to viral proteins were negatively selected as a result of their deleterious effect for the individuals carrying the mutations. Higher binding to these viral proteins would favor viral evasion from immune responses and increase the susceptibility to certain viral infections. Moreover, purifying selection against non-synonymous changes could have limited the diversification of IGHG1 and IGHG2. The Fay and Wu H test was significant with negative values for almost every population and gene segment analyzed. This could be interpreted as a result of an excess of derived variants at high frequencies in the gene genealogies. Fay and Wu (30) suggested that this may be a unique pattern produced by hitchhiking of variants in the vicinity that are being favored by positive selection. IGHG gene segments are located downstream of the IGHV, IGHD, and IGHJ gene segments that encode the immunoglobulin variable regions, which specifically bind to antigens (2,4). Therefore, we suggest that selection for variants in the variable region may be impacting the diversity of the constant region by hitchhiking mutations in the IGHG gene segments. This hypothesis is corroborated by the findings of Tanaka and Nei (48), who demonstrated that the non-synonymous mutation rate was higher than the synonymous rate in the gene segments that code for the Ig variable region. Their results were consistent with diversity-enhancing selection or overdominant selection driving the nucleotide diversity in the variable region.

CONCLUSION
Antibodies are pivotal for human survival, at both the individual and the population levels. It is surprising that despite decades of compelling evidence about the importance of the immunoglobulin gene variation for human immunity and the not so recent advent of sequencing technologies, most of the knowledge about IGHG is still based on serologic typing. As we see here, the fact that the regions encoded by IGHG are called "constant" does not mean these segments are not highly polymorphic. In fact, we found 16 novel alleles in a population sample of only 57 Japanese descendants. The IGHG genomic region is not well-covered in genome-wide association studies and whole genome sequencing databases. The homology and high sequence similarity of IGHG segments impose technical difficulties for sequencing, particularly at large scale. Besides, the somatic recombination events characteristic of the IGH locus makes DNA from B-cell lines, used in so many studies, not suitable for IGHG sequencing. Our study is the first to sequence systematically these segments at the nucleotide level in populations. We here present a full characterization of IGHG1-3 diversity in seven Brazilian populations, linkage disequilibrium, haplotypes and evidence of purifying selection and genetic drift. Understanding the IGHG normal variation in populations and its evolution may be the key to better comprehend how the immune system fights invading organisms and non-self-antigens and also may contribute to the development of new vaccines.

ETHICS STATEMENT
This study was carried out in accordance with the recommendations of Brazilian National Human Research Ethics Committee (CONEP) with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Brazilian National Human Research Ethics Committee (CONEP).

AUTHOR CONTRIBUTIONS
DA designed the study. VC-S, DA, LV, RD, and HI performed DNA sequencing and genotyping. VC-S analyzed the data. RW, VC-S, RD, HI, and LV performed molecular cloning and validation of novel alleles. MP-E, DA, DM, and RW contributed with reagents. VC-S, DA, MP-E, DM, and MB drafted the manuscript. All authors significantly contributed with ideas and critically reviewed this manuscript.

ACKNOWLEDGMENTS
We thank Kirsten M. Anderson for kindly reviewing this manuscript.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu. 2019.01161/full#supplementary-material Figure S1 | Location of IGHG1, IGHG2, and IGHG3 primers. Arrows indicate primers and their direction. All primer sequences are listed in Table S2. Primers used for amplification and sequencing are shown in blue, green, and red; primers only used for sequencing are represented in gray. This representation is not to scale.