Original Research ARTICLE
Insights into HLA-G genetics provided by worldwide haplotype diversity
- 1Department of Pathology, School of Medicine of Botucatu, Universidade Estadual Paulista, Botucatu, Brazil
- 2Biological Sciences Institute, Federal University of Goias, Goiânia, Brazil
- 3UMR 216, Institut de Recherche pour le Développement, MERIT, Paris, France
- 4Faculté de Pharmacie, Université Paris Descartes, Sorbonne Paris Cité, Paris, France
- 5Division of Clinical Immunology, Department of Medicine, School of Medicine of Ribeirão Preto, University of São Paulo, Ribeirão Preto, Brazil
- 6Departamento de Química, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, University of São Paulo, Ribeirão Preto, Brazil
Human leukocyte antigen G (HLA-G) belongs to the family of non-classical HLA class I genes, located within the major histocompatibility complex (MHC). HLA-G has been the target of most recent research regarding the function of class I non-classical genes. The main features that distinguish HLA-G from classical class I genes are (a) limited protein variability, (b) alternative splicing generating several membrane bound and soluble isoforms, (c) short cytoplasmic tail, (d) modulation of immune response (immune tolerance), and (e) restricted expression to certain tissues. In the present work, we describe the HLA-G gene structure and address the HLA-G variability and haplotype diversity among several populations around the world, considering each of its major segments [promoter, coding, and 3′ untranslated region (UTR)]. For this purpose, we developed a pipeline to reevaluate the 1000Genomes data and recover miscalled or missing genotypes and haplotypes. It became clear that the overall structure of the HLA-G molecule has been maintained during the evolutionary process and that most of the variation sites found in the HLA-G coding region are either coding synonymous or intronic mutations. In addition, only a few frequent and divergent extended haplotypes are found when the promoter, coding, and 3′UTRs are evaluated together. The divergence is particularly evident for the regulatory regions. The population comparisons confirmed that most of the HLA-G variability has originated before human dispersion from Africa and that the allele and haplotype frequencies have probably been shaped by strong selective pressures.
Human leukocyte antigen G (HLA-G) belongs to the family of non-classical HLA class I genes, located within the major histocompatibility complex (MHC) at chromosomal region 6p21.3. The MHC segment is considered to be the most polymorphic region in vertebrate genome (1). Although the HLA-G product presents the same class I classical molecule structure, its main function is not antigen presentation. HLA-G function in the immune response regulation has been extensively studied since its discovery by Geraghty and colleagues in 1987 (2).
The HLA-G gene has been the target of most recent research regarding the function of class I non-classical genes. The main features that distinguish HLA-G from classical class I genes are (a) limited protein variability, (b) alternative splicing generating several membrane bound and soluble isoforms, (c) short cytoplasmic tail, (d) modulation of immune response (immune tolerance), and (e) restricted expression to certain tissues (3).
The HLA-G molecule does not seem to stimulate immune responses, however, it exerts inhibitory functions against natural killer (NK) cells (4), T lymphocytes (4), and antigen-presenting cells (APC) (5) through direct interaction with multiple inhibitory receptors such as ILT2/CD85j/LILRB1 (ILT2), expressed by all monocytes, B cells, some lineages of T cells, and NK cells (6); ILT4/CD85d/LILRB2 (ILT4), only expressed by monocytes and dendritic cells (7); and KIR2DL4/CD158d (KIR2DL4) that has a restricted expression to CD56 NK cells (8).
HLA-G role in immune tolerance was first studied in trophoblast cells at the maternal–fetal interface (9). Several studies reported an aberrant or reduced HLA-G expression in both mRNA and protein levels. This phenomenon was observed in pathological conditions such as preeclampsia (10) and recurrent spontaneous abortion (11) in comparison with normal placentas.
Beyond trophoblast expression, HLA-G is related to a variety of physiological and pathological conditions. In physiological conditions, HLA-G expression has been documented in cornea (12), thymus (13), and erythroid and endothelial precursors (14). On the other hand, HLA-G variation sites and/or expression levels are associated with pathological conditions such as viral infections (15–20), cancer (21–27), recurrent miscarriage (28–37), pregnancy outcome and pregnancy complications (37–45), autoimmune diseases (46–54), transplantation outcome (55–57), and inflammatory diseases (58–61), indicating that HLA-G encodes a critical molecule for the immune system.
HLA-G Genetic Structure
The HLA-G gene presents a structure that resembles other classical class I genes such as HLA-A, HLA-B, and HLA-C. HLA-G encodes for a membrane-bound molecule with the same extracellular domains presented by other class I molecules, including the association with the β2-microglobulin. However, its main function is not antigen presentation.
The HLA-G gene exon/intron structure and splicing patterns are well defined, but there are inconsistencies between the National Center for Biotechnology Information (NCBI)1, the International Immunogenetics Database (IMGT/HLA2), and the Ensembl database3 annotations regarding its structure, mainly because the IMGT/HLA database only presents sequences within 300 bases upstream the coding sequence (CDS) and the database does not consider most of the 3′ untranslated region (UTR) segment. Therefore, in the present work, the structure defined by NCBI/Ensembl will be used throughout the text.
According to the NCBI reference sequence NC_000006.12 (GRCh38 or hg19) and transcripts such as NM_002127.5 (NCBI), ENST00000428701, and ENST00000376828 (Ensembl), the HLA-G gene (NCBI Gene ID: 3135) presents eight exons and seven introns, consistent with a classical class I gene structure, and encompasses a region of 4144 nucleotides between positions 29826979 and 29831122 at 6p21.3 (GRCh38). This gene is surrounded by some of the most polymorphic genes in the human genome (Figure 1), such as HLA-A (115 Kb downstream), HLA-B (1526 Kb downstream), and HLA-C (1441 Kb downstream), and other non-classical HLA loci such as HLA-E (662 Kb downstream) and HLA-F (103 Kb upstream). According to the NCBI annotation and hg19, the HLA-G DNA segment encodes a full-length mRNA of 1578 nucleotides and alternative smaller ones, as discussed later. Considering the full-length mRNA, 1017 nucleotides represent the CDS encoding for a full-length protein of 338 amino acids, 178 nucleotides represent the 5′UTR segment, and 383 nucleotides represent the 3′UTR segment.
There is no consensus regarding the exact location where the HLA-G transcription may start. Considering the NCBI and Ensembl annotations, and the transcripts NM_002127.5 from NCBI and ENST00000428701 from Ensembl, the HLA-G transcription starts 866 nucleotides upstream the initial translated ATG (third * at Figure 1). However, other transcripts tell us a different story: ENST00000376828 indicates that the HLA-G transcription might start even earlier, while ENST00000360323 indicates that the transcription starts 24 nucleotides upstream the initial translated ATG. Given these contradictory information, it is possible that the HLA-G gene presents multiple transcription start points depending on the presence of specific transcription factors or other expression inducing mechanisms, but it probably presents only one translation start point as described further. Since there is no consensus, in the present work, we opt to use the annotation presented by both NCBI and Ensembl, considering NM_002127.5 and ENST00000428701 as references. Considering the transcription start site indicated by NM_002127.5/ENST00000428701 or ENST00000360323, HLA-G presents a large 5′UTR segment. Within this segment, there is an intron (intron 1) of about 688 nucleotides that is spliced out, giving rise to 5′UTR of about 178 nucleotides composed of DNA segments of two adjacent exons. Considering this transcription start point, the HLA-G 5′ sequence presents at least three potential translation start points, i.e., two in the 5′UTR and the third one defining the beginning of the CDS. In the present work, we will consider the Adenine of this third ATG, i.e., the first base of the CDS, as nucleotide +1. Although conventional nomenclature would suggest the first transcribed base as nucleotide +1, our decision will avoid unnecessary confusion regarding the position of various well-established HLA-G variation sites. All nucleotides before the CDS will be noted as negative numbers and nucleotides in the CDS segment will be noted as positive numbers, using as a reference sequence the one available at the official human genome hg19 or NC_000006.12.
The first ATG is found between nucleotides −154 and −152 (mRNA) or nucleotides −842 and −840 (DNA). The second one is found between nucleotides −118 and −116 (mRNA) or nucleotides −806 and −804 (DNA). Both of these translation start points are in the same frame and are included in a sequence that does not resemble the preferred translation initiation sequence (Kozak consensus sequence) and might not initiate translation (62). Even if the first ATG is used, it would produce a peptide of only eight residues due to a stop codon found downstream in the reading frame. Alternatively, if the second ATG is used, a protein of about 136 amino acid residues would be produced. Although in a different frame from the main translation start point (the third one), this 136 amino acid molecule is quite similar to other human and primate class I molecule alpha-1 domains. The third and main ATG is compatible with the preferred Kozac sequence (62) and it initiates the translation of the full-length 338 amino acid residues protein and defines the beginning of the CDS segment.
The HLA-G CDS is composed of joining segments of six exons, in which the first contains the translation start point and the last one contains the stop codon (Table 1, Figure 1). It should be noted that there is no consensus regarding exon and intron nomenclature between NCBI/Ensembl and the IMGT/HLA databases. IMGT/HLA considers as exon 1 the first mRNA segment that is translated, i.e., exon 2 for NCBI/Ensembl (Figure 1). The actual exon 2, which encodes the final portion of the 5′UTR, contains the main translation start point and in fact encodes the HLA-G leader peptide (Figure 1). In addition, exons 3, 4, and 5 encode the alpha-1, alpha-2, and alpha-3 domains, respectively, exon 6 encodes the transmembrane domain, and exon 7 the cytoplasmic tail. A premature stop codon at exon 7 leads to a shorter cytoplasmic tail when compared to other class I molecules (Figure 1, Table 1). The segment downstream the stop codon at exon 7 extending to exon 8 composes the HLA-G 3′UTR. The HLA-G mRNA 3′UTR is short when compared to other class I genes. This gene structure description highlights one of the widely spread misconceptions regarding HLA-G gene structure: in 1987, Geraghty and colleagues proposed the existence of an exon 7 based on homology with classical class I genes (2). This “exon 7” was in fact part of the intron 7 (NCBI) and it is usually absent in most of the HLA-G transcripts. Although this “exon 7” segment has been found in alternative transcripts (e.g., ENST00000478519), other intron segments are also sometimes kept in rare alternative transcripts (e.g., ENST00000478355), since alternative splicing is an important characteristic of the HLA-G gene as described further.
The HLA-G gene may produce at least seven protein isoforms generated by alternative splicing of the primary transcript (Figure 1). Four isoforms are membrane bound presenting the transmembrane domain and the short cytoplasmic tail. HLA-G1 is the full-length membrane-bound isoform with a structure that resembles classical class I molecules. HLA-G2 lacks alpha-2 domain, HLA-G3 lacks alpha-2 and alpha-3 domains, and HLA-G4 lacks alpha-3 domain. Three isoforms are soluble due to the lack of the transmembrane domain. The soluble HLA-G5 and HLA-G6 isoforms present the same extracellular domains of HLA-G1 and HLA-G2, respectively; however, both transcript variants retain intron 5 leading to a stop codon before the translation of the transmembrane domain, and a tail of 21 amino acids implicated in their solubility. HLA-G7 transcript variant retains intron 3 leading to a premature stop codon. Therefore, HLA-G7 isoform presents only the alpha-1 domain linked to two amino acids encoded by intron 2 (Figure 1) (63–65).
In the next sections, we will address the HLA-G variability and haplotype diversity among several populations around the world.
HLA-G Variability as Described in the 1000Genomes Project
The 1000Genomes Project is a large survey aiming to sequence the entire genome of thousands of individuals in several populations around the world (66). In the initial released data, the phased genotypes of 1092 individuals from 14 populations were available. These data have driven several studies regarding HLA-G variability and evolutionary aspects (67–69).
The initial genotype published by the 1000Genomes Project was based on exome sequencing or whole genome low coverage sequencing and lacks several known HLA-G polymorphisms due to limitations in the genotype detection procedures at that moment. Among the missing polymorphic sites, we may highlight some known indels, such as the traditionally studied 14-bp presence or absence (insertion/deletion) in the HLA-G 3′UTR. In addition, the method used to infer genotypes and haplotypes failed to clearly distinguish triallelic SNPs, reporting them as biallelic ones (e.g., the HLA-G promoter SNP at position −725C/T/G, rs1233334).
Considering these technical limitations and considering the fact that most of the bioinformatics tools used in the initial survey are now more advanced and developed, we have reevaluated the 1000Genomes raw sequencing data regarding the HLA-G gene using a locally developed pipeline to get genotypes and haplotypes, to better understand the HLA-G variability around the world and to retrieve data regarding some HLA-G missed polymorphic sites.
First, by using Samtools (70) subroutine view, we downloaded the BAM files (binary alignment map) containing the 1000Genomes official alignment data for the HLA-G gene region (between positions 29793317 and 29799834 at chromosome 6) directly from the 1000Genomes server (ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/). The reads downloaded were already trimmed on both ends for primer sequences. The download was performed for each of the initial 1092 samples and included data from both low coverage whole genome and exome when available. It should be mentioned that we got the sequences (reads) from BAM files representing the HLA-G region, thus, the next step of our pipeline used only the reads that were previously mapped to the HLA-G region by the 1000Genomes Consortium. Each BAM file was converted into a Fastq format file retrieving all reads that were previously mapped to the HLA-G region. The BAM to Fastq conversion was made using Bamtools (https://github.com/pezmaster31/bamtools/) and Perl scripts (locally developed) to filter out duplicated reads and to classify the reads as paired or unpaired.
Both paired and unpaired Fastq files were mapped to a masked chromosome 6 (hg19), in which only the HLA-G region was available and the rest of the chromosome was masked with “N” to preserve nucleotide positions regarding hg19. To date, hg19 presents a HLA-G coding region sequence compatible with the widely spread HLA-G allele known as G*01:01:01:05. Mapping was performed using the application BWA, subroutine ALN (71), configured to allow the extension of a deletion up to 20 nucleotides, in order to evaluate the 14-bp polymorphism. The resulting BAM files from the newly mapped reads, from both paired-end and unpaired sequences, were joined using Picard-tools (http://picard.sourceforge.net/index.shtml). Regions containing indels were locally realigned by using the application GATK (72), routines RealignerTargetCreator and IndelRealigner. This local realignment used as reference a file containing known HLA-G indels. The Bamtools software was also used to remove reads mapped with low mapping quality (MQ) scores (MQ < 40). After the procedure described above, 16 samples were discarded because all mapped reads (or most of them) were withdrawn due to poor MQ scores. The GATK routine UnifiedGenotyper was used to infer genotypes and a VCF file (variant call format) was generated.
Given the low coverage nature of the 1000Genomes data, some genotypes called by GATK are far uncertain, mainly in situations in which a homozygous genotype is inferred when that position presents low depth coverage. In addition, given the polymorphic nature and the high level of sequence similarity of HLA genes, some level of miss-mapped reads is expected and might bias genotype inference. To circumvent this issue, the VCF file generated by GATK was treated with a locally developed Perl script that applied the rules described below. This script uses the number of different reads detected for each allele at a given position (provided by GATK when the VCF file was generated).
– Homozygosity was only inferred when a minimal coverage of seven reads was achieved; otherwise, a missing allele was introduced in this genotype. This procedure assures (p > 0.99) that a homozygous genotype is called because of lack of variance at that position and not because the second allele was not sampled.
– Genotypes, in which one allele was extremely underrepresented (proportion of reads under 5%), were considered as homozygous for the most represented allele. This procedure minimizes the influence of miss-mapped reads to the HLA-G region and the high level of sequencing errors that characterizes next-generation sequencing data, and such correction was applied only in situations characterized by high depth of coverage (20 or more reads available for the evaluated position).
– For genotypes in which one allele was mildly underrepresented (with a proportion of reads between 5 and 20%), a missing allele was introduced representing this underrepresented allele. This procedure is particularly helpful in situations characterized by low depth of coverage (less than 20 reads available for the evaluated position), in which a single read may indicate the existence of an alternative allele, such read may be a miss-mapped read (false positive variant) or may represent a true unbalanced heterozygous genotype (true positive variant). Therefore, the definitive status of this kind of genotype (homozygous or heterozygous) was inferred during a final imputation step.
– Genotypes in which the proportion of reads for the less represented allele was higher than 20% were considered to be heterozygous. This procedure assures that only high-quality heterozygous genotypes are passed forward to the imputation procedure.
After applying the rules described above, the HLA-G database presented 8.42% of missing alleles, i.e., alleles that were considered uncertain because of low coverage or bad proportions. Some single nucleotide variations (SNVs) previously detected (with low quality) were converted into monomorphic as the alternative allele was removed or coded as missing, thus, they were not considered for further analyses. By using the VCFtools package (73), we removed SNVs that were no longer variable or that were represented just once in the dataset (i.e., singletons). In addition, we predicted the functional effect of each SNV, i.e., they were classified as coding synonymous mutations, coding non-synonymous mutations, splice site acceptors, stop-codon generation, and others, by using Snpeff (74). The missing alleles were imputed as well as HLA-G haplotypes were inferred by using the PHASE algorithm (75) as previously described (76, 77). For this purpose, a database containing high-quality genotype information for 133 SNVs for each of the 1076 remaining samples was used. The haplotyping procedure generated 200 haplotypes, with a mean haplotype pair probability of 0.7965 and with 524 samples (48.70%) presenting a haplotype pair with a probability higher than 0.9. The results of the procedure described above were presented separately for each HLA-G region (coding, 3′UTR and promoter) and, finally, as fully characterized extended haplotypes.
To characterize and explore global patterns of HLA-G diversity, a population genetics approach was performed using the ARLEQUIN 18.104.22.168 software (78, 79). The frequencies of each HLA-G haplotype were computed by the direct counting method and adherences of diplotype proportions to expectations under Hardy–Weinberg equilibrium were tested by the exact test of Guo and Thompson (80). Intrapopulational genetic diversity parameters were assessed in each population by computation of gene diversity (average expected heterozygosity across variation sites), haplotype diversity, nucleotide diversity, and the number of private haplotypes. Interpopulation genetic diversity was explored by means of pair-wise FST estimates (81), by the exact test of population differentiation (82), and by the analysis of molecular variance (AMOVA) (83), all based on haplotype frequencies. Since the pair-wise FST and the exact test of population differentiation between pairs of populations represent 91 statistical comparisons, the Bonferroni correction was used to adjust the significance level for multiple testing, resulting in a α = 0.0005 (i.e., 0.05/91). Reynolds’ genetics distance was also estimated for each pair of population samples by the ARLEQUIN 22.214.171.124 software (78, 79, 84). The resulting matrix was used to generate a multidimensional scaling (MDS) using the PASW Statistics (17.0.2) software (SPSS Inc.).
HLA-G Coding Region Variability and Haplotypes
In contrast to classical HLA class I genes, HLA-G presents low variability in its coding region. To date, only 50 coding alleles or haplotypes are officially recognized by the IMGT/HLA database2 (version 126.96.36.199). Most of the SNVs in the HLA-G coding region are either coding synonymous mutations or intronic variants. Therefore, these 50 officially recognized HLA-G alleles encode only 16 different full-length proteins and two truncated molecules (null alleles). This is a distinctive feature of the HLA-G gene and also of other non-classical class I genes: only 36% of the known HLA-G alleles are associated with different HLA-G molecules when compared to classical class I genes, in which 75.4% for HLA-A, 77.8% for HLA-B, and 73.5% for HLA-C alleles are associated with different molecules (IMGT/HLA). The limited HLA-G coding region polymorphism is distributed among the alpha-1, alpha-2, and alpha-3 domains, while for classical class I genes, polymorphisms are found mainly around the region encoding the peptide binding groove, i.e., alpha-1 and alpha-2 domains (1). This is particularly evident for HLA-B, in which there is at least one recognized allele carrying a mutation for each nucleotide of exons 2 or 3, with few exceptions.
Generally, a SNV is considered as a polymorphic site if the minor allele presents a frequency of at least 1%. In this matter, some HLA-G variable sites may not be considered as true polymorphisms because they are rarely observed. Considering the 50 HLA-G alleles that have been officially recognized by IMGT/HLA, and taking into account the several studies evaluating the HLA-G coding region polymorphisms in normal or pathological conditions, only 13 alleles encoding four different HLA-G full-length molecules and a truncated one are frequently observed in worldwide populations (3, 19, 23, 34, 36, 37, 68, 69, 76, 85–104).
Among the high-frequency HLA-G coding alleles, we may find the G*01:01:01:01, G*01:01:01:04, G*01:01:01:05 (present at hg19), G*01:01:02:01, G*01:01:03:01, G*01:01:05, and G*01:01:07 alleles; all carrying intronic or synonymous mutations and encoding for the same full-length HLA-G molecule known as G*01:01. HLA-G*01:01:01:01 is the reference allele used by IMGT/HLA, it was the first one described (2) and usually the most common allele in all populations studied so far. Among the frequent ones, we also find the G*01:03:01:01 allele that is characterized by a non-synonymous mutation at position 292, codon 31, exchanging a Threonine by a Serine, encoding the full-length molecule known as G*01:03. Another group of alleles are represented by G*01:04:01, G*01:04:03, and G*01:04:04, all of them encoding the same molecule known as G*01:04. They are characterized by a non-synonymous mutation at position 755, codon 110, exchanging a Leucine by an Isoleucine, and by other synonymous mutations. The null allele, G*01:05N, which is associated with a truncated HLA-G molecule due to a deletion of a cytosine around codon 130 that changes the reading frame, is also very frequent in some African, Asian, and admixed populations. Finally, the last frequent allele is G*01:06, which is characterized by a non-synonymous mutation at position 1799, codon 258, exchanging a Threonine by a Methionine, encoding a molecule known as G*01:06. Other HLA-G alleles are sporadically found around the world, but only the ones presented above have been described at polymorphic frequencies.
However, the variability in the HLA-G coding region may be higher than the one presented by IMGT/HLA, because IMGT/HLA only presents alleles that were cloned, sequenced, and properly characterized by the researchers. In addition, most of the known alleles are not fully characterized, presenting only some exons sequenced. Therefore, the variability at the HLA-G coding region may be greater than the one reported so far.
The reevaluation of the HLA-G sequencing data from the 1000Genomes Project indicated that the HLA-G coding region is indeed much conserved and just a few new coding alleles are frequently found worldwide. The approach described earlier evidenced the presence of 81 SNVs in the HLA-G coding region, as described in Table 2. Some of these variation sites are truly polymorphic, while some might be considered as mutations. In addition, some of these new sites are not represented in the IMGT/HLA database and might represent new HLA-G alleles.
Table 2. List of all variation sites found in the HLA-G coding. region, their genomic positions on chromosome 6 relative to hg19 and the HLA-G gene, and their allele frequencies considering all populations of the 1000Genomes Project (Phase 1).
As observed in Table 2, most of the 81 variation sites occur in introns (54 sites) or in exons as synonymous changes (16 sites). Thus, 86.4% of all variants are associated with the same HLA-G full-length molecule, unless they somehow influence HLA-G splicing pattern. Among the ones that might be related to different HLA-G full-length proteins, we may find two frameshift mutations: the first associated with the G*01:05N null allele and the second representing a low-frequency variation site not recognized by IMGT/HLA (genomic position 29797195); one variation site associated with a splicing acceptor site (genomic position 29795822, HLA-G position + 201) and eight non-synonymous modifications, most of them recognized by IMGT/HLA. Interestingly, one synonymous modification was found presenting a high frequency (2.93%) and is not associated with any known HLA-G allele described so far (HLA-G position + 2412, rs17179080, Table 2). Although a triallelic SNV is described at exon 2 (HLA-G position + 372), associated with the G*01:04:02 allele, we did not find the third allele in the present data.
As described earlier, haplotypes were inferred considering all variation sites found in the HLA-G region. When the coding region is isolated from these haplotypes, we found 93 different HLA-G coding haplotypes, a number far higher than the number of HLA-G alleles officially recognized. The complete table of haplotypes is available upon request. Table 3 describes all coding haplotypes presenting a minimum global frequency of 1% and the closest known HLA-G allele in terms of sequence similarity. It should be mentioned that non-variable positions for the haplotypes presented in Table 3 were removed. Although 93 different haplotypes were inferred, only 11 present a frequency higher than 1%. Of those, 10 were compatible with a specific allele described at the IMGT/HLA database and mentioned earlier as high-frequency alleles that usually occur in any population, and 1 is a new allele that is close to G*01:01:01:01 but presents the frequent nucleotide change at position + 2412, not recognized by IMGT/HLA. As previously observed in other studies, the most frequent HLA-G allele is G*01:01:01:01, followed by G*01:01:02:01 and G*01:04:01. These 11 haplotypes or coding alleles do represent 88.8% of all HLA-G coding haplotypes and are associated with only four different HLA-G full-length molecules and a truncated one. Moreover, taking into account these 11 haplotypes, at least 60.87% of all HLA-G full-length molecules would be the same (from G*01:01:01:01, G*01:01:02:01, G:01:01:03:03, G*01:01:01:04, and G*01:01:01:01new) and a higher proportion is expected if other rare haplotypes are considered.
Table 3. List of HLA-G coding haplotypes presenting a global frequency higher than 1%, considering all populations of the 1000Genomes Project (Phase 1).
The haplotypes listed in Table 3 do present heterogeneous frequencies among the 1000Genomes populations (Table 4). The G*01:01:01:01 allele, for example, is very frequent among Europeans and Asians, presents intermediate frequencies among admixed populations and lower frequencies in African populations, while an opposite pattern is observed for the G*01:05N null allele. In addition, allele G*01:01:03:03 is absent or very rare in African populations, and the G*01:04:04, G*01:01:01:04, and G*01:01:01:01new alleles are absent in Asians.
Table 4. The most frequent HLA-G coding haplotypes and their frequencies among the 1000Genomes Project (Phase 1) populations.
HLA-G 3′ Untranslated Region Variability and Haplotypes
The reevaluation of the HLA-G sequencing data indicated that its 3′UTR presents several high-frequency variation sites in a short segment. The approach described earlier evidenced as much as 17 variation sites in this short region, as described in Table 5. Some of these variation sites are polymorphic and have been previously described in several studies that evaluated the HLA-G 3′UTR (38, 69, 76, 88, 105–117), while some might be considered as mutations. In general, nine variation sites can be considered as true polymorphisms. It should be noted that the nomenclature used to designate HLA-G 3′UTR variation sites is based on our previous reports, being designated as UTR-1, UTR-2, and so forth (88). In this matter, the 14-bp insertion (rs371194629), although less frequent and not represented in the hg19 human genome, is considered to be the ancestral allele and should be counted for designate HLA-G 3′UTR positions.
Table 5. List of all variation sites found in the HLA-G 3′ untranslated region, their positions regarding hg19 and the HLA-G gene, and their allele frequencies considering all populations of the 1000Genomes Project (Phase 1).
When the 3′UTR segment is isolated from the 200 extended haplotypes found, we observe 41 different haplotypes for this region. Table 6 presents all haplotypes that reached a global frequency higher than 1% and the complete table of haplotypes is available upon request. Monomorphic positions considering these high-frequency haplotypes are removed from Table 6. Considering the global frequency of each haplotype, it is noteworthy that only nine haplotypes account for more than 95% of all haplotypes found. These haplotypes were named according to the previous studies addressing the HLA-G 3′UTR variability (38, 69, 76, 88, 105–117).
Table 6. The most frequent HLA-G 3′ untranslated region haplotypes presenting frequencies higher than 1% considering all populations of the 1000Genomes Project (Phase 1).
The haplotypes found considering the reevaluation of the 1000Genomes data are consistent with the ones found in several other populations, and some haplotypes that were previously considered as rare ones (such as UTR-10 and UTR-18) are actually more frequent than previously thought considering all populations pooled together (global frequency). Some rare SNVs that were previously described using Sanger sequencing, such as the one at position +3001 (69, 110, 111), and others that were described in studies evaluating the 1000Genomes data, such as +3032, +3052, +3092, +3121, and +3227, were also detected in this reevaluation (Table 5). In addition, it should be pointed out that the 14-bp polymorphism, which is absent at the 1000Genomes initial released VCF files, was retrieved from the raw sequence data and its genotypes were inferred for most of the samples.
Similar to the HLA-G coding region, a heterogeneous distribution of these nine 3′UTR haplotypes is observed among the 1000Genomes populations (Table 7). The UTR-1 haplotype, for example, is very common in European populations, but presents lower frequencies in populations from Africa. The UTR-7 haplotype is absent or rare in populations of African ancestry, and haplotypes UTR-6 and UTR-18 are absent or rare in Asia. The 3′UTR haplotype frequencies in admixed populations are close to the ones reported for other admixed populations such as Brazilians (76, 88, 110, 111). In addition, the frequencies observed for the 1000Genomes African populations are close to the ones reported for other African populations described in isolated reports (108, 116, 117). Moreover, the frequencies reported here are close to the ones presented for the same data in another manuscript (69), with some minor differences since this latter manuscript only imputed the 14-bp polymorphism and used the original 1000Genomes VCF data.
Table 7. The most frequent HLA-G 3′ untranslated region haplotypes and their frequencies among the 1000Genomes Project (Phase 1) populations.
HLA-G 5′ Promoter Region Variability and Haplotypes
As previously discussed, there is no consensus regarding where the HLA-G transcription starts. Considering NCBI and NM_002127.5, the HLA-G transcription starts 866 nucleotides upstream the initiation codon ATG. However, most of the studies performed so far regarding the HLA-G promoter structure did consider 1500 nucleotides upstream the main initiation codon ATG as the HLA-G promoter region. In this scenario, only SNVs above −866 should be considered as promoter SNVs (or SNVs from the upstream regulatory region) and the ones between −866 and −1 should be considered as 5′UTR SNVs. Nevertheless, despite of this inconsistency and considering the fact that there is no consensus yet regarding the HLA-G initial transcription starting point, in the present work we considered all SNVs upstream the main translation start point as promoter (5′ upstream regulatory region) SNVs.
The approach described earlier evidenced the presence of 35 SNVs in the HLA-G promoter region, as described in Table 8. Among them, 26 of all variable sites (74.3%) can be considered as true polymorphisms (minor allele frequency above 1%), and at least 11 present frequencies around 50%. In addition, the trialleic SNP at position −725, as well as other known indels at the promoter region, was properly recovered.
Table 8. List of all variation sites found at the HLA-G 5′ promoter region, their positions regarding hg19 and the HLA-G gene, and their allele frequencies considering all populations of the 1000Genomes Project (Phase 1).
When the promoter region is isolated from the 200 extended haplotypes found, we observe 64 haplotypes for this region. Table 9 presents all haplotypes that reached a frequency higher than 1% and the complete table of haplotypes is available upon request. Monomorphic positions considering these frequent haplotypes were removed from Table 9. Considering the global frequency of each haplotype, it is worth mentioning that only nine haplotypes account for more than 95% of all haplotypes found. These haplotypes were named according to previously published works addressing the HLA-G promoter region variability (76, 118–120). As previously observed for both the coding and 3′UTR regions, promoter haplotype frequencies greatly vary among populations (Table 10).
Table 9. The most frequent HLA-G 5′ promoter region haplotypes presenting frequencies higher than 1% considering all populations of the 1000Genomes Project (Phase 1).
Table 10. The most frequent HLA-G 5′ promoter region haplotypes and their frequencies among the 1000Genomes Project (Phase 1) populations.
HLA-G Extended Haplotypes
As described earlier, 200 extended haplotypes were inferred considering the whole HLA-G sequence encompassing the promoter, coding, and 3′UTR segments. Since there is no official nomenclature for the entire MHC genes, the HLA-G extended haplotypes were named according to the nomenclature adopted for each HLA-G segment. As already observed for some populations (76, 88, 118–120), the promoter haplotypes are usually associated with the same coding and 3′UTR haplotypes (Table 11). For example, promoter haplotype 010101a is usually associated with the coding allele G*01:01:01:01 and the 3′UTR haplotype named UTR-1. The same phenomenon is observed for each of the main HLA-G promoter, coding, or 3′UTR haplotypes. In this matter, only 24 extended HLA-G haplotypes were found presenting a minimum frequency of 0.5% and representing more than 85% of all haplotypes, and only 15 present frequencies higher than 1%.
Table 11. The most frequent HLA-G extended haplotypes presenting frequencies higher than 0.5% considering all populations of the 1000Genomes Project (Phase 1).
The extended haplotypes shown in Table 11 were classified according to previously defined HLA-G lineages (76, 118). It becomes clear that most of the extended haplotypes are associated with the same encoded full-length molecule and functional polymorphisms are mainly present at the regulatory regions. In fact, many polymorphisms in the regulatory regions do present high frequencies (around 50%), what is compatible with the evidence of balancing selection acting on the HLA-G regulatory regions (3, 69, 76, 88, 115, 118, 121). For example, lineages HG010101 (a, b or c) and HG010102 are associated with HLA-G coding alleles that usually encode the same HLA-G molecules (exception made to the G*01:06 and G*01:05N alleles), but the promoter and 3′UTR haplotypes are the most divergent ones compared to each other.
Recently, the Neanderthal genome sequence corresponding to a sample dating 40,000 years was published (122). The same pipeline described above was applied to this Neanderthal genome and we found that this unique sample does present a HLA-G haplotype found among modern humans with a frequency of 0.00604 (G010101f/G*01:01:01:04/UTR-6) and another haplotype that was not found in the present series and is composed of a recombined promoter, an unknown HLA-G coding allele close to G*01:01:02:01 and UTR-2.
HLA-G Worldwide Diversity
Human leukocyte antigen G worldwide intrapopulational genetic diversity was evaluated by means of different population genetics parameters (Table 12). Except for the number of private alleles, which is greatly influenced by sample sizes and the number of different samples from a same geographic area (group), African populations exhibited higher levels of genetic diversity in comparison with Europeans and Asians. Admixed populations sampled in America also revealed high levels of diversity. These findings are consistent with the current knowledge that older and admixed populations are prone to exhibit larger diversity than younger and non-admixed populations. Similar observations are made when the promoter (Table 13) and coding (Table 14) regions are considered separately. Since these differences between Africans and non-Africans are not as substantial as those observed for neutral markers (123), such similar levels of diversity may be reflecting both demographic events and the action of balancing selection. However, when the 3′UTR is considered (Table 15), a different pattern arises, regarding gene and nucleotide diversity. For instance, Europeans present the highest levels while Africans presents the lowest levels. This finding does not present a straightforward explanation, although one may suppose that a stronger signature of balancing selection over HLA-G 3′UTR may have distorted demographic signatures, resulting in a higher diversity in Eurasia. It should be emphasized that, as previously reported for a Brazilian population sample (76) and also for the populations of the 1000Genomes Project (69), both the promoter and 3′UTR diversity have been shaped by a strong balancing pressure.
Table 12. Genetic diversity parameters and probability of adherence of diplotype frequencies to Hardy–Weinberg equilibrium expectations (pHWE), considering whole HLA-G ha.plotypes.
Table 13. Genetic diversity parameters and probability of adherence of diplotype frequencies to Hardy–Weinberg equilibrium expectations (pHWE), considering HLA-G promoter haplotypes.
Table 14. Genetic diversity parameters and probability of adherence of diplotype frequencies to Hardy–Weinberg equilibrium expectations (pHWE), considering HLA-G coding region haplotypes.
Table 15. Genetic diversity parameters and probability of adherence of diplotype frequencies to Hardy–Weinberg equilibrium expectations (pHWE), considering HLA-G 3′UTR haplotypes.
The comparison of the three different HLA-G regions (Tables 13–15) also reveals interesting aspects. The average expected heterozygosity (gene diversity) for variation sites at the 3′UTR is ~20% higher (0.2730) than the estimated ones for the promoter (0.2323) and coding (0.2244) regions. As a consequence, nucleotide diversity is 4.5 times higher for the 3′UTR (2.8640%) than for the promoter (0.6331%) and coding (0.6432%) regions. Nucleotide diversity at HLA-G 3′UTR is almost 40 times higher than the human genome average (0.075%) (118, 124), resulting in an astonishing average of 8.19 differences when two randomly chosen 3′UTR (286-bp long) haplotypes are compared. Balancing selection favors the maintenance of different alleles in a population, resulting in a proportionally higher average pair-wise difference as compared with the measure of diversity based on the number of polymorphic sites. The worldwide nucleotide diversity at the whole HLA-G locus (0.7548%) is as expected slightly higher than that observed for the Brazilian population sample (0.00643%) (76). The direct comparison of haplotype diversity between the three regions could not be performed, since the very different lengths and number of variation sites of the three regions (Tables 2, 5, and 8) may bias any retrieved conclusions.
Two independent approaches were used to evaluate the extent of differentiation between pairs of populations (interpopulation diversity): FST and the exact test of population differentiation based on haplotype frequencies. Although these analyses have the same purpose and may provide similar results, both were performed to provide more reliable and robust conclusions. The analysis of the pair-wise FST matrix revealed a large range of variation of FST values: from −0.0150, between British from England and Scotland (GBR) and Iberian populations from Spain (IBS), to 0.2037, between Finnish (FIN) and Japanese (JPT) (Table 16). While only 1 out of 6 (16.7%) pairs of admixed populations and 4 out of 10 (40%) European populations differed significantly at the 5% unadjusted significance level; it is noteworthy that the two African populations, as well as the three Asian populations, differed. IBS presented the lowest number of significant comparisons (2 out of 13), a fact that is clearly related to the lack of statistical power due to the small sample size. On the other hand, JPT (all comparisons), CHB (12 out of 13), CHS (12 out of 13), FIN (12 out of 13), and YRI (11 out of 13) presented the largest number of significant comparisons. An overall stronger differentiation was observed by the matrix composed of non-differentiation probability values obtained through the exact test of population differentiation (Table 17). While only 3 out of 10 (30%) European populations differed significantly at the 5% significance level, it is noteworthy that the two African populations, as well as the three Asian populations and four admixed populations, differed. IBS presented the lowest number of significant comparisons (4 out of 13), while JPT, CHB, CHS and YRI differed in all pair-wise comparisons including them. To sum up, both the exact test of population differentiation based on haplotype frequencies and the FST estimate revealed the existence of highly significant difference between the 14 populations. Since the more frequent HLA-G haplotypes are shared between most of the populations, these pair-wise population differences may be due to the existence of many low-frequency haplotypes that are restricted to two or three populations (22.5% of the 200 identified haplotypes) or are private to a single population (63% of the 200 haplotypes).
Table 16. Matrix of pair-wise FST values based on whole HLA-G haplotype frequencies (below the diagonal) and probabilities associated with pair-wise FST values (above the diagonal) for the 14 populations analyzed in the present study.
Table 17. Matrix of non-differentiation probabilities obtained by means of exact tests of population differentiation based on haplotype frequencies for the 14 populations analyzed in the present study.
To further explore the genetic relationships between populations, an AMOVA was performed assuming a hierarchical structure in which the 14 populations were divided into four groups: African, Asian, European, and admixed populations (Table 18). Considering the whole HLA-G gene, differences between the four groups account for only 2.45% of the variance, whereas 1.64% of the variance occurs as a consequence of differences between populations that belong to a same group. Almost all the variance (95.91%) is observed within populations. This same pattern is observed when each HLA-G region, i.e., promoter, coding, and 3′UTR, is considered separately, with the exception of the 3′UTR where the variance among groups (0.65%) gets even lower than the variance among populations that belong to a same group (1.32%), and is statistically non-significant.
Table 18. Analysis of molecular variance (AMOVA) for HLA-G haplotype frequencies, according to two different hierarchical structures and four different HLA-G datasets.
Since the group composed of admixed populations represent an assembly of populations whose individuals present varying levels of ancestry that can be assigned to Africans, Amerindians/Asians, and Europeans, this group was removed from a second round of analysis (Table 18). As a result, levels of variance between groups increased, although still lower than the expected ones for neutrally evolving sequences (123). Therefore, one may conclude that this analysis reflects the fact that most of the HLA-G diversity, particularly that from the 3′UTR, (a) originated from Africa before Homo sapiens dispersion to other continents and (b) has been maintained in worldwide populations by non-neutral evolutionary forces, particularly balancing selection. These conclusions are corroborated by previous data on HLA-G (68, 69, 76, 89, 121). Moreover, many different low-frequency haplotypes are being generated within populations by mutation and recombination. These features are responsible for the relatively poor resolution of the MDS plot (Figure 2) obtained with the matrix of Reynolds’ genetic distance based on the whole HLA-G gene. Unexpectedly, (a) populations from a same geographic group, for example Asians (CHB, CHS and JPT), are distributed across large distances in the plot and (b) admixed populations (CLM, MXL, and PUR) that present major European, intermediate Amerindian, and minor African ancestry contributions (66), as revealed by the analysis of Ancestry Informative Markers (data not shown), are clustered together with African populations. These unexpected findings support the hypothesis that a strong signature of balancing selection over HLA-G may have distorted the expected demographic signatures.
Figure 2. Multidimensional scaling (MDS) plot revealing the genetics relationships between the 14 populations of the 1000Genomes Project (Phase 1).
HLA-G Evolution Aspects
The MHC class I molecules evolved by a series of events that include chromosomal duplication, gene recombination, and selection probably driven by pathogens (125–127). Apparently, MHC-G, the HLA-G homologous sequence in non-human primates, is the oldest class I gene and it would be responsible for the origin of the whole class I loci (127). In fact, MHC class I genes from the New World primates, such as the cotton-top tamarin (Saguinus oedipus), are much closer to the human HLA-G than other human classical class I genes (127). This primate lineage separated from the one that gave rise to the Old World monkeys (or anthropoids) about 38 million years ago. It is noteworthy that the HLA-G and MHC-G molecules are functionally different despite the high identity among exonic sequences (128). New World primates’ MHC-G plays a role in antigen presentation that is uncommon for human HLA-G, and this fact suggests that they are not orthologous as theorized in the past (129, 130). In contrast, the cotton-top tamarin presents two MHC-C molecules with inhibitory properties that interact with KIR receptors (131). The regulation of MHC levels (in this case, MHC-C) in these non-human primates seems to be one of the responsible mechanisms for fetal acceptance as well as for the shorter pregnancy period (132).
Old World primates have a peculiar MHC-G molecule. It presents just the α1 domain due to a stop codon at codon 164 (133), which may not hinder fetal protection against maternal NK cells, unless there is a mechanism in which the stop codon is ignored, allowing translation to continue (which is not discarded). In addition, gorillas and chimpanzees present a conserved MHC-G coding segment with few variations (3, 128, 129). Even the pregnancy period being shorter than in human beings, these species are polygamous, which would expose the female to different allogeneic fetuses during the fertile age. Orangutans on the other hand have long-lasting relationships and five MHC-G variants have been found so far – the polymorphism levels are low but more similar to human beings (3). Orangutans and humans are separated by about 15 million years of evolution. Possibly, the differences between maternal-fetal relationships among different species are responsible for each MHC-G peculiarities and for its function and variation levels.
In addition to alignments between human and other primates coding MHC-G sequences, analyses of HLA-G non-coding regions have proved to be highly informative about the evolutionary history of this gene. For example, the polymorphism of 14-pb located on HLA-G exon 8 (3′UTR) is exclusively found in the human lineage, suggesting that UTR haplotypes bearing the deletion such as UTR-1 are more recent than the ones that present the 14-bp fragment (134).
An interesting finding confirmed recently is that one of the most frequent HLA-G coding allele (global frequency of 0.24257), G*01:01:01:01, which is usually associated with UTR-1 and the promoter haplotype G010101a [described in Ref. (76) and Table 11], is probably the most recent haplotype. These data were established by the association between G*01:01:01:01/UTR-1 with an Alu insertion (AluyHG) that occurred before human dispersion from Africa, in a location 20 Kb downstream HLA-G 3′UTR. The frequency of this Alu element increases with distance from Africa (68).
Given the HLA-G immunomodulatory properties and the unique tissue expression patterns, HLA-G expression levels must be maintained under a fine regulatory control. In addition, the lack of variability found in its coding region and limited number of proteins coded by this gene lead us to believe that this region is under tight evolutionary forces that limit variation. The differences on mammalian pregnancy and species-specific pathogens must be considered when studying the evolution of the immune system molecules.
HLA-G Transcription Regulation
Most of the studies already performed to understand HLA-G regulation considered as the HLA-G promoter 200 nucleotides upstream the first translated ATG and within 1.5 Kb upstream the CDS. The HLA-G regulation is unique among all class I genes [reviewed at Ref. (67)]. Generally, HLA class I genes present two main regulatory modules in the proximal promoter region (within 200 bases upstream the CDS) that includes [reviewed at Ref. (67)] (a) the Enhancer-A (EnhA) that interacts with NF-κB family of transcription factors, which are important elements to induce HLA class I genes expression (135); (b) the interferon-stimulated response element (ISRE) that consists of a target site for interferon regulatory factors (IRF), which might act as class I activators (IRF-1) or inhibitors (IRF-2 and IRF-8) (135). The ISRE module is located adjacent to the EnhA element, and both work cooperatively controlling HLA class I genes expression; (c) the SXY module in which the transcription apparatus is mounted.
However, the HLA-G gene presents regulation peculiarities that differ from other class I genes [reviewed at Ref. (67)]. First, the HLA-G EnhA is the most divergent one among the class I genes and is unresponsive to NF-κB (136) and might only interact with p50 homodimers, which are not potent HLA class I gene transactivators (137). In addition, the HLA-G ISRE is also unresponsive to IFN-γ (138) due to modified ISRE. In fact, the HLA-G locus presents the most divergent ISRE sequence among the class I genes (135, 136), what could explain the absence of IFN-γ induced transactivation. The ISRE is also a target for other protein complexes that may mediate HLA class I transactivation. However, both HLA-G EnhA and ISRE seem to bind only the expressed factor Sp1, which apparently does not modulate the constitutive or IFN-induced transactivation of HLA-G (136). Some polymorphisms in promoter region, such as −725 C > G/T, are close to known regulatory elements. In this matter, the −725 G allele was related with higher HLA-G expression levels (120).
The SXY module comprises the S, X1, X2, and Y boxes and is an important target for regulatory binding elements and HLA class I genes transactivation. Box X1 is a target for the multiprotein complex regulatory factor X (RFX), including RFX5, RFX-associated protein, and RFXANK (137, 139–141). The RFX members use to interact with an important element for HLA class II transactivation (CIITA), also important to HLA class I gene transactivation (139). The X2 box is a binding target for activating transcription factor/cAMP response element binding protein (ATF/CREB) transcription factor family (142) and Y box is a binding target for nuclear factor Y (NFY), which includes subunits alpha, beta, and gamma (NFYA, BFYB, and NFYC) (67, 139). For HLA-G, the SXY module presents sequences compatible only with S and X1 elements, but divergent from X2 and Y. Because CIITA is dependent of a functional SXY module, which includes X2 and Y elements, the SXY module does not transactivate HLA-G gene (139, 143–146).
Other regulatory elements within the HLA-G promoter have been described, such as heat shock element, located at −469/−454 position, that bind with heat shock factor-1 (HSF-1), important elements involved in immune responses modulation (147), and progesterone, which is a steroid hormone secreted from corpus luteum and placenta, involved with endometrium maintenance and embryo implantation [reviewed at Ref. (67)]. The mechanism involved in HLA-G expression induced by progesterone is primarily mediated by the activation of progesterone receptor and a subsequent binding to a progesterone response element, found in the promoter region (148). The transactivation of HLA-G transcription has also been demonstrated by leukemia inhibitory factor (LIF) (149) and methotrexate cell exposure (150). In addition, it was demonstrated an increased HLA-G transcription level in choriocarcinoma cell JEG3 line after the treatment with LIF. Furthermore, LIF induces HLA-G expression in the presence of endoplasmic reticulum aminopeptidase-1 (ERAP1), expressed in the endoplasmic reticulum, and repression of ERAP1 culminates in HLA-G downregulation, indicating that ERAP1 has an important role in HLA-G regulation (151). Finally, it is necessary to highlight the importance of methylation status of the HLA-G promoter, since it appears to be very important for HLA-G transcription (152, 153).
Although some HLA-G regulatory elements are known, it is not clear why balancing selection is maintaining divergent lineages since most of the polymorphisms would not theoretically influence HLA-G transcription by the known mechanisms, mainly because they do not coincide with known regulatory elements [reviewed at Ref. (67)]. It should be noted that the same SNVs described for the HLA-G promoter in other manuscripts are also found in the present analysis.
HLA-G Post-Transcriptional Regulation
HLA-G might also be regulated by post-transcriptional mechanisms such as alternative splicing and microRNAs. Several studies have reported polymorphisms influencing splicing, mRNA stability, and also the ability of some microRNAs to bind to the HLA-G mRNA. The HLA-G 3′UTR segment is a key feature for its regulation mainly by the binding of microRNAs and influencing mRNA stability. HLA-G 3′UTR presents several polymorphic sites that influence gene expression [reviewed at Ref. (67)].
The 14-bp presence or absence (insertion or deletion) polymorphism was implicated in the HLA-G transcriptional levels and mRNA stability. The presence of the 14 bases segment in trophoblast samples has been associated with lower mRNA production for most membrane-bound and soluble isoforms (98, 154), and the absence of this segment seems to stabilize mRNA with a consequent higher HLA-G expression (98, 155, 156). In addition, HLA-G transcripts presenting the 14 bases segment can be further processed with the removal of 92 bases from the complete mRNA (98), giving rise to a shorter HLA-G transcript reported to be more stable than the complete isoform (157). The alternative splicing associated with the presence of the 14 bases segment is probably driven by other polymorphic sites in Linkage Disequilibrium with this polymorphic site (3).
The SNP located at position +3142 has been associated with differential HLA-G expression, because it might influence microRNA binding (158). The presence of a Guanine at the + 3142 is associated with a stronger binding of specific microRNAs, such as miR-148a, miR-148b, and miR-152, decreasing HLA-G expression by mRNA degradation and translation suppression (3, 158, 159). In addition, the 14-bp region might also be a target for specific microRNAs and other 3′UTR polymorphisms might also influence microRNA binding (159). Another polymorphic site that would influence HLA-G expression is located at +3187. The allele +3187A is associated with decreased HLA-G expression because it extends an AU-rich motif that mediates mRNA degradation (106).
UTR-1 (Table 6) is the only frequent 3′UTR haplotype that do not carry the 14-bp sequence, and both the high expression alleles +3142G and +3187A. Therefore, it was postulated that this haplotype would be associated with high HLA-G expression; this was confirmed by another study evaluating soluble HLA-G levels and 3′UTR haplotypes (109). In addition, as already introduced, this haplotype (together with the coding allele G*01:01:01:01) is probably the most recent one (109) and its frequency might be increased worldwide due to its high-expressing feature.
Due to the key features of HLA-G on the regulation of immune response and immune modulation, particularly during pregnancy, the overall structure of the HLA-G molecule has been maintained during the evolution process. This is evident when the variability of more than a thousand individuals is taking into account, and only few encoded different molecules are frequently found. Most of the variation sites found in the HLA-G coding region are either synonymous or intronic mutations. The HLA-G promoter region presents numerous polymorphic sites, with several examples of variation sites in which both alleles are equally represented. Although the mechanisms underlying why some divergent promoter haplotypes are preferentially selected are still unclear, just a few divergent and frequent promoter haplotypes are found worldwide. The HLA-G 3′UTR variability is quite expressive considering the fact that most of the SNVs are true polymorphisms, they are equally represented, and this segment is of short size. These observations, for both promoter and 3′UTR, are compatible with the evidences of balancing selection acting on these regions. Finally, the population comparisons confirmed that most of the HLA-G variability has arisen before human dispersion from Africa and that the allele and haplotype frequencies might have been shaped by strong selective pressures.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This work was supported by FAPESP/Brazil (Grant# 2013/17084-2). Erick C. Castelli, Eduardo A. Donadi, and Celso T. Mendes Jr are supported by CNPq/Brazil (Grants # 304471/2013-5, 304753/2009-2, and 305493/2011-6). Jaqueline Ramalho is supported by a scholarship from PROPE/UNESP.
2. Geraghty DE, Koller BH, Orr HT. A human major histocompatibility complex class I gene that encodes a protein with a shortened cytoplasmic segment. Proc Natl Acad Sci U S A (1987) 84(24):9145–9. doi:10.1073/pnas.84.24.9145
3. Donadi EA, Castelli EC, Arnaiz-Villena A, Roger M, Rey D, Moreau P. Implications of the polymorphism of HLA-G on its function, regulation, evolution and disease association. Cell Mol Life Sci (2011) 68(3):369–95. doi:10.1007/s00018-010-0580-7
4. Rouas-Freiss N, Goncalves RM, Menier C, Dausset J, Carosella ED. Direct evidence to support the role of HLA-G in protecting the fetus from maternal uterine natural killer cytolysis. Proc Natl Acad Sci U S A (1997) 94(21):11520–5. doi:10.1073/pnas.94.21.11520
7. Shiroishi M, Kuroki K, Ose T, Rasubala L, Shiratori I, Arase H, et al. Efficient leukocyte Ig-like receptor signaling and crystal structure of disulfide-linked HLA-G dimer. J Biol Chem (2006) 281(15):10439–47. doi:10.1074/jbc.M512305200
10. Yie SM, Li LH, Li YM, Librach C. HLA-G protein concentrations in maternal serum and placental tissue are decreased in preeclampsia. Am J Obstet Gynecol (2004) 191(2):525–9. doi:10.1016/j.ajog.2004.01.033
11. Peng B, Zhang L, Xing AY, Hu M, Liu SY. [The expression of human leukocyte antigen G and E on human first trimester placenta and its relationship with recurrent spontaneous abortion]. Sichuan Da Xue Xue Bao Yi Xue Ban (2008) 39(6):976–9.
12. Le Discorde M, Moreau P, Sabatier P, Legeais JM, Carosella ED. Expression of HLA-G in human cornea, an immune-privileged tissue. Hum Immunol (2003) 64(11):1039–44. doi:10.1016/j.humimm.2003.08.346
13. Lefebvre S, Adrian F, Moreau P, Gourand L, Dausset J, Berrih-Aknin S, et al. Modulation of HLA-G expression in human thymic and amniotic epithelial cells. Hum Immunol (2000) 61(11):1095–101. doi:10.1016/S0198-8859(00)00192-0
14. Menier C, Rabreau M, Challier JC, Le Discorde M, Carosella ED, Rouas-Freiss N. Erythroblasts secrete the non-classical HLA-G molecule from primitive to definitive hematopoiesis. Blood (2004) 104(10):3153–60. doi:10.1182/blood-2004-03-0809
15. Cordero EA, Veit TD, da Silva MA, Jacques SM, Silla LM, Chies JA. HLA-G polymorphism influences the susceptibility to HCV infection in sickle cell disease patients. Tissue Antigens (2009) 74(4):308–13. doi:10.1111/j.1399-0039.2009.01331.x
16. Haddad R, Ciliao Alves DC, Rocha-Junior MC, Azevedo R, do Socorro Pombo-de-Oliveira M, Takayanagui OM, et al. HLA-G 14-bp insertion/deletion polymorphism is a risk factor for HTLV-1 infection. AIDS Res Hum Retroviruses (2011) 27(3):283–8. doi:10.1089/aid.2010.0165
17. Kim SK, Chung JH, Jeon JW, Park JJ, Cha JM, Joo KR, et al. Association between HLA-G 14-bp insertion/deletion polymorphism and hepatocellular carcinoma in Korean patients with chronic hepatitis B viral infection. Hepatogastroenterology (2013) 60(124):796–8. doi:10.5754/hge11180
18. Segat L, Zupin L, Kim HY, Catamo E, Thea DM, Kankasa C, et al. HLA-G 14 bp deletion/insertion polymorphism and mother-to-child transmission of HIV. Tissue Antigens (2014) 83(3):161–7. doi:10.1111/tan.12296
19. Simoes RT, Goncalves MA, Castelli EC, Junior CM, Bettini JS, Discorde ML, et al. HLA-G polymorphisms in women with squamous intraepithelial lesions harboring human papillomavirus. Mod Pathol (2009) 22(8):1075–82. doi:10.1038/modpathol.2009.67
20. da Silva GK, Vianna P, Veit TD, Crovella S, Catamo E, Cordero EA, et al. Influence of HLA-G polymorphisms in human immunodeficiency virus infection and hepatitis C virus co-infection in Brazilian and Italian individuals. Infect Genet Evol (2014) 21:418–23. doi:10.1016/j.meegid.2013.12.013
21. Jeong S, Park S, Park BW, Park Y, Kwon OJ, Kim HS. Human leukocyte antigen-G (HLA-G) polymorphism and expression in breast cancer patients. PLoS One (2014) 9(5):e98284. doi:10.1371/journal.pone.0098284
22. Chen Y, Gao XJ, Deng YC, Zhang HX. Relationship between HLA-G gene polymorphism and the susceptibility of esophageal cancer in Kazakh and Han nationality in Xinjiang. Biomarkers (2012) 17(1):9–15. doi:10.3109/1354750X.2011.633242
23. Castelli EC, Mendes-Junior CT, Viana de Camargo JL, Donadi EA. HLA-G polymorphism and transitional cell carcinoma of the bladder in a Brazilian population. Tissue Antigens (2008) 72(2):149–57. doi:10.1111/j.1399-0039.2008.01091.x
24. Cao M, Yie SM, Liu J, Ye SR, Xia D, Gao E. Plasma soluble HLA-G is a potential biomarker for diagnosis of colorectal, gastric, esophageal and lung cancer. Tissue Antigens (2011) 78(2):120–8. doi:10.1111/j.1399-0039.2011.01716.x
25. Dong DD, Yie SM, Li K, Li F, Xu Y, Xu G, et al. Importance of HLA-G expression and tumor infiltrating lymphocytes in molecular subtypes of breast cancer. Hum Immunol (2012) 73(10):998–1004. doi:10.1016/j.humimm.2012.07.321
26. Dunker K, Schlaf G, Bukur J, Altermann WW, Handke D, Seliger B. Expression and regulation of non-classical HLA-G in renal cell carcinoma. Tissue Antigens (2008) 72(2):137–48. doi:10.1111/j.1399-0039.2008.01090.x
27. Kren L, Slaby O, Muckova K, Lzicarova E, Sova M, Vybihal V, et al. Expression of immune-modulatory molecules HLA-G and HLA-E by tumor cells in glioblastomas: an unexpected prognostic significance? Neuropathology (2011) 31(2):129–34. doi:10.1111/j.1440-1789.2010.01149.x
28. Akhter A, Faridi RM, Das V, Pandey A, Naik S, Agrawal S. In vitro up-regulation of HLA-G using dexamethasone and hydrocortisone in first-trimester trophoblast cells of women experiencing recurrent miscarriage. Tissue Antigens (2012) 80(2):126–35. doi:10.1111/j.1399-0039.2012.01884.x
29. Aldrich CL, Stephenson MD, Karrison T, Odem RR, Branch DW, Scott JR, et al. HLA-G genotypes and pregnancy outcome in couples with unexplained recurrent miscarriage. Mol Hum Reprod (2001) 7(12):1167–72. doi:10.1093/molehr/7.12.1167
30. Bhalla A, Stone PR, Liddell HS, Zanderigo A, Chamley LW. Comparison of the expression of human leukocyte antigen (HLA)-G and HLA-E in women with normal pregnancy and those with recurrent miscarriage. Reproduction (2006) 131(3):583–9. doi:10.1530/rep.1.00892
31. Christiansen OB, Kolte AM, Dahl M, Larsen EC, Steffensen R, Nielsen HS, et al. Maternal homozygocity for a 14 base pair insertion in exon 8 of the HLA-G gene and carriage of HLA class II alleles restricting HY immunity predispose to unexplained secondary recurrent miscarriage and low birth weight in children born to these patients. Hum Immunol (2012) 73(7):699–705. doi:10.1016/j.humimm.2012.04.014
32. Fan W, Li S, Huang Z, Chen Q. Relationship between HLA-G polymorphism and susceptibility to recurrent miscarriage: a meta-analysis of non-family-based studies. J Assist Reprod Genet (2014) 31(2):173–84. doi:10.1007/s10815-013-0155-2
33. Kolte AM, Steffensen R, Nielsen HS, Hviid TV, Christiansen OB. Study of the structure and impact of human leukocyte antigen (HLA)-G-A, HLA-G-B, and HLA-G-DRB1 haplotypes in families with recurrent miscarriage. Hum Immunol (2010) 71(5):482–8. doi:10.1016/j.humimm.2010.02.001
34. Vargas RG, Sarturi PR, Mattar SB, Bompeixe EP, Silva Jdos S, Pirri A, et al. Association of HLA-G alleles and 3’ UTR 14 bp haplotypes with recurrent miscarriage in Brazilian couples. Hum Immunol (2011) 72(6):479–85. doi:10.1016/j.humimm.2011.02.011
35. Zhu Y, Huo Z, Lai J, Li S, Jiao H, Dang J, et al. Case-control study of a HLA-G 14-bp insertion-deletion polymorphism in women with recurrent miscarriages. Scand J Immunol (2010) 71(1):52–4. doi:10.1111/j.1365-3083.2009.02348.x
36. Hviid TV, Hylenius S, Hoegh AM, Kruse C, Christiansen OB. HLA-G polymorphisms in couples with recurrent spontaneous abortions. Tissue Antigens (2002) 60(2):122–32. doi:10.1034/j.1399-0039.2002.600202.x
38. Larsen MH, Hylenius S, Andersen AM, Hviid TV. The 3’-untranslated region of the HLA-G gene in relation to pre-eclampsia: revisited. Tissue Antigens (2010) 75(3):253–61. doi:10.1111/j.1399-0039.2009.01435.x
39. Hylenius S, Andersen AM, Melbye M, Hviid TV. Association between HLA-G genotype and risk of pre-eclampsia: a case-control study using family triads. Mol Hum Reprod (2004) 10(4):237–46. doi:10.1093/molehr/gah035
40. Hviid TV, Hylenius S, Lindhard A, Christiansen OB. Association between human leukocyte antigen-G genotype and success of in vitro fertilization and pregnancy outcome. Tissue Antigens (2004) 64(1):66–9. doi:10.1111/j.1399-0039.2004.00239.x
41. Lin A, Yan WH, Dai MZ, Chen XJ, Li BL, Chen BG, et al. Maternal human leukocyte antigen-G polymorphism is not associated with pre-eclampsia in a Chinese Han population. Tissue Antigens (2006) 68(4):311–6. doi:10.1111/j.1399-0039.2006.00667.x
42. Loisel DA, Billstrand C, Murray K, Patterson K, Chaiworapongsa T, Romero R, et al. The maternal HLA-G 1597DeltaC null mutation is associated with increased risk of pre-eclampsia and reduced HLA-G expression during pregnancy in African-American women. Mol Hum Reprod (2013) 19(3):144–52. doi:10.1093/molehr/gas041
43. O’Brien M, McCarthy T, Jenkins D, Paul P, Dausset J, Carosella ED, et al. Altered HLA-G transcription in pre-eclampsia is associated with allele specific inheritance: possible role of the HLA-G gene in susceptibility to the disease. Cell Mol Life Sci (2001) 58(12–13):1943–9. doi:10.1007/PL00000828
44. Tan CY, Ho JF, Chong YS, Loganath A, Chan YH, Ravichandran J, et al. Paternal contribution of HLA-G*0106 significantly increases risk for pre-eclampsia in multigravid pregnancies. Mol Hum Reprod (2008) 14(5):317–24. doi:10.1093/molehr/gan013
46. Brenol CV, Veit TD, Chies JA, Xavier RM. The role of the HLA-G gene and molecule on the clinical expression of rheumatologic diseases. Rev Bras Reumatol (2012) 52(1):82–91. doi:10.1590/S0482-50042012000100009
47. Consiglio CR, Veit TD, Monticielo OA, Mucenic T, Xavier RM, Brenol JC, et al. Association of the HLA-G gene + 3142C > G polymorphism with systemic lupus erythematosus. Tissue Antigens (2011) 77(6):540–5. doi:10.1111/j.1399-0039.2011.01635.x
48. Veit TD, Cordero EA, Mucenic T, Monticielo OA, Brenol JC, Xavier RM, et al. Association of the HLA-G 14 bp polymorphism with systemic lupus erythematosus. Lupus (2009) 18(5):424–30. doi:10.1177/0961203308098187
49. Veit TD, de Lima CP, Cavalheiro LC, Callegari-Jacques SM, Brenol CV, Brenol JC, et al. HLA-G + 3142 polymorphism as a susceptibility marker in two rheumatoid arthritis populations in Brazil. Tissue Antigens (2014) 83(4):260–6. doi:10.1111/tan.12311
50. Veit TD, Vianna P, Scheibel I, Brenol CV, Brenol JC, Xavier RM, et al. Association of the HLA-G 14-bp insertion/deletion polymorphism with juvenile idiopathic arthritis and rheumatoid arthritis. Tissue Antigens (2008) 71(5):440–6. doi:10.1111/j.1399-0039.2008.01019.x
51. Rizzo R, Hviid TV, Govoni M, Padovan M, Rubini M, Melchiorri L, et al. HLA-G genotype and HLA-G expression in systemic lupus erythematosus: HLA-G as a putative susceptibility gene in systemic lupus erythematosus. Tissue Antigens (2008) 71(6):520–9. doi:10.1111/j.1399-0039.2008.01037.x
54. Wisniewski A, Bilinska M, Klimczak A, Wagner M, Majorczyk E, Nowak I, et al. Association of the HLA-G gene polymorphism with multiple sclerosis in a Polish population. Int J Immunogenet (2010) 37(4):307–11. doi:10.1111/j.1744-313X.2010.00926.x
55. Crispim JC, Mendes-Junior CT, Wastowski IJ, Costa R, Castelli EC, Saber LT, et al. Frequency of insertion/deletion polymorphism in exon 8 of HLA-G and kidney allograft outcome. Tissue Antigens (2008) 71(1):35–41. doi:10.1111/j.1399-0039.2007.00961.x
56. Mociornita AG, Lim-Shon J, Joseph JM, Ross HJ, Rao V, Delgado DH. Can HLA-G polymorphisms predict the development of cardiac allograft vasculopathy? Hum Immunol (2013) 74(4):464–7. doi:10.1016/j.humimm.2012.12.014
57. Twito T, Joseph J, Mociornita A, Rao V, Ross H, Delgado DH. The 14-bp deletion in the HLA-G gene indicates a low risk for acute cellular rejection in heart transplant recipients. J Heart Lung Transplant (2011) 30(7):778–82. doi:10.1016/j.healun.2011.01.726
58. Khosrotehrani K, Le Danff C, Reynaud-Mendel B, Dubertret L, Carosella ED, Aractingi S. HLA-G expression in atopic dermatitis. J Invest Dermatol (2001) 117(3):750–2. doi:10.1046/j.0022-202x.2001.01487.x
60. Aractingi S, Briand N, Le Danff C, Viguier M, Bachelez H, Michel L, et al. HLA-G and NK receptor are expressed in psoriatic skin: a possible pathway for regulating infiltrating T cells? Am J Pathol (2001) 159(1):71–7. doi:10.1016/S0002-9440(10)61675-6
61. Graebin P, Veit TD, Alho CS, Dias FS, Chies JA. Polymorphic variants in exon 8 at the 3’ UTR of the HLA-G gene are associated with septic shock in critically ill patients. Crit Care (2012) 16(5):R211. doi:10.1186/cc11845
62. Nakagawa S, Niimura Y, Gojobori T, Tanaka H, Miura K. Diversity of preferred nucleotide sequences around the translation initiation codon in eukaryote genomes. Nucleic Acids Res (2008) 36(3):861–71. doi:10.1093/nar/gkm1102
64. Ishitani A, Geraghty DE. Alternative splicing of HLA-G transcripts yields proteins with primary structures resembling both class I and class II antigens. Proc Natl Acad Sci U S A (1992) 89(9):3947–51. doi:10.1073/pnas.89.9.3947
65. Paul P, Cabestre FA, Ibrahim EC, Lefebvre S, Khalil-Daher I, Vazeux G, et al. Identification of HLA-G7 as a new splice variant of the HLA-G mRNA and expression of soluble HLA-G5, -G6, and -G7 transcripts in human transfected cells. Hum Immunol (2000) 61(11):1138–49. doi:10.1016/S0198-8859(00)00197-X
66. Genomes Project C, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, et al. An integrated map of genetic variation from 1,092 human genomes. Nature (2012) 491(7422):56–65. doi:10.1038/nature11632
68. Santos KE, Lima TH, Felicio LP, Massaro JD, Palomino GM, Silva AC, et al. Insights on the HLA-G evolutionary history provided by a nearby Alu insertion. Mol Biol Evol (2013) 30(11):2423–34. doi:10.1093/molbev/mst142
69. Sabbagh A, Luisi P, Castelli EC, Gineau L, Courtin D, Milet J, et al. Worldwide genetic variation at the 3’ untranslated region of the HLA-G gene: balancing selection influencing genetic diversity. Genes Immun (2014) 15(2):95–106. doi:10.1038/gene.2013.67
72. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res (2010) 20(9):1297–303. doi:10.1101/gr.107524.110
74. Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (2012) 6(2):80–92. doi:10.4161/fly.19695
76. Castelli EC, Mendes-Junior CT, Veiga-Castelli LC, Roger M, Moreau P, Donadi EA. A comprehensive study of polymorphic sites along the HLA-G gene: implication for gene regulation and evolution. Mol Biol Evol (2011) 28(11):3069–86. doi:10.1093/molbev/msr138
77. Castelli EC, Mendes-Junior CT, Veiga-Castelli LC, Pereira NF, Petzl-Erler ML, Donadi EA. Evaluation of computational methods for the reconstruction of HLA haplotypes. Tissue Antigens (2010) 76(6):459–66. doi:10.1111/j.1399-0039.2010.01539.x
79. Excoffier L, Lischer HE. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol Ecol Resour (2010) 10(3):564–7. doi:10.1111/j.1755-0998.2010.02847.x
83. Excoffier L, Smouse PE, Quattro JM. Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics (1992) 131(2):479–91.
85. Matte C, Lacaille J, Zijenah L, Ward B, Roger M, Group ZS. HLA-G exhibits low level of polymorphism in indigenous East Africans. Hum Immunol (2002) 63(6):495–501. doi:10.1016/S0198-8859(02)00391-9
88. Castelli EC, Mendes-Junior CT, Deghaide NHS, de Albuquerque RS, Muniz YCN, Simões RT, et al. The genetic structure of 3’untranslated region of the HLA-G gene: polymorphisms and haplotypes. Genes Immun (2010) 11(2):134–41. doi:10.1038/gene.2009.74
89. Mendes-Junior CT, Castelli EC, Meyer D, Simoes AL, Donadi EA. Genetic diversity of the HLA-G coding region in Amerindian populations from the Brazilian Amazon: a possible role of natural selection. Genes Immun (2013) 14(8):518–26. doi:10.1038/gene.2013.47
91. Abbas A, Tripathi P, Naik S, Agrawal S. Analysis of human leukocyte antigen (HLA)-G polymorphism in normal women and in women with recurrent spontaneous abortions. Eur J Immunogenet (2004) 31(6):275–8. doi:10.1111/j.1365-2370.2004.00487.x
92. Pirri A, Contieri FC, Benvenutti R, Bicalho Mda G. A study of HLA-G polymorphism and linkage disequilibrium in renal transplant patients and their donors. Transpl Immunol (2009) 20(3):143–9. doi:10.1016/j.trim.2008.09.012
93. Sipak-Szmigiel O, Cybulski C, Wokolorczyk D, Lubinski J, Kurzawa R, Baczkowski T, et al. HLA-G polymorphism and in vitro fertilization failure in a Polish population. Tissue Antigens (2009) 73(4):348–52. doi:10.1111/j.1399-0039.2008.01205.x
94. Sipak-Szmigiel O, Cybulski C, Lubinski J, Ronin-Walknowska E. HLA-G polymorphism in a Polish population and reproductive failure. Tissue Antigens (2008) 71(1):67–71. doi:10.1111/j.1399-0039.2007.00942.x
95. Ober C, Rosinsky B, Grimsley C, van der Ven K, Robertson A, Runge A. Population genetic studies of HLA-G: allele frequencies and linkage disequilibrium with HLA-A1. J Reprod Immunol (1996) 32(2):111–23. doi:10.1016/S0165-0378(96)01000-5
97. van der Ven K, Skrablin S, Ober C, Krebs D. HLA-G polymorphisms: ethnic differences and implications for potential molecule function. Am J Reprod Immunol (1998) 40(3):145–57. doi:10.1111/j.1600-0897.1998.tb00406.x
98. Hviid TV, Hylenius S, Rorbye C, Nielsen LG. HLA-G allelic variants are associated with differences in the HLA-G mRNA isoform profile and HLA-G mRNA levels. Immunogenetics (2003) 55(2):63–79. doi:10.1007/s00251-003-0547-z
101. Yan WH, Fan LA, Yang JQ, Xu LD, Ge Y, Yao FJ. HLA-G polymorphism in a Chinese Han population with recurrent spontaneous abortion. Int J Immunogenet (2006) 33(1):55–8. doi:10.1111/j.1744-313X.2006.00567.x
102. Yan WH, Lin A, Chen XJ, Dai MZ, Gan LH, Zhou MY, et al. Association of the maternal 14-bp insertion polymorphism in the HLA-G gene in women with recurrent spontaneous abortions. Tissue Antigens (2006) 68(6):521–3. doi:10.1111/j.1399-0039.2006.00723.x
104. Metcalfe S, Roger M, Faucher MC, Coutlee F, Franco EL, Brassard P. The association between human leukocyte antigen (HLA)-G polymorphisms and human papillomavirus (HPV) infection in Inuit women of northern Quebec. Hum Immunol (2013) 74(12):1610–5. doi:10.1016/j.humimm.2013.08.279
105. Alvarez M, Piedade J, Balseiro S, Ribas G, Regateiro F. HLA-G 3’-UTR SNP and 14-bp deletion polymorphisms in Portuguese and Guinea-Bissau populations. Int J Immunogenet (2009) 36(6):361–6. doi:10.1111/j.1744-313X.2009.00875.x
106. Yie SM, Li LH, Xiao R, Librach CL. A single base-pair mutation in the 3’-untranslated region of HLA-G mRNA is associated with pre-eclampsia. Mol Hum Reprod (2008) 14(11):649–53. doi:10.1093/molehr/gan059
107. Sizzano F, Testi M, Zito L, Crocchiolo R, Troiano M, Mazzi B, et al. Genotypes and haplotypes in the 3’ untranslated region of the HLA-G gene and their association with clinical outcome of hematopoietic stem cell transplantation for beta-thalassemia. Tissue Antigens (2012) 79(5):326–32. doi:10.1111/j.1399-0039.2012.01862.x
108. Sabbagh A, Courtin D, Milet J, Massaro JD, Castelli EC, Migot-Nabias F, et al. Association of HLA-G 3’ untranslated region polymorphisms with antibody response against Plasmodium falciparum antigens: preliminary results. Tissue Antigens (2013) 82(1):53–8. doi:10.1111/tan.12140
109. Martelli-Palomino G, Pancotto JA, Muniz YC, Mendes-Junior CT, Castelli EC, Massaro JD, et al. Polymorphic sites at the 3’ untranslated region of the HLA-G gene are associated with differential HLA-G soluble levels in the Brazilian and French population. PLoS One (2013) 8(10):e71742. doi:10.1371/journal.pone.0071742
110. Lucena-Silva N, Monteiro AR, de Albuquerque RS, Gomes RG, Mendes-Junior CT, Castelli EC, et al. Haplotype frequencies based on eight polymorphic sites at the 3’ untranslated region of the HLA-G gene in individuals from two different geographical regions of Brazil. Tissue Antigens (2012) 79(4):272–8. doi:10.1111/j.1399-0039.2012.01842.x
111. Lucena-Silva N, de Souza VS, Gomes RG, Fantinatti A, Muniz YC, de Albuquerque RS, et al. HLA-G 3’ untranslated region polymorphisms are associated with systemic lupus erythematosus in 2 Brazilian populations. J Rheumatol (2013) 40(7):1104–13. doi:10.3899/jrheum.120814
112. Larsen MH, Zinyama R, Kallestrup P, Gerstoft J, Gomo E, Thorner LW, et al. HLA-G 3’ untranslated region 14-base pair deletion: association with poor survival in an HIV-1-infected Zimbabwean population. J Infect Dis (2013) 207(6):903–6. doi:10.1093/infdis/jis924
113. Hviid TV, Rizzo R, Melchiorri L, Stignani M, Baricordi OR. Polymorphism in the 5’ upstream regulatory and 3’ untranslated regions of the HLA-G gene in relation to soluble HLA-G and IL-10 expression. Hum Immunol (2006) 67(1–2):53–62. doi:10.1016/j.humimm.2005.12.003
114. Ciliao Alves DC, de Oliveira Crispim JC, Castelli EC, Mendes-Junior CT, Deghaide NH, Barros Silva GE, et al. Human leukocyte antigen-G 3’ untranslated region polymorphisms are associated with better kidney allograft acceptance. Hum Immunol (2012) 73(1):52–9. doi:10.1016/j.humimm.2011.10.007
115. Mendes-Junior CT, Castelli EC, Simoes RT, Simoes AL, Donadi EA. HLA-G 14-bp polymorphism at exon 8 in Amerindian populations from the Brazilian Amazon. Tissue Antigens (2007) 69(3):255–60. doi:10.1111/j.1399-0039.2006.00797.x
116. Garcia A, Milet J, Courtin D, Sabbagh A, Massaro JD, Castelli EC, et al. Association of HLA-G 3’UTR polymorphisms with response to malaria infection: a first insight. Infect Genet Evol (2013) 16:263–9. doi:10.1016/j.meegid.2013.02.021
117. Courtin D, Milet J, Sabbagh A, Massaro JD, Castelli EC, Jamonneau V, et al. HLA-G 3’ UTR-2 haplotype is associated with Human African trypanosomiasis susceptibility. Infect Genet Evol (2013) 17:1–7. doi:10.1016/j.meegid.2013.03.004
119. Ober C, Aldrich CL, Chervoneva I, Billstrand C, Rahimov F, Gray HL, et al. Variation in the HLA-G promoter region influences miscarriage rates. Am J Hum Genet (2003) 72(6):1425–35. doi:10.1086/375501
121. Veit TD, Cazarolli J, Salzano FM, Schiengold M, Chies JA. New evidence for balancing selection at the HLA-G locus in South Amerindians. Genet Mol Biol (2012) 35(4 Suppl):919–23. doi:10.1590/S1415-47572012000600005
124. Sachidanandam R, Weissman D, Schmidt SC, Kakol JM, Stein LD, Marth G, et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature (2001) 409(6822):928–33. doi:10.1038/35057149
125. Vogel TU, Evans DT, Urvater JA, O’Connor DH, Hughes AL, Watkins DI. Major histocompatibility complex class I genes in primates: co-evolution with pathogens. Immunol Rev (1999) 167:327–37. doi:10.1111/j.1600-065X.1999.tb01402.x
127. Watkins DI, Chen ZW, Hughes AL, Evans MG, Tedder TF, Letvin NL. Evolution of the MHC class I genes of a new world primate from ancestral homologues of human non-classical genes. Nature (1990) 346(6279):60–3. doi:10.1038/346060a0
130. Arnaiz-Villena A, Morales P, Gomez-Casado E, Castro MJ, Varela P, Rojo-Amigo R, et al. Evolution of MHC-G in primates: a different kind of molecule for each group of species. J Reprod Immunol (1999) 43(2):111–25. doi:10.1016/S0165-0378(99)00026-1
131. Parga-Lozano C, Reguera R, Gomez-Prieto P, Arnaiz-Villena A. Evolution of major histocompatibility complex G and C and natural killer receptors in primates. Hum Immunol (2009) 70(12):1035–40. doi:10.1016/j.humimm.2009.07.017
133. Castro MJ, Morales P, Fernandez-Soria V, Suarez B, Recio MJ, Alvarez M, et al. Allelic diversity at the primate Mhc-G locus: exon 3 bears stop codons in all Cercopithecinae sequences. Immunogenetics (1996) 43(6):327–36. doi:10.1007/BF02199801
134. Castro MJ, Morales P, Martinez-Laso J, Allende L, Rojo-Amigo R, Gonzalez-Hevilla M, et al. Evolution of MHC-G in humans and primates based on three new 3’UT polymorphisms. Hum Immunol (2000) 61(11):1157–63. doi:10.1016/S0198-8859(00)00188-9
135. van den Elsen PJ, Gobin SJ, van Eggermond MC, Peijnenburg A. Regulation of MHC class I and II gene transcription: differences and similarities. Immunogenetics (1998) 48(3):208–21. doi:10.1007/s002510050425
137. Gobin SJ, Keijsers V, van Zutphen M, van den Elsen PJ. The role of enhancer A in the locus-specific transactivation of classical and non-classical HLA class I genes by nuclear factor kappa B. J Immunol (1998) 161(5):2276–83.
138. Gobin SJ, van Zutphen M, Woltman AM, van den Elsen PJ. Transactivation of classical and non-classical HLA class I genes through the IFN-stimulated response element. J Immunol (1999) 163(3):1428–34.
140. Steimle V, Durand B, Barras E, Zufferey M, Hadam MR, Mach B, et al. A novel DNA-binding regulatory factor is mutated in primary MHC class II deficiency (bare lymphocyte syndrome). Genes Dev (1995) 9(9):1021–32. doi:10.1101/gad.9.9.1021
141. Durand B, Sperisen P, Emery P, Barras E, Zufferey M, Mach B, et al. RFXAP, a novel subunit of the RFX DNA binding complex is mutated in MHC class II deficiency. EMBO J (1997) 16(5):1045–55. doi:10.1093/emboj/16.5.1045
142. Gobin SJ, Biesta P, de Steenwinkel JE, Datema G, van den Elsen PJ. HLA-G transactivation by cAMP-response element-binding protein (CREB). An alternative transactivation pathway to the conserved major histocompatibility complex (MHC) class I regulatory routes. J Biol Chem (2002) 277(42):39525–31. doi:10.1074/jbc.M112273200
143. Gobin SJ, Peijnenburg A, Keijsers V, van den Elsen PJ. Site alpha is crucial for two routes of IFN gamma-induced MHC class I transactivation: the ISRE-mediated route and a novel pathway involving CIITA. Immunity (1997) 6(5):601–11. doi:10.1016/S1074-7613(00)80348-9
144. Lefebvre S, Moreau P, Dausset J, Carosella ED, Paul P. Downregulation of HLA class I gene transcription in choriocarcinoma cells is controlled by the proximal promoter element and can be reversed by CIITA. Placenta (1999) 20(4):293–301. doi:10.1053/plac.1998.0380
145. Rousseau P, Masternak K, Krawczyk M, Reith W, Dausset J, Carosella ED, et al. In vivo, RFX5 binds differently to the human leucocyte antigen-E, -F, and -G gene promoters and participates in HLA class I protein expression in a cell type-dependent manner. Immunology (2004) 111(1):53–65. doi:10.1111/j.1365-2567.2004.01783.x
147. Ibrahim EC, Morange M, Dausset J, Carosella ED, Paul P. Heat shock and arsenite induce expression of the non-classical class I histocompatibility HLA-G gene in tumor cell lines. Cell Stress Chaperones (2000) 5(3):207–18. doi:10.1379/1466-1268(2000)005<0207:HSAAIE>2.0.CO;2
149. Bamberger AM, Jenatschke S, Schulte HM, Loning T, Bamberger MC. Leukemia inhibitory factor (LIF) stimulates the human HLA-G promoter in JEG3 choriocarcinoma cells. J Clin Endocrinol Metab (2000) 85(10):3932–6. doi:10.1210/jcem.85.10.6849
150. Rizzo R, Rubini M, Govoni M, Padovan M, Melchiorri L, Stignani M, et al. HLA-G 14-bp polymorphism regulates the methotrexate response in rheumatoid arthritis. Pharmacogenet Genomics (2006) 16(9):615–23. doi:10.1097/01.fpc.0000230115.41828.3a
151. Shido F, Ito T, Nomura S, Yamamoto E, Sumigama S, Ino K, et al. Endoplasmic reticulum aminopeptidase-1 mediates leukemia inhibitory factor-induced cell surface human leukocyte antigen-G expression in JEG-3 choriocarcinoma cells. Endocrinology (2006) 147(4):1780–8. doi:10.1210/en.2005-1449
152. Onno M, Amiot L, Bertho N, Drenou B, Fauchet R. CpG methylation patterns in the 5’ part of the non-classical HLA-G gene in peripheral blood CD34 + cells and CD2 + lymphocytes. Tissue Antigens (1997) 49(4):356–64. doi:10.1111/j.1399-0039.1997.tb02763.x
153. Moreau P, Mouillot G, Rousseau P, Marcou C, Dausset J, Carosella ED. HLA-G gene repression is reversed by demethylation. Proc Natl Acad Sci U S A (2003) 100(3):1191–6. doi:10.1073/pnas.0337539100
155. Hviid TV, Rizzo R, Christiansen OB, Melchiorri L, Lindhard A, Baricordi OR. HLA-G and IL-10 in serum in relation to HLA-G genotype and polymorphisms. Immunogenetics (2004) 56(3):135–41. doi:10.1007/s00251-004-0673-2
156. Svendsen SG, Hantash BM, Zhao L, Faber C, Bzorek M, Nissen MH, et al. The expression and functional activity of membrane-bound human leukocyte antigen-G1 are influenced by the 3’-untranslated region. Hum Immunol (2013) 74(7):818–27. doi:10.1016/j.humimm.2013.03.003
157. Rousseau P, Le Discorde M, Mouillot G, Marcou C, Carosella ED, Moreau P. The 14 bp deletion-insertion polymorphism in the 3’ UT region of the HLA-G gene influences HLA-G mRNA stability. Hum Immunol (2003) 64(11):1005–10. doi:10.1016/j.humimm.2003.08.347
158. Tan Z, Randall G, Fan J, Camoretti-Mercado B, Brockman-Schneider R, Pan L, et al. Allele-specific targeting of microRNAs to HLA-G and risk of asthma. Am J Hum Genet (2007) 81(4):829–34. doi:10.1086/521200
159. Castelli EC, Moreau P, Oya e Chiromatzo A, Mendes-Junior CT, Veiga-Castelli LC, Yaghi L, et al. In silico analysis of microRNAS targeting the HLA-G 3’ untranslated region alleles and haplotypes. Hum Immunol (2009) 70(12):1020–5. doi:10.1016/j.humimm.2009.07.028
Keywords: HLA-G, haplotypes, polymorphisms, variability, gene structure and diversity, non-classical HLA, 1000Genomes Project, selective pressure
Citation: Castelli EC, Ramalho J, Porto IOP, Lima THA, Felício LP, Sabbagh A, Donadi EA and Mendes-Junior CT (2014) Insights into HLA-G genetics provided by worldwide haplotype diversity. Front. Immunol. 5:476. doi: 10.3389/fimmu.2014.00476
Received: 29 July 2014; Accepted: 18 September 2014;
Published online: 06 October 2014.
Edited by:Silvia Gregori, San Raffaele Telethon Institute for Gene Therapy, Italy
Reviewed by:Roberto Biassoni, Istituto Giannina Gaslini, Italy
Thomas Vauvert Hviid, Copenhagen University Hospital Roskilde, Denmark
Copyright: © 2014 Castelli, Ramalho, Porto, Lima, Felício, Sabbagh, Donadi and Mendes-Junior. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Erick C. Castelli, Faculdade de Medicina de Botucatu, Departamento de Patologia, Univ Estadual Paulista, Botucatu, São Paulo 18618-970, Brazil e-mail: email@example.com