Edited by: Efstratios Stratikos, National and Kapodistrian University of Athens, Greece
Reviewed by: Athanasios Papakyriakou, National Centre of Scientific Research Demokritos, Greece; Lawrence J Stern, University of Massachusetts Medical School, United States
*Correspondence: Frans Bianchi,
This article was submitted to Antigen Presenting Cell Biology, a section of the journal Frontiers in Immunology
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Cytolytic T cell responses are predicted to be biased towards membrane proteins. The peptide-binding grooves of most alleles of histocompatibility complex class I (MHC-I) are relatively hydrophobic, therefore peptide fragments derived from human transmembrane helices (TMHs) are predicted to be presented more often as would be expected based on their abundance in the proteome. However, the physiological reason of why membrane proteins might be over-presented is unclear. In this study, we show that the predicted over-presentation of TMH-derived peptides is general, as it is predicted for bacteria and viruses and for both MHC-I and MHC-II, and confirmed by re-analysis of epitope databases. Moreover, we show that TMHs are evolutionarily more conserved, because single nucleotide polymorphisms (SNPs) are present relatively less frequently in TMH-coding chromosomal regions compared to regions coding for extracellular and cytoplasmic protein regions. Thus, our findings suggest that both cytolytic and helper T cells are more tuned to respond to membrane proteins, because these are evolutionary more conserved. We speculate that TMHs are less prone to mutations that enable pathogens to evade T cell responses.
Our immune system fights diseases and infections from pathogens, such as fungi, bacteria or viruses. An important part of the acquired immune response, that develops specialized and more specific recognition of pathogens than the innate immune response, are T cells which recognize peptides, called epitopes, derived from antigenic proteins presented on Major Histocompatibility Complexes (MHC) class I and II on the cell surface.
The MHC proteins are heterodimeric complexes encoded by the HLA (Human Leukocyte Antigens) genes. In humans, the peptide binding groove of MHC-I is made by only the alpha subunit. There are three classical alleles of MHC-I, hallmarked by a highly polymorphic alpha chain called HLA-A, HLA-B and HLA-C, that all present epitopes to cytolytic T cells. For MHC-II, both the alpha and the beta chains contribute to the peptide binding groove. There are three classical alleles of MHC-II as well, called HLA-DR, HLA-DQ and HLA-DP, that all present epitopes to helper T cells. Each MHC complex can present a subset of all possible peptides. For example, HLA-A and HLA-B have no overlap in which epitopes they bind (
Humans express a limited set of MHC alleles and therefore an individual’s immune system detects only a fraction of all possible peptide fragments. However, at the population level, the coverage of pathogenic peptides that are detected is very high, because of the highly polymorphic MHC genes. It is therefore believed that MHC polymorphism improves immunity at the population level, as mutations in a protein that disrupt a particular MHC presentation at the individual level, so-called escape mutations, will not affect MHC presentation for all alleles present in the population (
Many studies are aimed at identifying the repertoire of epitopes that are presented in any of the different alleles to determine which epitopes will result in an immune response, as this will for instance aid the design of vaccines. These studies have led to the development of prediction algorithms that allow for very reliable
Using these prediction algorithms, we recently showed that peptides derived from transmembrane helices (TMHs) are likely to be more frequently presented by MHC-I than expected based on their abundance (
TMHs are hydrophobic as they need to span the hydrophobic lipid bilayer of cellular membranes. They consist of an alpha helix of, on average, 23 amino acids in length. TMHs can also be predicted with high accuracy from a protein sequence by bioinformatics approaches (
This study had two objectives. First, we aimed to generalize our findings by predicting the antigenic presentation from different kingdoms of life in both MHC-I and -II. From these
To predict how frequently epitopes overlapping with TMHs are presented, a similar analysis strategy was applied as described in (
For MHC-I, 9-mers were used, as this is the length most frequently presented in MHC-I and was used in our earlier study (
We define a protein to be a binder if, for a certain MHC allele, any of its 9-mer or 14-mer peptides have an IC50 value in the lowest 2% of all peptides within a
To obtain experimental confirmation that peptides stemming from TMHs are presented by MHC-I and MHC-II, we mined the IEDB (
The full analysis can be found at (
To determine the evolutionary conservation of TMHs, we first collected human single nucleotide polymorphisms (SNPs) resulting in a single amino acid substitution and determined if this occurred within a predicted TMH or not.
As a data source, multiple NCBI (
The first query was a call to the
The number of SNPs was limited to the first 250 variations per gene, resulting in ≈ 61k variations. Only variations that result in a SNP for a single amino acid substitution were analyzed, resulting in ≈ 38k SNPs. The exact amounts can be found in the supplementary materials,
SNPs were picked based on ID number, which is linked to their discovery date. To verify that these ID numbers are unrelated to SNP positions, the relative positions of all analyzed SNPs in a protein were determined. This analysis showed no positional bias of the SNPs, as shown in
Per SNP, the
Over-presentation of TMH-derived epitopes on most MHC-I and -II alleles
We next wondered if the over-representation of TMH-derived peptides would also be present for MHC-II.
For MHC-I, we previously showed that the over-presentation of TMH-derived peptides is caused by the hydrophobicity of the peptide binding grooves (
The Immune Epitope Database (IEDB) from the National Institutes of Health contains millions of linear epitope sequences obtained by MHC ligand assays. For the MHC alleles used in this study, we obtained 54,303 and 2,484 linear epitope sequences for the MHC-I and MHC-II alleles from human origin respectively. There are relatively few epitopes for MHC-II, as MHC-II has many more different alleles than MHC-I, whereas we selected only the human epitopes found for the 21 MHC-II alleles used in this study.
Analysis of epitope database shows that TMH derived epitopes are over-presented. The percentage of epitopes for MHC-I and -II alleles that overlap with TMHs that are presented. The pair of horizontal red lines in each plot indicate the lower and upper bound of the 99% confidence interval. Note that only one line is visible as this interval is relatively narrow. Alleles are listed in
In
These findings robustly confirm that epitopes derived from human TMHs are presented in both MHC-I and MHC-II, and support that they are over-presented. See the
We also mined the IEDB database for epitopes for any type of T cell response from the specified alleles. From the total reports, 36% and 7% concerned TMH-derived epitopes in MHC class I and II, respectively (see
This data confirms that not only TMH derived epitopes are presented on MHC, but this also elicits T cell mediated immune responses.
We addressed the question whether there is an evolutionary advantage in presenting TMHs. We determined the conservation of TMHs by comparing the occurrences of SNPs located in TMHs or soluble protein regions for the genes coding for membrane proteins. We obtained 911 unique gene names associated with the phrase ‘membrane protein’, which are genes coding for both membrane-associated proteins (MAPs, which have no TMH) and transmembrane proteins (TMPs, which have at least one TMH). These genes are linked to 4,780 protein isoforms, of which 2,553 are predicted to be TMPs and 2,237 proteins are predicted to be MAPs. We obtained 37,630 unique variations, of which 9,621 are SNPs that resulted in a straightforward amino acids substitution, of which 6,062 were located in predicted TMPs. See supplementary
Per protein, we calculated two percentages: (1) the percentage of a protein sequence length bearing TMHs, and (2) the percentage of SNPs located within these predicted TMHs. Each percentage pair was plotted in
Evolutionary conservation of human TMHs.
We determined the probability to find the observed amount of SNPs in TMHs by chance, i.e., when assuming SNPs occur just as likely in soluble domains as in TMHs. We used a binomial Poisson distribution, where the number of trails (
We split this analysis for TMPs containing only a single TMH (so-called single-membrane spanners) and TMPs containing multiple TMHs (multi-membrane spanners). We hypothesized that single-membrane spanners are less conserved than multi-membrane spanners, because multi-membrane spanners might have protein-protein interactions between their TMHs, for example to accommodate active sites, and thus might have additional structural constraints. From the split data, we did the same analysis as for the total TMPs.
| Membrane proteins with multiple TMHs are evolutionary more conserved than proteins with only a single TMH.
We also determined the probability to find the observed amount of SNPs by chance in single- and multi-spanners. For single-spanners, we found 452 SNPs in TMH, where ≈ 462 were expected by chance. The chance to observe this or a lower number by chance is 0.319. As this chance was higher than our
Also, for single- and multi-spanners, we determined the relevance of this finding by calculating where and how much less SNPs are found in TMHs when compared to soluble regions, as depicted in
Epitope prediction is important to understand the immune system function and for the design of vaccines. In this study, we provide evidence that epitopes derived from TMHs are a major source of MHC epitopes. Our bioinformatics predictions indicate that the TMH-derived epitope repertoire is larger than expected by chance for both MHC-I and MHC-II, regardless of the organism. Moreover, reanalysis of MHC-ligands from the IEDB database confirmed the presentation of TMH-derived epitopes. Therefore, it seems likely that TMH-derived epitopes would also result in enhanced T cell responses, although the conservation of TMHs might promote the deletion of T cells responsive to TMH-derived epitopes by central tolerance mechanisms. Finally, our SNP analysis shows that TMHs are evolutionary more conserved than solvent-exposed protein regions.
Although our data show that TMH-derived epitopes are presented in all classical MHC-I and MHC-II alleles, the molecular mechanisms of how integral membrane proteins are processed for MHC presentation are largely unknown (
A first possibility is that the extraction of TMPs from the membrane is mediated by the ER-associated degradation (ERAD) machinery. For MHC class I (MHC-I) antigen presentation of soluble proteins, the loading of the epitope primarily occurs at the endoplasmatic reticulum (ER). The chaperones tapasin (TAPBP), ERp57 (PDIA3), and calreticulin (CALR) (
A second possibility is that TMPs are proteolytically processed by intramembrane proteases that cleave TMHs while they are still membrane embedded. Supporting this hypothesis is the well-established notion that peptides generated by signal peptide peptidases (SPPs), an important class of intramembrane proteases that cleave TMH-like signal sequences, are presented on a specialized class of MHC-I called HLA-E (
A third possibility is that peptide processing and MHC-loading occur in multivesicular bodies (MVBs) (
Alternatively to the enzymatic degradation of lipids in MVBs by lipases (
In general, one might expect that evolutionary selection shapes an immune system where surveillance is directed towards protein regions essential for the survival, proliferation and/or virulence or pathogenic microbes, as these will be most conserved. In SARS-CoV-2, for example, there is preliminary evidence that the strongest selection pressure is directed upon residues that change its virulence (
Evolutionary selection of pathogens by a host’s immune system, however, is more likely to occur for protein patterns that are general, over patterns that are rare. While essential catalytic sites in a pathogenic proteome might be relatively rare, TMHs are common and thus might be a more feasible target for evolution to respond to. Indeed, we have found the signature of evolution when both factors, that is, TMHs and catalytic sites are likely to co-occur, which is in TMPs that span the membrane at least twice. In contrast to single-spanners, where we found no significant evolutionary conservation, the TMHs of multi-spanners are more evolutionary conserved than soluble protein regions. Likely, the TMHs in many multi-spanners need to interact which each other for correct protein structure and function and they might hence be more structurally constrained compared to the TMHs of single-spanners. Thus, we speculate that the human immune system is more attentive towards TMHs in multi-spanners, as these are evolutionarily more conserved.
There have been more efforts to assess the conservation of TMHs, using different methodologies. One such example is a study by Stevens and Arkin (
Together, from this study, two important conclusions can be drawn. First, the MHC over-presentation of TMHs is likely a general feature and predicted to occur for most alleles of both MHC-I and -II and for humans as well as bacterial and viral pathogens. Second, TMHs are genuinely more evolutionary conserved than soluble protein motifs, at least in the human proteome.
Publicly available datasets were analyzed in this study. This data can be found here:
RB and FB conceived the idea for this research. MB helped with the proteome analysis of
FB is funded by a Veni grant from the Netherlands Organization for Scientific Research (016.Veni.192.026) and an Off-Road Grant from the Dutch Medical Science Foundation (ZonMW 04510011910005). GB is funded by a Young Investigator Grant from the Human Frontier Science Program (HFSP; RGY0080/2018), and a Vidi grant from the Netherlands Organization for Scientific Research (NWO-ALW VIDI 864.14.001). GB has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 862137).
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
We thank the Center for Information Technology of the University of Groningen for its support and for providing access to the Peregrine high performance computing cluster.
The Supplementary Material for this article can be found online at: