Evolutionary trade-offs constraining the MHC gene expansion: beyond simple TCR depletion model

The immune system is as much shaped by the pressure of pathogens as it is by evolutionary trade-offs that constrain its structure and function. A perfect example comes from the major histocompatibility complex (MHC), molecules that initiate adaptive immune response by presentation of foreign antigens to T cells. The remarkable, population-level polymorphism of MHC genes is assumed to result mainly from a co-evolutionary arms race between hosts and pathogens, while the limited, within-individual number of functional MHC loci is thought to be the consequence of an evolutionary trade-off between enhanced pathogen recognition and excessive T cell depletion during negative selection in the thymus. Certain mathematical models and infection studies suggest that an intermediate individual MHC diversity would thus be optimal. A recent, more direct test of this hypothesis has shown that the effects of MHC diversity on T-cell receptor (TCR) repertoires may differ between MHC classes, supporting the depletion model only for MHC class I. Here, we used the bank vole (Myodes=Cletronomys glareolus), a rodent species with variable numbers of expressed MHC genes, to test how an individual MHC diversity influences the proportions and TCR repertoires of responding T cell subsets. We found a non-linear relationship between MHC diversity and T cell proportions (with intermediate MHC numbers coinciding with the largest T cell proportions), perhaps reflecting an optimality effect of balanced positive and negative thymic selection. The association was strongest for the relationship between MHC class I and splenic CD8+ T cells. The CD8+ TCR richness alone was unaffected by MHC class I diversity, suggesting that MHC class I expansion may be limited by decreasing T cell counts, rather than by direct depletion of TCR richness. In contrast, CD4+ TCR richness was positively correlated with MHC class II diversity, arguing against a universal TCR depletion. It also suggests that different evolutionary forces or trade-offs may limit the within-individual expansion of the MHC class II loci.


1
Supplementary Methods

RNA extraction from splenocyte lysates and liver fragments
Aliquots #3 (ca.1×10 6 ) of spleen cells (see Tissue processing section, main Materials and Methods), were lysed by gentle pipetting with 0.5 ml of RNAzol (Sigma-Aldrich) on the day of collection and preserved until further processing at -20 • C. Upon collection, liver fragments have been kept at 4 • C for tissue saturation with RNAlater (approx.8h) and subsequently stored at -20 • C. Prior to RNA extraction, liver fragments were removed from RNAlater and homogenized in RNAzol, with either Bead Ruptor Elite, OMNI International (3-4 DEPC-treated 2.8mm beads, 3.1 m/s, 3×10s and 10s dwell time) or Bio-Gen PRO200 Homogenizer with DEPC-treated probes (2-3×10s and 10s dwell time).Subsequently, total RNA from spleens and liver cells/tissues was extracted using standard RNAzol protocol and stored at -80 • C. Both methods of homogenization yielded pure (as assessed by the Nanodrop measurements) and high quality (as assessed by gel electrophoresis) RNA.

Cell staining for a detailed cytometric analysis and gating strategy
Cells were first stained with LIVE/DEAD™ Fixable Near IR (780) Viability Kit (Invitrogen, cat# L34992, 1:1000).The staining was carried out at room temperature (RT), for 20 min (in 1ml, in conical tubes), and was followed by a wash with 10ml of PBS with 2% FBS.Subsequent staining with mAbs was performed in 96-well U bottom plates.First, ca.0.5×10 6 cells/well were kept for 15 min in 100μl PBS with 2% FBS to block nonspecific binding, after which FITC-conjugated anti-CD4 mAb (clone GK1.5, eBioscience™, 1:100) was added to cells.Cells were stained for 30 min on ice and washed twice with PBS.Next, fixation/permeabilization was performed with Foxp3/Transcription Factor Staining Buffer Set (eBioscience™) according to the manufacturer's instructions.Briefly, cells were fixed in 200μl fixation/permeabilization working solution for 30 min at RT and then washed twice with working solution of permeabilization buffer.Washed cells were resuspended in permeabilization buffer, supplemented with 2% normal mouse and rat sera (eBioscience™) and blocked for 15 minutes at RT.After blocking, PE-conjugated anti-Foxp3 mAb (clone FJK-16s, eBioscience™, 1:20) and Alexa Fluor 647-conjugated anti CD3 mAb (clone CD3-12, Bio-Rad, 1:100) were added directly to cells and incubated for 40 min, at RT. Afterwards, cells were washes twice with permeabilization buffer and resuspended in PBS with 0.5% FBS.
Cell were analyzed on CytoFLEX cytometer.10 4 events per sample were recorded and analyzed with CytExpert Acquisition and Analysis Software (Beckman Coulter, United States).Lymphocytes were gated according to forward and side scatter patterns, then live cells were gated based on LIVE/DEAD staining, and doublets were excluded based on FSC-A vs FSC-H pattern.Further gating of lymphocyte populations was established using FMO and non-stained control samples (1).Mean values from three technical replicates (Supplementary Dataset 2) of the following gates/quadrats (Figure S1) were used for the subsequent data analyses: -Proportion of CD4+ T cells (CD3+CD4+) in lymphocytesplot #4, upper right quadrant -Proportion of CD8+ T cells (CD3+CD4-) in lymphocytesplot #4, upper left quadrant -Proportion of regulatory Foxp3+ T cells in CD4+ T cellsplot #5, upper right quadrant Moreover, for an additional analysis that looked into proportion of CD4+Foxp3+ T cells in all lymphocytes (rather than in CD4+ T cells alone), mean values from the upper right quadrant of the Figure S7 were taken.Raw values from the three replicates are deposited in an OSF repository, doi: https://osf.io/6wjrb/.

MHC amplicon library preparation
Each PCR reaction contained 1 μl of transcribed cDNA (see Main Methods), 1.25 μM of degenerate primer/0.75μM of non-degenerate primer and HotStart MasterMix (QIAGEN).Moreover, each primer contained a 6-bp tag; a unique combination of forward and reverse tags allowed multiplexing of up to 72 samples (per primer pair) into one pool.PCR cycling conditions were: 95°C for 15 min, followed by 35 cycles at 95°C for 30 s, 55°C for 60 s, 72°C for 25 s, and a final extension at 72 °C for 10 min.Aliquots of PCR products were run on agarose gel to confirm amplification and then pooled in approximately equimolar quantities.Pools were separated on 1.5% agarose gel and appropriate bands were excised with Zymoclean™ Gel DNA Recovery Kit (Zymo Research).Platform-specific adaptors and additional indices marking separate pools were ligated with NEBNext Ultra™ II DNA Library Prep Kit for Illumina (New England BioLabs) with a PCR-free protocol provided by the manufacturer.

Details on MHC genotyping
MHC genotyping for each of the four markers (that is, regions amplified by a primer pair, i.e., MHC class I exon 3, MHC class II exon 2 of DQA, DQB and DRB) was based on cDNAs from two independent RNA extractions: R1 and R2.For each individual, R1 was an extract from spleen cells (i.e., aliquot #3 in the Tissue processing section in the main Methods) or from a liver fragment (when the amount of splenocytes did not suffice for a successful RNA extraction); R2 was an extract from a liver fragment (another liver fragment, if R1 was also liver-based).Subsequently, each of the four markers was amplified in four PCR reactionstwo using R1 material as a template (i.e., R1A, R1B), and two using R2 as a template (i.e., R2A, R2B).
Upon HTS sequencing, all amplicons were processed with tools from the AmpliSAT suite (http://evobiolab.biol.amu.edu.pl/amplisat/index.php).First, sequencing reads were merged with AmpliMERGE and then de-multiplexed, clustered and filtered with AmpliSAS (2).We used the following AmpliSAS parameters for both MHC classes: MHC class I genes: for clusteringsubstitution errors (%) = 1; minimum dominant frequency (%) = 25; for filtering: Minimum variant depth: 3 reads (no minimum amplicon frequency threshold); discard chimeras = YES, discard frameshifts = YES.For each marker AmpiSAS outputs from the four PCRs were juxtaposed and compared (SI_Intermediate_genotyping_files.xlsx,OSF repository doi: https://osf.io/9yhfd/).Use of two independent genetic material extracts and total of four PCR replicates allowed us to accept variants with low read numbers and per amplicon frequencies (as long as they were present in two out of four amplicons), while controlling for possible cross contamination.Furthermore, for MHC class I, additional restrictions were added when calling variants for the final genotypes (SI_Features_MHCI.xlsx,OSF doi: https://osf.io/9yhfd/).This was done to exclude possible non-classical, MHC Ib genes, which code for molecules characterized by tissue-specific expression, low polymorphism and specialized functions other than presentation of peptides to cytotoxic CD8+ T cells.Although some MHC Ib molecules supposedly take part in thymic selection and present some antigens to innate-like T cells with restricted TCR diversity (3), tissue-specific expression could cause systematic differences in gene number estimation between individuals for which only liver was used in MHC genotyping, versus those that used two tissue types (i.e., liver and spleen).To avoid such bias, we excluded (at the population level) variants that consistently were amplified only from liver or spleen, and never from both tissues.Based on this criterion we excluded 18 variants found only in spleen and 13 variants found only in liver (out of 120 variants that met our basic "two-infour" genotyping criteria).Apart from tissue-specificity, presence of certain conserved, peptideanchoring residues was proposed to characterize classical, MHC Ia genes (4).Our MHC class I amplicons covered five (out of nine proposed) such residues: T143, K146, W147, Y159 and Y171.Whereas these observations were based on data from a handful of species, and whether they would hold for classical MHC Ia of bank voles is unknown, we conservatively removed sequences with more than two substitutions at these positions (discarding further three variants).In these final genotypes the mean number of called MHC class I amino acid variants did not differ between the animals for which extraction from one vs two types of tissues were used for MHC genotyping (means: 11.7 vs 12.8, t(22)=-0.99,p=0.33).

Details on MHC supertyping
For MHC supertyping, we used 166 MHC class I, and 59 DQA, 71 DQB, 37 DRB MHC class II sequences, which were either newly identified during the course of this study (Genbank IDs: OR424419-OR424467) or used for supertyping previously in (5) (5,7,8).Z-values describing physicochemical proprieties of amino acids in PSSs were taken from (9), and the grouping was done via clustering with discriminant analysis of principal components (DAPC) (10), implemented in the adegenet package (11,12) . The exact supertyping procedure was described in detail in the Supplementary materials of (5,8), therefor here we only report parameters unique to this study.Number of clusters (k) was selected with help of the find.clusters()function (arguments: n.pca=100, max.n.clust=20, n.iter=5e5, n.start=500) in 25 replicate runs of clustering and upon visual inspection pf the plotted mean BIC values against k.To determine the optimal numbers of retained principal components (PCs Finally, we used posterior assignments from the discriminate function analysis to assign MHC variants to the supertypes.For a few (n=5) MHC class I variants, posterior probability of supertype assignment was slightly lower than <0.5, and so we ascribed them manually to supertypes to which they belonged with the highest probability.Supertype assignments are in Supplementary Dataset 1 associated with this article (the numbering is arbitrary and does not correspond to that of (5,8)).22 supertypes were identified for MHC class I, 12 for DQA, 11 DQB and 17 for DRB, which exactly matches number of supertypes detected in REF ( 5).

TCR amplicon library preparation
The protocol consist of three general stages: 1) RNA reverse transcription, 2) PCR 1 with bank-vole specific, nested primer specific to C region of TCRβ; 3) PCR 2 with indexed primers containing P5 and P7 adaptors.All primer and oligonucleotide sequences are available in (14).
In the first stage, total RNA extracted from 10 5 cells was reverse-transcribed by 5′RACE with SMARTScribe™ Reverse Transcriptase kit (TaKaRA) and a custom-made SmartNNNNa adaptor (IDT), which is a template-switch oligo containing Unique Molecular Identifier (a tag composed of 12 random nucleotides).Reaction setup consisted of two steps.First, mix 1 (20μl of RNA and 2.5μl of 10μM MyglTCRb_1 primer) was kept at 65°C for 2 min (to allow primer annealing).Next, the temperature was lowered to 42°C, and mix 2 was added to each sample (5× First-strand buffer, 1μl 20mM DTT, 2.5μl of 10μM SmartNNNNa, 3μl of 10mM dNTP mix, 1.5μl of SMARTScribe Reverse Transcriptase and 1μl of RNasin® Plus RNase Inhibitor, Promega).Total reaction volume was 40μl.cDNA synthesis was conducted at 42°C for 1h, after which samples were transferred on ice.Products of cDNA synthesis were treated with 5U USER enzyme (uracil-DNA glycosylase, NEB) for 1h at 37°C to degrade any leftover SmartNNNNa adaptor.
On the same day, whole 5'RACE reaction was purified with MinElute PCR purification kit (Qiagen) and eluted in 10μl EB.Entire cDNA was taken as a template for the second stage: PCR 1.Total reaction volume was 25μl, and apart from the cDNA it contained Q5® High-Fidelity 2× Master Mix (NEB), 0.5 μM of Smart20-mod and TCRb_3NN primers.PCR cycling conditions were: 98°C for 1 min, followed by 23 cycles of 98°C for 10s, 65°C for 20s, 72°C for 40s, and a final extension at 72°C for 4 min.PCR 1 products were purified using Agencourt AMPure XP beads (Beckman Coulter) with 1:0.6 DNA to beads ratio and DNA was eluted in 20μl of nuclease-free H2O.
5μl of the purified product of PCR 1 was used as a template in the third stage, PCR 2. Total reaction volume was 20μl, and it contained Q5® High-Fidelity 2× Master Mix (NEB) and a combination of P5_50X and P7_70X primers (1.25μM each) containing indices that would be used for demultiplexing of samples pooled in a sequencing run.PCR cycling conditions were as described for PCR 1, but with 12 cycles and 5min of final elongation.Products of the PCR 2 were run immediately after reaction on 1.5% agarose gel and bands of desired size (~600 bp) were purified with Zymoclean Gel DNA Recovery Kit (Zymo research).DNA was eluted in 15μl of warm EB and its concentration was measured with HS dsDNA Qubit kit (ThermoFisher Scientific).

Details on TCR repertoire analyses and size estimation
Reads processing followed the "Quantitative analysis of the bank vole TCRβ repertoire" described in (14).In brief, non-overlapping, paired-end reads were concatenated, Phred score-filtered and processed to correct sequencing errors with UMIs.Then, CDR3 sequences were extracted and summarized for each sample and Chao1 lower bound richness estimation (3) was calculated based on CDR3 clonotype abundance.
To determine appropriate sequencing depth needed for efficient UMI-based error correction (where certain level of over-sequencing is necessary) we analyzed rarefaction curves for four samples sequenced to a high coverage (>500 000 reads).We checked both the number of retrieved, errorcorrected CDR3 variants (Supplementary Figure S2A) and Chao1 estimates of the TCR repertoire size (Supplementary Figure S2B) retrieved in a samples of 100 000 cells.CDR3 variant numbers started to plateau around 150 000 -200 000 reads, wears Chao1 estimates seemed to stabilize at 250 000-300 000 reads.We thus chose 300 000 reads as an optimal coverage for the subsequent analyses.
It was used as a general minimal coverage (Sup Tab.S1), and as a read sub-sampling threshold, used to control for differences in sequencing depth among amplicons.
Finally, we used Pearson's correlation to checked for relationship between T cell proportions in spleen and Chao1 TCR repertoire richness estimates of given T cell subset.Similarly, individual, betweensubset TCR richness was checked for correlations.Upon confirming the latter (see Results in the main article), we checked whether this could be a technical or sampling artefact related to differences in cell viability or in overall cell numbers collected form a given individual.To that end, we checked whether summed Chao1 estimates from both CD4+ and CD8+ subset correlated with either proportion of live cells (gate P2, Figure S1) or recorded total cell number counted from spleens after cell collection.Neither proposed explanation was supported (correlation with live cells: r25=0.09,p=0.644; correlation with cell counts: r17= 0.19, p=0.442).

Figure S3 .
Figure S3.Clonal proportion of the top n clonotypes ranked by abundance.The brightest bin represent proportion of repertoire space occupied by the top 10 clones, the darkest bin all clones below the top 1000.Two intermediate bins represent clones ranked from 11 to 100 and from 101 to 1000.

Figure S5 .
Figure S5.Predictor effect plots of A) number of MHC class I amino acid variants on the proportion of CD8+T cells among splenic lymphocytes; B) number of MHC class II amino acid variants on the proportion of CD4+T cells among splenic lymphocytes.Line shows fitted values versus the focal predictor on the horizontal axis, when the other predictor (sex) is held fixed.The shaded area is a pointwise confidence band for the fitted values, based on standard errors computed from the covariance matrix of the fitted regression coefficients.Hollow points are partial residuals.

Figure S6 .
Figure S6.Predictor effect plots of A) a proportion of CD4+ T cells among splenic lymphocytes on CD4+Foxp3+ T cells within those CD4+ T cells; B) number of MHC class II supertypes on the proportion of CD4+Foxp3+ T cells among splenic lymphocytes.Blue line shows fitted values versus the focal predictor on the horizontal axis, when the other predictors (A: number of MHC class II supertypes and sex, B:sex) are held fixed.The shaded area is a pointwise confidence band for the fitted values, based on standard errors computed from the covariance matrix of the fitted regression coefficients.Hollow points are partial residuals.

Figure S7 .
Figure S7.Representative Bank vole lymphocyte staining gated as in Figure S1 (parent gate P3), showing anti-Foxp3 and anti-CD4 staining.Upper right quadrant shows the percentage of putative regulatory T cells among all splenic lymphocytes.

Figure
Figure S8.A) Conditional effect plot of MHC class I amino acid variant number on splenic lymphocyte proportions.The three response categories are proportion of CD4+ T cells (teal line), CD8+ T cells (yellow line) and other splenic lymphocytes (grey line).Shaded are shows 90% credible intervals (CI).B) and C) Posterior CI visualizing Markov chain Monte Carlo draws from the posterior distribution of the parameters of a Bayesian model with MHC amino acid variants and sex as predictors."CD4" or "CD8" in the parameter prefixes corresponds to the predicted effect on given T cells subpopulation (third, reference category -"other lymphocytes" -was modelled implicitly).Figure was split into panel B (linear term) and C (quadratic term) to better visualize posterior values at different scales.The points are posterior medians, thick segments are 50% CI, thinner lines are 90% CI (package default).

Figure S9 .
Figure S9.Cook's distances plot for a linear model predicting Chao1 TCR diversity estimates in 10 sorted CD4+ T cells.Explanatory variables were the number of MHC class II supertypes and sex.th observation (Individual s16) was identified as an influential observation in this analysis.

Figure S10 .
Figure S10.Predictor effect plots of number of MHC class II supertypes on the Chao1 TCR diversity estimates in 10 5 sorted CD4+ T cellswithout the influential observation.Teal line shows fitted values versus the focal predictor on the horizontal axis, when other predictor (sex) was held fixed.The shaded area is a pointwise confidence band for the fitted values, based on standard errors computed from the covariance matrix of the fitted regression coefficients.Hollow points are partial residuals.

Figure S11 .
Figure S11.Predictor effect plots of number of MHC class II amino acid variants on the Chao1 TCR diversity estimates in 10 5 sorted CD4+ T cells.Teal line shows fitted values versus the focal predictor on the horizontal axis, when other predictor (sex) was held fixed.The shaded area is a pointwise confidence band for the fitted values, based on standard errors computed from the covariance matrix of the fitted regression coefficients.Hollow points are partial residuals.

Table S2 .
Detailed summary of MHC genotyping and supertyping results.Totaltotal number of variants detected in all analyzed individuals.

Table S4 .
GLMM coefficients for predictors of CD8+ T cell proportion in splenic lymphocytes.Fixed effects were the number of MHC class I amino acid variants and sex.Dispersion parameter for beta family = 243.LRT showed that the model with a quadratic term fitted the data better than the one without it (df=1; χ 2 = 8.806; p=0.003).Significance codes: *** < 0.001 < ** < 0.01 < *< 0.05 < .< 0.1

Table S6 .
GLMM coefficients for predictors of CD4+ T cell proportion in splenic lymphocytes.Fixed effects were the number of MHC class II amino acid variants and sex.Dispersion parameter for beta family = 145.LRT showed that the model with a quadratic term fitted the data marginally better than the one without it (df=1; χ2= 2.870; p=0.090).Significance codes: *** < 0.001 < ** < 0.01 < *< 0.05 < .< 0.1

Table S13 .
Results of Bayesian analysis with Dirichlet regression, where the proportions of T cells among splenic lymphocytes was treated as a compositional proportion with three categories, i.e., CD3+CD4+ as CD4+ T cells; CD3+CD4-as CD8+ T cells; CD3-CD4-as other lymphocytes.The last category was set as a reference and model modelled implicitly.Fixed effects were sex and MHC supertypes.

Table S14 .
Results of Bayesian analysis with Dirichlet regression, where the proportions of T cells among splenic lymphocytes was treated as a compositional proportion with three categories, i.e., CD3+CD4+ as CD4+ T cells; CD3+CD4-as CD8+ T cells; CD3-CD4-as other lymphocytes.The last category was set as a reference and model modelled implicitly.Fixed effects were sex and MHC amino acid variants.