Transmembrane serine protease 2 Polymorphisms and Susceptibility to Severe Acute Respiratory Syndrome Coronavirus Type 2 Infection: A German Case-Control Study

The transmembrane serine protease 2 (TMPRSS2) is the major host protease that enables entry of the severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) into host cells by spike (S) protein priming. Single nucleotide polymorphisms (SNPs) in the gene TMPRSS2 have been associated with susceptibility to and severity of H1N1 or H1N9 influenza A virus infections. Functional variants may influence SARS-CoV-2 infection risk and severity of Coronavirus disease 2019 (COVID-19) as well. Therefore, we analyzed the role of SNPs in the gene TMPRSS2 in a German case-control study. We performed genotyping of the SNPs rs2070788, rs383510, and rs12329760 in the gene TMPRSS2 in 239 SARS-CoV-2-positive and 253 SARS-CoV-2-negative patients. We analyzed the association of the SNPs with susceptibility to SARS-CoV-2 infection and severity of COVID-19. SARS-CoV-2-positive and SARS-CoV-2-negative patients did not differ regarding their demographics. The CC genotype of TMPRSS2 rs383510 was associated with a 1.73-fold increased SARS-CoV-2 infection risk, but was not correlated to severity of COVID-19. Neither TMPRSS2 rs2070788 nor rs12329760 polymorphisms were related to SARS-CoV-2 infection risk or severity of COVID-19. In a multivariable analysis (MVA), the rs383510 CC genotype remained an independent predictor for a 2-fold increased SARS-CoV-2 infection risk. In summary, our report appears to be the first showing that the intron variant rs383510 in the gene TMPRSS2 is associated with an increased risk to SARS-CoV-2 infection in a German cohort.

The transmembrane serine protease 2 (TMPRSS2) is the major host protease that enables entry of the severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) into host cells by spike (S) protein priming. Single nucleotide polymorphisms (SNPs) in the gene TMPRSS2 have been associated with susceptibility to and severity of H1N1 or H1N9 influenza A virus infections. Functional variants may influence SARS-CoV-2 infection risk and severity of Coronavirus disease 2019 (COVID-19) as well. Therefore, we analyzed the role of SNPs in the gene TMPRSS2 in a German case-control study. We performed genotyping of the SNPs rs2070788, rs383510, and rs12329760 in the gene TMPRSS2 in 239 SARS-CoV-2-positive and 253 SARS-CoV-2-negative patients. We analyzed the association of the SNPs with susceptibility to SARS-CoV-2 infection and severity of COVID-19. SARS-CoV-2-positive and SARS-CoV-2-negative patients did not differ regarding their demographics. The CC genotype of TMPRSS2 rs383510 was associated with a 1.73-fold increased SARS-CoV-2 infection risk, but was not correlated to severity of COVID-19. Neither TMPRSS2 rs2070788 nor rs12329760 polymorphisms were related to SARS-CoV-2 infection risk or severity of COVID-19. In a multivariable analysis (MVA), the rs383510 CC genotype remained an independent predictor for a 2-fold increased SARS-CoV-2 infection risk. In summary, our report appears to be the first showing that the intron variant rs383510 in the gene TMPRSS2 is associated with an increased risk to SARS-CoV-2 infection in a German cohort.

INTRODUCTION
The transmembrane serine protease 2 (TMPRSS2) is the major host protease that enables cell entry of several coronaviruses, e.g., HCoV-229E, SARS-CoV, and MERS, respectively (Shulla et al., 2011;Bertram et al., 2013;Shirato et al., 2013). TMPRSS2 is strongly expressed in lung tissue and bronchial transient secretory cells (Lukassen et al., 2020). In a recent study, Hoffmann et al. (2020) demonstrated that SARS-CoV-2 uses the angiotensin-converting enzyme 2 (ACE2) as a receptor for host cell binding, and the viral spike (S) protein is cleaved into S1 and S2 by TMPRSS2 to allow fusion of the viral and cellular membranes.
Because of its functional impact, the missense variant rs12329760 (c.478G>A, p.V160M), which alters the amino acid valine at position 160 to methionine, might as well be of interest with regard to COVID-19. The SNP is located in the scavenger receptor cysteine-rich (SRCR) domain of TMPRSS2, which interacts with external pathogens (Paoloni-Giacobino et al., 1997). The amino acid exchange may create a de novo pocket protein (Paniri et al., 2020). It is an eQTL in whole blood with TT genotype carriers showing the highest expression levels (GTEx portal). The MAF of the T-allele in East Asians (0.36) is comparably higher than in Europeans (0.24), South Asians (0.23), or Africans (0.29; Clarke et al., 2017). In prostate cancer, the rs12329760 T-allele is associated with multiple copies of the TMPRSS2-ERG gene fusion responsible for shorter survival and higher tumor recurrence (FitzGerald et al., 2008). In a preliminary small scale, whole-exome sequencing study comprising 131 Italian hospitalized COVID-19 patients, Latini et al. (2020) observed a lower rs12329760 T-allele frequency (p = 0.02) in the patients compared to corresponding European GnomAD database controls. Based on the above described associations, we investigated the influence of the variants rs2070788, rs383510, and rs12329760 in the gene TMPRSS2 on SARS-CoV-2 infection risk and severity of COVID-19 in a German cohort.

Study Participants, Recruitment, and Outcome of the Patients
The study was conducted following the approval of the Ethics Committee of the Medical Faculty of the University of Duisburg-Essen (20-9230-BO) and in cooperation with the West German Biobank (WBE; 20-WBE-088). Written informed consent was obtained from the study patients.
Enrolment started on March 11, 2020, and ended on September 30, 2020. Patients were initially recruited upon presentation with COVID-19 typical symptoms, i.e., fever, cough, and dyspnea or who were admitted to the hospital with already confirmed SARS-CoV-2 infection. We included 239 SARS-CoV-2-positive patients. Patients were classified as SARS-CoV-2-positive with at least one positive real-time reverse transcription PCR (RT-PCR) test result. Follow-up was completed on October 31, 2020, at which time all patients either were discharged from the hospital as "cured" or had a fatal outcome of the disease. We also studied 253 SARS-CoV-2-negative patients, who presented with COVID-19 typical symptoms, but were tested exclusively negative for SARS-CoV-2 by RT-PCR. These patients were hospitalized at the University Hospital Essen or treated as outpatients, due to other medical conditions. Clinical outcome was defined as follows according to the criteria of the European Center of Disease Prevention and Control (ECDC; 2020) -"serious": hospitalized patients admitted to an intensive care unit and all cases of COVID-19-related deaths during the hospital stay; "moderate": outpatients and all other hospitalized patients. In contrast to the ECDC classification, where patients are counted up to three times, every patient only counted once according to the worst clinical outcome observed during the hospital stay in our study.

Genotyping of TMPRSS2
PCR was performed with 2 μl genomic DNA and 30 μl Taq DNA-Polymerase 2x Master Mix Red (Ampliqon, Odense, Denmark) with the following conditions: initial denaturation 95°C for 5 min; 38 cycles with denaturation 95°C for 30 s; annealing at 60°C (rs12329760) or 65°C (rs2070788 and rs383510, respectively) for 30 s and elongation 72°C for 30 s each; and final elongation 72°C for 10 min. The following oligonucleotide primers were used for the respective PCR reactions: rs2070788 5' [BIO]AGTTTCTGCTGATGAGGAGCC 3', 3'GAAGTGCTTA GTGGCAGGCA 5'; rs383510 5' [BIO]ATGGCTGTGCTTG GGAAATAAC 3', 3' CTTATTTCCTGGCCGGACGC 5'; and rs12329760 5' CGCCCGTAGTTCTCGTTCC 3', 3' TTCGCCT CTACGGACCAAAC 5'. PCR products for rs12329760 were digested with Hpy8I (Thermo Scientific, Dreireich, Germany), and restriction fragments were analyzed by agarose gel electrophoresis. For the various genotypes results from restriction fragment length polymorphism (RFLP)-PCR were validated by Sanger sequencing (Supplementary Figure S1). Genotyping of rs2070788 and rs383510 was performed by Pyrosequencing according to the manufacturers' instructions (Qiagen, Hilden, Germany). In brief, biotinylated PCR amplicons were immobilized on streptavidin-coated sepharose beads (GE Healthcare, Solingen, Germany) with the vacuum tool of the workstation (Qiagen, Hilden, Germany). In the next step, PyroMark denaturation solution (Qiagen, Hilden, Germany) was used to separate the complementary strand from the biotinylated strand. The biotinylated, single-stranded DNA remains immobilized on the vacuum tool. Finally, the single-stranded DNA was released into the sequencing plate containing 0.3 μM sequencing primer (5' TTTATATCCTTCTCAAAAG 3' for rs2070788 and 5' ACGT GGTGGTGCGGGCCT 3' for rs383510, respectively). After incubation at 80°C for 2 min, the hybridized primer and single-stranded template were incubated with the enzymes DNA polymerase, ATP sulfurylase, luciferase, and apyrase, as well as the substrates adenosine 5' phosphosulfate (APS) and luciferin. Addition of dideoxyribonucleotide triphosphates (ddNTPs) was performed sequentially. Each incorporation event is accompanied by the release of pyrophosphate (PPi) in a quantity equimolar to the amount of the incorporated nucleotide. ATP sulfurylase converts PPi to ATP, which drives luciferase-mediated conversion of luciferin to oxyluciferin that generates visible light. The height of each light signal is proportional to the number of nucleotides incorporated and can be visualized in the Pyrogram. Apyrase continuously degrades unincorporated nucleotides and ATP and when degradation is complete another nucleotide is added. Analyses were performed on the PyroMark Q96 MD instrument (Qiagen, Hilden, Germany).
Transmembrane serine protease 2 is located on the reverse strand, and all polymorphisms are reported on the forward strand in this study.

Statistical Analyses
Hardy-Weinberg equilibrium (HWE) was calculated using Pearson's Χ 2 goodness of fit test, and samples were considered as deviant from HWE at a significance level of p < 0.05. LD analysis was performed using HaploView V4.2.
For genetic association, we calculated odds ratio (OR) and 95% CI by Fisher's exact test using Baptista-Pike method for OR, respectively. Values of p are reported two-sided and values of p < 0.05 were considered significant. Adjustment for multiple comparisons in the genotype-phenotype association analysis was performed using Holm-Bonferroni method (α = 0.0167). MVA was performed for both SARS-COV-2 infection risk and COVID-19 severity to estimate independency of the variables age, sex, and TMPRSS2, and rs2070788, rs383510, and rs12329760 genotypes by logistic regression (likelihood ratio test, backwards).

RESULTS
From March 11, 2020 to October 31, 2020, we enrolled and studied 239 SARS-CoV-2-positive and 253 SARS-CoV-2-negative patients to determine associations of the SNPs rs2070788, rs383510, and rs12329760 in the gene TMPRSS2 with susceptibility to SARS-CoV-2 infection and severity of COVID-19. The characteristics and genotypes of SARS-CoV-2-positive and -negative patients are summarized in Table 1. Distribution of sex (p = 0.86) and age (p = 0.10) was similar in both groups.
Transmembrane serine protease 2 rs2070788 genotypes were compatible with HWE (p = 0.87 and p = 0.73, respectively). The MAF of the G-allele in SARS-CoV-2-positive patients (0.48) was similar to the MAF in SARS-CoV-2-negative patients (0.45). For TMPRSS2 rs12329760, we also did not observe a deviation from HWE for both SARS-CoV-2-negative and -positive patients (p = 0.66 and p = 0.50, respectively). The MAF of the T-allele was higher in SARS-CoV-2-positive patients (0.24 vs. 0.20). Neither the rs2070788 nor rs12329760 variant was correlated with an increased SARS-CoV-2 infection risk or severity of COVID-19 ( Table 1).
The observed genotype frequencies for TMPRSS2 rs383510 also were consistent with HWE in SARS-CoV-2-positive patients (p = 0.80). There was a slight deviation from the HWE in SARS-CoV-2-negative patients (p = 0.04). We found 20.1% TT, 48.5% TC, and 31.4% CC genotype carriers in the infected patients. The following genotype frequencies were observed in the non-infected group: 22.5% TT, 56.5% TC, and 21.0% CC. The MAF of the T-allele was higher in the SARS-CoV-2-negative patients (0.51) compared to the MAF in the SARS-CoV-2positive patients (0.44). Carriers of the CC genotype had a 1.73-fold increased risk to SARS-CoV-2 infection (OR: 1.73, 95% CI 1.15-2.59, p = 0.01). We did not observe an association of any allele to severity of COVID-19.
In this study, we did not observe linkage disequilibrium between the alleles of the three loci. The polymorphisms rs2070788 and rs383510 showed highest LD values (r 2 = 0.49, D' = 0.72).

DISCUSSION AND STUDY LIMITATIONS
Currently, the influence of host genetic factors on SARS-CoV-2 infection risk or COVID-19 severity is being investigated in a variety of in silico analyses and with targeted or genomewide sequencing approaches.
Recently, both blood group A and inborn errors of interferon I immunity were identified as risk factors for severity of COVID-19 in genome-wide association studies (Ellinghaus et al., 2020;Zhang et al., 2020).
In a first small-scale study, Latini et al. (2020) observed a significantly lower frequency of rs12329760 T-allele carriers in Italian SARS-CoV-2-positive patients (N = 131) compared to a European reference group from the GnomAD database. Interestingly, we found a non-significant higher prevalence of T-allele carriers in the SARS-COV-2-positive patients (0.24) compared to the SARS-CoV-2-negative patients (0.20). This would make sense, if one assumes that T-allele carriers have a higher gene expression (GTEx portal), which could enhance TMPRSS2 activity and so may result in a higher infection susceptibility. Our observations would need to be analyzed in additional patients for this purpose.
This is the first case-control study analyzing the role of TMPRSS2 rs2070788 with regard to COVID-19. Although it seems plausible from a biological point of view, we did neither observe differences in allele or genotype frequencies between infected and non-infected patients nor different courses of the disease. The influence of this polymorphism has so far only been shown in a study with Chinese H1N1/H7N9 influenza A virus infected patients (Cheng et al., 2015). It cannot be excluded that this variant is relevant in a larger cohort or in other ethnic groups. Since the allele frequencies differ markedly between East Asians (0.36) and Europeans (0.46; Clarke et al., 2017), the latter hypothesis could well play a role. The variant rs2070788 was in high LD with rs383510 in the study with Chinese individuals by Cheng et al. (2015), which we did not observe in our German cohort. These findings fit the LD database records (Clarke et al., 2017) and again underscore the hypothesis that the variant rs2070788 might have a greater significance in other ethnicities. 1 | Demographics, transmembrane serine protease 2 (TMPRSS2) genotypes, and outcome of the severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2)-positive and SARS-CoV-2-negative patients.  Remarkably, we found a significant association to a 1.73fold increased infection risk for TMPRSS2 rs383510 CC genotype carriers in our German cohort. Since genotyping errors can be excluded we have no explanation for the observation that TMPRSS2 rs383510 genotypes slightly violated HWE in the SARS-CoV-2-negative but not in the SARS-CoV-2-positive group. This could hypothetically be attributed to an unrecognized selection bias in the first-mentioned group. Multivariable logistic regression analysis confirmed the CC genotype as an independent risk factor for SARS-CoV-2 susceptibility. The mechanism behind our finding remains ambiguous or even counterintuitive. Considering the data from the GTEx database, it appears that the C-allele results in decreased gene expression in the lung (GTEx portal, 2020). Similarly, for H7N9 influenza A virus infected patients, it was observed that TT or TC genotype carriers had a higher infection risk, which seems quite plausible when viewed in conjunction with the expression data (Cheng et al., 2015). The GTEx data are based on post-mortem gene expression analysis in 515 individuals. Further functional studies need to be performed to validate the influence of the rs383510 polymorphism on mRNA or protein expression and function especially in the context of COVID-19. As mentioned above, our findings cannot be transferred to populations with different ethnicities and need to be validated in larger cohorts as well.

SARS-CoV
Regarding a genetic predisposition to a serious course of COVID-19, we did not observe any association with the analyzed TMPRSS2 variants. On the one hand, this could be due to the biological function of TMPRSS2 as a mediator of cell entry for SARS-CoV-2, so that differences in protein function might only influence the infection risk, rather than the course of the disease. On the other hand, the sample size of our cohort with "serious" COVID-19 was relatively small (N = 75). Further analyses are needed to conclusively clarify the influence of TMPRSS2 variants on disease severity.
In conclusion, for the first time, we report here an association of TMPRSS2 rs383510 CC genotype with SARS-CoV-2 infection susceptibility. Neither TMPRSS2 rs2070788 nor rs12329760 polymorphisms were related to SARS-CoV-2 infection risk or severity of COVID-19 in our German cohort. Before universally being acceptable, the role of variants in the gene TMPRSS2 in SARS-CoV-2 infections still needs to be elucidated in larger cohorts encompassing various ethnicities.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics Committee of the Medical Faculty of the University of Duisburg-Essen (20-9230-BO). The patients/ participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
KrS: resources, methodology, data curation, formal analysis, validation, and writing -review and editing. KB: data curation and investigation. CE: resources and investigation. UD, DF, FH, JR, KaS, SS, and CT: investigation. K-HJ: investigation and validation. WS and AK: conceptualization, validation, supervision, and writingreview and editing. BM: conceptualization, resources, methodology, formal analysis, investigation, supervision, data curation, funding acquisition, project administration, visualization, writing -original draft, and writing -review and editing. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by a grant (to BM) from the Stiftung Universitätsmedizin Essen of the Medical Faculty Essen. The funder of the study had no role in study design, data collection, data analysis, data interpretation, writing of the report, or in decision of submitting the paper for publication. We acknowledge support by the Open Access Publication Fund of the University of Duisburg-Essen.