Infectivity and Progression of COVID-19 Based on Selected Host Candidate Gene Variants

Introduction: Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) has spread around the globe. Susceptibility has been associated with age, biological sex, and other prior existing health conditions. However, host genes are involved in viral infectivity and pathogenicity, and polymorphisms in these could be responsible for the interethnic/interindividual variability observed in infection and progression of COVID-19. Materials and Methods: Clinical exome data of 103 individuals was analyzed to identify sequence variants in five selected candidate genes: ACE2, TMPRSS2, CD209, IFITM3, and MUC5B to assess their prevalence and role to understand the COVID-19 infectivity and progression in our population. Results: A total of 497 polymorphisms were identified in the five selected genes in the exomes analyzed. Thirty-eight polymorphisms identified in our cohort have been reported earlier in literature and have functional significance or association with health conditions. These variants were classified into three groups: protective, susceptible, and responsible for comorbidities. Discussion and Conclusion: The two polymorphisms described in literature as risk inducing are rs35705950 in MUC5B gene and TMPRSS2 haplotype (rs463727, rs34624090, rs55964536, rs734056, rs4290734, rs34783969, rs11702475, rs35899679, and rs35041537) were absent in our cohort explaining the slower infectivity of the disease in this part of India. The 38 functional variants identified can be used as a predisposition panel for the COVID-19 infectivity and progression and stratify individuals as “high or low risk,” which would help in planning appropriate surveillance and management protocols. A larger study from different regions of India is warranted to validate these results.


INTRODUCTION
Severe Acute Respiratory Syndrome Corona virus-2 (SARS-CoV-2) is a new virus responsible for an outbreak of respiratory illness, since December 2019, named as COVID-19, which has spread to several countries around the globe. Susceptibility initially was associated with age, biological sex, and other prior existing health conditions, which is true for all infectious diseases (Yi et al., 2020). The World Health Organization has declared it as a Public Health Emergency of International Concern. Although it is still early to predict susceptible populations, this needs to be addressed urgently to triage and safeguard individuals with high risk of infectivity and/or the potential to get this progressive disease with adverse prognosis.
SARS-CoV-2 belongs to the family of RNA viruses known as coronaviruses. Sequencing results from patient isolates have indicated that SARS-CoV-2 is similar to the beta (β) coronaviruses identified in bats. Two subtypes of coronaviruses have earlier been responsible for large-scale pandemics, Severe Acute Respiratory Syndrome (SARS) and Middle East Respiratory Syndrome (MERS). However, COVID-19 has been found to have higher levels of transmissibility than the earlier two viruses (Adhikari et al., 2020).
It has been observed that susceptibility to COVID-19 shows geographical variation, and it has also been noted that individuals have different disease severities, indicating that host genomic variations might be playing an important role. These need to be determined to predict disease risk and outcome, as well as help in plan new specific interventions and vaccine delivery.
A single patient follow-up study from Australia indicated that robust multi-factorial immune responses can be elicited to SARS-CoV-2, which is similar to the avian H7N9 disease, suggesting that early adaptive immune responses might correlate with better clinical outcomes (Thevarajan et al., 2020).
Recent studies have found that the SARS-CoV-2 and SARS-CoV genomes share around 80% of homology and use the same cell entry receptor, angiotensin converting enzyme 2 (ACE2), for infectivity (Gralinski and Menachery, 2020). Although there is no direct evidence that supports that variants of ACE2 exhibit differential binding of SARS-CoV-2 with S-protein, 11 common variants and one rare variant (rs143695310) associated with high expression of ACE2 in tissues was reported by (Cao et al., 2020). There is only one report from the Indian population about polymorphisms and their in-silico functional significance (Sharma et al., 2020).
After analyzing data from > 200,000 exomes, homozygous/heterozygous/hemizygous loss of function mutations of ACE2 appear to be extremely deleterious (Cirulli et al., 2020). Although rare, some of these variants affect the regions that interact wit h SARS-CoV-2, and further research may identify some of these variants as conferring resistance or heightened susceptibility to the virus.
Binding of SARS-CoV S protein to ACE2 triggers subtle conformational rearrangements, which are believed to increase the sensitivity of the S protein to proteolytic digestion at the S1 and S2 subunits (Haga et al., 2008). Cleavage of the S protein by host cell proteases is essential for viral infectivity and the responsible enzyme is the type II transmembrane serine proteases (TMPRSS2) (Matsuyama et al., 2010).
TMPRSS2 is expressed in ACE2-positive cells, including the human lung, and results obtained with surrogate cell culture systems suggest that TMPRSS2 might play a significant role in coronavirus spread in the human respiratory tract (Bertram et al., 2013). TMPRSS2 facilitates infection via two independent mechanisms, cleavage of ACE2, which might promote viral uptake, and cleavage of SARS-S, which activates the S protein for membrane fusion (Heurich et al., 2014).
Viruses use multiple alternative receptors to enter host cells and a study by Cai (2020) reported that apart from ACE2, dendritic cell-specific intercellular adhesion molecule-3grabbing non-integrin (DC-SIGN) known as CD209 may be important for microbial infectivity (Cai, 2020). DC-SIGN acts as an adhesion molecule and also initiates innate immunity, although the exact mechanism is not clear, but it is known to be involved in microbial clearance through capture, destruction, and presentation of antigens. High expression of DC-SIGN in older individuals and higher gene expression of L-SIGN in Caucasians when compared to Asians was reported. It was also shown that significantly higher DC-SIGN gene expression occurs in the lungs of smokers, especially former smokers, indicating that this may affect SARS-CoV-2 infection (Cai, 2020). It is commonly believed that RNA virus entry into cells by endocytosis is regulated with interferon (IFN)-induced transmembrane (IFITM) proteins, which detect and eliminate viral invaders before the establishment of infection (Weidner et al., 2010). Therefore, mutation in the genes coding for these proteins can lead to variability in establishing infections in the host. IFITM3 is a cellular restriction factor that inhibits infection of influenza virus and many other pathogenic viruses, and sequence variants of this gene exhibits ethnic differences (Yount et al., 2010;Zhang et al., 2013) This highlights the significant role of IFITM3 genetic variants on the epidemiology of influenza. The IFITM3 rs12252 polymorphism has different allele frequencies among different ethnicities ranging from 0% among Japanese to 44% among Chinese populations. Zhang et al. (2013) found a significant association between rs12252 and susceptibility to severe, but not mild flu, among Asians and evidence of an association between rs12252 C allele homozygotes and susceptibility to mild flu in patients with Caucasian ancestry (Zhang et al., 2013).
Respiratory surfaces are exposed to pathogens continuously and a protective mucus barrier traps and eliminates them via mucociliary clearance. MUC5B is an evolutionarily conserved gene that encodes the principal macromolecules in airway mucus (Roy et al., 2014). Genetic variants of MUC genes are linked to diverse lung diseases. MUC5B deficiency causes organisms to accumulate in upper and lower airways and is responsible for development of idiopathic pulmonary fibrosis (Kaur et al., 2017). MUCB5 variants may also be linked to COVID-19 progression. A gain-of-function promoter polymorphism, rs35705950 of MUC5B has been indicated to be responsible for increased severity of lung disease. This variant has been considered to confer the largest risk, genetically for pulmonary fibrosis (Roy et al., 2014;Helling et al., 2017).
The aim of the present study was to identify variants in these five selected candidate genes from the clinical exome data available with us for more than 100 individuals and make an attempt to classify them as relevant to the present COVID-19 aetiopathology, especially for the Indian population.

METHODS
We performed a literature survey to study the global trends of COVID-19 infection and its severity in different populations to understand and predict the impact of the pandemic in the Indian population. Scientific reports, publications, and Genome Wide Association Studies (GWAS) results were studied with respect to severe acute respiratory conditions both viral and non-viral. In view of limited data on COVID-19 susceptibility studies, we included studies on other pandemics like H1N1, H7N9, and MERS, and we identified five candidate genes and their polymorphisms, which may be responsible for infectivity, disease progression, and disease outcome. The selected genes are ACE2, TMPRSS2, IFITM3, CD209 (DC-SIGN), and MUC5B.
The Department of Genetics & Molecular Medicine, Kamineni Hospitals has a genetic counseling and diagnostic facility where patients from different specialties are referred for genetic evaluation. All the patients included in the study were provided with pre-and post-test genetic counseling. Written informed consent to utilize genetic information for research/academic purposes was obtained prior to collecting blood samples.
The variant calling format (vcf) and annotated file data generated from clinical exome studies from individuals/families were stored in a data bank. The selected candidate gene variants were assessed in our internal cohort of 103 individuals, who had earlier provided consent, to perform a pilot study on the susceptibility and disease severity of Indians for COVID-19.
The variant mining flow carried out is briefly described below: 1. Files in vcf format were uploaded on Base Space Variant Interpreter, an Illumina freeware platform. The variants were identified using inbuilt filters. 2. Frequency of the variants was tabulated and compared to global frequencies using TOPMED, EXAC, 1000G databases.

RESULTS
Data from a total of 103 exomes were analyzed of which 53 were females and 50 were males, and 497 polymorphisms were identified in the five selected candidate genes ( Table 1).
In the ACE2 gene, 17 polymorphisms were identified of which rs2285666 was the most frequent (9/103), followed by rs113691336, rs971249, and rs4646174, which were observed in 4/103 individuals ( Table 2). One missense variant rs4646116 was identified in one individual. This variant causes a protein change lysine to arginine at 26th position, which lies in the extracellular membrane and inhibits interaction with Sar-COV2 Spike protein.

DISCUSSION
A wide range of inter individual variation is being observed in the infectivity, disease symptoms, progression of disease, and mortality of COVID-19 between different populations. Host genetic factors have frequently been implicated in respiratory infectious diseases, single nucleotide polymorphisms (SNPs), or commonly known gene polymorphisms have been considered responsible for both ethnic and inter individual variation (Patarčić et al., 2015). A recent twin study indicated that there is 50% genetic heritability in response to the SARS-CoV2 infection justifying the need to evaluate sequence variants in candidate genes (Williams et al., 2020). Until exome and genome data     of individuals with specific SARS-CoV2 infection, unaffected contacts and individuals with range of disease symptoms, as well as, those who succumbed to disease is available, analysis of existing data from random population is the only option to identify variants, which may help to develop a polymorphism panel to identify individuals who are susceptible to infection and also those at risk of developing severe COVID-19 disease.
Since host genetic polymorphisms have been demonstrated to be associated with vulnerability to human infection, in this study five candidate genes-ACE2, TMPRSS2, CD209, IFITM3, and MUC5B-were selected based on their relevance to the current pandemic. All variants reported from each gene were identified from the sequence data available with us as vcf files. The data belonged to 103 individuals of Indian origin (who had consented for use of their data for research) and was selected randomly without any prior bias. Maximum number of variants (n = 390) were identified in MUC5B gene and the least (n = 17) in ACE2 gene.
ACE2 is a human homolog for ACE, which is well-known for its role in the Renin-Angiotensin pathway. ACE2 was identified in the year 2000, it consists of 805 amino acids, and it is a type I transmembrane glycoprotein with a single extracellular catalytic motif (Kuba et al., 2010). It is a carboxypeptidase that catalyzes liberation of vasodilator peptide, angiotensin, from angiotensin II, thus is responsible for counterbalancing the potent vasoconstrictor effects of angiotensin II (Schindler et al., 2007). It also has various other physiological functions, like being a key regulator of dietary amino acid homeostasis in colitis (Hashimoto et al., 2012). But in the context of COVID-19, ACE2 is the major viral receptor and is important for SARS-CoV-2 entry into the cell, making it extremely relevant for infectivity.
Several polymorphisms in ACE2 were reported from a large commercial dataset (Cai, 2020). A recent study from Italian population states that ACE2 variants underlie interindividual variability and susceptibility to COVID-19 (Hashimoto et al., 2012). In our analysis, 17 variants were identified, and the earlier reported missense variant rs4646116, which is responsible for Lys26Arg change in exon 2, was seen with a frequency of 0.0048 in our preliminary data. The frequency of rs 4646116 is reported as 0.0021 in TOPMED and 0.0046 in 1000 Genomes, it has been reported from most populations with Europeans having the highest frequency (Asselta et al., 2020). Another intronic polymorphism rs4646171 was identified with a frequency of 0.0048 and was reported with a higher frequency of 0.048 in TOPMED and 0.070 in 1000 Genome data. The most common polymorphism in our cohort was rs2285666, which exhibited a frequency of 0.0436, which was higher than what was observed in the two databases ( Table 2). The A/A genotype of rs2285666 has a 50% lower expression level of ACE2 compared to G/G genotype and may be protective (Asselta et al., 2020).
Earlier studies reported that the A allele of ACE2 rs4646127 intronic SNP is responsible for decreased tissue expression of ACE2 and was considerably more common in people of European (44.1%) descent and less common in people of East Asian (1%) descent. Another polymorphism rs4646174 has been associated with central pulse pressure, Brain Natriuretic Peptide, and NYHA classification of patients with chronic heart failure (Malard et al., 2013). In several studies, different ACE2 polymorphisms with altered ACE2 expression have been associated with systolic blood pressure, diabetes, cardiovascular disorders, stroke, etc. These could be the likely reason behind the individuals with comorbidities succumbing severely to the COVID-19 infection than others.
TMPRSS2 is considered to play a role in SARS-CoV-2 virus entry in human cells along with ACE2. The host cell protease TMPRSS2 is involved in the fusion of the virus with cell membrane, thereby it is important to understand the role of its variants in COVID-19 infection and disease progression (Bertram et al., 2013;Heurich et al., 2014). TMPRSS2 is known to cleave the influenza A virus and knock out TMPRSS2 −/− mice are resistant to infection, indicating the importance of this gene in the spread and pathogenesis of viral infection (Lambertz et al., 2020).
The role of TMPRSS2 gene in prostate cancer is well-known; however, there are very few reports about the association of TMPRSS2 polymorphism with respiratory distress. An SNP of the TMPRSS2 gene (rs12329760 C>T; Met160Val) present in an exonic splicing enhancer srp40 site, which is highly conserved across mammals, has been found to be associated positively with TMPRSS2-ERG fusion by translocation due to an increased chance of exon skipping in prostate cancer (Bhanushali et al., 2018). This rs12329760 polymorphism was seen in 4.85% of individuals from our study.
However, TMPRSS2 gene eQTLs in lungs appear to cluster at the 3 ′ end and are potentially associated with expression of alternative transcripts in lungs. Amongst these common eQTLs, HaploReg annotations indicated rs 4818239 as an important SNP (Sharma et al., 2020). This was identified in about 2% of the individuals in our population. A recent Indian paper assessed the in-silico functional analysis of various variants some of which were identified in our cohort, like rs4818239 (2/103), rs734056 (5/103), and rs62217531 (1/103) (Sharma et al., 2020). It has been demonstrated that genetic variants with higher TMPRSS2 expression confer greater risk to severe influenza A(H1N1). Notably, rs2070788 and rs383510 had high expression of TMPRSS2 and were significantly associated with the susceptibility to Influenza A(H7N9) (Cheng et al., 2015). The rs2298662 showed high LD with rs2070788 associated with respiratory disorders (Cheng et al., 2015). All these three variants-rs2070788, rs2298662 (1/103), and rs383510 (1/103)were found in our study at a low frequency. The same variants also increase susceptibility to human Influenza A(H7N9) and may be relevant for COVID-19 infectivity. A haplotype, predicted to be associated with higher TMPRSS2 expression, is characterized by three SNPs (rs2070788, rs9974589, rs7364083), whose Minor Allele Frequency (MAF) is significantly increased in Europeans and was 9% higher in Italians with respect to East Asians (Asselta et al., 2020). The polymorphism rs9974995 is nominally associated with phenotype related to respiratory function or respiratory medication (Salmeterol or fluticasone propionate) .
Higher expression of ACE2 and TMPRSS2 in males, African Americans, and patients with diabetes mellitus provides rationale for monitoring these subgroups for high infectivity and poor COVID-19 outcomes. The lower expression of ACE2 and TMPRSS2 with inhaled corticosteroid use warrants prospective study of inhaled corticosteroid use as a predictor of decreased susceptibility to SARS-CoV-2 infection and decreased COVID-19 morbidity (Peters et al., 2020). The two polymorphisms rs112132031 and rs75603675 identified in 10.67 and 4.85% of our cohort, respectively, may be relevant for COVID-19 Cytokine Release syndrome and conjunctival infection according to earlier reports (Peters et al., 2020).
The CD209 gene, which encodes the DC-SIGN (dendritic cell-specific intercellular adhesion molecule-3-grabbing nonintegrin), a key effector of the innate immunity and antiviral defense, is a receptor expressed in the dendritic cells involved in recognition of oligosaccharides present in several pathogens (Granelli-Piperno et al., 2005). Hence, polymorphisms in this gene may explain susceptibility/resistance to infection as well as severity of symptoms in several infectious diseases. In our cohort, 50 variants have been identified in this gene. The GG genotype of rs2287886 present in the promoter region of CD209 was identified in 27.18% individuals in our cohort. It is reported to be associated with development of dengue fever requiring hospitalization, cytomegalovirus disease after allogeneic stem-cell transplantation, predisposition to developing tick-borne encephalitis, invasive pulmonary Aspergillosis infection, Kawazaki infection, and colorectal cancer (Mezger et al., 2008;Barkhash et al., 2012;Sainz et al., 2012;Alagarasu et al., 2013;Lu et al., 2013;Portman et al., 2013;Czupryna et al., 2017). Individuals with genotype AA at rs2287886, whose frequency in Central Asian Mongoloids is high, have shown to express higher levels of CD209, thereby having a higher rate of infection by cytomegalovirus than DCs carrying the GG genotype creating an analogous situation for the existence of Epstein-Barr virus (EBV), since glycoproteins on the viral surface, which are conserved are similar with those on cytomegalovirus (Barkhash et al., 2012). CD209 is also known to transport viruses via immature DCs from the periphery to lymph nodes, where CD4 cells get activated and infected, elucidating a link between CD209 genetic variation and CD4 count. In our dataset, the rs8105483 was observed in 3.88% subjects and rs2287886 in 27.18%, which together form a protective haplotype, which has a higher CD4 count (Geijtenbeek et al., 2003;Tailleux et al., 2003;Tassaneetrithep et al., 2003;Barreiro et al., 2006;Hennig et al., 2011). These polymorphisms may similarly be protective for COVID-19.
In addition, significant associations were found between high risk of Kawasaki disease with CD209 polymorphisms rs4804800 and rs2287886 (Kuo et al., 2014). A study by Ovsyannikova et al. (2011) has shown associations between promoter SNP rs11881682 and intronic SNPs: rs8105572 and rs7252229 of the CD209 gene and measles-specific IFN-γ Elispot responses in Caucasian subjects (Ovsyannikova et al., 2011). SNPs in CD209 rs4804800, rs11465384, rs7248637, and rs7252229 have shown association with an increased risk to develop invasive pulmonary aspergillosis infection (Sainz et al., 2012). The polymorphism rs7248637 in CD209, which showed association with dengue in the Colombian population (Avendaño-Tamayo et al., 2019), was observed in 11.65% subjects in our dataset, and 9.7% subjects in our dataset showed rs11465413 associated with atopic sensitization (Penders et al., 2010).
Interferon-induced membrane protein that inhibits the entry of viruses into the host cell cytoplasm by preventing viral fusion with cholesterol depleted endosomes is encoded by the IFITM3 gene (Zhao et al., 2019). It has a capacity to inactivate new enveloped viruses, which bud out of the infected cell. It has been shown to be active against multiple viruses, including influenza A virus, SARS coronavirus (SARS-CoV), Ebola virus (EBOV), Dengue virus (DNV), human immunodeficiency virus type 1 (HIV-1), etc. (Brass et al., 2009;Lu et al., 2011). Pathways through which IFITM3 functions are: Innate Immune System and Interferon gamma signaling. Studies have shown that the first 21 amino acids of the N-terminus of IFITM3 gene are required for attenuation of vesicular stomatitis virus replication, and that truncated IFITM3 protein fails to restrict the replication of various strains of influenza virus, as well as HIV-1 (Jia et al., 2012;Bailey et al., 2014). Williams et al. (2014) and Kim et al. (2019) showed that even full-length IFITM3 restricts entry and replication of H1N1. Polymorphisms of this gene have been studied in several infections. The polymorphism rs6598045 c.−188T > C (4.85% in our cohort) induces a difference in the binding capacity of the transcription factor causing a difference in the transcription efficiency of the IFITM3 gene, which was reported to exhibit a strong association with influenza H1N1 2009 pandemic virus infection (Shen et al., 2013). Another functional polymorphism rs3888188, showed that peripheral-blood mononuclear cells carrying GG genotype had reduced IFITM3 mRNA level compared to those with TT or GT genotype, which predisposes toward pulmonary tuberculosis in Iranian and Han Chinese populations (Shen et al., 2013).
Previous studies predicted that rs12252 C allele might produce an alternate spliced transcript that encodes an aberrant truncated protein 21 of IFITM3, which reduces the cellular resistance to influenza viruses by blocking early stage of viral replication (Everitt et al., 2012;Compton et al., 2016). Association of this polymorphism has been observed in Chinese population with pandemic influenza (H1N1 09pdm), seasonal influenza (H3N2 and influenza B), and avian influenza (H7N9) (Carter et al., 2018). In our cohort, 11.65% of the individuals carried this allele. Seasonal influenza hospital admissions were associated with rs7948108, which was observed with a low (0.97%) percentage in our cohort.
The rs34481144 is considered to directly affect IFITM3 promoter function. The risk allele A is linked to diminished promoter activity by increasing the binding of the methylationsensitive CTCF transcriptional repressor. This ablation of methylation site controls IFITM3-promoter methylation in memory Cytotoxic T Lymphocytes (CTLs) reducing CTCF binding to increase IFITM3 expression, which leads to increased memory CTL survival and more efficient viral clearance from infected airways (Eisfeld and Kawaoka, 2017). This was observed in 22% of individuals of our cohort and may be a relevant variant for SARS-CoV-2 clearance and reduced progression of disease.
MUC5B gene encodes a member of the mucin family of proteins, which are highly glycosylated macromolecular components of mucus secretions. This family member is the major gel-forming mucin in mucus (Ridley and Thornton, 2018). It is a major contributor to the lubricating and viscoelastic properties of whole saliva, normal lung mucus, and cervical mucus. This gene has been found to be upregulated in some human diseases, while pathogenic variants have been reported to cause pulmonary fibrosis-a lung disease characterized by shortness of breath and varying degrees of inflammation and fibrosis, which is rapidly progressive and acute lung injury with subsequent scarring and end-stage lung disease (Zhang et al., 2019). Many of these symptoms are similar to that reported in COVID-19 disease.
Airway lining mucus serves as the first line of defense during upper respiratory infection. Pathogens trapped in the mucus layer are first removed by the mucociliary clearance mechanism of the underlying airway epithelium as well as macrophages and then by neutrophils recruited into the airways in response to inflammatory mediators released by epithelial cells and macrophages (Kim, 2012). Adult MUC5B-deficient mice displayed bronchial hyperplasia and metaplasia, interstitial thickening, alveolar collapse, immune cell infiltrates, fragmented and disorganized elastin fibers, and collagen deposits that were, for approximately one-fifth of the mice, associated with altered pulmonary function leading to respiratory failure demonstrating that the mouse MUC5B is essential for maintaining normal lung function (Valque et al., 2019).
MUC5B gene had the maximum number (n = 390) of variants in our cohort. The SNP rs2672794 is associated significantly with increased susceptibility to coal workers' pneumoconiosis in a Chinese population (Ji et al., 2014). While the rs56235854 polymorphism is associated with severe asthma (Johnson, 2020), this was identified in 1.94% of individuals analyzed in our dataset. The MUC5B gene rs2735733, rs2249073, and rs2857476 were associated with dental caries; all the three variants were present in 30.09, 25.24, and 27.18%, respectively, of individuals from our cohort, indicating that Indians are highly susceptible to dental caries.
Polymorphisms rs2735727 (12/103) and rs12417955 (4/103) that lead to alternative splicing of MUC5B and rs56367042 (3/103) are speculated to be involved in the pathogenesis of idiopathic pulmonary fibrosis (Nance et al., 2014). A promoter variant rs7115457 is associated with diffuse pan bronchiolitis (Kamio et al., 2005), while another MUC5B promoter polymorphism, rs35705950, is the strongest risk factor, genetic or otherwise, accounting for 30-35% risk of developing Idiopathic pulmonary fibrosis (IPF), a disease that was previously considered idiopathic. This MUC5B variant can potentially be used to identify individuals with preclinical pulmonary fibrosis and is predictive of radiologic progression of disease. The excessive production of MUC5B either enhances injury due to reduced mucociliary clearance or impedes repair consequent to disruption of normal regenerative mechanisms in the distal lung (Evans et al., 2016). This variant rs35705950 (1000G frequency T = 0.0467), was not identified in our cohort and maybe responsible for protecting us from COVID-19 fullblown symptoms.
A total of 497 polymorphisms were identified in five genes in 103 exomes analyzed; 38 polymorphisms identified in our cohort have been reported earlier in literature and have functional significance. Two polymorphic variants rs35705950 of MUC5B that increase susceptibility to pulmonary fibrosis and the common "European" haplotype of TMPRSS2 gene (composed of SNPs rs463727, rs34624090, rs55964536, rs734056, rs4290734, rs34783969, rs11702475, rs35899679, and rs35041537) are totally absent in the Asian population. This may be one of the plausible factors for reduced severity of COVID-19 in Asians compared to Europeans. Based on the function and expression, we categorized the 38 polymorphisms into three groups: polymorphisms increasing susceptibility to the infection (respiratory illnesses), polymorphisms that confer enhanced immunity, and the polymorphisms that are involved in other pathologies, which can induce comorbidities and make an individual fall under high risk category. The six protective polymorphisms, three in ACE2, two in CD209, and one in IFITM3 genes ( Table 3) were identified in our cohort, at least one was present in 38.83% of the individuals analyzed. There were 27 risk susceptibility polymorphisms identified in four genes ( Table 3) and cumulative count of individuals with at least one risk polymorphism was 47.57%. There were five polymorphisms, three in MUC5B and one each in ACE2 and TMPRSS2 genes, which are associated with comorbidities and the cumulative count of individuals with at least one risk polymorphism in our cohort was 40.77% (Table 3).
A larger study for validating our results from Indian population is required, and the sequence data for this maybe already available with the Council of Scientific and Industrial Research-Institute of Genomics and Integrative Biology (CSIR-IGIB) consortium and commercial companies doing testing for Indian patients since 2014. These will also include individuals from different regions of India, unlike the present study where the majority of individuals were from South India. Preliminary sequence data analysis results from five selected candidate genes presented in this paper highlights the importance of identifying polymorphisms from COVID-19 infected asymptomatic and symptomatic patients to give more meaningful results, which will help in managing this pandemic.

CONCLUSION
This is the first study from an Indian population presenting polymorphisms from five selected candidate genes, which may be important for understanding the infectivity and progression of COVID-19 in our population. A larger dataset needs to be analyzed to validate the results and develop a panel of polymorphisms useful for identifying individuals at risk, as well as, those likely to have severe disease symptoms.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
GI co-conceptualized the study, performed data analysis, and wrote the first draft of the manuscript. SS compiled and analyzed the data. SZ co-conceptualized the study and assisted in scientific editing of the manuscript. DS, VM, SP, AS, NA, and NB co-compiled the data. AN co-conceptualized the study. QH conceptualized, supervised the study, and performed scientific editing of the manuscript. All authors approved the final draft of the manuscript and agree to be accountable for the content of the work.