Association Between the LZTFL1 rs11385942 Polymorphism and COVID-19 Severity in Colombian Population

Genetic and non-genetic factors are responsible for the high interindividual variability in the response to SARS-CoV-2. Although numerous genetic polymorphisms have been identified as risk factors for severe COVID-19, these remain understudied in Latin-American populations. This study evaluated the association of non-genetic factors and three polymorphisms: ACE rs4646994, ACE2 rs2285666, and LZTFL1 rs11385942, with COVID severity and long-term symptoms by using a case-control design. The control group was composed of asymptomatic/mild cases (n = 61) recruited from a private laboratory, while the case group was composed of severe/critical patients (n = 63) hospitalized in the Hospital Universitario Mayor-Méderi, both institutions located in Bogotá, Colombia. Clinical follow up and exhaustive revision of medical records allowed us to assess non-genetic factors. Genotypification of the polymorphism of interest was performed by amplicon size analysis and Sanger sequencing. In agreement with previous reports, we found a statistically significant association between age, male sex, and comorbidities, such as hypertension and type 2 diabetes mellitus (T2DM), and worst outcomes. We identified the polymorphism LZTFL1 rs11385942 as an important risk factor for hospitalization (p < 0.01; OR = 5.73; 95% CI = 1.2–26.5, under the allelic test). Furthermore, long-term symptoms were common among the studied population and associated with disease severity. No association between the polymorphisms examined and long-term symptoms was found. Comparison of allelic frequencies with other populations revealed significant differences for the three polymorphisms investigated. Finally, we used the statistically significant genetic and non-genetic variables to develop a predictive logistic regression model, which was implemented in a Shiny web application. Model discrimination was assessed using the area under the receiver operating characteristic curve (AUC = 0.86; 95% confidence interval 0.79–0.93). These results suggest that LZTFL1 rs11385942 may be a potential biomarker for COVID-19 severity in addition to conventional non-genetic risk factors. A better understanding of the impact of these genetic risk factors may be useful to prioritize high-risk individuals and decrease the morbimortality caused by SARS-CoV2 and future pandemics.

Genetic and non-genetic factors are responsible for the high interindividual variability in the response to SARS-CoV-2. Although numerous genetic polymorphisms have been identified as risk factors for severe COVID-19, these remain understudied in Latin-American populations. This study evaluated the association of non-genetic factors and three polymorphisms: ACE rs4646994, ACE2 rs2285666, and LZTFL1 rs11385942, with COVID severity and long-term symptoms by using a case-control design. The control group was composed of asymptomatic/mild cases (n = 61) recruited from a private laboratory, while the case group was composed of severe/critical patients (n = 63) hospitalized in the Hospital Universitario Mayor-Méderi, both institutions located in Bogotá, Colombia. Clinical follow up and exhaustive revision of medical records allowed us to assess non-genetic factors. Genotypification of the polymorphism of interest was performed by amplicon size analysis and Sanger sequencing. In agreement with previous reports, we found a statistically significant association between age, male sex, and comorbidities, such as hypertension and type 2 diabetes mellitus (T2DM), and worst outcomes. We identified the polymorphism LZTFL1 rs11385942 as an important risk factor for hospitalization (p < 0.01; OR = 5.73; 95% CI = 1. 2-26.5, under the allelic test). Furthermore, long-term symptoms were common among the studied population and associated with disease severity. No association between the polymorphisms examined and long-term symptoms was found. Comparison of allelic frequencies with other populations revealed significant differences for the three polymorphisms investigated. Finally, we used the statistically significant genetic and non-genetic variables to develop a predictive logistic regression model, which was implemented in a Shiny web application. Model discrimination was assessed using the area under the receiver operating characteristic curve (AUC = 0.86; 95% confidence interval 0.79-0.93). These results suggest that LZTFL1 rs11385942 may be a potential biomarker for COVID-19 severity in addition to conventional non-genetic risk factors.

INTRODUCTION
SARS-CoV-2 (Severe acute respiratory syndrome coronavirus 2) is a novel coronavirus, first identified in China in late December 2019 (1). The disease caused by this virus, named COVID-19, rapidly spread across the globe being declared a pandemic by the WHO in March 2021 (2). Up to the first week of March 2022, more than 450 million confirmed cases and 6 million deaths were reported worldwide, from which ∼6 million confirmed cases and 139.000 deaths occurred in Colombia (3,4). The clinical course and severity of COVID-19 disease are highly variable among individuals, ranging from asymptomatic cases to severe respiratory failure and death (5).
Different clinical risk factors, including aging, male sex and comorbidities such as cardiovascular disease, hypertension, diabetes mellitus, chronic obstructive pulmonary lung disease, immunosuppression and obesity have been linked to more severe courses of COVID-19 (6,7). Importantly, numerous studies have shown that host genetic factors also play a critical role in SARS-CoV-2 disease progression and severity (8)(9)(10). Early works suggested a potential role of genes related to the renin-angiotensin-aldosterone system (RAAS) (ACE1 and ACE2), the ABO blood group system and the human leukocyte antigen (HLA) (11)(12)(13). The RAAS pathway is a physiological system that plays an important role in the homeostatic control of blood pressure and body water-electrolyte balance (14). Angiotensin I converting enzyme and angiotensin converting enzyme 2, coded by the genes ACE and ACE2, respectively, are critical regulators of this pathway and may also contribute to multiple organ injuries in COVID- 19. In lung vascular endothelium, ACE catalyzes Angiotensin I conversion into Angiotensin II, an active peptide that promotes vasoconstriction, inflammation and thrombosis (15). Conversely, ACE2 converts Angiotensin II into angiotensin-(1-7), molecules that counteract the effects of Angiotensin II, including vasodilatation and vascular protection (16). Polymorphisms that increase ACE expression have been associated with more severe COVID-19 infections. The ACE insertion(Ins)/deletion(Del) polymorphism (rs4646994) is of particular interest as the resulting decrease in ACE activity has been linked to a protective effect in Ins allele carriers (17). Moreover, ACE2 has a dual role as the SARS-CoV-2 receptor, allowing virus internalization, and as RAAS regulator, catalyzing angiotensin II degradation (16,18). Whole exome studies (WES) have identified more than 30 variants in the ACE2 gene, potentially interfering with protein structure, stabilization and expression, and contributing to the high interindividual variability and susceptibility to COVID-19 (19). Among these variants, NM_001371415.1:c.439+4G>A (rs2285666) polymorphism is related to an increase of 50% of ACE2 expression, compared to wild-type G/G genotype carriers, and decreases the risk of severe SARS-CoV2 infection (20). In addition, two large genome-wide association studies, oriented to find genetic susceptibly locus, identified an association signal at chromosome 3p21.31 (rs11385942 and rs10490770) as the one with the most significant association with respiratory failure and mechanical ventilation requirement amongst severe COVID-19 patients (21,22). This locus contains several genes related to cell signaling and solute transportation, including CCR9, CXCR6, LZTFL1, and SLC6A20. LZTFL1 gene, the most promising candidate, codifies for a protein involved in the primary cilia function and the immunological synapse between T-cells and antigen-presenting cells (23).
Despite their relevance, genetic host factors related to COVID-19 severity remain understudied in Latin-American populations, limiting their potential use as predictive biomarkers and the development of predictive models. Furthermore, the study of these factors is particularly relevant considering that Latin-American countries have been severely affected by the COVID-19 pandemic. In this study, we performed an ambispective case-control analysis to evaluate the association between non-genetic factors and genetic factors, including the polymorphisms rs4646994 (ACE), rs2285666 (ACE2), and rs11385942 (LZTFL1), and COVID-19 severity and long-term symptoms in Colombian population. The results of this study support a positive association between the LZTFL1 rs11385942 locus variant and an increased risk of severe SARS-CoV-2 infection. Furthermore, we developed a predictive model integrating non-genetic and genetic factors, potentially useful to identify high-risk individuals and prioritize prevention and mitigation efforts.

Study Population and Sampling
This study enrolled 145 patients between 18 and 60 years with confirmed diagnosis of COVID-19 by positive RT-PCR (reverse transcriptase polymerase chain reaction), antigens or antibodies (IgG and/or IgM for SARS-CoV-2) tests. The control group consisted of 71 patients who were classified as asymptomatic or mild COVID-19, group non-hospitalized. The case group was composed of 74 patients with severe or critical disease, group hospitalized. Subcategorization of the case group was made with patients critically ill who required intensive care unit (ICU), group hospitalized-ICU. Clinical severity was determined according to national guidelines for COVID-19 by the Colombian Health Ministry (24). Cases were recruited among hospitalized patients at the Hospital Universitario Mayor-Méderi (Bogotá, Colombia). Controls were enrolled from a private laboratory (Genética Molecular de Colombia, Bogotá, Colombia). Cases and controls were invited to participate in this study and those who accepted signed an informed consent and underwent buccal swap or peripheral blood sampling. Patients were enrolled between December 2020-July 2021 and all subjects were unvaccinated at the time of recruitment.
The sample size was calculated with a p (sample proportion) of 7% according to the minimum allele frequency (MAF) for the allele with the reported lowest frequency, in our case the polymorphism rs11385942, a confidence level of 95% (α = 0.05, z = 1.96), a margin of error (e) of 5%, and a population size N = 8,000,000 for Bogotá city. Using the formula n = Nz 2 * p(1-p)/α 2 (N-1)+z 2 * p(1-p), implemented in the OpenEpi web-tool, we estimated that the minimum sample size was 101 (25). This value was approximated to 145 individuals considering possible clinical follow up lost. Given this is the first study to assess allele frequency for the polymorphisms of interest in Colombian population, MAF were obtained from the GnomAD database for Latino-American individuals (26). This study followed the guidelines of the Declaration of Helsinki and all experimental procedures were approved by the Ethics Committee of Universidad del Rosario (DVO005 1543-CV1334).

Clinical Data Collection and Follow Up
Data collection and clinical follow up were conducted through phone calls at least 21 days after the diagnosis. Data was obtained through a standardized format that included the following clinical and demographical information: sex, age, blood type, medical history, comorbidities, drugs use, symptoms, long-term symptoms, and any change in disease severity. Furthermore, we performed an exhaustive revision of clinical records of hospitalized patients to validate the information collected previously and verify the clinical classification and severity criteria according to the clinical guidelines mentioned before. One hundred and twenty four patients, 61 cases and 63 controls, completed the clinical follow up and continued in the study.

DNA Extraction and Genotyping
Total genomic DNA was obtained from buccal swab or blood samples using either the Quick-DNA TM Miniprep Plus Kit (Zymo Research) or the Buccal Swab DNA Kit (Promega). The buccal swab samples were collected in a cotton swab and the blood samples were collected in EDTA tubes, 5 mL for patient. Genomic DNA was quantified using a nanodrop spectrophotometer. All samples were aliquoted and stored at 4 • C until analysis. Polymerase chain reaction (PCR) was used to amplify and genotype three polymorphisms of interest: ACE 289bp ALU Ins/Del (rs4646994), ACE2 c.439+4G>A (rs2285666), and LZTFL1 c.323+621dup (rs11385942). Primers were designed using PrimerBlast (27). Primers sequences and PCR conditions are listed in Supplementary Table 1. For ACE rs4646994 genotyping, PCR products were run on a 1% agarose gel stained by ethidium bromide and amplicon sizes were used to determine individual genotypes. Fragments obtained were 191 bp for the Del allele and 480 bp for the Ins allele. For ACE2 rs2285666 and LZTL1 rs11385942, PCR products were purified and sequenced through Sanger method. Sequences were analyzed with the software Geneious Prime v2021.2 (Biomatters) (28). Genotypes were assigned in batches of 20 samples by two independent researchers. In case the results were in disagreement, a third researcher reassessed the results and a final consensus was achieved. These researchers were blind to the casecontrol status of the individuals. Genotypification was attempted in 125 individuals, being successful in 124 (99.2%).

Statistical Analysis and Predictive Model
A bivariate analysis was performed between clinical and demographic variables with the severe COVID-19 outcome (non-hospitalized vs. hospitalized, including UCI and non-UCI patients) or the presence of long-term COVID-19 symptoms using the χ 2 , Mann-Whitney and OR statistics. All the analyses were conducted using this case-control definition unless otherwise stated. Significant thresholds were set as p < 0.05, and a 95% confidence interval for the OR. Long-term COVID-19 symptoms were defined as persistent symptoms beyond 3 weeks from initial symptoms onset (29). An extended analysis of long-term symptoms was performed grouping symptoms into the following categories: (1) frequent (fatigue, headache, attention deficit, alopecia, dyspnea), (2) organ system affected (neurological, psychiatric, osteomuscular, respiratory, and cardiovascular), and (3) others including the ones with low sample and literature prevalence (dysphagia, otorhinolaryngological, ophthalmological and cutaneous manifestations) according to Lopez-Leon et al. (30).
The bivariate association analysis between genetic polymorphisms and severity outcome or the presence of long-term symptoms was performed with the PLINK software (47). Different genetic models, including allelic, genotypic, dominant and recessive, were assessed with the Cochran-Armitage trend, genotypic (2df), dominant gene action (1df), and recessive gene (1df) tests. In addition, a subgroup analysis between control (non-hospitalized) and ICU-hospitalized patients (n = 26) was conducted under the allelic model. The clinical and genetic variables with a significant correlation were used to build a multivariate logistic regression model in order to develop a predictive risk model for severe disease. Different combinations of variables were tested to construct the models, and these were compared using the Akaike Information Criterion (AIC) and the Coefficient of Discrimination D (Tjur's R2) parameters. This last method, Tjur's R2, is used for binomial logistic models and a value approaching 1 indicates that there is a clear separation between the predicted values for the response outcomes (48). For the model construction we evaluated and handled the potentially cofounding and interacting variables. We assessed the variation inflation factor (VIF) to protect our model to be inflated by multicollinearity, all the variables included had a VIF value of 1. Model comparison was assessed by calculating the area under the receiver operating characteristic curve (AUC), direct comparison between the scores obtained from the models, integrated discrimination improvement (IDI) and cross-validation parameters, including concordance, sensitivity, specificity, and net benefit at different cutoff probabilities. Concordance was defined as the correctly estimated outcomes using several cutoff values for the predicted affection probability. The IDI score and cross-validation parameters were calculated with the R packages PredictABEL and rmda, respectively (49,50).
Finally, an open-source and online application was developed for users to easily access and test the model. The predictive model was constructed using R v4.1.2 and the online application was built using the Shiny package for R (51).

Clinical and Demographic Data
In total 145 patients, 71 controls and 74 cases were enrolled in the study. Nine patients from the control group and six patients from the cases group were excluded from the study by loss to follow up. One control was excluded due to familial relationship, one case was excluded by insufficient DNA and four cases were excluded due to direct request from the family. The final number of patients included was 61 controls and 63 cases. Two patients from the cases group died due to COVID-19 complications; nevertheless, clinical follow up was completed with help of relatives. A summary of the study participants is presented in Figure 1. For the control group, 29.5% (n = 18) diagnoses were made by RT-PCR, 63.9% (n = 39) by antigen test, and 6.6% (n = 4) by antibodies. The sampling methods for this group were 67.2% (n = 41) by buccal swabs and 32.8% (n = 20) from peripheral blood. For the case group, 98.4% (n = 62) diagnoses were made by RT-PCR and 1.6% (n = 1) by antigen test and 100% samples were taken from peripheral blood.

Population Genetic Analysis
Next, we compared the allelic frequencies obtained in this study with those of other datasets including populations of European, Asian, African, North American, and Latin-American ancestries (Supplementary Table 3). We found significant statistical differences for the three systems assessed. For ACE rs46469949, East Asia allelic frequencies were the only population with no statistical differences. For ACE2 the rs2285666 allelic frequency found in our study was similar to those reported in Mexican and American populations. Finally, for LZTFL1 rs11385942, the comparison was made against COVID-19 patients obtained from a previous study. We found significant differences with Italian controls but not with Italian cases or Spanish population ( Table 6).

Predictive Model and App Development
Genetic and non-genetic significant variables obtained from the previous analyses were entered into a logistic regression model.
Where the adjusted score is a number between 0 and 1, "age" the age in years, "male" male sex, "comorb" represents the number of comorbidities and "WT/Alt" the risk allele for the LZTFL1 rs11385942 polymorphism. Score distribution using this model for cases and controls is presented in Figures 2A,B. The model achieved good discrimination power (AUC = 0.857; 95% confidence interval 0.79-0.93) (Figure 2C) (Supplementary Table 4). Comparison between the clinical (Age + Sex + Comorbidities) and complete models (Age + Sex + Comorbidities + risk allele) showed a slight increase in the AUC, 0.846 vs. 0.857, respectively. Model comparison was assessed by three additional methods. First, direct comparison between the scores obtained from the clinical and complete model showed a high correlation, nevertheless, for several individuals, the risk scores changed noticeably when the risk allele is included in the model ( Figure 2D). Next, we compared the models using the IDI score (52). This method is defined as the difference in the discrimination slopes between two models, the discrimination slopes are calculated as the difference of predicted probabilities for events and non-events (53). We obtained a positive IDI score (0.026; confidence interval 95% 0.001-0.051, p-value: 0.039) supporting a significant improvement for the complete model. Third, we calculated cross-validation parameters including concordance, sensitivity, specificity and net benefit for different probability cutoffs (54). Net benefit is a decision analytic measure, which puts benefits and harms on the same scale to be compared (55). The results of this analysis showed that the concordance and net benefit were better for most of the probability cutoffs tested (Supplementary Table 5). This improvement was particularly noticeable at probabilities between 0.3 and 0.4. Finally, the complete model was used to design a web-based application using the R package Shiny. The application is open-access and is accessible through a shinyApp server (https:// oscarortega.shinyapps.io/COVID19_UR_Shiny/). The source code of the shiny app is publicly available on Github at https:// github.com/OscarOrt/COVID_19_risk.

DISCUSSION
During the last 2 years, the COVID-19 pandemic has caused vast disruptions in almost any sphere of human activity. Despite   the growing knowledge about the biology and clinical features of this disease, many aspects of its physiopathology and clinical progression remain to be understood. Of particular interest in this process are host risk factors that could contribute to severe courses of COVID-19 and presence of long-term symptoms. These factors include non-genetic and genetic variables. In this study, we aimed to characterize the impact of these variables on COVID-19 outcomes in a sample of Colombian population. We identified several risk factors including the polymorphism LZTFL1 rs11385942 and incorporated these variables into a predictive model. To the best of our knowledge, this is the first study to evaluate the association between genetic risk factors and COVID-19 severity in a Latin-American population using a casecontrol design and illustrates the importance of host genetics in SARS-CoV-2 clinical outcomes. Several non-genetic factors have been associated with poor COVID-19 prognosis, including age, male sex and comorbidities (56). In agreement with such reports, we found a significant association between age, male sex, hypertension and T2DM. Both hypertension and T2DM had been previously identified as independent risk factors for increased morbimortality in COVID-19 patients (57)(58)(59)(60)(61). The mechanism by which hypertension is a risk factor has been attributed to hyperactivation of the RAAS pathway, which increases the inflammatory response, cytokine storm, myocardial remodeling, acute lung injury, and endothelial damage (62). Similarly, it has been proposed that T2DM contributes to thromboembolic complications and organ damage through glucotoxicity, oxidative stress, and increased cytokine production (63). Interestingly, hyperglycemia in nondiabetic patients had a negative impact on patient outcomes (64), highlighting the importance of adequate metabolic control in the management of these patients. Other comorbidities analyzed did not show a statistically significant association individually, probably because the sample size was not large enough to detect such associations. Nonetheless, when grouped, the presence of two or more comorbidities conferred an increased risk of severe COVID-19, an effect possibly explained by the additive effect of risk factors to determine the clinical progression of the disease. The second point worth mentioning about clinical features in the studied population was the prevalence of acute symptoms. Among the most common symptoms reported in the literature are generalized weakness, dry cough, headache, dyspnea, and myalgias (65). In our sample, respiratory and systemic symptoms, including dyspnea, cough, fever and fatigue, were associated with severe disease, whereas flu-like symptoms, such as ageusia, anosmia and headache, were more frequent in patients with a mild form of the disease. Other studies, that included, populations have reported similar findings (66,67). Lower respiratory tract symptoms are often related to severe COVID19, as they are a manifestation of underlying lung compromise.
Another element included in our analysis was the incidence of long-term COVID-19 symptoms, a phenomenon also reported in other viral infections including Spanish Flu SARS CoV-1 and MERS (68). Our findings are consistent with global literature, in which the most common long-term symptoms were fatigue (50-72.8%), joints pain (31.4%), headache (28.9%), chest pain (20-28.9%), dyspnea (28.2%) and palpitations (9%) (68)(69)(70). Remarkably, growing evidence suggests that psychiatric illness is an important COVID-19 sequel, affecting particularly specific populations such as Hispanic and African patients (71,72). Psychiatric long-term symptoms were highly prevalent in hospitalized patients in our study (36.5%). Despite our study being limited by the absence of a standardized mental health scale for patient follow up, our data support these observations (73). The mechanistic basis for these symptoms is attributed to the ability of the virus to infect the central nervous system via the blood-brain barrier and the olfactory bulb, affecting thereafter neurons on the hypothalamus, cortex and brainstem, which could explain many of the neuropsychiatric manifestations (71,74). On the other hand, the absence of association between comorbidities and long-term symptoms has been also observed in the literature (75). Demographic variables such as sex are of much debate, as there is contradictory evidence of higher rates of long-term symptoms in female individuals (76). Finally, several acute symptoms associated with long-term compromise found in this study have been previously reported in the literature and include fatigue, dyspnea and osteomuscular pain and myalgias (77).
Regarding our genetic findings, our study identified the LZFTL1 rs11385942 as a significant genetic factor associated with disease severity, conferring risk for severe/critical clinical outcomes. This polymorphism is located in the 3p21.31 locus, a region previously described as an important risk factor for severe respiratory disease in several studies (21,78). There are six candidate genes in this locus potentially involved in the disease progression presumably by viral entry or clearance and immunological response, these are SLC6A20, LZTFL1, CCR9, FYCO1, CXCR6, and XCR1 (78). The rs11385942 polymorphism is located at intron 5 of LZFTL1 and recent studies have assessed its functional significance in SARS-CoV-2 infection, suggesting a regulatory role. A CRISPRi analysis using lung epithelial cell lines showed that LZTFL1 expression is severely affected by this polymorphism (79). LZTFL1 (leucine zipper transcription factor like 1) protein is highly expressed in lung cells and regulates airway cilia and epithelial-mesenchymal transition, a developmental process critical for the innate immune and inflammatory response. Remarkably, the rs11385942 polymorphism has been associated with higher levels of C5a and soluble terminal complement complex C5b-9 (SC5b-9) plasma levels during SARS-CoV-2 infection, suggesting that enhanced immune system and complement activation might be important pathways in the deleterious effect of this variant (80). Moreover, it has been described that complement activation and membrane attack complex (MAC) formation leads to upregulation of proinflammatory proteins and inflammasomes causing severe lung injury and, in parallel, endothelial cells death, platelet activation and induction of the coagulation cascade leading to thrombus formation, well-known physiopathological findings in severe COVID-19 (81,82). The results of another recent study suggest that rs11385942 is in genetic linkage with the polymorphism rs17713054G>A, the gain-of-function risk A allele upregulates the expression of LZTFL1 by generating a CCAAT/enhancer binding protein beta motif (23). Despite other molecular mechanisms cannot be discarded, this evidence supports LZTFL1 as a candidate effector and provides further support to our findings. Additional studies have found supporting evidence for this association (79,83). In line with these observations, genotypification of the risk allele in this gene could be useful as a molecular predictive biomarker for COVID-19 severe/critical clinical outcomes.
Since the beginning of the pandemic, numerous studies have explored the role of host genetic variability in COVID-19 severity and susceptibility. These studies have included genome-wide association studies (GWAS), which have identified multiple reproducible associations (21,22,(84)(85)(86). Given the underrepresentation of Latin American population in these initiatives, our study allowed us to reproduce the association of the 3p21.32 locus in an ethnically different cohort and suggests that the variation in this region modulates the disease outcome (21). Importantly, detailed exploration of "expanded" phenotypes, other than clinical severity, including symptomatic/paucisymptomatic and Exposed_Positive/Exposed_Negative phenotypes have identified a much larger proportion of protective minor alleles (85). These results suggest that using additional phenotype definitions can identify protective associations. Our patients classified as asymptomatic-mild/severe-critical are more likely enriched for risk alleles conferred by loci such as those analyzed in our study.
It is important to highlight that case-control association studies are potentially influenced by population stratification due to undetected population substructure produced by differences in ancestry generating spurious associations (87). To avoid confounding due to population stratification, analysis using ancestry markers (AIMs) are useful to estimate variability between cases and controls (88). Although our study did not carry out this evaluation, we estimate that sampled population shares a similar gene pool without the influence of factors such as geographic isolation or non-random mating. Additionally, the individuals analyzed come from the Colombian Andean region, a geographical area where high inter-individual variation has not been identified (89), which supports the ethnic similarity of the cases and controls included. Here, LZFTL1 rs11385942 was identified as a significant genetic factor associated with severe COVID-19 (p = 0.01; OR = 5.73; 95% CI = 1. 24-26.46) supporting an important genetic effect. Previously, it has been suggested a need for approaches such as family-based designs or genomic control when the identified genetic effects are very small (OR < 1.20) (90). Finally, although stratification may be less of a concern than originally anticipated and the evidence against a large effect of population stratification, hidden or otherwise, it is important to consider it in false positive or negative association arising from differences in local ancestry (87,88,91).
Two polymorphisms analyzed in our study, ACE rs4646994 and ACE2 rs2285666, are important regulators of the RAAS pathway, a physiological system implicated in COVID-19 susceptibility and severity (92). Despite we did not find evidence of association between these polymorphisms and COVID-19 severity, numerous studies support a biological basis for such relationship (92)(93)(94). The ACE2 rs2285666 T allele is associated with a significant increase in ACE2 expression (95). Interestingly, association studies of this polymorphism with COVID-19 severity have had contradictory results and similar findings to ours have been reported by several authors (43,96,97). Among these, next-generation sequencing analysis in patients hospitalized for COVID-19 indicated no association between ACE2 variants and COVID-19 severity (97). Such discrepancies might be explained by population-specific differences, the additive role of other genes interacting with risk alleles or other mechanisms not assessed such as epigenetic modifiers (98)(99)(100). Concerning ACE rs4646994 the Del allele has been associated with increased ACE expression, higher enzyme activity and elevated production of angiotensin II (101). Despite ACE Del/Del genotype and Del allele have been associated with increased COVID-19 patient severity (101-103), our results failed to replicate these findings in the Colombian population. In agreement with our results, other studies have reported no association between ACE rs4646994 and COVID-19 severity (43,96). Collectively, current evidence contains conflicting results about the role of this polymorphism in SARS-CoV-2 infections. The reasons for these discrepancies are unclear and similar to the ACE2 rs2285666 polymorphism require further exploration. Interestingly, a recent metaanalysis evaluating several polymorphisms related to COVID-19 outcomes found a significant association between the polymorphism ACE rs4646994 and COVID-19 severity (104). Results of individual association studies must be considered carefully and discrepancies in the findings may be the result of underpowered sample sizes, therefore replicates and more robust studies should be considered to validate these associations. On the other hand, we identified ACE rs4646994 Del allele as a protective factor for neurological long-term symptoms, we hypothesize this could be related to an increased catalytic activity resulting in vasoconstriction that counterbalances the intracerebral vasodilation and brain edema due to the anaerobic metabolism in cerebral cells in response to SARS-CoV-2 induced hypoxia (105,106). Whereas, interesting, this hypothesis requires experimental and clinical validation.
Comparison of allelic frequencies obtained in our study with other populations revealed important differences. For ACE rs4646994, Asia was the only region with a similar allele frequency to our studied population (40). This may reflect the ancestral origin of Native American population in Colombia or the admixture between an ancestral population with a higher frequency and Europeans, where allele frequencies are considerably lower (107,108). For the ACE2 polymorphism, the allelic frequency was similar to Mexican population, probably due to a common ancestry and admixture history (44). For the variant LZTFL1 rs11385942, no differences were found with Spanish, European and African populations. Remarkably, Zeberg and Pääbo (109) described that the 3p21.31 region, the locus where the variant is located, was inherited from Neanderthals. The mixture of native Americans and Europeans probably modified the ancestral genetic pool leading to the current allele frequencies. Additionally, it has been proposed that differences in allelic frequencies for the 3p21.31 risk haplotype are produced by natural selection in response to pathogens (109).
Another important determinant of COVID-19 severity is viral genetics (10). It has been identified that specific SARS-CoV-2 variants are associated with differences in severity and mortality, for example, the alpha and gamma variants are related to increased hospitalization, ICU admission and mortality risk (110)(111)(112). While our study did not assess variant differences in cases and controls, genomic surveillance studies conducted during the sample collection period (December 2020-July 2021) in Bogotá, showed that the predominant variants were B.1.621 (Mu) 57.3% (469/819), P.1 (Gamma) 14% (114/819) and B.1.1.7 (alpha) 2.8% (23/819) (113). The most common variant found in this interval of time, Mu, was classified as a variant being monitored (VBM) by the Centers for Disease Control and Prevention (CDC U.S.) without reported major effects on infectivity, transmissibility or severity (114). The coexistence of several variants during this period constitutes a source of variation and might reflect a more complex dynamics of host-pathogen interactions.
Our clinical and genetic association analysis allowed us to identify several risk factors related to disease severity. These factors were incorporated into a predictive risk model using a multivariate logistic regression including demographic, clinical, and genetic traits. To date, ∼50 prediction models and scoring systems, have been published (115). These models are useful tools to facilitate decision-making in healthcare services and rely mostly on clinical features such as age, sex, number of comorbidities, hypertension, T2DM, chronic obstructive lung disease, cancer, cardiovascular disease. However, it is noteworthy that COVID-19 severity is influenced by viral and host genetic factors (10). Recent models, which like ours incorporate a multifactorial approach (genetic and non-genetic factors), included several single nucleotide variants (SNVs) (116). These models have achieved good results in discriminating COVID-19 severity groups and highlighted the role of integrated approaches to predict clinical outcomes. Furthermore, other models aiming to predict adverse outcomes are based on detailed clinical features during diagnosis, admission and hospitalization have been developed, nevertheless, its accessibility and clinical implementation have been limited (117,118). We propose our model as a useful tool to estimate a priori severe or critical illness risk. Notably, despite the minor increase in the AUC when the clinical and complete models were compared, detailed analysis of the discrimination performance and crossvalidation parameters suggest that the incorporation of the risk allele improves the risk prediction model. Further studies involving larger sample sizes might be useful to validate these findings. Likewise, the implementation of our model into a web application might facilitate its usage by healthcare providers in limited-resource settings during the current SARS-CoV-2 pandemics and future health emergencies caused by similar pathogens.
In summary, our study explores the relation between nongenetic and genetic factors, with COVID-19 outcomes in Colombian population, demonstrating a positive association between the LZTFL1 rs11385942 polymorphism and severe disease. By establishing such association, we point up the importance of genetic host factors in SARS-CoV-2 infection. In addition, our work identified previously known nongenetic factors and developed a predictive model which was implemented in a web application, providing a useful tool for risk prediction. Integrative approaches, like ours, may be helpful to better understand COVID-19 clinical progression, refine healthcare efforts and reduce the morbimortality of patients with this disease.

Study Limitations
Our study has potential limitations. First, the sample size was calculated in order to have 80% statistical power based on previous association reports for the variant with the lower allele frequency (LZTF1 rs11385942.), nevertheless, it could have been limited to detect potential small effect sizes for the rs4646994 and rs228566 SNPs in our population. Second, some clinical variables assessed in the clinical follow-up interview were self-reported. Even though most of this information was confirmed in the clinical record, this could have been a potential source of bias. Third, we did not match the case-control groups by age or sex for the statistical analysis. Considering these variables are known risk factors, we aimed to assess their impact on COVID-19 outcome. Fourth, as previously mentioned, analysis of potential population stratification was not performed. In addition, COVID-19 severity is a multifactorial trait and other important variables, including environmental factors, SARS-CoV-2 variants, and additional host genetic polymorphisms, described as risk or protective factors were not evaluated. Assessment of such variables in future studies could help to improve discriminative models and medical risk assessment. Finally, we should highlight that our proposed risk model constitutes a proof-of-concept of the feasibility of this integrative approach and further studies with larger sample sizes and independent replications are required to validate the model.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethics Committee of Universidad del Rosario (DVO005 1543-CV1334). The patients/participants provided their written informed consent to participate in this study.