Insilico Functional Analysis of Genome-Wide Dataset From 17,000 Individuals Identifies Candidate Malaria Resistance Genes Enriched in Malaria Pathogenic Pathways

Recent genome-wide association studies (GWASs) of severe malaria have identified several association variants. However, much about the underlying biological functions are yet to be discovered. Here, we systematically predicted plausible candidate genes and pathways from functional analysis of severe malaria resistance GWAS summary statistics (N = 17,000) meta-analysed across 11 populations in malaria endemic regions. We applied positional mapping, expression quantitative trait locus (eQTL), chromatin interaction mapping, and gene-based association analyses to identify candidate severe malaria resistance genes. We further applied rare variant analysis to raw GWAS datasets (N = 11,000) of three malaria endemic populations including Kenya, Malawi, and Gambia and performed various population genetic structures of the identified genes in the three populations and global populations. We performed network and pathway analyses to investigate their shared biological functions. Our functional mapping analysis identified 57 genes located in the known malaria genomic loci, while our gene-based GWAS analysis identified additional 125 genes across the genome. The identified genes were significantly enriched in malaria pathogenic pathways including multiple overlapping pathways in erythrocyte-related functions, blood coagulations, ion channels, adhesion molecules, membrane signalling elements, and neuronal systems. Our population genetic analysis revealed that the minor allele frequencies (MAF) of the single nucleotide polymorphisms (SNPs) residing in the identified genes are generally higher in the three malaria endemic populations compared to global populations. Overall, our results suggest that severe malaria resistance trait is attributed to multiple genes, highlighting the possibility of harnessing new malaria therapeutics that can simultaneously target multiple malaria protective host molecular pathways.

Recent genome-wide association studies (GWASs) of severe malaria have identified several association variants. However, much about the underlying biological functions are yet to be discovered. Here, we systematically predicted plausible candidate genes and pathways from functional analysis of severe malaria resistance GWAS summary statistics (N 17,000) meta-analysed across 11 populations in malaria endemic regions. We applied positional mapping, expression quantitative trait locus (eQTL), chromatin interaction mapping, and gene-based association analyses to identify candidate severe malaria resistance genes. We further applied rare variant analysis to raw GWAS datasets (N 11,000) of three malaria endemic populations including Kenya, Malawi, and Gambia and performed various population genetic structures of the identified genes in the three populations and global populations. We performed network and pathway analyses to investigate their shared biological functions. Our functional mapping analysis identified 57 genes located in the known malaria genomic loci, while our gene-based GWAS analysis identified additional 125 genes across the genome. The identified genes were significantly enriched in malaria pathogenic pathways including multiple overlapping pathways in erythrocyte-related functions, blood coagulations, ion channels, adhesion molecules, membrane signalling elements, and neuronal systems. Our population genetic analysis revealed that the minor allele frequencies (MAF) of the single nucleotide polymorphisms (SNPs) residing in the identified genes are generally higher in the three malaria endemic populations compared to global populations. Overall, our results suggest that severe malaria resistance trait is attributed to multiple genes, highlighting the possibility of harnessing new malaria therapeutics that can simultaneously target multiple malaria protective host molecular pathways.

INTRODUCTION
Malaria is still one of the global health problems with approximately 228 million cases and 405, 000 deaths in 2018 (WHO 2019). African countries disproportionately carry the global burden of malaria, accounting for 93 and 94% of cases and deaths, respectively (WHO 2019). P. falciparum malaria is still one of the leading causes of child mortality in endemic regions, particularly in sub-Saharan Africa. According to the World Health Organization (WHO), malaria killed about 285,000 under five children in 2016 (WHO 2018). About 10-20% of children who recover from severe malaria develop neurological sequalae and sub-optimal neuronal development (Egan et al., 2001). Severe malaria (SM) is defined as demonstration of asexual forms of the malaria parasites in the blood of a patient with a potentially fatal manifestation or complication of malaria in whom other diagnosis have been excluded (WHO 2014). The SM complications include rapid progression to severe malarial anaemia (SMA), hypoglycaemia, cerebral malaria (CM), acidosis, and death (WHO 2014).
P. falciparum has a complex life cycle that alternate between vertebrate and female Anopheles mosquito. During its blood meal, the infected mosquitoes inoculates the transmissive form of the parasite, the sporozoites, into human skin. From the skin, the sporozoites enter into the blood circulation or up-taken by the lymphatic system and invade liver (Miller et al., 2002). After maturation, the parasite buds off the hepatocytes and released into the circulation in the form of merozomes containing hundreds of thousands of merozoites that infect erythrocytes (Miller et al., 2002). The erythrocyte stage also called blood stage lifecycle is a complex multi-step process that involves repeated invasion, growth, replication, and egress events (Cowman and Crabb 2006). The clinical symptoms of the SM have been linked to the blood stage life cycle (Cowman and Crabb 2006).
Even though SM is one of the commonest reasons for admission to hospital and is a major cause of hospital death in children aged 1-5 years in endemic areas, it constitutes only a small subset (1-2%) of the infected children as the majority of malaria infections are mild (Kevin et al., 1995). It has been shown that such clinical variations are partly attributable to human genetic factors (Damena et al., 2019;Damena and Chimusa 2020). Thus, a comprehensive understanding of the human genetic causes of variation in malaria clinical outcomes may potentially provide clues to design new intervention strategies such as therapeutics and vaccines (Kwiatkowski 2005;Teo, Small, and Kwiatkowski 2013).
Aiming at shedding more light to the genetic basis of severe P. falciparum malaria, several genome-wide association studies (GWASs) have been conducted in diverse malaria endemic populations over the last decade (Jallow et al., 2010;Timmann et al., 2012;Band et al., 2015;Ravenhall et al., 2018;Malaria Genomic Epidemiology Network, 2019). The GWASs have replicated some of the well-known malaria resistance genomic risk loci including sickle cell (HBB) and ABO blood group loci and identified new variants in ATP2B4 and Glycophorin regions. Due to the single-marker testing approach commonly used, the GWASs miss candidate variants with weak genetic effects (Chimusa et al., 2015). To address these problem, a number of gene-based and pathway-level statistical analytic methods have been developed and successfully implemented in complex disease studies (Leeuw et al., 2015;Lamparter et al., 2016;Watanabe et al., 2017;Yoon et al., 2018). These methods can improve the study power by aggregating the joint effects of weakly associated markers at gene and pathway levels (Dudbridge 2016).
Furthermore, the methods integrate functional information from advanced biological databases including the Genotype-Tissue Expression (GTEx), (The GTex Consortium 2015), Encyclopedia of DNA Elements (ENCODE) (Hoffman et al., 2013), Roadmap Epigenomics Project Roadmap Epigenomics Consortium, 2015 and chromatin interaction information (Schmitt et al., 2016) to identify and prioritize candidate genes. Owing to the fact that direct functional follow-up of several candidate causal variants and genes is expensive, application of computational method to prioritize genes and their respective biological pathways are proven to be useful in complex diseases studies (Watanabe et al., 2017).
Here, we implement several gene-set, pathway and network analytic methods on summary statistics of severe malaria GWAS from 17,000 individuals meta-analysed across 11 populations and systematically predicted plausible genes and pathways. We further performed rare variant analysis on raw GWAS dataset (N ∼11,000) of Kenya, Gambia, and Malawi populations. Finally, we performed population genetic structure analysis of the identified genes in the three malaria endemic countries and across global populations. Established over the course of long coevolution time, blood stage life cycle of the parasite constitutes the most extensive interplay between host and parasite genomes which leads to the clinical symptoms of SM. Therefore, our results suggest that severe malaria resistance is polygenic and attributed to multiple genes aggregated in pathogenic pathways linked to the erythrocyte stage lifecycle of P. falciparum.

Description of the Study Datasets
We accessed a previous severe malaria GWAS datasets (Band et al., 2015) (N ∼11,000) of three African populations including Kenya, Gambia, and Malawi from European Phenome Genome Archive (EGA) following the standard data access protocols outlined in previous studies (Achidi et al., 2008;Parker et al., 2009). Children with severe malaria cases were recruited on admission to hospital using definitions outlined by the WHO: cerebral malaria (Blantyre coma score <3 in children or Glasgow coma score <11 in adults), severe malarial anaemia (haemoglobin <5 g/100 ml or haematocrit <15%), and other malaria-related symptoms (Trampuz et al., 2003). Control samples were obtained from representative of the ethnic groups of the cases or in some study sites from the local population (Achidi et al., 2008). The samples were genotyped on Illumina Omni 2.5Marray and QC filtered as described in (Jallow et al., 2010).
In addition to the genotype dataset, we obtained a set of severe malaria susceptibility GWAS summary statistics (N 17,000) meta-analysed across 11 population in Africa, Oceania, and Asia from Malaria Genomic Epidemiology Network (2019). The dataset contained information on GWASs of individual study populations and their meta-analysis. We additionally accessed reference dataset from 1,000 Genomes Project (The 1000 Genomes Project Consortium, 2011) and African Genome Variation Project (AGVP) (Gurdasani et al., 2015) ( Supplementary Table S3).

Functional Mapping and Annotations
We used the meta-analysed malaria GWAS summary statistics (N 17,000 samples, 17 million SNPs) across 11 populations (Malaria Genomic Epidemiology Network 2019) for functional mapping and annotations. We implemented FUMA (Watanabe et al., 2017), a pipeline that determines genomic risk loci and prioritize potential causal genes by incorporating information from multiple sources including GTEx (The GTex Consortium 2015), Encyclopedia of DNA Elements (ENCODE) (Hoffman et al., 2013), Roadmap Epigenomics Project (Roadmap Epigenomics Consortium 2015, and chromatin interaction information (Schmitt et al., 2016).
To gain insights into the biological functions of the prioritized genes, we performed gene enrichment analysis using a hypergeometric test in which gene-sets obtained from MsigDB (Liberzon et al., 2011) and WikiPathways (Kutmon et al., 2016) were used as background genes. We further tested differential gene expression values on 54 tissues obtained from the GTEx (Guerini, Pan, and Carafoli 2003) as described in FUMA (Watanabe et al., 2017).

Gene-Based Genome-Wide Association Analysis
Considering the polygenic nature of severe malaria susceptibility trait (Damena and Chimusa 2020), we applied Pascal (Lamparter et al., 2016), a gene-based GWAS analysis. Unlike the FUMA method which only identifies genes encoded by GWAS significant SNPs, Pascal method aggregates SNPs with modest effects and yields score for the corresponding genes. Briefly, we applied sum of chi-squared statistics (SOCS) analysis to the GWAS summary statistics of SM to compute the corresponding gene scores (p-values) using default settings of the Pascal software (Lamparter et al., 2016). LD information for estimation of correlation structure was obtained from African dataset in 1,000 G phase 3 (The 1000 Genomes Project Consortium 2011). We further categorized the prioritized genes into different functional groups using DAVID tools (Jiao et al., 2012). Significant genes after Bonferroni corrections were subjected to differential gene expression analysis implemented in FUMA software using 54 tissues obtained from the GTEx. Pathway scores were computed by combining the scores of genes that belong to the same gene-set using default parameter of the Pascal software (Lamparter et al., 2016).

Gene Burden and Rare-Variants Association Analysis
Given that the GWAS assumption is based on common variant common disease hypothesis, GWAS approach always miss potential association signal from rare variants. To examine the contribution of rare variants, we applied optimal unified sequence kernel association test (SKAT-O) (Ionita-laza et al., 2013), which combines burden and variance-component analyses to the GWAS dataset of Gambia, Kenya, and Malawi populations. Briefly, we aligned the VCF files including Gambia (N 4,920 samples, 1.6 million SNPs), Malawi (N 2,560 samples, 1.6 million SNPs), and Kenya (N 3,143 samples, 1.6 million SNPs) to GWAS dataset to 1,000 Genome v-3 reference haplotypes using Genotype Harmonizer (Deelen et al., 2014) and removed SNPs with position and strand mismatches and phased using SHAPEITv2 (Delaneau et al., 2013). We performed imputation using impute 2 (Howie et al., 2009) and obtained ∼20 million from each population. After removal of SNPs with low genotype rate and imputation accuracy, we retained ∼15,000 SNPs in each population. We then applied SKAT-O test to the quality filtered data following the procedure outlined in SKAT package (Ionita-laza et al., 2013).

Network Analysis
To investigate the functional interactions between all the candidate malaria resistance candidate genes identified by FUMA and Pascal methods, we implemented network analysis. Briefly, we obtained functional interaction network of all the identified candidate malaria resistance genes using Multiple Association Network Integration Algorithm (geneMANIA) tool (Mostafavi et al., 2008). Using this information, we computed network parameters including degree, betweenness, and closeness centrality metrics to evaluate the topology of nodes (genes) and edges (interactions) in the network using networkX (Hagberg 2008) and R igraph packages (Csardi and Tamas, 2006). We excluded genes in the network which had betweenness score of zero and those with degree less than 3. We used the lowest betweenness and degree centrality score as a threshold to further filter the gene list to elucidate both provincial and connector hubs. From the analysis, we defined degree, betweenness, and closeness threshold of 17, 1,399, and 0.21502, respectively, to identify hubs in the network.
Closeness measures the average distance from the node to all other nodes in the network, indicating which nodes represent a greater "risk" (maximally close with lowest sum of edge weights) for eliciting other nodes. Betweenness measures the number of times that a node lies on the shortest path between two other nodes, indicating which nodes serve as a "hub" between other nodes (Csardi and Tamas, 2006). The degree of a node is described as the number of direct connections it has with other nodes within the network. Low degree nodes usually connect to nodes within their local community, whereas high degree nodes usually extend to the neighbouring community. Using the centrality scores, we quantified node centrality to identify hub genes by investigating the contribution of the edges and the weight of the edges towards node centrality. The hubs genes make strong contributions to the subnetwork and/or global network integrity. Connector hubs and provincial hubs refers to nodes that link other nodes across different communities and local communities, respectively.

Population Genetic Structure of Malaria Resistance Candidate Genes
We performed the population genetic structure of the identified genes in malaria endemic populations (Kenya, Gambia, and Malawi) and global populations of 20 ethnic groups obtained from African Genome Variation Project (AGVP) (Gurdasani et al., 2015). We merged the quality filtered GWAS datasets of the three malaria endemic populations using PLINK software (Purcell et al., 2007). We performed basic quality control on both the merged malaria population dataset and AGVP dataset using PLINK software. We removed structural variants and ambiguous SNPs, removed SNPs with MAF below 0.01, deviate from Hardy-Weinberg at p-value below 0.01 and SNP missingness proportion greater than 0.02 (Marees et al., 2018).
We mapped SNPs in dbSNP database to the identified candidate genes using custom python script. We extracted the mapped SNPs from our datasets (Malaria GWAS and AGVP datasets) and retained for the downstream analyses.
We then partitioned the datasets into a total of 23 different Ethnic groups (20 from AGVP and three from malaria endemic populations) based on population or country label information. We clustered the merged malaria GWAS dataset into sub-regions/populations using smartpca software (Patterson, Price, and Reich 2006). To understand the frequency spectrum of the SNPS residing in the identified genes, we created different minor allele frequency (MAF) bins of all the SNPs mapped to our candidate genes for each ethnic group. We repeated the same analysis separately for each gene to obtain gene-specific MAF. We finally computed proportion of pathogenic SNPs contained in each of the candidate gene using ANNOVAR software (Wang, Li, and Hakonarson 2010).

Functional Mapping and Annotations
We applied three functional mapping strategies implemented in FUMA (Watanabe et al., 2017) to the severe malaria GWAS summary statistics (see Materials and Methods). We identified 19 lead SNPs out of 69 significant SNPs across six genomic loci using default settings of the software (Supplementary Datas S1-S3). These SNPs were significantly enriched in ncRNAintronic, intronic, intergenic, and ncRNA-exonic regions ( Figure 1A).
Our functional mapping strategies yielded a total of 57 protein-coding genes ( Table 1; Supplementary Data S4). These include 29, 23, and 14 genes identified by eQTL mapping, chromatin interaction mapping, and positional mapping, respectively ( Figure 1B). Two genes including ATP2B4 and HBD were identified by all the three gene mapping strategies, while five genes including GYPB, HBG2, TRIM6-TRIM34, OR51F2, and TRIM68 were predicted by two of the three mapping strategies (Supplementary Data S4).
All the implicated genes in chr9q34 locus are located outside the genomic risk locus and were identified by eQTL mapping (Supplementary Data S4; Supplementary Figure S2). These include surfeit gene cluster such as SURF2, SURF4, MED22, and SURF6; a metalloprotease gene, ADAMTS13; and a gene encoding ORS blood group system coding gene (GBGT1). In the remaining enriched cytogenic positions, the known genes including ATP2B4 (chr1q32) and FREM3, GYPE, and GYPB (chr4p31) were replicated. Other notable genes include BTG2, a tumour suppressor gene on chr1q32, and B3GALNT1 on chr3q26 (Supplementary Data S4).  Table  S2) and identified additional 125 genes across the genome (Supplementary Data S5). The genes with top scores outside genomic risk loci include CSMD1 (p 1.58e-12) on chr8p23.2 and RBFOX1 (p 9.76e-11) on chr16p13.3. CSMD1 is an important regulator of complement activation and inflammation (Sun et al., 2001;Lee et al., 2019), while RBFOX1 encodes for an mRNA-splicing factor linked to autism spectrum disorders (Hamada et al., 2016). A previous study in Tanzanian population reported association of variants in RBFOX gene with SM (Ravenhall et al., 2018).
Furthermore, protein kinases including FLT4 (p 9.96e-8) on chr5q35.3, PTPRT (p 4.92e-7) on chr20q12-q13, and PRKG1 (p 1.2e-6) on chr10q11.2-q21.1 were among the genes with top scores (Supplementary Data S5). PTPRT is a tyrosine phosphatase receptor involved in STAT3 pathway and was recently reported to be associated with mild malaria susceptibility in Benin populations (Milet et al., 2019). PRKG1 is a cyclic guanosine monophosphate (GMP) dependent protein kinase which plays important roles in relaxation of vascular smooth muscle and inhibition of platelet aggregation (Mrozek et al., 2003). FLT4 acts as a cell-surface receptor for vascular endothelial growth factor C (VEGFC) and vascular endothelial growth factor D (VEGFD), and plays an essential role in the development of the vascular network (Aprelikova et al., 1992). It has been shown that VEGF is expressed in the brain tissues and reported to play protective during CM (Yeo et al., 2008).

Gene-Based Rare Variant Association
Because rare variants are known to play role in the variation of most complex traits, we applied optimal unified sequence kernel association test (SKAT-O), which combines burden and variance-component analyses (Ionita-laza et al., 2013), to the raw genotype GWAS dataset of Gambia, Kenya, and Malawi populations (see Materials and Methods). The SKAT-O analysis identified a total of six and nine nominally significant genes in Gambia and Malawi populations, respectively. These include nine long intergenic non-protein coding RNAs (LincRNAs), MIR4282, GLYR1, NDNF, EPB41L2, ATP8A1, and WASF3 (Supplementary  Table S3). However, none of these genes were significant after correction for multiple testing.

Functional Networks and Subnetworks of Severe Malaria Resistance Candidate Genes
To investigate the functional interaction between all the candidate SM resistance candidate genes identified in this study, we implemented network analysis (see Materials and Methods). Our global network generated 351 functional interactions between 268 genes. Topology analysis identified ABO, HBB, HBD, HBE1, and ATP2B4 as highly influential connector hub genes influencing at least two subnetworks/communities, while TRIM21 and OR5F2 constituted independent communities. MED22 and OR551B6 constituted provincial hub genes ( Figure 3).

Molecular Functions of Genes in Malaria Risk Loci Identified by FUMA Method
To test whether the genes predicted by the three functional mapping strategies overlapped in functional gene sets and pathways, we conducted gene enrichment analysis implemented in FUMA (Watanabe et al., 2017) using MsigDBc5 (Liberzon et al., 2011) gensets as background (see Materials and Methods). The gene enrichment analysis identified several shared biological functions linked to erythrocyte-related pathways including three gene ontology (GO) cellular components, eight GO molecular functions, and 14 GO  biological processes ( Table 2). The implicated cellular components include haptoglobin-haemoglobin complex (p 7.6e-8), haemoglobin-complex (p 7.63e-8), and cytosolic-part (p 6.73e-3). The enriched molecular functions include haptoglobin binding (p 4.87e-8), oxygen carrier activity (3.8e-7), oxygen binding (p 4.87e-5), and other activities related to haemoglobin functions. The shared biological activities include oxygen transport (p 4.22e-6), gas transport (p 4.87e-6), hydrogen peroxide catabolism (p 9.08e-5), protein hetero-oligomerization (p 2.86e-3), protein complexoligomerization (p 4.23e-3), interferon gamma mediated signalling pathways (p 3.09e-2), and blood coagulation (p 3.09e-2). Our differential expression analysis of genes identified by FUMA method did not yield significant enrichments.

Molecular Functions of Candidate Genes Identified by Gene-Based GWAS Analysis Using Pascal Method
In addition to genes within SM genomic risk loci, we performed functional analysis and pathway analysis for the genes identified by the gene-based GWAS using Database for Annotation, Visualization and Integrated Discovery (DAVID) method (Jiao et al., 2012) and Pascal (Lamparter et al., 2016), respectively (see Materials and Methods). The DAVID analysis yielded eight functional categories, the majority of which are linked to malaria pathogenesis (Table 3) including GPCR signalling, membrane/transmembrane proteins, Na+/K+ transporting ATPases, cell adhesions, haemoglobin related functions, calcium signalling, and actin binding activities. The Pascal analysis replicated pathways including haemostasis (p 4.52e-10), G protein-coupled receptor signalling (p 7.88e-15), and calcium signalling (p 1.10e-7) ( Figure 4A). Additional pathways including neuronal system (p 1.25 e-9), axon guidance (p 5.93e-8), chemical transmission across synapses (p 1.75 e-7), immune system (p 2.61e-6), signalling by Rho GTPase (p 1.45 e-5), and tight junction (p 3.57 e-5) were identified by this method. Furthermore, differential expression of genes implicated by Pascal method showed significant enrichment in blood vessels ( Figure 4B).

Population Genetic Structure of Malaria Resistance Candidate Genes
To assess the levels of differentiation of the candidate genes, we performed population genetic structure analyses in malaria endemic populations (Kenya, Gambia, and Malawi) and global populations of 20 ethnic groups obtained from African Genome Variation Project (AGVP) (Gurdasani et al., 2015). We mapped a total of 14,106,476 of SNPs in dbSNP database to the identified candidate genes. Out of these, we retained a total of 15,675 SNPs and 93,5549 SNPs that are in the Malaria GWASs dataset (N 10,578 samples) and AGVP (N 4,932 samples) dataset, respectively. We partitioned these datasets into individual population and computed population structure analyses including PCA, MAF, and proportion of pathogenic SNPs residing in the candidate genes (see Materials and Methods).
The PCA analysis effectively clustered the majority of the populations according to their ancestry ( Figure 5). This suggests that the SNPs residing in the candidate genes are differentiated across geographical locations and ethnic background. We also noted that the minor allele frequencies (MAF) of common SNPs (MAF > 0.05) in the candidate genes are generally higher in the three malaria populations compared to 20 ethnic groups ( Figure 6; Supplementary Figure S3). We further observed that the proportion of pathogenic SNPs in a total of 18 genes is much higher in the three malaria endemic populations compared to other populations (Supplementary Datas S6, S7). These include TRIM family genes such as TRIM21, TRIM22, TRIM68, TRIM6-TRIM34, and TRIM34 in which the pathogenic SNP proportion ranges from 13.3 to 25%, and olfactory receptors genes such as OR51B4, OR51B6, OR51B2, OR56A1, OR51L1, OR52K2, and OR51E2 in which the pathogenic SNP proportion ranges from 27.3 to 100%. FIGURE 3 | Network generated from predominant severe malaria protective candidate genes, comprising 351 interactions between 268 nodes. Topology analysis identified ABO, HBB, HBD, HBE1, and ATP2B4 as highly influential connector hub genes influencing at least two subnetworks/communities, while TRIM21 and OR5F2 constituted independent communities. MED22 and OR551B6 constituted provincial hub genes.

DISCUSSION
In this study, we applied statistical functional analytic method to the largest ever severe malaria susceptibility GWAS dataset and identified the well-known malaria resistance loci and a number of novel genes that can guide future functional experiments. We noted that severe malaria resistance is attributed to multiple genes and pathways linked to malaria pathogenesis during blood stage life cycle of the parasite including merozoite invasion, parasite growth, cytoadherence, and signal transduction. The genes that were identified by our three mapping strategies might have equal importance; genes that were identified by positional mapping may act at protein level through structural changes, while the genes identified by eQTL and chromatin interactions exert their influences through quantitative changes at gene expression levels (Watanabe et al., 2017).  The fact that the functionally mapped genes are clustered on chr11p15 is consistent with our recent work in which we reported the disproportionate concentration of SNP-heritability on chr 11. This might reinforce the need for targeting this chromosome in the future severe malaria resistance studies (Damena and Chimusa 2020). In addition to the sickle trait gene (HBB), our mapping strategies identified other members of beta globin gene cluster that cause various forms of beta-thalassemia (HBE1, HBD, HBG, and HBG2). Our network analysis showed that all these genes constituted hub with which several other genes are connected, which reaffirm the importance of hemoglobinopathies in resistance against severe malaria. Hemoglobinopathies are believed to confer protection against severe malaria by suppressing the parasite growth and by mitigating associated pathogenic effects (Taylor, Cerami, and Fairhurst 2013). The proposed protective mechanisms of hemoglobinopathies against severe malaria have been extensively reviewed elsewhere (Hedrick 2011;Taylor, Cerami, and Fairhurst 2013).
In addition to the beta globin gene cluster, the Tripartite motif (TRIM) containing gene family (TRIM68 and TRIM21) identified in this locus are known to play critical role in down regulating Toll-like receptors (TLR)-and Rig-like receptors (RLR)-induced responses and protect from autoimmune and inflammations (Mccarthy et al., 2014;Foss et al., 2019). Mal-adapted inflammatory reactions is one of the hall mark pathogenic pathways in severe malaria (Milner 2018;Moxon et al., 2020). The olfactory receptor super-family genes (CCKR, OR51F2, and OR51L) identified in this locus might be involved in G proteincoupled receptor (GPCR) signalling activities which is important in blood stage life cycle of P. falciparum (Mbengue, Yam, and Braun-breton 2012). However, it is also possible that these genes were detected because of their abundance and close proximity to globin gene cluster (Alonso, Lo, and Izagirre 2006). We also noted that the genes in the TRM family and olfactory receptor superfamily contain higher proportion of pathogenic SNPs in the three-malaria endemic population compared to the global populations. The majority of the well-known malaria protective genes have deleterious variants and were evolved under balancing selections.
Our eQTL mapping identified ADAMTS13 gene on chr9q34 outside the malaria genomic risk locus which would not have been identified by the conventional SNP mapping approach (Watanabe et al., 2017). ADAMTS13 is a zinc-containing metalloprotease enzyme that cleaves von Willebrand factor vWF, a large protein derived from endothelial surface and megakaryocytes which plays a crucial role in basic haemostasis (Sporn et al., 1989;Zheng, 2016). Following activation of endothelial cells, vWF is directly released into plasma and basement membrane or is stored in Weibel-Palade bodies (WPBs) from where it is released by regulated secretion to promote adhesion of platelets at the sites of vascular injury and facilitate vascular healing (Sporn et al., 1989). However, abnormal accumulations of vWF caused by deficiency of plasma ADAMTS13 trigger intravascular platelet aggregation and micro thrombosis leading to a vascular disease, thrombotic thrombocytopenic purpura (TTP) (Zheng, 2016). Indeed, recent works have linked the platelet-mediated clumping of infected erythrocytes in microvasculature during cerebral malaria with increased level of VWF in plasma caused by mutations in ADAMTS13 genes (Hollestelle et al., 2006;Kraisin et al., 2011;Adams et al., 2014).

FIGURE 5 | (A)
We clustered the merged malaria GWAS dataset containing only SNPs residing in the identified malaria resistance candidate genes (N 10,578 samples, 15,675 SNPs) using smartpca software. The populations were indicated by different colours and symbols. The PCA analysis effectively clustered the majority of the populations according to their ancestry, suggesting that the variants in the candidate genes are differentiated across ethnic background. The three populations and their case/control status were indicated by different colours and symbols. (B) We clustered the AGVP dataset containing only SNPs residing in the identified malaria resistance candidate genes (N 4,932 samples, 93,5549 SNPs) into sub-regions/populations using smartpca software. The populations were indicated by different colours and symbols. The populations were clustered based on their geographical locations. However, the clusters in Malawi and Gambia slightly overlapped, likely because of the similar malaria endemicity in these population.
Frontiers in Genetics | www.frontiersin.org November 2021 | Volume 12 | Article 676960 In the same genomic locus, our eQTL functional mapping identified surfeit gene cluster, metalloprotease genes linked to epithelial adhesions and blood coagulations. SURF4 gene has been implicated in epithelial cell adhesion trait (Ahsan et al., 2017), MED22 gene is linked to VWF factor/factor VIII level measurement (Sabater-Lleal et al., 2019), and SURF6 has been implicated in epithelial ovarian cancer (Chen et al., 2014). Our network analysis showed that MED22 forms a central hub to which the rest of surfeit gene clusters are connected. This may suggest the greater importance of MED22 gene compared with other members of the cluster. In the remaining genomic risk loci, the well-known genes that play role in severe malaria resistance (Timmann et al., 2012;Band et al., 2015) including ATP2B4 (chr1q32), FREM3, GYPE, and GYPB (chr4p31) were replicated and few additional genes were identified. The glycophorin gene clusters, GYPA and GYPB, encode the MNS blood group system and are host-erythrocyte receptors for P. falciparum, suggesting that polymorphisms in these genes play protective role by interfering with the invasion processes (Sun et al., 2001). ATP2B4 variants may impair the parasite lifecycle in erythrocyte by affecting intracellular Calcium homeostasis (Tiffert et al., 2005; Malaria Genomic Epidemiology Network 2019).
Novel genes identified in these loci include BTG2 on chr1q32 and B3GALNT1 on chr3q26. BTG2 is a tumour suppressor gene known to be linked to RBC related traits including MCHC level, RBC distribution, and reticulocyte count (Astle et al., 2017). B3GALNT1 encodes globoside blood group system which is determined by P antigen (Hellberg, Poole, and Olsson 2002). Globoside/P antigen is the most abundant neutral glycolipid in the erythrocyte membrane and has been recognized as a cellular receptor for parvo-B19 virus (Kevin et al., 1994). Individuals lacking this receptor are resistant to parvo-B19 virus and uropathogenic E. coli infections (Lund et al., 1985;Kevin et al., 1994). This may suggest that variants in these genes might influence susceptibility to infectious diseases including malaria. However, further investigations are needed to establish the link between these genes and SM resistance.
We noted that the genes identified in malaria risk loci share cellular components including haptoglobin binding, haemoglobin complex, and cytosolic part and several overlapped molecular functions and biological processes linked with the blood stage life cycle of the parasite. P. falciparum spends most of its lifecycle within RBCs, where it undergoes multiple rounds of invasion, growth, replication, and egress, causing the signs and symptoms of malaria (Rowe et al., 2009;Moxon et al., 2020). The majority of the classical haemoglobin variants confer protection against severe malaria by restricting invasion process and intraerythrocytic growth of the parasite (Taylor, Cerami, and Fairhurst 2013). Haptoglobin is an acute phase glycoprotein present in human plasma. It forms stable complexes with extracellular haemoglobin that is released from lysed RBCs and thereby curtail the haemoglobin-induced oxidative tissue damage (Kristiansen et al., 2001).
P. falciparum ingest the host cell cytosol to obtain nutrients and space for growth in the RBCs (Francis, Sullivan, and Goldberg 1997). A recent study showed that the host cytosol uptake process is mediated by parasite's protein called VPS45 (Jonscher et al., 2019). The fact that the identified candidate genes in this study were enriched in the cytosol part of the cellular component might suggest that the variants of these genes might arrest the nutrient up-take of the parasites and thereby confer protection against their pathogenic effects. In addition to haemoglobin related functions, some of the candidate genes were enriched in pathways linked to malaria pathogenesis including blood coagulation related processes (Francischetti, Seydel, and Monteiro 2008;Sullivan et al., 2016) and interferon gamma mediated signalling pathways (Hunt et al., 2014; Tosevski et al., 2017). This may suggest that the host genetic factors might interfere with parasite development and its pathogenic effects at multiple levels to confer protection against the life-threatening form of malaria. Our gene-based GWAS analysis replicated the well-known malaria resistance candidate genes in the genomic risk loci and identified several genes across the genome. Genes with top scores encode for different malaria relevant functions such as regulation of inflammation (CSMD1), neural adhesion (CNTN4, PCSK5, CDH13, TMEM132), vascular epithelial development (FLT4), and protein kinases (PTPRT, PRKG1). Functional analyses of these genes yielded functional categories linked to the blood stage lifecycle of the parasite and associated pathologies. The top enriched functions include GPCR signalling, membrane/ transmembrane proteins (Na+/K+) transporting ATPases, sodium leak and potassium channel interacting proteins, cell adhesions molecules and calcium/calmodulin-dependent protein kinases (CDPKs), cell adhesion molecules, Rho GTPase activities, tight junction, neuronal system, and axon guidance.
CDPKs have crucial functions in calcium signalling at various stages of the parasite's life cycle and is proposed to be one of the potential drug targets against malaria (Ghartey-kwansah et al., 2020). It has been shown that P. falciparum infection activates host signalling pathway involving protein kinase C (PKC) (Sicard et al., 2011). Similarly, host GPCR signalling pathways have been shown to play vital roles in invasion, intra-erythrocyte parasite development, and egress processes (Millholland et al., 2013;Brochet and Billker 2016), suggesting the existence of substantial interactions between host membrane/transmembrane signalling and parasite signalling elements which might mediate the disease severity. Further studies are needed to decouple the host-parasite interface of signal transduction and explore the potential target for new therapeutics.
Sodium leak and potassium channel interacting proteins (Na + / K + ) transporting ATPases play critical role in maintaining electrochemical equilibrium in normal erythrocytes. However, upon invasion by trophoblast stage of the parasite, the ion pumpleak balance is perturbed, with increased leak rate and decreased pump rate resulting in a remarkable increase in (Na + ) and decrease in (K + ) in the erythrocyte cytosol (Pillai et al., 2013;Desai 2014). This results in formation of a new permeability pathway (NPP) in the erythrocyte membrane which allow the transport of nutrients and waste products necessary for the parasite (Kirk and Lehane 2014). Studies have shown that both parasite driven proteins encoded by clag3.1 and clag3.2 (Ekland, Akabas, and Fidock 2011;Nguitragool et al., 2011) and host benzodiazepine receptor mediate the formation of NPP (Bouyer, Egée, and Thomas 2006;Bouyer et al., 2011). Given that NPP can be a potential target for new therapeutics, further studies are needed to investigate the role of host variants in influencing the NPP channel formation and its ability transporting nutrients to the parasites.
One of the key virulence factors of P. falciparum is its capacity to modify iRBCs to adhere to the endothelium of the vasculature and, thereby, sequester in capillaries and postcapillary venules in vital organs leading to severe disease manifestations (Rowe et al., 2009). Adhesion phenotype is primarily mediated by expression of P. falciparum erythrocyte membrane protein 1 (PfEMP1) on the iRBCs (Heatwoie et al., 1995;Smith et al., 1995). The binding of iRBCs with endothelium involves various adhesion molecules including CD36, ICAM-1, E-selectin, and chondroitin sulfate A (CSA) that are variably expressed in different organs (Raventossuarez et al., 1985;Fried and Duffy 1996;Rowe et al., 1997). The neural adhesion molecules identified in the current study might be involved in receptor activities, and their polymorphisms might play protective roles against SM. Adhesion events have been shown to activate Rho kinase signalling pathway which is strongly implicated in various vascular diseases (Taoufiq et al., 2008). The variants of genes that are enriched in these pathways in the current study might provide protection against severe malaria by weakening the cytoadherence interactions and associated pathologies. The variants of other genes that are enriched in neuronal system, axon guidance, and tight junction might be linked with intracerebral pathogenesis of SM (Nishanth and Schlüter 2019). Furthermore, the candidate malaria resistance genes identified by gene-based GWAS were differentially expressed in blood vessels, suggesting that the majority of the identified genes likely counteract P. falciparum induced endothelial disfunctions in microvasculature and capillaries (Moxon et al., 2020).
The PCA analysis effectively clustered the majority of the populations according to their ancestry. This suggests that the SNPs residing in the candidate genes are differentiated across geographical locations and ethnic background. However, the clusters generated from malaria endemic regions including Malawi and Gambia overlapped, likely because of the similar malaria endemicity in these populations. As expected, the MAF of common SNPs (MAF > 0.05) residing in the candidate genes is higher in the three malaria populations compared to the 20 global populations. However, it should be cautioned that the SNPs used here were not specifically associated with SM and may not represent causal and association SNP level frequency spectrum. Future association studies are needed to identify specific causal SNPs in these genes.

Limitations
Severe malaria is a complex disease with various clinical manifestations including cerebral malaria, severe malarial anaemia, and others which may arise from distinct pathophysiological processes. This implies the existence of subphenotype specific variants that influence the disease outcomes. However, sub-phenotype analyses were not presented in the current study owing to the lack of adequate sub-phenotype information in the MalariaGen datasets. Moreover, our functional analysis, including Quantitative Trait Locus (eQTL) mapping, and chromatin interaction were based on datasets that come from European populations which might negatively affect the findings. Furthermore, given the high genetic diversity of African population, it could have been more appropriate to use population specific reference panel for each of the study population. Our genotype-based rare variant association analysis in the current study was limited because such analyses work best in whole genome dataset. Future whole genome-based studies are needed to better understand the contribution of rare variants in malaria resistance trait. Finally, our findings were not yet supported by experimental functional studies.

CONCLUSION
Our functional mapping analysis identified 57 genes located in the known malaria genomic loci, while our gene-based GWAS analysis identified additional 125 genes across the genome which can potentially guide future experimental studies. The identified genes were significantly enriched in malaria pathogenic pathways including multiple overlapping pathways in erythrocyte-related functions, blood coagulations, ion channels, adhesion molecules, membrane signalling elements, and neuronal systems. Overall, our results suggest that severe malaria resistance trait is attributed to multiple genes that are enriched in overlapping pathways linked to severe malaria pathogenesis, highlighting the possibility of harnessing new malaria therapeutics that can simultaneously target multiple malaria protective host molecular pathways. Further experimental studies are needed to validate the findings in the current study.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, Further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
DD designed, and performed the data analysis and drafted the manuscript, EC, FA, and LG contributed in data analysis and revision of the manuscript, EC supervised the work. FUNDING DD, FA, HG, NK, PK, and LG, funded by The Developing Excellence in Leadership and Genetics Training for Malaria Elimination in sub-Saharan Africa (DELGEME) program (grant #PD00217ML). EC is funded by NIH projects.

ACKNOWLEDGMENTS
This study makes use of data generated by MalariaGEN. A full list of the investigators who contributed to the generation of the data is available from www.malariagen.net. We acknowledge the study participants without whom the original data couldn't have been generated. We acknowledge the Centre for High-Performance Computing (CHPC, www.chpc.ac.za) for provision of computing resources.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2021.676960/ full#supplementary-material Supplementary Data 1 | Describes the Genomic risk loci of severe malaria resistance, Independent significant SNPs and Lead SNPs identified from independent significant SNPs of severe malaria resistance GWAS, respectively.
The genomic locus was defined as the region that contain independent lead SNPs and nominally significant SNPs (p < 0.05) in linkage dis-equilibrium (LD) block with lead SNPs. An independent significant SNP was defined as a genome-wide significant SNP (P-value < 5e -8 ) within the genomic boundary of LD threshold of r 2 > 0.6.
Supplementary Data 2 | Describes the Genomic risk loci of severe malaria resistance, Independent significant SNPs and Lead SNPs identified from independent significant SNPs of severe malaria resistance GWAS, respectively. The genomic locus was defined as the region that contain independent lead SNPs and nominally significant SNPs (p < 0.05) in linkage dis-equilibrium (LD) block with lead SNPs. An independent significant SNP was defined as a genome-wide significant SNP (P-value < 5e -8 ) within the genomic boundary of LD threshold of r 2 > 0.6.
Supplementary Data 3 | Describes the Genomic risk loci of severe malaria resistance, Independent significant SNPs and Lead SNPs identified from independent significant SNPs of severe malaria resistance GWAS, respectively. The genomic locus was defined as the region that contain independent lead SNPs and nominally significant SNPs (p < 0.05) in linkage dis-equilibrium (LD) block with lead SNPs. An independent significant SNP was defined as a genome-wide significant SNP (P-value < 5e -8 ) within the genomic boundary of LD threshold of r 2 > 0.6.
Supplementary Data 4 | Presents the description of prioritized genes from severe malaria resistance GWAS by three functional mapping strategies implemented in FUMA.
Supplementary Data 5 | Describes malaria resistance candidate genes identified by gene-based GWAS analysis using Pascal method.
Supplementary Data 6 | Indicates list of genes in which the proportion of pathogenic SNPs were much higher in the three malaria endemic populations (Kenya, Malawi and Gambia) compared to the global populations.
Supplementary Data 7 | Presents the proportion of pathogenic SNPs in candidate malaria resistance genes for all the 23 global populations as identified by ANNOVAR.
Supplementary Figure 1 | Chromatin interactions and eQTLs of severe malaria resistance candidate genes on chr 11 and chr 9 risk locus, respectively. The most outer layer is the Manhattan plot displaying SNPs with P-value < 0.05. Candidate SNPs are coloured based on the highest r 2 to one of the independent significant loci (red: r 2 > 0.8, orange: r 2 > 0.6). Other SNPs are coloured in grey. The outer circle is the chromosome coordinate and genomic risk loci are highlighted in blue. Genes mapped by either Hi-C or eQTLs are shown on the inner circle. Genes identified by chromatin interaction and eQTLs are coloured orange and green respectively while genes mapped by both are coloured red.
Supplementary Figure 2 | Chromatin interactions and eQTLs of severe malaria resistance candidate genes on chr 11 and chr 9 risk locus, respectively. The most outer layer is the Manhattan plot displaying SNPs with P-value < 0.05. Candidate SNPs are coloured based on the highest r 2 to one of the independent significant loci (red: r 2 > 0.8, orange: r 2 > 0.6). Other SNPs are coloured in grey. The outer circle is the chromosome coordinate and genomic risk loci are highlighted in blue. Genes mapped by either Hi-C or eQTLs are shown on the inner circle. Genes identified by chromatin interaction and eQTLs are coloured orange and green respectively while genes mapped by both are coloured red.