Using multi-tissue transcriptome-wide association study to identify candidate susceptibility genes for respiratory infectious diseases

Objective: We explore the candidate susceptibility genes for influenza A virus (IAV), measles, rubella, and mumps and their underlying biological mechanisms. Methods: We downloaded the genome-wide association study summary data of four virus-specific immunoglobulin G (IgG) level data sets (anti-IAV IgG, anti-measles IgG, anti-rubella IgG, and anti-mumps virus IgG levels) and integrated them with reference models of three potential tissues from the Genotype-Tissue Expression (GTEx) project, namely, whole blood, lung, and transformed fibroblast cells, to identify genes whose expression is predicted to be associated with IAV, measles, mumps, and rubella. Results: We identified 19 significant genes (ULK4, AC010132.11, SURF1, NIPAL2, TRAP1, TAF1C, AC000078.5, RP4-639F20.1, RMDN2, ATP1B3, SRSF12, RP11-477D19.2, TFB1M, XXyac-YX65C7_A.2, TAF1C, PCGF2, and BNIP1) associated with IAV at a Bonferroni-corrected threshold of p < 0.05; 14 significant genes (SOAT1, COLGALT2, AC021860.1, HCG11, METTL21B, MRPL10, GSTM4, PAQR6, RP11-617D20.1, SNX8, METTL21B, ANKRD27, CBWD2, and TSFM) associated with measles at a Bonferroni-corrected threshold of p < 0.05; 15 significant genes (MTOR, LAMC1, TRIM38, U91328.21, POLR2J, SCRN2, Smpd4, UBN1, CNTROB, SCRN2, HOXB-AS1, SLC14A1, AC007566.10, AC093668.2, and CPD) associated with mumps at a Bonferroni-corrected threshold of p < 0.05; and 13 significant genes (JAGN1, RRP12, RP11-452K12.7, CASP7, AP3S2, IL17RC, FAM86HP, AMACR, RRP12, PPP2R1B, C11orf1, DLAT, and TMEM117) associated with rubella at a Bonferroni-corrected threshold of p < 0.05. Conclusions: We have identified several candidate genes for IAV, measles, mumps, and rubella in multiple tissues. Our research may further our understanding of the pathogenesis of infectious respiratory diseases.


Introduction
Although respiratory infections are largely preventable causes of illness and death, they remain the leading cause of death from infectious diseases worldwide, ranking fifth among all total causes of death (GBD 2015LRI Collaborators, 2017. Influenza A virus (IAV), measles virus, rubella virus, and mumps virus infect a wide range of hosts, are highly transmissible and are prone to a mutation that can cause pandemics in a short period of time, placing an enormous burden and pressure on public health systems (Marshall and Plotkin, 2019;Harrington et al., 2021;Iacobucci, 2022;Raghunathan and Orenstein, 2022). The incidence of IAV, measles, mumps, and rubella has decreased drastically following the implementation of vaccination programs. However, the segmented genome of IAV allows for easy recombination and antigenic shift, resulting in novel antigens (Bouvier and Palese, 2008). Thus, the ability of IAV to rapidly evolve can result in highly pathogenic viral strains. Measles and rubella remain endemic in many countries, leading to the importation of cases and occasional local transmission within China, and the resurgence of mumps has recently been reported, with outbreaks still occurring and challenges remaining while controlling these diseases (Kauffmann et al., 2021). There is considerable variation in the severity of respiratory infections caused by viral infections. There are major determinants of this variability, such as intrinsic virus pathogenicity, acquired host factors (e.g., immunity and coexisting conditions), and innate host susceptibility. While viral genetic determinants of respiratory disease severity and host immunity have been well studied, host genetic determinants have been much less explored (Kennedy et al., 2014;Voigt et al., 2018;Sabikunnahar et al., 2022).
Genome-wide association studies (GWAS) are effective methods for understanding the genetic basis of many complex traits in common human diseases (Hayes, 2013). In particular, these have proven to be well-suited for identifying common single-nucleotide polymorphism (SNP) variants with moderate to large effects on phenotypes (Gorlova et al., 2022). However, the specific biological mechanisms and functional consequences of many of the genetic variants identified by the GWAS remain unclear, especially their role in disease severity; that is, GWAS methods may miss small effect-trait associations. Gene expression is an intermediate phenotype between genetic variation and an underlying disease predisposition trait (Albert and Kruglyak, 2015). Many genetic variants affect complex traits by regulating gene expressions. Unfortunately, large-scale expression-trait associations are hampered by sample availability and cost, as well as intrinsic factors and small effects. Therefore, to address these issues, transcriptome-wide association studies (TWASs) were developed, which integrate gene expression into large-scale GWAS (Derks and Gamazon, 2020). Through extensive simulation of existing GWAS data, TWASs have identified candidate genes associated with mental disorders (Thériault et al., 2018), calcified aortic stenosis (Liu et al., 2020), pancreatic cancer (Lin et al., 2017), and inflammatory biological age . However, for many complex traits, biologically relevant tissues are unknown. Most existing studies identify gene-trait associations on the basis of a single tissue, whereby the significant gene effects that are identified get inflated. In addition, it has been shown that eQTLs with larger effects tend to regulate gene expressions in multiple tissues. The TWAS analysis of multiple tissues improves the accuracy of the results.
There have been some studies exploring the association of respiratory infectious diseases with some human tissues. Pulmonary inflammation and airway epithelial injury are hallmarks of human pulmonary infectious diseases. The association of HLA alleles with respiratory infectious diseases in lung cells has been confirmed by Zhang et al. (2022). Moreover, Xiao et al. (2023) found that interleukin-11 (IL-11) as a fibrotic factor may be associated with respiratory diseases leading to respiratory failure or even death. These potential connections deserve our attention.
In this study, we used TWASs to explore genes associated with IAV, measles, rubella, and mumps. Given that respiratory infectious diseases are associated with multiple tissues, we used eQTL reference panels for three tissues, namely, whole blood, lung, and transformed fibroblast cells, and pooled the GWAS data for IAV, measles, rubella, and mumps to identify tissue-specific susceptibility genes. This study lays the foundation for further understanding the pathogenesis of these four viruses.

GWAS data collection and processing
We collected four virus-specific immunoglobulin G (IgG) level data sets (anti-IAV IgG, anti-measles IgG, anti-rubella IgG, and anti-mumps virus IgG levels) from the NHGRI-EBI GWAS directory (https://www.ebi.ac.uk/gwas/downloads/summarystatistics). These data sets were originally obtained from 1,000 healthy individuals (MI), and their serum prevalence rates were 77.7%, 88.5%, 93.5%, and 91.2%, respectively, for these four viruses (Scepanovic et al., 2018). Considering that the main unit of anti-IAV IgG quantitative unit is S/CO (signal/cut-off ratio), the main unit of anti-measles and mumps virus IgG level is IA (index antibody), and the main unit of anti-rubella virus IgG is UI/mL. We also use SD as the change unit. Then, we screened the above data for single-nucleotide polymorphisms (SNPs) for further analysis. We retained the following SNPs: 1) autosomes (1-22), 2) minimum allele frequencies (MAF) greater than 0.001, and 3) more than 70% of observers having these specific SNPs (Yang and Zhou, 2020;Yang and Zhou, 2022). We also plotted Manhattan plots and q-q plots for the processed GWAS data sets for four traits to show the frequency and distribution of their genes ( Figure 1; Supplementary Figure S1).

Transcriptome-wide association analysis
We used FUSION to perform relevant TWAS tissue analysis using GWAS summary data formatted after the processing of whole blood, lung, and transformed fibroblast cells (Gusev et al., 2016). Specifically, FUSION uses the pre-calculated gene expression weight to perform summary statistics with respiratory infectious diseases GWAS and calculates the association between each gene and respiratory infectious diseases. Association statistics are defined as TWAS z-scores and estimated as follows: The weight of Z and W plays a crucial role in the distribution of Z TWAS . In the formula, Z is the estimated value of the SNPs for the disease under study, w t is the weight, and s is the linkage disequilibrium between all SNPs. It is worth noting that Z TWAS is only well calibrated under the zero model of Z~N (0, Σs,s) (mean 0, unit variance 0). The reference panel that we used came from the 1000 Genomes Project in Europe (Speed et al., 2020). We also downloaded from FUSION (http://gusevlab.org/projects/fusion/) pre-computed gene expression weights based on GTEx v7 feature weights and used these for the research of each tissue. Based on the highest cross-validation R 2 , FUSION selects the best expression prediction model among the top eQTL, the best linear predictor (BLUP) (Gusev et al., 2016), the Bayesian linear mixed model (BSLMM) (Zhou et al., 2013), the least absolute shrinkage and selection operator (LASSO) (Tibshirani, 1996;Veturi and Ritchie, 2018), and the elastic net (Hui and Hastie, 2005) to calculate the SNP expression weight within a 1 Mb window for a given gene. We also applied the Bonferroni correction to explain multiple hypotheses and set the p-value threshold to 0.01.
GSTM4 belongs to the mu class of glutathione S-transferase (GST). Compared to other GSTMs (Comstock et al., 1993), GSTM4 exhibits greater consistency in amino acid sequence but has significant differences in physicochemical properties and tissue distribution. GSTM4 does not exhibit activity comparable to standard TPS substrates, and its specific substrate has not been identified. It is well known that the GST gene is highly polymorphic to adapt to the growing number of exotic compounds (Hayes and Strange, 2000). This polymorphism can alter individual susceptibility to the disease and response to therapeutic agents. For example, the T2517C polymorphism in GSTM4 has been shown to be associated with an increased risk of lung cancer (Liloglou et al., 2002). The mechanism underlying this association is not yet clear.
Currently, there are few reports of the relationship between the GSTM4 gene and the disease. Efforts may be needed to verify its role in infectious respiratory diseases.
In addition to the aforementioned genes, we have identified other candidate genes (SOAT1, HCG11, PAQR6, and AP3S2) in different tissues, of which AP3S2 has been reported to be associated with the development of type 2 diabetes (Kazakova et al., 2017). Other genes have also been implicated in cancer. For example, SOAT1 has been linked to the development of stomach cancer (Zhu et al., 2021), and pancreatic cancer (Oni et al., 2020); HCG11 may influence the development of nasopharyngeal carcinoma (Zheng et al., 2022); the PAQR6 gene has also been associated with several cancers (Yang et al., 2021). Although there is little research on these genes and respiratory infectious diseases, the potential links between these genes deserve further attention in future studies.
In summary, our results list some genes for respiratory infections that have not been studied before. Based on the previous knowledge of these genes, this means that they may be involved in host infection through transcriptional processes of susceptible genes and RNA degradation processes. This study is helpful to deepen the understanding of the pathogenesis of respiratory infectious diseases.

Data availability statement
Publicly available data sets were analyzed in this study. These data can be found here: https://www.ebi.ac.uk/gwas/downloads/ summary-statistics.

Author contributions
XZ, YZ, and LJ contributed equally to the conception, design, and execution of the study, as well as to the analysis and interpretation of data. XZ, YZ, LJ, XY, YZo, JT, JL, SY, RY, and PH drafted, revised, and approved the manuscript. RY and PH are the corresponding authors who took responsibility for the final submission. All authors agree to be accountable for all aspects of the work, ensuring the accuracy and integrity of the content.

Funding
This study was sponsored by the National Natural Science Foundation of China (82173585 and 82273741), the Natural Science Foundation of Jiangsu Province (BK20190106), and the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).