Whole-Exome Sequencing Implicates the USP34 rs777591A > G Intron Variant in Chronic Obstructive Pulmonary Disease in a Kashi Cohort

Genetic factors are important factors in chronic obstructive pulmonary disease (COPD) onset. Plenty of risk and new causative genes for COPD have been identified in patients of the Chinese Han population. In contrast, we know considerably little concerning the genetics in the Kashi COPD population (Uyghur). This study aims at clarifying the genetic maps regarding COPD susceptibility in Kashi (China). Whole-exome sequencing (WES) was used to analyze three Uyghur families with COPD in Kashi (eight patients and one healthy control). Sanger sequencing was also used to verify the WES results in 541 unrelated Uyghur COPD patients and 534 Uyghur healthy controls. WES showed 72 single nucleotide variants (SNVs), two deletions, and small insertions (InDels), 26 copy number variants (CNVs), and 34 structural variants (SVs), including g.71230620T > A (rs12449210T > A, NC_000,016.10) in the HYDIN axonemal central pair apparatus protein (HYDIN) gene and g.61190482A > G (rs777591A > G, NC_000002.12) in the ubiquitin-specific protease 34 (USP34) gene. After Sanger sequencing, we found that rs777591“AA” under different genetic models except for the dominant model (adjusted OR = 0.8559, 95%CI 0.6568–1.115, p > .05), could significantly reduce COPD risk, but rs12449210T > A was not related to COPD. In stratified analysis of smoking status, rs777591“AA” reduced COPD risk significantly among the nonsmoker group. Protein and mRNA expression of USP34 in cigarette smoke extract-treated BEAS-2b cells increased significantly compared with those in the control group. Our findings associate the USP34 rs777591“AA” genotype as a protector factor in COPD.


INTRODUCTION
Chronic obstructive pulmonary disease (COPD) is a common and complicated disease of the lungs. It is characterized by continuous and irreversible airflow destruction due to chronic inflammation (Lozano et al., 2012). As of 2020, COPD is expected to become the fifth most prevalent burden of disease and the third leading cause of death worldwide (Vestbo et al., 2013).
Our epidemiological investigation shows the COPD prevalence in Kashi (Xinjiang, China) to be 17.01% , which is higher than that in other parts of China. COPD development involves environmental factors (e.g., smoking, air pollution), genetic susceptibility, and infection (Zhang et al., 2014). A trend of familial aggregation in COPD patients is also documented . Scholars have found that some genes might be related to COPD genetic susceptibility of the Chinese Han population (Wang C. et al., 2018;Zhang et al., 2020). In contrast, the study for COPD is still lacking in Kashi.
Kashi city is located in the northwestern China, and more than 90% of the total local population in Kashi are Uyghurs . The Uyghurs demonstrate a range of mixed Asian and European anthropological features (Xu and Jin 2008). At the same time, they usually do not marry other ethnic groups. Hence, their genomes are significantly different compared with the Han populations. Furthermore, the history, geographic location, and local customs of Kashi are quite different from other regions of China (Abuzhalihan et al., 2019;Abudureheman et al., 2021). These features make the Kashi Uyghurs a resourceful population for describing the ethnicity-specific variants associated with COPD.
The HYDIN axonemal central pair apparatus protein (HYDIN) gene is located on human chromosome 16 with a length of 423 kb (Laske et al., 2013). Population studies show that HYDIN mutations can cause primary ciliary dyskinesia (PCD) (Shapiro et al., 2018). PCD is a genetic disease with abnormal ciliary motility and is characterized by chronic respiratory infections (Olbrich et al., 2012;Kurkowiak et al., 2015). Respiratory infection is known to play an important role in the pathogenesis and progression of COPD (Leung et al., 2017).
The ubiquitin-specific protease 34 (USP34) gene is a member of the ubiquitin-specific protease family. USPs can regulate cell growth (Liu et al., 2019) and inhibit apoptosis (de Castro et al., 2019). Moreover, USP34 regulates the Wnt pathway (Lui et al., 2011) and plays an important role in DNA damage (Sy et al., 2013). Studies confirm that USP34 also has an influence in the NF-κB pathway (Truong et al., 2021). DNA damage, excessive apoptosis, and the NF-κB pathway are known to be associated with the development of COPD (Neofytou et al., 2012;Sauler et al., 2018;Canadas et al., 2020;Wang et al., 2021).
In the present study, eight people with COPD and one healthy person from three Uyghur families with COPD in Kashi were subjected to whole-exome sequencing (WES) to screen for the susceptibility genes and polymorphisms related to COPD. The two single nucleotide variants (SNVs) (rs12449210T > A in HYDIN and rs777591A > G in USP34) were determined to be studied because they were found for the first time in the Kashi COPD population. Furthermore, the association between the two SNVs and COPD risk have not been previously described. Based on the WES data sets, we continued to evaluate their relationship with clinical characteristics in a case-control study of 1075 people. Then, we hypothesized that the presence of SNVs combined with environmental factors (e.g., smoking) might regulate gene expression, thereby increasing the risk of COPD. Also, protein and mRNA expression of them were determined in cigarette smoke extract (CSE)-treated bronchial epithelial beas-2b (BEAS-2b) cells.

Collection of Information of COPD Families for WES
Eight COPD patients and one healthy control (HC) were selected for WES. All nine people were permanent residents of Kashi and aged >40 years. These Uyghur families were derived from a previous epidemiological investigation of COPD in Kashi. Families had ≥2 generations with COPD in three generations. The inclusion and exclusion criteria for the subjects were described by Gong et al. (2022).
Peripheral blood samples (5 ml) were obtained from each participant and transferred to EDTA-K 2 Vacutainer tubes for DNA extraction. Apart from blood samples, the basic information of patients and HC were also collected: age, sex, body mass index (BMI), pulmonary function, smoking history, and other data. All pulmonary function tests were undertaken using a spirometer (Cosmed, Rome, Italy).

Collection of Samples for a Case-Control Study
Recruited individuals were aged >40 years and from Kashi during 2018-2019. The study cohort was 541 Uyghur unrelated COPD patients and 534 Uyghur HCs from the First People's Hospital of Kashi. For the COPD group, the inclusion criteria were people whose forced expiratory volume in the first second (FEV 1 )/forced vital capacity (FVC) < 0.70 after bronchodilator inhalation denotes airflow limitation. The exclusion criteria are identical to COPD family patients. For the HC group, the exclusion criteria were the same as HC of COPD family.
After providing written informed consent, all individuals were required to provide the same basic information as the family subjects. Similarly, 5 ml peripheral blood samples were obtained from each of the 1075 subjects for DNA extraction.

Genetic Analyses
Genetic analyses, including WES, data analyses, variants selection, filtering strategy, Sanger sequencing, enrichment analyses, chemicals and reagents, Western blotting, and realtime RT-qPCR, had been described in the supplemental methods.

Statistical Analyses
SPSS 18 (IBM, Armonk, NY, United States) and PLINK v1.07 (pngu.mgh.harvard.edu/∼purcell/plink/index.shtml) were employed for statistical analyses. Quantitative data are the mean ± SD or median (interquartile range). The independent t-test was used to compare the difference between age and BMI. We used the chi-square test to ascertain the influence of sex, smoking status, coal consumption, wood consumption, cigarettes per day, and cumulative quantity of active smoking. The difference in quitting smoking years was assessed by Fisher's exact test. For parameters that did not have a normal distribution (e.g., annual household income, FEV 1 %, and FEV 1 /FVC), we used the Mann-Whitney U-test to evaluate differences. We tested for the Hardy-Weinberg equilibrium on each SNV among Frontiers in Cell and Developmental Biology | www.frontiersin.org February 2022 | Volume 9 | Article 792027 samples in the case-control study. Akaike's information criteria (AIC) were used to calculate the genetic model for rs12449210 T > A and rs777591A > G (including genotype, dominant, recessive, allele, and additive). We used odds ratios (ORs) and corresponding 95% confidence intervals (CIs) by logistic regression analysis after correcting for sex, age, BMI, and smoking status to assess the relationship between SNVs and case-control groups. R function genpwr in package devtools was used to perform post hoc power estimation. The homogeneity test of stratified analyses was evaluated by the chi-square test. In the enrichment analyses, verification criteria of multiple SNVs were evaluated using the false discovery rate (FDR) with multiple corrections. The experimental data of Western blotting and real-time RT-qPCR were the mean ± SEM, and differences between groups were evaluated by oneway ANOVA. The replicates for the experiments were at least three times. p < .05 was considered significant.

Family Members With COPD
WES WES was undertaken for Uyghur family members with COPD to identify common and significant variants. The genealogical trees of the three Uyghur families are described in Figure 1A. Eight patients and one healthy subject were included in Supplementary. Table S1. The family members with COPD were aged 48-90 (mean age 63.63) and had always lived in Kashi.
After WES, there were ∼0.26 Tb of sequencing data. The minimum value of sequencing depth was 30×, the average sequencing depth was 116.29 ×. Q20 of all samples was >95%, Q30 was >90%, and the average coverage rate was >99.5%. We detected 100,193 SNVs, 11,009 deletions, and small insertions (InDels), 2915 copy number variants (CNVs), and 29,016 structural variants (SVs). To facilitate observation of all variants in family members with COPD, we used Circos to display all variants ( Figure 1B).

Analysis for SNVs
After WES, we obtained 100,193 SNVs. We undertook statistical analyses of all SNVs (Supplementary Figure S1). We found that the top three regions with the largest number of SNVs were the intron (43.47%), exon (42.52%), and noncoding RNA (5.76%) (Supplementary Figures S1A-C).

FIGURE 1 | Family information and variants. (A):
Genealogical trees of three Uyghur families with COPD. Square male, and circle female. A square with a slash represents a dead male, and a circle with a slash denotes a dead female. A solid square represents a male with COPD, and the solid circle denotes a female with COPD. A solid square with "R" represents a COPD male who refused to join our study, and the solid circle with "R" denotes a COPD female who refused to join our study. (B): Summary of all variations among exomes of family members with COPD. There are nine circles of information represented from the outside to the inside of this. Panel 1. Chromosome. 2. Map of SNVs. The density calculation is based on the number of SNVs in each window using log 10 . The color changes from red, yellow, and blue. 3. Density graph of inserts. The density calculation is based on the number of inserts in each window using log 10 . 4. Density of deletions. The density calculation is based on the number of deletions in each window using log 10 . 5. Density map of the variation sites in coding regions, including SNVs and InDels. The density calculation is based on the various sites in each window using log 10 . 6. Density map of the mutation sites in the noncoding area, including SNVs and InDels. The density calculation is based on the number of mutation sites in each window using log 10 . 7. Map of CNV locations. The area size represents the size of CNV. Red gain, blue loss (when the number of samples >1, there is no circle After drawing a heat map for all 2915 CNVs, we detected the gain and loss of each chromosome (Supplementary Figure S2). Finally, we obtained two CNVs (Supplementary Table S4).
After analyzing the distribution of SVs in different regions of the genome, we found that the top three regions with the largest number of SVs were the intron (55.99%), exon (18.65%), and 3′ UTR (8.24%) (Supplementary Figure S1F). Finally, we obtained 34 SVs (Supplementary Table S5).
In conclusion, we obtained 72 SNVs, two CNVs, 26 InDels, and 34 SVs by the aforementioned filtering strategy. The top 30 q-value GO terms among the categories of "cell component," "molecular function," and "biological process." The abscissa "rich factor" represents the ratio of input or background frequency in the enrichment-analysis result. The bubble size represents the number of genes annotated to this functional entry for the mutant gene, and the color corresponds to the q-value in the enrichment-analysis result.

Enrichment Analyses of Common Mutant Genes in Family Members with COPD
We selected the top 30 q-value GO terms, pathways, and diseases to draw a bubble chart (Figures 2A-C). We also selected the top  . Anti-β-actin antibody was used as a control in Western blotting. *p < .05, **p < .01, and ***p < .001, versus control or 24-h group.
Frontiers in Cell and Developmental Biology | www.frontiersin.org February 2022 | Volume 9 | Article 792027 30 q-value GO terms among "cell component," "molecular function," and "biological process" to draw bubble charts ( Figures 2D-F). The enrichment analyses indicate common mutant genes were related to cilium-or flagellum-dependent cell motility, ciliary cytoplasm, axoneme, calcium-ion binding, olfactory transduction, ABC transporters, obesity-related traits, and congenital disorders of metabolism.

Clinical Characteristics of the COPD and Non-COPD Groups
A total of 1075 individuals were invited to participate ( Table 1). The study cohort comprised 541 Uyghur COPD patients and 534 Uyghur HCs. Significant differences between COPD cases and HCs were observed with regard to age, sex, BMI, smoking status, FEV 1 %, and FEV 1 /FVC (p < .05 for all) but not for annual household income, coal consumption, wood consumption, cigarettes per day, cumulative quantity of active smoking, and quitting smoking years (p > .05 for all) (Supplementary Table S6).

Hardy-Weinberg Equilibrium of rs12449210T > A and rs777591A > G
Rs12449210T > A and rs777591A > G both met the criteria for the Hardy-Weinberg equilibrium (p > .05) ( Table 2). Therefore, rs12449210T > A and rs777591A > G could be analyzed further.

Stratified Analysis of rs12449210T > A and rs777591A > G in the Case-Control Study
We undertook stratified analysis for the relationship between rs12449210T > A and rs777591A > G and COPD risk based on smoking status and FEV 1 %.
For rs12449210T > A (Supplementary Table S7), no differences were observed in smokers and nonsmokers before or after correction of potentially influencing factors.
Rs12449210T > A and rs777591A > G did not show an obvious difference in FEV 1 % (Supplementary Table S8).

Expression of USP34 and Iκbα in CSE-Treated BEAS-2b Cells
We wished to determine the impact of CSE on expression of USP34 and Iκbα. We measured expression in BEAS-2b cells treated with 0%-3% CSE for 24 h. Compared with the control group, 3% CSE obviously increased expression of USP34 and Iκbα protein (Figures 3E,F) and mRNA ( Figures 3A,B).
To determine the impact of different treatment times on expression of USP34 and Iκbα, we measured expression in BEAS-2b cells treated with 3% CSE for 24, 48, and 72 h. Compared with the 24 h group, treatment for 48 and 72 h significantly increased expression of USP34 and Iκbα protein ( Figures 3G,H) and mRNA ( Figures 3C,D).

DISCUSSION
A total of 72 SNVs, two CNVs, 26 InDels, and 34 SVs were obtained by WES from three Uyghur families with COPD. Two SNVs, rs12449210T > A of HYDIN and rs777591A > G of USP34, were chosen to verify in a large Uyghur population. We found that "AA" in rs777591A > G was a protective factor against COPD, whereas rs12449210T > A was not related to COPD susceptibility in Kashi COPD population. mRNA and protein expression of USP34 and Iκbα were obviously increased in CSEtreated BEAS-2b cells in vitro study.
WES has been used widely in the discovery of pathogenic genes (Adalsteinsson et al., 2017;Epi25 Collaborative, 2019;Petrovski et al., 2019). WES could be used to detect high-frequency, lowfrequency, and even rare variants that have important roles in the occurrence of complex diseases and could be employed to discover new variants (Meienberg et al., 2015).
An approach, used in Mendelian diseases of high penetrance, is to use a direct filtering strategy in a few families to identify rare or novel variants (Qiao et al., 2016). Many studies identify and validate the candidate variants that were discovered using this approach. They performed WES in a small sample size of family patients to identify potentially related causative genes. The candidate variants were then assessed in a large sample size of case-control population by Sanger sequencing (Mescheriakova et al., 2019;Cetani et al., 2020;Engelbrecht et al., 2020;Froukh et al., 2020).
We undertook WES on eight Uyghur patients from three families with COPD. We showed a DNA-variation map of Frontiers in Cell and Developmental Biology | www.frontiersin.org February 2022 | Volume 9 | Article 792027 COPD patients (SNVs, InDels, CNVs, and SVs) at the genetic level, and found 72 SNVs of 55 genes that might be associated with COPD. Scholars have found four genes that might be related to COPD using WES in 49 families with COPD: DNAH8, ALCAM, RARS, and GBF1. The main populations they studied were Caucasian and African Americans (Regan et al., 2010;Qiao et al., 2016). Also, the patients they studied were aged <53 years with FEV 1 % ≤40% before bronchodilator inhalation. In our study, we studied a Uyghur population aged >40 years with FEV 1 /FVC <0.70 after bronchodilator inhalation. Those were the differences between ours and previous studies. Furthermore, the final 72 SNVs of 55 genes for COPD identified in our study were different from those determined in previous studies. On the one hand, previous studies selected rare variants, according to the common disease-rare mutation hypothesis (Khera et al., 2018) and most candidate gene methods (Gibson 2012). We did not select rare variants as the research targets. On the other hand, the genetic backgrounds of different ethnicities are different, and even though the same filtering strategy was used, the results might be different . Consequently, rs12449210T > A of HYDIN and rs777591A > G of USP34 in our study were found for the first time in a Uyghur COPD population in Kashi.
We evaluated only one HC as a reference; the selected 72 SNVs might contain the SNVs of healthy people. Hence, it was necessary to verify the relationship between these 72 SNVs and COPD in a case-control study.
In the case-control study of two SNVs (rs12449210T > A and rs777591A > G), they were analyzed through genetic models, tobacco smoking, and pulmonary function. Because the PLINK software defaulted a higher mutation frequency to a wild type, in Asia, the "G" mutant frequency of rs777591A > G was higher than "A" (via NCBI database: https://www.ncbi.nlm.nih.gov/). Therefore, our data determined "AA" in rs777591A > G (USP34) to be a protective factor.
Smoking is the main risk factor for COPD (Vogelmeier et al., 2017); rs777591"AA" showed weak protection. However, in the stratified analysis of smokers and nonsmokers, the protective effect of rs777591"AA" was meaningful only in the nonsmoker group. Based on this analysis, we think rs777591"AA" of USP34 had only a weak protective effect against COPD in the nonsmoker group.
In the present study for the COPD population in Kashi, the allele "A" frequency of rs777591A > G is 35.13%, and the COPD prevalence is 17.01% . In East Asia, the frequency of allele "A" is 32%, but the prevalence of COPD in China (East Asia), Korea (East Asia), and Japan (East Asia) is 13.7% , 7.82% (Kwon and Kim 2016), and 10.9% (Tan and Ng 2008), respectively. Although South Asia has a higher allele "A" frequency (45%), the prevalence of COPD in countries such as India and Nepal is 4.1% and 18% (Tan and Ng 2008), respectively. Thus, despite the similar frequency of allele "A", the incidence of COPD varied considerably. This might be because the pathogenesis of COPD was not caused by genes alone, but by the interaction of genetic factors, environmental factors (e.g., tobacco smoking, air pollution, wood and coal consumption) and economic status (Shetty et al., 2021). The protective effect of "A" in rs777591A > G might only be demonstrated in the Kashi COPD population at present, and further validation in different ethnic groups or different regions is required.
HYDIN mainly encodes the C2b protein on the central microtubules of motile cilia to adjust the amplitude of the cilia swing and the synergy between cilia (Olbrich et al., 2012). The influence of HYDIN on the cilia ultrastructure could lead to abnormal cilia function and, ultimately, reduce the ability to remove foreign bodies (Lechtreck et al., 2008). Some studies show that, in COPD patients, the cilia on the mucosal surface of the respiratory tract are shortened, and cilia motility is reduced, resulting in significant impairment of foreign-body excretion (Hedstrom et al., 2021). Therefore, we speculated that HYDIN might have a key role in the occurrence and development of COPD. In this study, the rs12449210T > A of HYDIN located in 5′ UTR. HYDIN might be related to the abnormal function of respiratory-tract mucosal cilia in COPD, but we showed in a case-control study that HYDIN had little effect on the occurrence and development of COPD. This did not mean that HYDIN was not associated with COPD in other ethnicities.
Studies show that USP34 inhibits nuclear factor-kappa B (NF-κB) signal transduction by deubiquitinating and stabilizing the NF-κB inhibitor Iκbα (Li et al., 2020). Studies on the inflammatory cell surface and bronchial epithelial biopsies in COPD cases show that the NF-κB pathway is highly activated (Di Stefano et al., 2002). NF-κB expression has also been found to be increased significantly in the lung tissues of animals with COPD (Yang et al., 2009). Therefore, USP34 might have a key role in the occurrence and development of COPD. rs777591A > G is located in the intron region at HaploReg v4.1 (ANNOVAR annotated it in 3′UTR of USP34). Although rs777591A > G is located in the noncoding regions, genome-wide association studies (GWASs) indicate that only 7% single nucleotide polymorphisms (SNPs) related to complex diseases were located in the coding regions, whereas 93% were in the noncoding regions . We showed that rs777591"AA" could reduce COPD, so it is a protective factor against COPD. In our in vitro study, mRNA and protein expression of USP34 and Iκbα in CSE-treated BEAS-2b cells were obviously higher than those in controls.
We note important limitations of our experimental setting. First, only one non-COPD sample of a COPD family was used to be a control in WES. This may result in containing familyspecific variants that are not associated with COPD. It would increase costs and reduce experimental efficiency for casecontrol validation. Second, the recruited participants were all Uyghurs, and the results might not be representative for ethnic populations. Also, the sample size of the smoking population in the case-control group was small. In addition, although USP34 mRNA and protein expression levels were increased in CSEtreated BEAS-2b cells, the specific role of USP34 in COPD pathogenesis is unclear.
Frontiers in Cell and Developmental Biology | www.frontiersin.org February 2022 | Volume 9 | Article 792027 Due to the above limitations, we will conduct multicenter studies and expand the sample size to further validate our data. Further experiments (including cellular experiments in genetic models, knockout or overexpressed animal experiments) are also required.

CONCLUSION
In the present study, WES revealed 72 SNVs, two CNVs, 26 InDels, and 34 SVs in three families with COPD. rs777591 "AA" of USP34 was a protective factor against COPD, especially in the nonsmoking population. USP34 mRNA and protein expression levels were increased in CSE-treated BEAS-2b cells. Therefore, our data provide new clues for the relationship between USP34 polymorphisms and COPD susceptibility in a Chinese Uyghur population.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: PRJNA785331

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The Ethics Committee of the First People's Hospital of Kashi (2019-95). The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
XZ and LL conceived and designed the study. JX and JR carried out the experiments and wrote the first version of the manuscript. XZ, CX, and AZ conducted the variants analyses. AA, MT, and SZ recruited individuals and undertook clinical work. LT and DH carried out statistical analyses. All authors read and approved this version of the manuscript.

FUNDING
This study was supported by the Xinjiang Uygur Autonomous Natural Science Foundation of China (Grant numbers: 2019D01C006 and 2017D01C009).