Validations of Top and Novel Susceptibility Variants in All-Age Chinese Patients With Acute Lymphoblastic Leukemia

Through genome-wide association studies (GWAS), multiple inherited predispositions to acute lymphoblastic leukemia (ALL) have been identified in children. Most recently, a novel susceptibility locus at ERG was localized, exhibiting Hispanic-specific manner. In this study, we conducted a replication study to in all-age Chinese patients (N = 451), not only validating the novel ERG locus, but also systematically determining the impact of age on association status of the top GWAS signals. We found that single nucleotide polymorphisms at ARID5B, IKZF1, CEBPE, PIP4K2A were only significantly associated with ALL susceptibility in childhood patients with no BCR-ABL fusion, while GATA3 signal exhibited its significance in adults no matter carrying BCR-ABL fusion or not. Moreover, the novel ERG SNP can be validated in pediatric patients without both BCR-ABL and ETV6-RUNX1 fusion. Our finding suggests the modifying effects of age on genetic predisposition to ALL, and highlights the impact of ERG SNP in Chinese patients.


INTRODUCTION
Acute lymphoblastic leukemia (ALL) is a leading cause of disease-induced death, especially in children aged between 2 and 5 years old, and age is negatively related to prognosis and survival rates of ALL patients according to the clinical statistics. The inferior prognosis of adult ALL patients is likely to be multifactorial, including the age-related differences in leukemia blasts and host genomic alterations. Leukemogenesis mechanisms were considered to be different between children and adults. For example, more frequent somatic BCR-ABL fusion (also known as Philadelphia chromosome positive, Ph + ) were observed in adults' (Inaba et al., 2013). Additionally, racial disparities were also noticed in terms of incidence and treatment outcomes, highlighting the importance of age-and ethnicity-related study on leukemogenesis and clinical practice.
By conducting genome-wide association studies (GWASs), we and other independent groups identified multiple top inherited predispositions to acute lymphoblastic leukemia (ALL) susceptibility, including single nucleotide polymorphisms (SNPs) at ARID5B, IKZF1, GATA3 and etc. (Papaemmanuil et al., 2009;Trevino et al., 2009;Perez-Andreu et al., 2013Xu et al., 2013Xu et al., , 2015Vijayakrishnan and Studd, 2018;Wiemels et al., 2018). Association of these SNPs are greatly impacted by clinical characteristics (e.g., age, ethnicity, and subtypes) (Xu et al., 2012;Perez-Andreu et al., 2015;Liao et al., 2016). For instance, we and another independent group noticed that association of ARID5B and GATA3 SNPs with ALL risk were oppositely influenced by age in Caucasians and Hispanics (Migliorini et al., 2013;Xu et al., 2013). Also, we recently identified a novel locus at rs2836365 in ERG gene, which tends to exhibit Hispanic specific manner and varied in different subtypes (Qian et al., 2018). This novel site has been recently validated in independent studies (de Smith et al., 2019;Vijayakrishnan et al., 2019), but predominately performed in childhood without BCR-ABL fusion, with limited systematical investigation in Ph + adolescents/adults and Chinese patients.
In this study, we sought to investigate association of the reported top GWAS signals (i.e., ARID5B, IKZF1, GATA3, PIP4K2A, CEBPE, and ERG) with ALL susceptibility in all-age Chinese patients, and also estimate the impact of BCR-ABL fusion and age at diagnosis.

Subjects and Genotyping
Peripheral blood was obtained from 456 non-ALL individuals, as well as 451 all-age B-lineage ALL patients, who were treated with standard protocol in Department of Hematology/Oncology, West China Hospital and West China Second Hospital (e.g., CCGC-ALL2015, registered in http://www.chictr.org.cn/with ID: ChiCTR-IPR-14005706) (Zhu et al., 2018;Cao et al., 2020). Clinical information of each patient was obtained from the electronic records system at our hospitals, including gender, age and white blood cell (WBC) at diagnosis, molecular subtypes. Fusion-based molecular subtypes were determined by Fluorescence in situ hybridization, while hyperdiploid was determined by flow cytometry-based DNA index. Patients (14 years is considered as children, while the rest were considered as adolescents/adults (following referred as "Adults") in this study.
Nine SNPs at 6 loci (ARID5B, IKZF1, GATA3, PIP4K2A, CEBPE, and ERG) were directly genotyped through Sanger sequencing. Additionally, we also retrieved genotype information of a large Chinese Han population from the public dataset to increase the statistical power (Cai et al., 2017). After filtered out the individuals with missing information of either SNPs analyzed in this study, genotype information of 10,640 out of 11,670 individuals from this public database were used for further association analyses. Comparison were conducted between these two control cohorts in terms of risk allele frequencies (RAF). Because the prevalence of patients with ALL is less than 1 in 10,000 in China, these two control cohorts were combined and considered as non-ALL controls (totally 11,096) for SNPs with no significant difference. This study was approved by Ethnics Committee of West China Hospital and West China Second Hospital, and informed consent was obtained from patients or their guardians, as appropriate.

Statistical Analysis
All SNPs have passed the quality control based on call rate of the SNPs (at least 95% patients have been successfully genotyped for each SNP), and Hardy-Weinberg equilibrium (P > 0.05 for χ 2test). The association of each SNP with ALL susceptibility was estimated by comparing the genotype frequency between ALL cases and non-ALL controls by using logistic regression model. P-value, odds ratio (OR) and 95% confidence interval (95% CI) was estimated by using R (version 3.5), and a two-sided P < 0.05 was considered as statistically significant.

RESULTS
Totally 451 B lineage all-age ALL patients were included in this study, and were subsequently divided into two age groups, including 294 childhood (median age = 4.5; ≤ 14 years), and 157 adolescents/adults (median age = 37; range 14-68). Baseline characteristics of ALL patients are summarized in Table 1. BCR-ABL fusion has been taken it into account in the association test, because it is enriched in adults and barely considered in previous association studies. Totally nine SNPs were directly genotyped in all cases and 456 non-ALL controls, with overall call rate >95%. To increase the statistical power, genotypes of these nine SNPs were also retrieved from public dataset with 10,640 Han Chinese, and compared with our non-ALL controls in terms of risk allele  frequencies (RAF). No significantly difference were observed in any of these SNPs, we therefore combined the two control cohorts (N = 11,096) to conduct association analyses ( Table 2).
We next performed association analyses in two age groups separately. Interestingly, ARID5B and GATA3 SNPs were only significantly associated with ALL susceptibility in childhood and adolescent/adult patients, respectively (Table 3). Additionally, the independent IKZF1 signal exhibit its association in both age group, with even higher odds ratio in adult than childhood Ph − patients (1.55 vs. 1.36), while the top IKZF1 SNP (i.e., rs11978267) was only significant in childhood Ph − patients ( Table 3). SNPs at PIP4K2A locus also tend to be associated with ALL susceptibility in childhood patients, especially with hyperdiploid subtype [e.g., rs4748793, P = 0.02, OR = 1.93 (1.09-3.40)].
Particularly, we focused on the novel Hispanic-specific ALL risk signal at ERG locus, observing only marginally significant association of this SNP with ALL susceptibility in childhood patients [P = 0.09, OR = 1.17 (0.98-1.40)], but not adults (P = 0.76). Because RAF of rs2836365 was observed to be under-represented in patients with ETV6-RUNX1 fusion (Qian et al., 2018), the association reached significance after excluding ETV6-RUNX1 and Ph + subtypes of patients [P = 0.04, OR = 1.23 (1.01-1.52)], Next, RAF of rs2836365 was determined in each ALL subtype and age group. Consistent with observation in Hispanics (Qian et al., 2018), enriched risk allele was observed in TCF3-PBX1 subtype at rs2836365 (RAF = 0.36, Figure 1). However, the difference was not significant probably because of the small sample size.

DISCUSSION
Racial differences in cancer risk and survivals are well documented, but with poor biological basis explored (Schafer and Hunger, 2011). Although increased genomic screenings were conducted in non-Caucasians, the most comprehensive records and novel findings in cancer genomics are still predominated based on Caucasian but limited in Chinese, including ALL.
In this study, we systematically analyzed the association status of the reported top and novel GWAS signals in allage Chinese ALL patients, demonstrating the large effect of age on genetic predispositions to ALL susceptibility. The top signal at ARID5B locus exhibited the strongest association only in childhood patients, which further support its importance in Chinese patients (Wang et al., 2013). However, no significant association was found in adolescents/adults, which is consistent with the findings in Caucasians (Peyrouze et al., 2012;Burmeister et al., 2014), as well as the fact of negative correlation of ARID5B SNP effect with age at diagnosis in pediatric patients (Xu et al., 2013). In contrast, only GATA3 SNP exhibited its significance in adolescents/adults, validating previous Caucasian population based GWAS performed by us and another independent group. On the other hand, we validated the association of IKZF1-rs11978267 and CEBPE SNP with ALL risk in childhood patients, standing by the side of replication study with positive validation result (Urayama et al., 2018), rather than the racespecific assumption (Wang et al., 2013). Moreover, at least two independent association signals were located at IKZF1 locus, and higher impact of rs11770117 than rs11978267 was noticed in Hispanics but not Caucasians. In this case, IKZF1-rs11770117 was also evaluated in our cohort, and exhibits not only independent association with childhood ALL susceptibility after adjusting for rs11978267, but also marginally significance in adolescents/adults, indicating different regulation patterns of IKZF1 may be involved in leukemogenesis between children and adults.
Effects of the reported inherited predispositions to ALL varied among different subtypes. For instance, ARID5B and PIP4K2A SNPs were more enriched in hyperdiploid subtype, which was also validated in our study. However, Ph + subtype has barely mentioned because of its low frequency in childhood ALL. In this study, 70 Ph + patients were enrolled for association analyses, and mostly in adolescents/adults. Although no clear conclusion can be drawn in childhood ALL due to the limited statistical power, we observed significant association of GATA3 SNP with ALL risk in adults with Ph + subtype, indicating its Frontiers in Genetics | www.frontiersin.org FIGURE 1 | The frequency of ERG SNP in age-aged ALL patients. Risk allele frequencies of rs2836365 were illustrated in all patients/non-ALL controls, and separately in each group divided in terms of age and subtypes. Logistic regression tests were conducted by comparing risk allele frequencies of rs2836365 in each patient group with non-ALL controls. * P < 0.05.
broad effect in both Ph + and Ph-like ALL in elder patients . To our knowledge, this is the first report that established the correlation of GATA3 SNP with ALL risk in adult Ph + patients, which therefore requiring more evidences in independent validation cohorts. Importantly, we recently identified a novel Hispanic-specific signal at ERG locus, which is highly related to Native Americans ancestry (Qian et al., 2018). Since ancestors of Native Americans is considered to descend from a single founding population initially split from East Asians (Moreno-Mayar et al., 2018), we sought to evaluate its effect in Chinses ALL patients. Actually, the trend of ERG-rs2836365 RAF distribution in each subtype is similar to that in childhood Hispanic patients, which is high in TCF3-PBX1 + subtype and low in ETV6-RUNX1 − subtype, while no difference was observed in adults with any subtypes. Association of ERG-rs2836365 can only reach statistically significance in childhood patients with Ph − /ETV6-RUNX1 − , probably because of the small sample size.
Collectively, our results not only described association of the top GWAS hits with ALL susceptibility in all-age Chinese patient (including Ph + patients), but also provided independent confirmation that the novel signal at ERG locus conferred risk of childhood ALL in an age and subtype specific manner. Additionally, the fact of consistent significant race-specific association of ERG and IKZF1-rs11770117 with ALL susceptibility, suggests the similarity of East Asians and Hispanics for leukemogenesis. However, since the relatively small sample size in our study, additional independent cohorts with larger sample size are needed to confirm the effect of ERG SNP, especially in different molecular subtypes.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation, to any qualified researcher.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics Committee of West China Hospital and West China Second Hospital. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
YS, YiZ, and XL designed and supervised this study. FL, DY, JZ, ZD, and YW conducted the experiments and data analyses, and interpreted the data. YY, YQ, WZ, YaZ, BY, LW, and JG collected the clinical information. XL contributed to the conception of the study and drafted the manuscript. All authors contributed to writing of the manuscript and approved the final manuscript.