MYBL2 Gene Polymorphism Is Associated With Acute Lymphoblastic Leukemia Susceptibility in Children

Purpose Although MYBL2 had been validated to participate in multiple cancers including leukemia, the role of MYBL2 polymorphisms in acute lymphoblastic leukemia (ALL) was still not clear. In this study, we aimed to evaluate the association between MYBL2 single nucleotide polymorphisms (SNPs) and ALL risk in children. Methods A total of 687 pediatric ALL cases and 971 cancer-free controls from two hospitals in South China were recruited. A case-control study by genotyping three SNPs in the MYBL2 gene (rs285162 C>T, rs285207 A>C, and rs2070235 A>G) was conducted. The associations were assessed by odds ratios (ORs) with corresponding 95% confidence intervals (CIs). Subgroup and stratification analyses were conducted to explore the association of rs285207 with ALL risk in terms of age, sex, immunophenotype, risk level, and other clinical characteristics. The false-positive report probability (FPRP) analysis was performed to verify each significant finding. Functional analysis in silico was used to evaluate the probability that rs285207 might influence the regulation of MYBL2. Results Our study demonstrated that rs285207 was related to a decreased ALL risk (adjusted OR = 0.78; 95% CI = 0.63-0.97, P = 0.022) in the dominant model. The associations of rs285207 with ALL risk appeared stronger in patients with pre B ALL (adjusted OR=0.56; 95% CI=0.38-0.84, P=0.004), with normal diploid (adjusted OR=0.73; 95% CI=0.57-0.95, P=0.017), with low risk (adjusted OR=0.68; 95% CI=0.49-0.94, P=0.021), with lower WBC (adjusted OR=0.62; 95% CI=0.43-0.87, P=0.007) or lower platelet level (adjusted OR=0.76; 95% CI=0.59-0.96, P=0.023). With FPRP analysis, the significant association between the rs285207 polymorphism and decreased ALL risk was still noteworthy (FPRP=0.128). Functional analysis showed that IKZF1 bound to DNA motif overlapping rs285207 and had a higher preference for the risk allele A. As for rs285162 C>T and rs2070235 A>G, no significant was found between them and ALL risk. Conclusion In this study, we revealed that rs285207 polymorphism decreased the ALL risk in children, and rs285207 might alter the binding to IKZF1, which indicated that the MYBL2 gene polymorphism might be a potential biomarker of childhood ALL.


INTRODUCTION
Acute lymphoblastic leukemia (ALL) is a malignant proliferation of poorly differentiated lymphocytes, the most common type of pediatric leukemia and also the most frequently diagnosed malignancy in children (1,2). ALL is still one of the most important causes of childhood morbidity and mortality, despite medicine development and elevated cure rates in the last decades (3). The etiology of ALL is still not well clear, although mechanisms involved in it have been extensively investigated. In addition to environmental factors, gene etiology remains the predominant pathogenesis of childhood ALL (3,4). Single-nucleotide polymorphisms (SNPs) in cancerassociated key genes (CDKN2A, GATA3, FOXO3, etc.) had been reported to influence ALL risk in children (5)(6)(7). However, many of the variations in genes influencing ALL remain to be found.
MYBL2 (MYB proto-oncogene like 2), also known as B-MYB, belongs to the MYB family of transcription factor which was first identified as a vertebrate homolog of the v-myb oncogene causing leukemia in chickens (8,9). The MYB transcription factor family included MYBL1 (A-MYB), MYBL2 (B-MYB), and MYB (c-MYB) (10,11). However, unlike the other two members which are usually expressed in certain tissues, MYBL2 is ubiquitously expressed in proliferating cells (12,13). MYBL2 has also been demonstrated to act as an oncogene in numerous studies (14,15). MYBL2 promotes the malignant development of cancer via regulating various cellular processes including apoptosis, proliferation, differentiation, invasion, metastasis, and replication stress (12,(16)(17)(18)(19)(20). The aberrant expression or dysfunction of MYBL2 has been validated in a variety of cancers including adult leukemia, breast cancer, prostate cancer, ovarian cancer, liver cancer, and lung cancer (19)(20)(21)(22)(23)(24)(25). However, the association of MYBL2 gene polymorphisms with childhood ALL risk and outcomes has not been reported. In the current study, we explored the association of MYBL2 SNPs with ALL risk among a case-control series of Chinese children.

Study Subject
In total, 687 ALL patients and 971 healthy controls were included in the current study. All the individuals were enrolled from Guangzhou Women and Children's Medical Center and Nanfang Hospital. Briefly, ALL patients < 18 years old confirmed with a clinical and histological diagnosis were recruited. All patients were newly diagnosed, and a detailed medical history was recorded for each case. Thus, 687 children with ALL were recruited from January 2016 to June 2019. During the same period, 971 cancer-free healthy volunteers were also collected as controls, which were matched to the cases on age, sex, and residential region. The controls were randomly selected from children undergoing a routine physical examination. All included subjects were ethnic Han Chinese. In addition, those with other malignant disorders, a history of chemotherapy or radiotherapy were excluded. The written informed consent was obtained from each subject before participation. This study obtained permission from the institutional review boards of both hospitals.

SNP Selection and Genotyping
The SNP selection was performed using data from SNPinfo and NCBI dbSNP databases, and the selection strategy was based on four criteria as described previously (26,27): (1) the minor allele frequency (MAF) reported in HapMap was >5% for Chinese subjects; (2) located in or near the MYBL2 gene (i.e., < 2kb upstream or downstream of MYBL2); (3) affecting transcription factor binding sites (TFBS) activity, splicing activity or proteincoding; (4) not in high linkage disequilibrium (LD, R 2 < 0.8). Based on the above criteria, three SNPs (rs285162 C>T, rs285207 A>C, and rs2070235 A>G) in MYBL2 gene were retrieved for further analyses.
Peripheral blood samples were collected from each participant at diagnosis and then used for DNA extraction using the TIANamp DNAKit (TianGen, Beijing, China) according to the manufacturer's instruction. rs285162, rs285207 and rs2070235 were selected for genotyping. In the genotyping assays, the Taqman ProAmp master mix and predesigned SNP genotyping assay mix containing polymerase chain reaction (PCR) probes and primers (ABI, Massachusetts, USA) were used. The quantitative real-time PCR method was performed to genotype these three SNPs using QuantStudio ™ 6 Flex System (ABI, Massachusetts, USA). In each of the 384-well plates, five positive and five negative controls were included to ensure the accuracy of genotyping. About 10% of samples were selected randomly for direct sequencing to ensure quality control (28,29), and the results were 100% concordant.

Functional Analysis In Silico
The probability that rs285207 A>C might influence the regulation of MYBL2 was evaluated by using the Roadmap Epigenome Browser (30,31), TFBIND software (32), and the ENCODE Project (33). Briefly, promoter and enhancer were predicted via histone modification and DNase hypersensitivity (DHS) of GM12878 (lymphocyte cell line) in Roadmap Epigenomics data. TFBIND was used to assess whether rs285207 altered any transcription factor binding sites (TFBS), and then ENCODE ChIP-seq experiments of IKZF1 in GM12878 (Experiment Series: ENCSR816OIY) was used to assess the binding signals and motifs overlapping rs285207.

Statistical Analyses
For each SNP in controls, Hardy-Weinberg equilibrium (HWE) was assessed via the goodness-of-fit c 2 test. Genotype distribution of each SNP and demographic variables between the case and control group was analyzed using a 2-sided c 2 test. To evaluate the strength of the relation between MYBL2 polymorphisms and ALL susceptibility, odds ratios (ORs) and 95% confidence interval (95% CIs) were calculated using logistic regression analyses, adjusting for age and sex. The false-positive report probability (FPRP) was also computed for each significant finding as previously described (34). A prior probability of 0.1 was adopted to detect an OR of 0.67 for protective effects, and an FPRP value reaching the threshold of <0.2 was considered noteworthy. All statistical analyses were conducted with SAS software (version 9.4; SAS Institute, Cary, North Carolina). In this study, all P values were 2-sided, and a P value of <0.05 was considered as statistical significance.

Subject Characteristics
In the present study, a total of 687 cases and 971 controls were included, and the detailed characteristics are summarized in Table 1. Briefly, no significant differences were observed in age (P =0.494) or sex (P =0.107) distribution between cases and controls. According to immunophenotype-based classification (35)(36)(37), 596 (86.75%) cases were diagnosed with B cell ALL (B ALL), including 227 (33.04%) pro B ALL, 200 (29.11%) common B ALL, 166 (24.16%) pre B ALL, and 3 (0.44%) mature B ALL; 61 (8.88%) diagnosed with T cell ALL (T ALL); 30 (4.37%) with no available data (NA). Besides, information about gene infusion, risk level, karyotype, relapse, and the levels of minimum residual disease (MRD) at multiple time points post-therapy were also included in Table 1.

Associations Between MYBL2 Gene Polymorphisms and ALL Susceptibility
According to the SNP selection strategy, three SNPs (rs285162 C>T, rs285207 A>C, and rs2070235 A>G) that overlapped with transcription factor binding site (TFBS), splicing regulating site (SRS), or non-synonymous SNP (nsSNP) were selected ( Table 2). The genotype frequencies of MYBL2 gene SNPs in all 687 cases and 971 controls and their association with ALL risk were described in Table 3. All these three SNPs were in HWE (P HWE >0.05) among the controls. Of the three SNPs, significant differences were observed for rs285207 A>C (crude P = 0.017) between ALL cases and controls in a dominant model. After adjustment with age and sex, rs285207 C allele was significantly related to a decreased ALL risk in the dominant model (AC/CC vs AA: adjusted OR = 0.78; 95% CI = 0.63-0.97, P = 0.022).
The rest two genotypes (rs285162 C>T and rs2070235 A>G), however, were not significantly associated with ALL risk.

Subgroup and Stratification Analyses
To further explore the association between the MYBL2 gene rs285207 A>C polymorphism and ALL susceptibility, subgroup

Functional Analysis
To explore the potential mechanisms by which rs285207 influences the ALL risk, we evaluated the probability of rs285207 polymorphism altering transcription regulation of MYBL2. The Roadmap Epigenomics data showed that rs285207 overlapped DHS marks and histone modifications related to both  promoter and enhancer in multiple tissue types ( Figure 1A). This observation was further supported by H3K4me1, H3K4me3, H3K9ac, H3K27ac and DHS ChIP data in GM12878 cells ( Figure 1B). TFBIND analysis revealed that rs285207 altered the binding affinity of this site to transcription factors including IKZF1, THAP4 and FOXA2. For the rs285207 risk allele A, all of IKZF1, THAP4 and FOXA2 have high TFbind scores (0.85, 0.77 and 0.86, respectively), while none of these three bindings was found for the non-risk allele C. In ENCODE ChIP-seq analysis, however, only IKZF1 was found to bind to DNA motif overlapping rs285207 (Figure 2A). IKZF1 has a higher preference for the risk allele A and leave no chance for allele C ( Figure 2B). The results showed that rs285207 A>C might influence the transcription of MYBL2 via disrupting IKZF1 binding site.

DISCUSSION
In this case-control study with 687 ALL cases and 971 healthy controls from a Chinese population, we investigated the potential association between MYBL2 gene polymorphisms and ALL risk in children. Among these three SNPs of MYBL2 in this study, we found that rs285207 A>C was significantly associated with a decreased ALL risk in the dominant model. To our knowledge, the present study is the first to explore the association between MYBL2 polymorphisms and ALL risk.
MYBL2 is a member of the MYB transcription factor family which including MYBL1, MYBL2, and MYB (10,11). Products encoded by the MYB family have similar protein structures: a DNAbinding domain, a transactivation domain, and a regulatory domain (9,38). All members of the MYB family exert their action through regulating transcription of target genes by binding to the same DNA consensus sequence (NAACNG) (8,39,40). However, only MYBL2 is ubiquitously expressed in proliferating cells (12,13). The expression of MYBL2 gene is controlled by other transcription factors or noncoding RNA (41)(42)(43)(44)(45), and the MYBL2 protein is activated via phosphorylation (14,21). After activated, MYBL2 regulates downstream diverse genes or proteins involved in multiple cellular processes, such as BCL2 and MYC in cell survival (46)(47)(48), cyclins and FGF4 in cell cycle (49)(50)(51), SOX2 and OCT4 in cell differentiation (52,53), SNAIL and YAP1 in cell invasion and metastasis (19,54). Overexpression or amplification of MYBL2 had been widely reported in previous studies on cancer. For instance, MYBL2 was overexpressed in castration-resistant prostate cancer and promoted cell growth and metastatic by promoting YAP1 transcriptional activity (19). Liang et al. reported that MYBL2 expression was increased in gallbladder cancer and could serve as a potential prognostic biomarker (55). In addition, the MYBL2 polymorphisms were reported to be associated with cancer (39,56,57). For example, Thorner et al. demonstrated that rs2070235 polymorphism was related to the increased risk of basal-like breast cancer (39). To date, there is no study detecting the polymorphisms of MYBL2 gene in ALL.
In the present study, we performed genotyping of three potential functional SNP sites (rs285162, rs285207, and rs2070235). We found that the rs285207 polymorphism was associated with the reduced ALL risk, for the first time. Besides, in the subgroup analysis, the significant association between rs285207 and ALL patients with lower platelet level was achieved with a high statistical power (>0.8), which further strengthened the significant results above. Rs285207 is located in the 374bp upstream of MYBL2 gene, which overlaps with the MYBL2 promoter and enhancer. In our analyses of transcription factor binding, rs285207 A>C was also found to disrupt the binding Normal, values within reference range; Lower, values less than the lower limit of reference range; Higher, values higher than the upper limit of the reference range; The reference range of WBC (10 9 /L): 5-12; The reference range of Platelet (10 9 /L): 140-440.
The bold values were statistically significant results.
to IKZF1 which showed preferential binding of the risk allele A. IKZF1, also called IKAROS, belongs to the transcription factor family of zinc-finger proteins (58). IKZF1 played a key role in lymphopoiesis and also was a predisposition gene of leukemia (59)(60)(61). Therefore, rs285207 might influence the transcription of MYBL2 via altering the affinity of its binding to IKZF1. The effects of rs285207 polymorphism on ALL risk might be achieved by controlling the expression of MYBL2 gene, which needs to be validated in future studies. The rest two SNPs, rs285162 and rs2070235, are both located in the code region of MYBL2. Rs285162, with the potential to regulate splicing, was reported to influence the deviation between allele frequencies (62). Rs2070235, which results in a missense variant  (Ser > Gly) of MYBL2 protein, was reported to be associated with an increased risk of breast cancer (39), but not with the incidence of acute myeloid leukemia (63). Besides, rs2070235 was found to be related to the reduced cancer risk in an investigation integrating several cancers (neuroblastoma, colon carcinoma, and chronic myelogenous leukemia) (64). In the current study, neither rs285162 nor rs2070235 was found to be associated with childhood ALL susceptibility. Along with previous reports, this study suggests that the MYBL2 gene polymorphism is complex, depending on cancer types. In addition, the variety of ethnicity and sample composition should be taken into consideration. Although the present study is the first to explore the relationship between MYBL2 polymorphisms and ALL risk in children, several limitations should be acknowledged. First, only three SNPs of MYBL2 were investigated in this study and more potentially functional SNPs need to be done in future studies. Second, although the sample size in this study was a relatively large one, studies with larger sizes should be conducted in the future. Third, the subjects were all retrieved from south China, which might cause selection bias, and therefore multicenter studies with more populations are needed to further confirm the role of MYBL2 polymorphisms in ALL. Finally, this study just explored the genetic factor, but environmental factors including dietary intake were not available. The functions of MYBL2 SNPs in the progression of ALL also need to be further investigated.

CONCLUSION
In conclusion, the case-control study explored the association of MYBL2 polymorphisms (rs285162, rs285207, and rs2070235) with childhood ALL risk and firstly demonstrated that the rs285207 A>C in MYBL2 gene decreased the risk of childhood ALL and might influence the transcription of MYBL2 via altering IKZF1 binding, which suggested that MYBL2 polymorphism might serve as a biomarker for ALL susceptibility. Certainly, larger multicenter-based studies and functional experiments are encouraged to further elucidate the role of MYBL2 polymorphism and the potential mechanisms in ALL.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

ETHICS STATEMENT
The study was reviewed and approved by the institutional review boards of Guangzhou Women and Children's Medical Center and Nanfang Hospital. Written informed consent to participate in this study was provided by the participants' legal guardian/ next of kin.

AUTHOR CONTRIBUTIONS
HG and XY designed the study. HG, NL, YS, and XY wrote the manuscript. NL, CW, HD, and LX treated the patients, collected the data, and commented on the manuscript. CW, HD, and LX performed the genotyping assay. YS and XY performed statistical analysis and function analysis. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
We are grateful to the individuals who participated in this study.