Genotype-Environment Interaction Analysis of NQO1, CYP2E1, and NAT2 Polymorphisms and the Risk of Childhood Acute Lymphoblastic Leukemia: A Report From the Mexican Interinstitutional Group for the Identification of the Causes of Childhood Leukemia

Background: Acute lymphoblastic leukemia (ALL) is the main type of cancer in children. In Mexico and other Hispanic populations, the incidence of this neoplasm is one of the highest reported worldwide. Functional polymorphisms of various enzymes involved in the metabolism of xenobiotics have been associated with an increased risk of developing ALL, and the risk is different by ethnicity. The aims of the present study were to identify whether NQO1, CYP2E1, and NAT2 polymorphisms or some genotype-environmental interactions were associated with ALL risk in Mexican children. Methods: We conducted a case-control study including 478 pediatric patients diagnosed with ALL and 284 controls (children without leukemia). Ancestry composition of a subset of cases and controls was assessed using 32 ancestry informative markers. Genetic-environmental interactions for the exposure to hydrocarbons were assessed by logistic regression analysis. Results: The polymorphisms rs1801280 (OR 1.54, 95% CI 1.21–1.93), rs1799929 (OR 1.96, 95% CI 1.55–2.49), and rs1208 (OR 1.44, 95% CI 1.14–1.81) were found to increase the risk of ALL; being the risks higher under a recessive model (OR 2.20, 95% CI 1.30–1.71, OR 3.87, 95% CI 2.20–6.80, and OR 2.26, 95% CI 1.32–3.87, respectively). Gene-environment interaction analysis showed that NAT2 rs1799929 TT genotype confers high risk to ALL under exposure to fertilizers, insecticides, hydrocarbon derivatives, and parental tobacco smoking. No associations among NQO1, CYP2E1, and ALL were observed. Conclusion: Our study provides evidence for the association between NAT2 polymorphisms/gene-environment interactions, and the risk of childhood ALL in Mexican children.

Background: Acute lymphoblastic leukemia (ALL) is the main type of cancer in children. In Mexico and other Hispanic populations, the incidence of this neoplasm is one of the highest reported worldwide. Functional polymorphisms of various enzymes involved in the metabolism of xenobiotics have been associated with an increased risk of developing ALL, and the risk is different by ethnicity. The aims of the present study were to identify whether NQO1, CYP2E1, and NAT2 polymorphisms or some genotype-environmental interactions were associated with ALL risk in Mexican children.
Methods: We conducted a case-control study including 478 pediatric patients diagnosed with ALL and 284 controls (children without leukemia). Ancestry composition of a subset of cases and controls was assessed using 32 ancestry informative markers. Genetic-environmental interactions for the exposure to hydrocarbons were assessed by logistic regression analysis.

INTRODUCTION
Acute lymphoblastic leukemia (ALL) represents the most frequent type of cancer in pediatric population worldwide. Additionally, Mexico and other Hispanic populations have one of the highest incidence and mortality rates for this cancer around the globe (1,8). The etiology is unclear; however, it has been recognized that the interaction between genetic and environmental factors may play a role in the development of the disease. It has been suggested that enzymes involved in the metabolism of carcinogenic agents are associated with ALL susceptibility; as well, it could explain the differences in survival rates reported across populations (9)(10)(11).
Hydrocarbons, parental smoking, alcohol drinking, pesticides, and air pollution have been reported as the main type of xenobiotics related with an increased risk of developing ALL in children (12)(13)(14)(15)(16). Afterwards, these compounds enter into the body, they undergo biotransformation and elimination processes by xenobiotic metabolizing enzymes (2,17,18). Phase I enzymes are responsible for catalyzing reactions of hydroxylation, reduction and oxidation of xenobiotics, ultimately, transforming these into more toxic compounds. Phase II enzymes participate in conjugation processes, such as glucuronidation, acetylation, and methylation, converting the metabolites into non-reactive and water-soluble products which could be more easily eliminated from the body (18).
Cytochrome P450 Family 2 Subfamily E Member 1 (CYP2E1) are phase I enzymes, whereas, NAD(P)H quinone oxidoreductase 1 (NQO1) and Arylamine N-acetyltransferase 2 (NAT2) are phase II enzymes, which are commonly implicated in xenobiotics metabolism (2). It is well-recognized that genetic polymorphisms could influence the activity of these enzymes by affecting at a transcription level (CYP2E1), enzymes activity (NQO1) or at a protein level (NAT2) (2,19). In addition, diverse studies have reported that polymorphisms in genes encoding xenobiotic metabolizing enzymes could increase the risk of developing childhood ALL, early relapse, chemotherapy-related toxicity, treatment resistance and have been associated with lower survival rates in patients with this type of cancer (2,17). However, the results are inconsistent across populations suggesting the possibility that susceptibility alleles could be associated with ALL in a race-specific manner, as it has been reported for other ALL-predisposing genetic polymorphisms (20,21).
The aims of the present study were to identify whether NQO1, CYP2E1, and NAT2 polymorphisms, or some genotypeenvironmental interactions, were associated with ALL risk in Mexican children. As well, to provide a picture of the genetic ancestry and admixture patterns of the patients with ALL from central Mexico.

MATERIALS AND METHODS Subjects
A case-control study was conducted by the Mexican Interinstitutional Group for the Identification of the Causes of Childhood Leukemia (MIGICCL), from August 1, 2014, to July 31, 2016. Cases and controls were younger than 17 years, residents of the Metropolitan Area of Mexico City.

Cases
Cases were recruited from eight public hospitals of Mexico City in which an estimated 69.9% of children with leukemia who reside in Mexico City are attended. Acute lymphoblastic leukemia diagnosis was established according to clinical features, and bone marrow aspirate findings: cell morphology, immunophenotype, and genetics, as defined by the 2008 WHO classification of lymphoid neoplasms Children with Down syndrome were excluded from the analysis. Case registration required that trained personnel were assigned to each participating hospital to identify incident cases of leukemia through reviews of clinical charts. Afterwards, parents were approached and invited to participate.
The participating public hospitals and health institutions were: (1) Hospital de Pediatría, "Dr. Silvestre Frenk Freund" Centro Médico Nacional (CMN) "Siglo XXI", Instituto Mexicano del Seguro Social (IMSS); (2) Hospital Infantil de México Federico Gómez, Secretaría de Salud (SS); (3) Hospital General "Dr. Gaudencio González Garza", CMN "La Raza", IMSS; (4) Hospital General Regional "Dr. Carlos McGregor Sánchez Navarro", IMSS; (5) Hospital Juárez de México, SS; (6) Hospital Pediátrico de Moctezuma, Secretaria de Salud de la Ciudad de México (SSCDMX); (7) Hospital General de México, SS; and (8) Hospital CMN "20 de Noviembre", Instituto de Seguridad Social al Servicio de los Trabajadores del Estado (ISSSTE). For this study the sample size was calculated with 250 cases and 250 controls, this being sufficient for obtaining a power of 80% to identify the associations. In parallel to this study the inclusion of cases continued for assessing the prognostic value of the NQO1, CYP2E1, and NAT2 polymorphisms studied here on ALL survival. Those cases were also included in the present work for increasing the precision of the assessed interactions.

Controls
The controls were selected from second-level hospitals of the same health institution (IMSS, SS, SSCDMX, and ISSSTE) that referred the children with ALL to the third-level care hospitals. The controls were children without leukemia who were treated at different hospital departments, such as ambulatory surgery, pediatrics, ophthalmology and orthopedic outpatient clinics, and the emergency room. Children with diagnoses of neoplasms, hematological diseases, allergies, infections, and congenital malformations were not selected as controls. The interviewers were provided with a list of characteristics of individual leukemia cases, and charged with identifying one control per case according to sex, age (±18 months), and health institution. When more than one control per case met the matching criteria, the control that was closest in age to the case was selected. If the controls were of the same age, one control was selected randomly (by tossing a coin). Moreover, if no control of the same sex was found after three visits to the same hospital, a control matched by age (±18 months) was selected.
Trained personnel were assigned to each of the secondlevel hospitals to search for controls who fulfilled the selection criteria. When a control was identified, parents were invited to participate.
The Ethics and National Committee of Scientific Research approved this study with number R-2013-785-062. Additionally, we obtained approval by the Institutional Scientific Review Committees of each health institution to carry out the study.

Study Overview
The interview was carried out using the same questionnaire for cases and controls. This questionnaire was previously standardized and adapted from the questionnaire module of the National Cancer Institute. General information obtained from the interviews included the child's history, socioeconomic status (SES), demographic characteristics, and the parents information (age at pregnancy, educational level, etc.).
Study variables were child's sex, age, and birthweight (<3,500 and ≥3,500 grams), family history of cancer, maternal years of education, after birth home exposure to fertilizers, insecticides, and other hydrocarbon derivatives (benzene, solvents, glues, petroleum products, etc.), and parental tobacco smoking.
For the exposure of the parents to tobacco smoking, the information was obtained from the in-person interview and it was assessed in the four exposure periods: (1) pre-conceptionally, (2) during pregnancy, (3) during lactation, and (4) the last year before leukemia diagnosis or interview in cases and controls. The number of cigarettes smoked during each exposure period was calculated. Parental smoking was classified into three groups: (1) when both parents were lifelong non-smokers or ex-smokers, who stopped smoking more than 1 year before the birth of the index child, (2) when either parent smoked <5 cigarettes a day, and (3) when at least one parent smoked 6 or more cigarettes daily. Passive smoking in the child was considered when at least one of the parents smoked in the presence of the child for at least three times a week during the year before diagnosis/interview in cases and controls, respectively.
Parental years of education were used as an indicator of SES as it has been previously used by the Childhood Leukemia International Consortium (CLIC) (0-9 years, 9.1-12.9 years [reference category], ≥13 years of education).

DNA Extraction
Genomic DNA from saliva or peripheral blood was obtained according to the ORAGENE Purification Kit (DNA Genotek Inc., ON, Canada) and the Gentra Kit (Gentra Systems Inc, Minneapolis, MN) manufacturer's instructions, respectively. DNA purity and concentration were determined by sypectrofotometry (Nanodrop-1000).

Ancestry Informative Markers Selection and Ancestry Composition Analysis
To know the ancestry composition of the ALL pediatric cases, and to account for the effects of population stratification in our case-control association study, we analyzed a subset of ALL cases and controls using a panel of 32 single nucleotide polymorphisms (SNPs) as ancestry informative markers (AIMs). The SNPs chosen for inclusion were based on AIMs sets' that has been validated in a large group of Mexican subjects. This panel included 1 to 2 SNPs in all chromosomes, with the exception of chromosome 18 (22). ALL cases and controls were paired by gender, age, and living area in Mexico-City. To estimate individual and global ancestry we used the STRUCTURE software considering the European and Native-American (NA) populations as the two main contributors to the largest Mexican ethnic group (Mexican-Mestizo) (3). Principal component analysis (PCA) was used to infer population structure within our samples (23).

Evaluation of Interactions Among Polymorphisms and Exposure to Hydrocarbons
To search for gene-environment interactions we used the MDR approach, since this program incorporates information from several loci and environmental factors to identify combinations of both factors that are associated with the risk of a disease (30).

Statistical Analysis
Analyses were performed using SPSS version 21 (IBM Statistical Package for the Social Sciences, Inc., Chicago, IL, USA). Descriptive analyses were conducted. Odds ratio (OR) and 95% confidence intervals (95% CI) were estimated by unconditional logistic regression. The logistic regression model was constructed as follows: (1) the overall interaction in the model was evaluated, (2) it was assessed whether study variables had a global confounding effect, those variables that, in the stratified analysis the ORs prior to (crude OR) and after stratifying (aOR) showed a difference between the two ORs >10%. These variables alongside with the matching variables (child's sex, age, and health institution), were included in the multivariate analyses. As a result, the most parsimonious model included: child's sex, age, paternal education level, maternal education level, maternal age at pregnancy, active smoking of the mother after birth, alcoholism of the mother before pregnancy, family cancer history, health institution, birthweight, maternal exposure to hydrocarbons at home during pregnancy, X-rays exposure during pregnancy, alcoholism of the father before pregnancy, and active smoking of the father after birth. Accordingly, the adjusted ORs (aOR) were calculated.
Hardy-Weinberg Equilibrium (HWE) test was performed using the FINETTI program (http://ihg.gsf.de/cgicbin/hw/hwa1. pl). Alleles and genotypes frequencies were compared among groups by using chi-square and Fisher's exact test, when appropriate, which are implemented in the STATCALC program (Epi Info v.6.02 software, Centers for Disease Control and Prevention, Atlanta, GA). The level of significance was set at 5%. To evaluate ancestry composition of our sample, STRUCTURE software was used assuming two populations (European and NA), and each analysis was performed at least three times using >100,000 replicates and 20,000 burn-in cycles under admixed model. To determine the statistical power of our study, we used Quanto software (http://hydra.usc.edu/gxe) accounting minor allele frequency (MAF) of all SNPs in the control the group; likewise, considering a recessive genetic model, and odd ratio (OR) of 2.0, the prevalence of ALL in Mexican children, and the sample size.

Patient Demographic Data
A total of 469 cases were diagnosed with ALL in participant hospitals during the study period. In addition, a total of 285 controls were recruited. Demographic, clinical features and exposure data of the study population are displayed in Supplementary Table 1. The following variables: child's age, parental education level, parental smoking, alcohol drinking by the father, and maternal exposure to hydrocarbons at home showed statistically significant differences between the total of cases with ALL and controls. Cases had a less proportion of active smoking by the mother after birth, alcohol drinking by both parents before pregnancy, and maternal exposure to hydrocarbons at home before pregnancy, but higher proportion of maternal exposure to hydrocarbons at home during pregnancy than controls (Supplementary Table 1). The associations when 279 cases were analyzed and then, when all the available 469 cases were included, were very similar (Supplementary Table 2). It was therefore decided to include the total number of cases in subsequent analyses.

NAT2 Polymorphisms Are Associated With Acute Lymphoblastic Leukemia
Except for the rs1041983 (C282T) and rs1799931 (G857A), the remaining SNPs were found in HWE in the control population. The association analysis between individual SNPs and ALL are described in Table 1  OR, crude odds ratios; aOR, adjusted odds ratios by: child's sex, age, paternal education level, maternal education level, maternal age at pregnancy, Active smoking of the mother after birth, alcoholism of the mother before pregnancy, family cancer history, health institution, birthweight, Maternal exposure to hydrocarbons at home during pregnancy, x-rays exposure during pregnancy, alcoholism of the father before pregnancy, Active smoking of the father after birth. Genotyping rate * >98% and ** >96.6%. Bold values represent the most statistically significance results. No interactions among NQO1 (rs1800566), CYP2E1 (rs3813867), and NAT2 (rs1799930, G590A) and the exposure variables were noted (Supplementary Table 3). Nevertheless, homozygotes genotypes TT and AA of NAT2 SNPs rs1041983 (C282T) and rs1799931 (G857A), respectively, were associated with a reduced risk of ALL under exposure to hydrocarbons by the mother at home before and during pregnancy and after birth (HEMAB), drug consumption, active smoking, alcohol consumption and insecticide exposure by the mother before pregnancy, whilst being pregnant and while child feeding. As well as active smoking and alcohol drinking by father before and during pregnancy. NAT2 variants rs1799929 and rs1208 variants, which were associated with ALL risk, exhibited an increased ORs under maternal hydrocarbons exposure and drug consumption. In the same way, active smoking, and alcohol consumption by the parents during pregnancy incremented the risk to ALL in homozygotes cases for mutant alleles to both SNPs. In fact, the rs1799929 (C481T) and active smoking by the father after birth displayed the highest OR (4.49, CI 95% 2. 46-8.17) to ALL observed in the present analysis. The rs1799930 (NAT2), rs1800566 (NQO1), and rs3813867 (CYP2E1) were not associated neither after adjusting by these variables (Supplementary Table 3).

Frequency of Rapid and Slow Acetylators
Patients harboring wild type for all of the NAT2 polymorphisms were classified as rapid acetylators. Those with homozygous mutants or heterozygous for more than one of the polymorphisms were phenotyped as slow acetylators, being the remainder considered as intermediate acetylators (

Gene-Gene Interaction Between NQ01-CYP2E1-NAT2 Polymorphisms
To investigate for interactions among NQ01-CYP2E1-NAT2 genotypes, we performed a MDR analysis including cases and controls having complete genotyping data. The model with the lowest prediction error and highest cross-validation consistency (CVC) was selected ( Table 4). The rs1799929 was the best factor model which showed statistically significant difference with testing accuracy (TBA) of 0.5856, a 10/10 CVC ( Table 4) and contributing with 3.03% to the risk of ALL. The multilocus model with CVC (10/10) and minimum prediction error (maximum testing accuracy is 0.6377) was a threefactors model including the NAT2_rs1801280, rs1799929, and rs1208. These three NAT2 SNPs interact together to collectively increase the risk to develop ALL (OR 6.59, CI 95% 4.05-10.71) (Figure 1, Table 4).
The interaction entropy analysis showed that NAT2-rs1799929 had the larger effect on the susceptibility to develop ALL (3.0%). A synergistic interaction (read lines) between NAT2-rs104183-rs1799931 and NAT2-rs1799929-rs1801280 SNPs, which reveals an epistasis effect between them, was observed. Redundancy (green lines) was observed between several NAT2 and CYP2E1 SNPs which means that jointly those SNPs provide less information than studying one SNP. Gold lines showed independence among SNPs (Figure 2).

Genetic Structure of Mexican Children With Acute Lymphoblastic Leukemia and Controls
The ancestry composition of the first subset of patients with ALL (n = 166) and controls (n = 167) who met the selection criteria during the study period was evaluated. A total of 32 AIMs, previously validated in a Mexican population were used (22). We found a similar genetic background among cases and controls, which were enriched by a Native-American (NA) contribution (Figure 3).

Gene-Environment Interaction in Acute Lymphoblastic Leukemia
MDR was used to explore gene-environment interactions. The following maternal exposure to hydrocarbons in the preconceptional, during pregnancy and after birth periods, were considered in the analyses: global exposure to hydrocarbons, active and passive smoking, alcohol, drugs and medicine consumption, wood smoke, insecticides, benzene, gasoline, and petroleum exposures. Whilst environmental exposure factors assessed for the father were: active and passive smoking and alcohol dinking mainly for the pre-conceptional and during pregnancy periods.
For maternal exposure to hydrocarbons during preconceptional and after birth periods, the best combination was determined for hydrocarbons exposure, active and passive smoking, drugs and medicine consumption and alcohol, as well as insecticides, benzene, gasoline, petroleum, and wood smoke exposures; in addition to active and passive smoking and alcohol consumption by the father.
Our analysis suggests that passive smoking by the father before conception (PSFBC) and during pregnancy (PSFDP) was the best factor with statistical significance (TBA 0.7087 CVC 10/10 and TBA 0.6868 CVC 10/10, respectively). The multifactor model with a minimum prediction error (TBA 0.7648)  OR, crude odds ratios; aOR, adjusted odds ratios by: child's sex, age (≥6 years), maternal education level, maternal age at pregnancy, active smoking of the mother after birth, alcoholism of the mother before pregnancy, family cancer history, health Institution, birthweight, maternal exposure to hydrocarbons at home during pregnancy, x-rays exposure during pregnancy, alcoholism of the father before pregnancy, active smoking of the father after birth. Missing data for phenotype classification: 2 and 5.6% for cases and controls, respectively. and CVC (10/10) was NAT2_rs1799929-NQ01_rs2811566-CYP2E1_rs1803867-alcohol consumption by the mother before pregnancy (ACMBP)-medicine consumption by the mother before pregnancy (MCMBP)-PSFBC suggesting that these factors jointly contributed to the etiology of ALL (Figure 4, Table 5). After child's birth, the MDR identified to NAT2_rs1799929 as the best one factor (TBA 6021 CVC 10/10) and the sixfactor interaction model NAT2_rs1041983, NAT2_rs1799929, NAT2_rs1799931, NQ01_rs1800566-insecticide exposition after child's birth (IEAB)-active smoking by the father after child's birth (ASFAB) as the best model for ALL with a TBA of 0.6062 and CVC 9/10 ( Table 5).

DISCUSSION
Acute lymphoblastic leukemia, the most prevalent cancer in children, like other hematologic malignancies is likely to develop from a complex interaction between genetic and environmental factors. Hydrocarbons, such as benzene, pesticides, and air pollutants, are among the common xenobiotics that have been implicated in ALL etiology (13,15). Detoxification enzymes play an important role to metabolize environmental carcinogens and it is well-known that polymorphisms in genes encoding for these enzymes may explain inter-individual differences in leukemia risk (16). To gain more knowledge on how these factors could be interplaying to influence disease risk and outcomes in ALL, and to provide a global picture of the genetic ancestry and admixture patterns of Mexican pediatric patients with ALL from central Mexico we conducted this study.
The present report focused on the effects of genetic variation in NQO1, CYP2E1A, and NAT2 polymorphisms and their interaction with environmental factors, such as home exposure to fertilizers, insecticides, hydrocarbon derivatives (benzene, petroleum products, etc.), and parental tobacco smoking and wood smoking, among other variables in the risk of developing childhood ALL. In the present study, we found that NAT2 polymorphisms, but not NQO1 and CYP2E1 polymorphisms were associated with ALL risk. NQ01 modifies internal exposure to bioactivated carcinogens. NQ01 has been previously described as an anticancer enzyme. There is a documented a relation between NQ01 (rs1800566) with several types of cancer, including infant leukemia (31)(32)(33)(34)(35); however, its association with ALL is controversial. Our results are in contrast with findings of some studies (34,35), but they are consistent with other reports that did not show a significant association between NQO1 rs1800566 polymorphism and the risk of childhood ALL, when specific ethnic populations were analyzed (32,36). Concerning CYP2E1, a fundamental contributor of the metabolisms of low molecular weight compounds as ethanol, and a bioactivator of Best one-to-eight-locus model of the NAT2-NQ01-CY2E1 gene-gene interactions with cross-validation consistency (CVC) and prediction error per the n-locus model that have been obtained by the multifactor-dimensionality reduction (MDR) of our data set. TA, training balance accuracy; TBA, testing balance accuracy; *The best one factor model and **The best multilocus model. • P-values from whole statistics data set. Bold values indicate the best single* and models. many procarcinogens including benzene, has been associated with an increased risk of ALL (26). A differential contribution by the variants among populations could be influenced by age and ethnicity (26,35,36). We have only studied one SNP for each gene so far. Since a number of functional polymorphisms of NQ01 and CYP2E1 have been identified, we cannot discard the contribution of these genes in the risk of ALL in Mexican children. NAT2 is highly polymorphic and its activity is largely determined by coding single SNPs. To date, about 108 NAT2 alleles have been identified by the Gene Nomenclature Committee (37), which defines three metabolizer groups: slow, intermediate and rapid acetylators. The wild-type NAT2 * 4 allele encodes a protein with high N-acetylation activity conferring a rapid acetylator phenotype (37,38). Haplotypes containing more than one mutant allele at rs1041983, rs1801280 (T341C), rs1799929 (C481T), rs1799930, rs1208 (A803G), and rs1799931 (G857A) defined the low acetylator phenotype (29,37). Our data showed that rs1801280 (T341C), rs1799929 (C481T), rs1208 (A803G), and rs1799931 (G857A) polymorphisms confer higher risk of ALL in Mexican children, which increases in homozygote mutant carriers, compared with heterozygotes. Notably, these risks were even higher after adjusting by age, sex, and other environmental variables ( Table 3).
NAT2 * 4, considered the most common allele involved in rapid acetylation, was frequent in our control group and statistically significant differences were observed between cases and controls. Consistent with our findings, NAT2 * 5B has been reported to be the most common slow acetylator haplotype in Caucasian populations (4,38). Notwithstanding, after multiple comparison tests, the statistical significance remained only for NAT2 * 11A, NAT2 * 12C, and NAT2 * 5V alleles, which were less frequent in healthy children than in cases with leukemia. Although similar frequencies of these alleles in healthy subjects have been reported elsewhere (39,40), we cannot discard that these results could be biased by the small sample size of the control group, which is suggested by the wide CI values observed ( Table 4).
Previous studies have suggested that the rapid-acetylator genotype of NAT2 leads to an increased risk of various types of cancer, particularly leukemia, colorectal and bladder cancer (4,(41)(42)(43)(44)(45). As it has been reported in Caucasian and Middle Eastern populations, rapid acetylator phenotypes were less frequent than slow acetylator phenotypes (4,39,46), but no statistical significance was observed between cases and controls. The predicted haplotypes described as slow acetylator TCTGAG (NAT2 * 5V, CI 1.39-201.58) and rapid CTTGAG (NAT2 * 11A, CI 2.05-348.8) and CTTGGG (NAT2 * 12C, CI 1.32-36.83) haplotypes, which were rare in the healthy subjects (<1%), were associated with higher risk of ALL. These results seem paradoxical; however, we predicted the acetylation phenotype according to the results of the NAT2 haplotypes and potential haplotype misclassification could exist (5,40). Uncommon haplotypes could not be clearly determined via indirect method of computational haplotype inference, however, direct methods for genotyping NAT2 haplotypes are costly and have large turnaround time. Recently, a new method has been described to eliminate potential errors in the genotypes assignation, but, this technique is based on the NAT2 * 4, * 5B, * 6A, * 7B, * 12A, and * 13A, which are the six most common NAT2 haplotypes in diverse populations. Due the differences in the NAT2 SNPs and haplotype frequencies among the populations, this new method is no applicable in very heterogenous ethnic groups, such as Mexican. In addition, it has been reported a genotype-phenotype discordance (47,48). It is well-known that the acetylation phenotype is affected either by NAT2 genetic variants or by environmental variables.
It has been mentioned that diet and epigenetic factors could be potential modifiers of discordant association results reported by studies performed in ALL. Recent studies have suggested that dietary lifestyle has a significant influence on xenobiotic metabolism by modifying the gut microbiota and consequently NAT2 gene expression in the liver. Subjects from the same ethnic group but living in different geographic regions can be capable of responding differently to xenobiotic agents (49). Moreover, inconsistent and even contradictory data is not surprising, since multiple genes have been implicated in key pathways associated with leukemogenic processes. More over, studies of two or three genotypes in combination have also yielded inconsistent results.
It has also been hypothesized that slow acetylator phenotypes have suffered positive selection in populations under an insufficient folate diet (50). Studies in Mexican children have documented a folate deficiency in 11.2% of the children aged 4 years, which could explain our results (51). Nevertheless, our findings in the control group differ of those frequencies reported in general population from different regions of Mexico (43,50,52). NAT2 gene has a high frequency of functional variation that differs among ethnically diverse populations, in fact, NAT2 functional variation contributes to high levels of diversity, illustrating how geographically and temporally fluctuating xenobiotic environments may have influenced our genome variability and susceptibility to disease (52,53). We discarded population stratification because we performed an ancestry structure analysis in a subset of patient and controls randomly selected. Our data showed that ALL cases, as well as control subjects belong to Mestizo group, the main ethnic group of Mexican population. The ancestry composition observed in the present study is in accordance with the Conquest history of Mexican population, mostly comprised by European-and Amerindian-descendent groups (54,55).
Association analysis stratifying for exposure variables revealed that the homozygotes to the risk allele of rs1041983 (C282T) and rs1799931 (G857A) confer protection to ALL under parents exposition to diverse xenobiotics (mother: hydrocarbons, drug consumption, active smoking, alcohol consumption and insecticide; father: active smoking and alcohol drinking). To our knowledge, no previous studies have explored the relationship between NAT2 genotypes and these environmental factors in ALL. However, differential contribution of single NAT2 SNPs to the risk of ALL has been observed recently by Zhu et al., who performed a meta-analysis including 1,522 acute leukemia patients and 2,688 controls (11).
Since ALL is considered a multifactorial disease where xenobiotics could be important factors that contribute to its pathogenesis, and that NQO1, CYP2E1, and NAT2 are enzymes involved in the metabolism of xenobiotics (including benzene, cigarette smoking, chemotherapy agents, and alcohol drinking), which increase the risk to develop diverse human diseases (2,11,19), we used a MDR approach to identify combinations between genetic and environmental factors associated with ALL (29). Our data showed that the NAT2_rs1799929-NAT2_rs1208-ACMBP-MCMBP-PSFBC interact to increase the risk of ALL before pregnancy.
The NAT2_rs1041983, NAT2_rs1799929, NAT2_rs1799931, NQ01_rs1800566, IEAB, ASFAB was the best model after child's birth; meaning that these factors in combination increase the risk of ALL ( Table 5). Our data suggest that the polymorphism affect acetylation of chemicals compounds having aromatic amines as drugs, pesticides, cigarette smoke, increase the risk to develop acute ALL in Mexican children. An interaction between NAT2 and alcohol drinking and smoking with various outcomes have been reported. It was suggested that NAT2 could be involved in the activation of one or more pro-carcinogens associated with alcohol intake and the risk of oral squamous cell cancer (56). It is known that NAT2 contributes to detoxification of tobacco smoke, pesticides and even prescription drugs (57)(58)(59); nonetheless, there are no evidences reporting a direct interaction within NAT2 polymorphisms and the environmental factors identified in the present study with the risk to develop hematological diseases. The underlying mechanism for the link between genetic polymorphisms in these genes and insecticide in the development of ALL is not fully understood. A study conducted in infant leukemia with maternal exposure to dipyrone during pregnancy reported that NAT2 SNPs are associated with this malignancy regardless of maternal exposure to the medication (59). Notwithstanding, it is well-known that studies addressing multi-gene rather than single-gene polymorphisms in xenobiotic genes could improve our knowledge of the genetic risk factors involved in ALL pathogenesis (4,6,25,60). This is a study that included patients from 8 public hospitals in Mexico City, what represents slightly <70% of all cases with leukemia in this city (61). It is also the first investigation to evaluate the interaction between NQO1 (rs1800566), CYP2E1 (rs3813867), and NAT2 [rs1041983 (C282T), rs1801280 (T341C), rs1799929 (C481T), rs1799930 (G590A), rs1208 (A803G), and rs1799931 (G857A)] polymorphisms and exposure to common environmental hydrocarbons. In Supplementary Table 4, we can see the sociodemographic characteristics and the distribution of the hydrocarbon exposure variables of the children diagnosed with ALL of the Hospital that was not included in the present analysis. When comparing frequencies, there are no important differences to highlight, thus reducing the possibility of selection biases.
On the other hand, with respect to the variables included in this study to assess the exposure to hydrocarbons, we used a similar strategy reproduced in previous studies where the association between the exposure to hydrocarbons and the risk Best models gene-environment interactions with cross-validation consistency (CVC) and prediction error per the n-locus model that have been obtained by the multifactor-dimensionality reduction (MDR) of our data set. Table is displaying best models with TBA > 0.57, *Values from Whole Statistics data set. TA, training balance accuracy; TBA, testing balance accuracy; PSFBC, passive smoking for the father before conception; MCMBP, Medicine consumption for the mother before pregnancy; ACMBP, alcohol consumption by the mother before pregnancy; MEHBP, hydrocarbons exposure by the mother before pregnancy; ASMP, active smoking for the mother during pregnancy; ASMBP, active smoking by the mother before pregnancy; PSFDP, passive smoking by the father during pregnancy; ASFP, active smoking by the father during pregnancy; HEMDP, hydrocarbon exposure by the mother during pregnancy; ASMAB, active smoking by the mother after child's birth; IEAB, insecticide exposure of the mother after birth; ASFAB, active smoking by the father after birth.
of developing childhood leukemia has been analyzed (62)(63)(64)(65). This study had the disadvantage that it was limited to what the cases and controls remember, but our instrument included a question if the parents of the cases identified whether a possible cause of the disease was exposure to hydrocarbons, tobacco or alcohol consumption. There were no responses related to this. Thus, it can be noted that there is little likelihood of recall bias associated with the measurement of these variables. Another point in favor is that neither the interviewers nor the interviewees knew the results of the polymorphisms, so finding the existence of interactions between some variables and the polymorphisms reinforces the strategy with which our variables were measured. A logistic regression model was performed to evaluate the variables that could potentially be confusing, which is shown in Table 1.
This was complemented with a recessive analysis which allowed us to adjust for possible confounding variables.
There were some limitations in the present study. On one hand, we tested only one SNP of NQ01 and CYP2E1 genes, thus we cannot discard the association among these genes and ALL or the interaction among SNPs in these genes and environmental xenobiotics. On the other hand, subjects were grouped into three different NAT2 acetylator phenotypes based only on six NAT2 SNPs (slow acetylators: two slow alleles, intermediate acetylators: one slow and one rapid allele, and rapid acetylators: 2 rapid alleles). Considering that we did not study the rs1801279 (191G>A), the assessment of only these six SNPs could result in a misclassification of some NAT2 alleles (39). In addition, new alleles of NAT2 have been identified and there is there is phenotype heterogeneity within the slow and intermediate acetylator genotype groups, due to variation in enzyme activity conferred by different alleles (66). Low genotype rate could bias our phenotype results; nevertheless, it has been reported that using only the rs1041983 (C282T), rs1801280 (T341C) SNPs, it is possible to predict the NAT2 phenotype with high sensitivity and specificity (0.9993 and 0.9880, respectively) in Caucasian, Latin-American and Middle East populations (42,67). Otherwise, controversial results in NAT2 association findings among studies could be explained by factors, such as sample size, age group, genotyping method, and the time of exposure to risk agents (7,68,69).

CONCLUSION
To the best of our knowledge, this is the first assessment of the interaction between hydrocarbon exposure and genetic polymorphisms of NAT2, NQO1, and CYP2E1 on the risk of childhood ALL in the Mexican population, and the first report of ancestry background in Mexican children with ALL. Our study provides evidence that polymorphisms of NAT2 might be genetic factors involved in childhood ALL. These results shed light on the contribution of NAT2 polymorphisms to increase the risk of developing ALL in children.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics and National Committee of Scientific Research of the Instituto Mexicano del Seguro Social approved this study with number R-2013-785-062. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.