Association Study Reveals Genetic Loci Responsible for Arsenic, Cadmium and Lead Accumulation in Rice Grain in Contaminated Farmlands

Accumulation of toxic heavy metals and metalloids (THMMs) in crop grain remarkably affects food safety and human health. Reducing the content of THMMs in grain requires the identification and manipulation of the genes regulating their accumulation. This study aimed to determine the genetic variations affecting grain THMM accumulation in rice by using association mapping. We used 276 accessions with 416 K single nucleotide polymorphisms (SNPs) and performed genome-wide association analysis of grain THMM concentrations in rice grown in heavily multi-contaminated farmlands. We detected 22, 17, and 21 quantitative trait loci (QTLs) for grain arsenic, cadmium, and lead concentrations, respectively. Both inter- and intra-subpopulation variants accounted for these QTLs. Most QTLs contained no known THMM-related genes and represented unidentified novel genes. We examined the candidate genes in qGAS1, a QTL for grain arsenic concentration with the best P-value detected for the entire population. We speculated that a transport protein of the multidrug and toxin extrusion family could be the candidate gene for this QTL. Our study suggested that the genetic regulation of grain THMM accumulation is very complex and largely unknown. The QTLs and SNPs identified in this study might help in the identification of new genes regulating THMM accumulation and aid in marker-assisted breeding of rice with low grain THMM content.


INTRODUCTION
Farmlands, especially those near mines, are subjected to THMM contamination (Loska et al., 2004;Wei and Yang, 2010;Li et al., 2014). THMMs in the soil, such as arsenic (As), cadmium (Cd), and lead (Pb), are absorbed and accumulated in crop plants, which poses a serious risk for food safety (Clemens and Ma, 2016). Rice (Oryza sativa L.) is a major food crop consumed by more than one-half of the world population. When rice is grown in contaminated farmlands or irrigated with contaminated water, high levels of THMMs accumulate in grain and adversely affect human health (Clemens and Ma, 2016). In Japan, consumption of Cd-contaminated rice caused the Itai-itai disease (Yamagata and Shigematsu, 1970;Horiguchi et al., 1994). In Bangladesh, China, and India, the elevation of As content in rice significantly increased the risk of cancer (Meharg et al., 2009;Melkonian et al., 2013). In China, grain enriched with Pb concentrations were detected in field-collected brown rice and market-collected white rice from mine-impacted and E-waste recycling areas (Fangmin et al., 2006;Fu et al., 2008;Williams et al., 2009;Norton et al., 2014b). Pb toxicity can cause gastrointestinal disturbances, renal damage, and neurological effects and adversely affect the intellectual ability of children (Canfield et al., 2003;Fewtrell et al., 2003).
In plants, the accumulation abilities of THMMs are quantitative traits, and their transport and metabolism are regulated via complex processes (Clemens and Ma, 2016). Identification and manipulation of THMM-related genes could help reduce the THMM content in crops. Arsenite, the primary form of As in paddy fields, is absorbed by aquaporins (water channels) such as OsLsi1 in rice roots and translocated to the xylem by OsLsi2 (Bienert et al., 2008;Ma et al., 2008). Mutations in OsLsi1 decrease arsenite uptake, and those in OsLsi2 remarkably reduce As accumulation in the shoots and grain (Ma et al., 2008). The overexpression of two other aquaporins OsNIP1;1 and OsNIP3;3 can reduce grain arsenite accumulation by disrupting its radial transport in root (Sun et al., 2018). OsLsi2 also functions in arsenite storage in rice nodes, which significantly affects As distribution to the grain (Chen et al., 2015). In the node, arsenite is mainly chelated by phytochelatins, and the complexes are then transported into the vacuole by OsABCC1, which reduces grain As accumulation (Song et al., 2014;Chen et al., 2015). In rice, the uptake of arsenate, another form of As, is mediated by phosphate transporters such as OsPT1, and their disruption decreases arsenate uptake and As accumulation (Kamiya et al., 2013;Wang et al., 2016;Cao et al., 2017;Ye et al., 2017). In cells, arsenate is reduced to arsenite by arsenate reductases (Chao et al., 2014;Shi et al., 2016;Xu et al., 2017). The overexpression of rice arsenate reductases OsHAC1;1, OsHAC1;2, or OsHAC4 decreases grain As accumulation (Shi et al., 2016;Xu et al., 2017).
The uptake of Cd is mediated by OsNramp5 in rice roots; Cd can be transported into the vacuole by OsHMA3 (Ueno et al., 2010;Miyadate et al., 2011;Sasaki et al., 2012Sasaki et al., , 2014. The overexpression of OsHMA3 reduces grain Cd accumulation (Ueno et al., 2010). OsHMA2 can transport Cd from the apoplast to the symplast to facilitate its translocation via the phloem (Satoh-Nagasawa et al., 2012;Takahashi et al., 2012;Yamaji et al., 2013). OsLCT1, a low-affinity cation transporter, also regulates Cd transport into rice grain, and its knockdown impairs grain Cd accumulation (Uraguchi et al., 2011). Further, transporters located in the rice nodes have been shown to control Cd storage in plants, thereby impacting the regulation of Cd distribution in developing grain (Huang et al., 2017). Recently, a defensinlike protein (CAL1) was found to chelate Cd and facilitate its secretion into extracellular spaces (Luo et al., 2018). However, the genes responsible for Pb transport and metabolism in rice are not yet known (Clemens and Ma, 2016).
Genome-wide association study by using natural populations is a powerful and efficient strategy to detect genetic variations and alleles of various phenotypes and agronomic traits (Huang and Han, 2014;Zhang et al., 2016). Variations affecting grain THMM accumulation in rice have also been investigated using association mapping. Norton et al. (2014a) completed the association mapping of grain As concentration in rice using 300 accessions and 37 K SNPs. Very recently, Yang et al. (2018) reported the association mapping of ionomic variations of rice, including As, Cd, and Pb concentrations in grain. However, none of these studies was performed in heavily multi-contaminated soil, and such contamination is known to occur in fields, especially those located near mines. In this study, we used a rice population of 276 accessions with 416 K SNPs to complete the association mapping of grain THMM (As, Cd, and Pb) concentrations in rice grown in heavily multi-contaminated farmlands. We successfully identified 60 quantitative trait loci (QTLs; 55 non-redundant loci) for grain THMM concentrations. The majority of these loci were previously unidentified. We successfully predicted the possible candidate gene for a QTL with the best P-value in the entire population.

Rice Population and Genotyping
The rice population was from a global-wide rice diversity panel consisting of 413 inbred accessions (Zhao et al., 2011). In all, 276 accessions with suitable heading date time and good germination performance were selected for field experiments (Supplementary Table S1). These accessions were previously genotyped using three types of SNP arrays (1536, 44 K, and 700 K SNP chips) (Zhao et al., 2010(Zhao et al., , 2011McCouch et al., 2016). We successfully extracted 415 754 high-quality SNPs (missing rate, <0.2; minor allele frequency, >0.05) from these array data for the 276 accessions. About 45% of the SNPs were located within genes, half of which were distributed within exons; the remaining 55% of SNPs were located in intergenic regions, half of which were distributed in putative regulatory regions (2 kb before a transcriptional start site) (McCouch et al., 2016). We also extracted high-quality SNPs for each investigated subpopulation (190,616,198,790,117,880,and 147,039 SNPs,respectively). The SNP data were processed using PLINK version 1.07 (Purcell et al., 2007). Based on these 416 K SNPs, we conducted PC analysis by using GCTA version 1.91 (Yang et al., 2011). Population structure was analyzed using fastSTRUCTURE, and the optimum subpopulation number was determined to be K = 5 (Raj et al., 2014;McCouch et al., 2016).

Field Experiments in Contaminated Farmlands
The rice plants were cultivated in farms, which were reported to be heavily contaminated by mining tailings from nearby abandoned mines Wu et al., 2013) in Shangyu, Shaoxing City, Zhejiang Province, China (N30 • 00 14 , E120 • 46 39 ) in 2014 and 2015. The soil levels of As, Cd, and Pb were about 3,000, 4.0, and 15,000 mg/kg, respectively, which are considerably higher than the environmental quality standard for soils in China, as well as soil background levels of THMMs in Zhejiang Province Wu et al., 2013).
For each year, each accession was arranged in a randomized complete block design into two-row plots with 12 plants per row at spacing of 20 cm × 17 cm with three replications. The farms were managed under flooded condition. Seeds for each accession were bulk-harvested from the eight central plants from each replication at the maturing stage. Before the grain THMM concentrations were determined, the seeds were dried for 7 days and then stored for 3 months.

Evaluation of Grain THMM Concentrations
The de-hulled seeds (brown rice) of each accession for each year were milled into fine flour, followed by filtering (0.180 mm). For each sample, 0.25 g filtered powder was placed into a 50 ml polypropylene digestion vessel containing 8 ml nitric acid, and then mixed in a graphite furnace (Agilent Technologies, CA) at 65 • C for 30 min. The mixture was digested at 120 • C until 3 ml was left. After cooling, 3 ml hydrogen peroxide was added to each digestion vessel. The mixture was again digested until 1 ml was left. After digestion, the solution was diluted to 50 ml with ultrapure water and mixed to uniformity. The diluted solutions were analyzed for As, Cd, and Pb by using inductively coupled plasma mass spectroscopy (Agilent 7500A; Agilent Technologies, CA). The rice reference material (GBW10045) was used as the standard. Three technological repetitions were performed for each sample, and three biological repetitions were used for each accession. The averages of 2-year repeats were used for data analyses. Correlation tests showed a significant correlation between each pair of 2-year repeats (P = 4.52 × 10 −4 to 2.63 × 10 −53 ; Supplementary Table S2). Two-way ANOVA was employed to examine the genotype and year effects on grain THMM concentrations (Supplementary Table S3).

Genome-Wide Association Study
We examined the LD decay for the 276 lines by using 416 K SNPs and PopLDdecay version 3.4 . The following parameters were used: maximum distance between two SNPs, 500 kb; missing rate, <0.2; minor allele frequency, >0.05. We also calculated LD decays for each subpopulation. The widely accepted linear mixed model was used for genomewide association study and was implemented in GEMMA version 0.97 (Zhou and Stephens, 2012). The population structure was controlled using a relatedness matrix that was calculated using all high-quality SNPs mentioned above. In addition to association mapping for the entire population, we performed association mapping for the respective subpopulations (not including the admixed subpopulation and the very limited aromatic subpopulation). For association mapping for subpopulations, high-quality SNP datasets were obtained for each subpopulation as mentioned above, and a corresponding relatedness matrix was constructed using the respective SNP dataset. Null hypothesis of no association was tested, and the statistical probability values were calculated using Wald test in GEMMA. The threshold for significance was determined according to a previous study (McCouch et al., 2016). In the case of a strong background in some mapping results (for grain Cd and Pb concentrations in the AUS subpopulation), we adjusted the threshold for significance by using Bonferroni correction for multiple comparisons at a false discovery rate of 0.05 (P-value = 0.05/SNP number).

Candidate Gene Analyses
Considering the balance between sensitivity and reliability, the potential candidate genes were predicted within a 400 kb genomic region for each locus (±200 kb of the peak SNPs). For the loci with multiple significant SNPs, the potential candidate genes were predicted within ±200 kb of the borders of the significant SNPs. Pairwise LD was determined by calculation of r 2 (the square of the correlation coefficient between SNP states) using PLINK. LD blocks were analyzed using LDheatmap version 0.99 (Shin et al., 2006), and candidate LD blocks were estimated using r 2 > 0.8. The transcriptomic data of rice under THMM stress were extracted from the supplemental data of the related references and NCBI GEO database (Huang et al., 2012;Yu et al., 2012;Ogo et al., 2014). The expression profile of the candidate genes was analyzed using the Bio-Analytic Resource Plant Biology database 1 . The differences in THMM concentration between the haplotypes were statistically analyzed using Welch's t-test (two haplotypes) or Tukey's test (more than two haplotypes). All SNP variations across genes were extracted from the whole-genome resequencing data of 3 000 Rice Genomes Project (3 K filtered dataset) (Mansueto et al., 2017;Wang et al., 2018).

Grain THMM Concentrations in Rice Grown in Contaminated Farmlands
Significant differences in the concentration of the three THMMs (As, Cd, and Pb) in grain (brown rice) were noted among the 276 global representative rice accessions (Supplementary Table S1) grown in a multi-contaminated farm with high soil As, Cd, and Pb levels, suggesting vast variations of grain THMM accumulation abilities of rice (Figure 1 and Table 1). The means for grain As, Cd, and Pb concentrations were 0.34 ± 0.11, 0.12 ± 0.06, and 0.78 ± 0.45 mg/kg, respectively ( Table 1). The grain As concentration showed minimum variation with a 6.25-fold difference, whereas grain Cd concentration showed FIGURE 1 | Diversity of grain toxic heavy metal and metalloid concentrations of the 276 global rice accessions grown at the multi-contaminated farmland. Histograms of grain (brown rice) As, Cd, and Pb concentrations (averages of 2 years) are shown separately. maximum variation with a 23.57-fold difference ( Table 1). The grain As and Pb concentrations of most accessions were over the maximum permitted level for grain commercialization, but the grain Cd concentration of most accessions were below the permitted level (Figure 1). Flooded condition during cultivation could be responsible for the relatively low accumulation of Cd in grain in our study. The accessions with low grain THMM accumulation are listed in Supplementary Table S4 for rice breeding.
Compared to the normal distribution, the frequency distributions of all three grain THMM concentrations, especially that for Pb, showed unique stronger peaks and longer right tails (Figure 1 and Table 1). The broad-sense heritability (H 2 ) of grain As, Cd, and Pb concentrations were 0.73, 0.35, and 0.21, respectively, suggesting that environmental factors have an important impact on grain THMM accumulation, especially for Cd and Pb (Table 1). Correlation tests showed a significant correlation between each pair of 2-year repeats for grain THMM concentrations (all P ≤ 4.52 × 10 −4 , Supplementary Table S2), enabling the examination of the genetic basis for grain THMM concentrations. Two-way ANOVA was then performed to examine both the genotype effect and year effect on grain THMM concentrations. Significant genotype effects were observed (all P ≤ 2.37 × 10 −4 , Supplementary Table S3), while no significant year effect was found for all three grain THMM concentrations (all P ≥ 0.41, Supplementary Table S3).
These results together suggest that the environmental factors affecting grain THMM concentrations are complex and remain unclear. Correlation analysis performed among the three grain THMM concentrations revealed a significant positive correlation between grain Cd and Pb concentrations (r = 0.63, P = 4.87 × 10 −34 ) and a negative correlation between grain As and Cd concentrations (r = −0.15, P = 0.02). This suggests that grain Cd accumulation could be impacted by similar factors as those for grain Pb accumulation, but different from those for grain As accumulation. No significant correlation was noted between grain As and Pb concentrations.

Variation of Grain THMM Concentrations Among Different Rice Subpopulations
Previous PC analysis revealed a clear population structure of the rice diversity panel (Zhao et al., 2011). We extracted 415,754 high-quality SNPs (missing rate, <0.2; minor allele frequency, >0.05) from three SNP arrays for the selected 276 accessions, with a high density of 0.90 kb/marker ( Table 2). The PC analysis based on these 416 K SNPs also indicated the same population structure for the selected accessions (Supplementary Figure S1). Next, we performed correlation analysis between grain THMM concentrations and the top four PCs. We found a positive correlation between PC3 and all three grain THMM concentrations and a negative correlation between PC4 and grain As concentration, suggesting that genetic diversity plays a role in grain THMM accumulation abilities ( Table 3).
Population structure analysis of these 276 accessions determined the optimum subpopulation number to be five  Table S1; Zhao et al., 2011). The top four PCs clearly separated these subpopulations except the ADM subpopulation (Supplementary Figure S1). Different subpopulations showed variations in grain THMM concentrations (Figure 2). The ARO subpopulation showed the lowest grain As and Pb concentration, and the TRJ subpopulation had the lowest grain Cd concentration; TEJ, ARO, and IND subpopulations had the highest grain As, Cd, and Pb concentrations, respectively (Figure 2). These data indicate a strong relationship of population structure with grain THMM concentrations. Further, we noted remarkable variations in grain THMM concentrations within some subpopulations, for example, AUS, IND, and TEJ subpopulations showed large variations of grain As, Cd, and Pb concentrations, respectively (Figure 2), suggesting the existence of major QTLs within the rice subpopulations.

Genome-Wide Association Mapping in the Rice Population
The LD decay analysis yielded similar results as those reported previously (Supplementary Figure S2) (McCouch et al., 2016). We used linear mixed model for genome-wide association mapping, which has been widely used and validated to be efficient (Yang et al., 2014). The impact of population structure in association mapping was determined by estimating the genetic relatedness. The association mapping performed using 416 K high-quality SNPs identified a total of 22 QTLs (21 nonredundant loci) reaching the thresholds (P < 1.00 × 10 −5 ; Figures 3-6 and Tables 4-6). The detection power was the highest for grain As concentration, and 10 QTLs were detected in the rice population (Figure 3 and Table 4). Seven QTLs were found to be responsible for grain Cd concentration, and the remaining five were responsible for grain Pb concentration (Figures 4, 5 and Tables 5, 6). A QTL in chromosome 5 was detected for both grain As and Pb  qGPB3; Figures 3, 5, 6 and Tables 4, 6).

Genome-Wide Association Mapping in the Rice Subpopulations
Although the impact of population structure has been considered during association mapping, we still performed association mapping for each of the subpopulations considering a strong relationship between population structure and grain THMM concentrations. The association mapping was not possible for the ARO subpopulation because of very limited accessions and was not performed for the ADM subpopulation. For the AUS, IND,TEJ,and TRJ subpopulations,190,616,198,790,117,880,and 147,039 high-quality SNPs (missing rate, <0.2; minor allele frequency, >0.05) were successfully extracted, respectively. We detected four QTLs in each of the AUS, IND, and TRJ subpopulations for grain As concentration (totally 12 QTLs; Figure 3 and Table 4). For grain Cd concentration, four, four, and two QTLs were detected in the IND, TEJ, and TRJ subpopulations, respectively (totally 10 QTLs; Figure 4 and Table 5). For grain Pb concentration, four, two, two, and FIGURE 4 | Genome-wide association mapping results for grain Cd concentrations. Manhattan plots of association mapping results (P-values) are shown at the (left), and the corresponding Quantile-Quantile plots are shown at the (right). Association mappings were performed for the entire population with all 276 accessions, as well as for each subpopulation with adequate accessions (AUS, aus; IND, indica; TEJ, temperate japonica; TRJ, tropical japonica). The red dashed lines indicate the threshold for significance (log 10 -transformed P-values, <5.0 or 6.6 for AUS subpopulation).
eight QTLs were identified in the AUS, IND, TEJ, and TRJ subpopulations, respectively (totally 16 QTLs; Figure 5 and Table 6). Consequently, 8, 10, 6, and 14 QTLs were discovered in the AUS, IND, TEJ, and TRJ subpopulations, respectively. Three QTLs (qGAS16, qGAS17, and qGPB6) were co-localized with the QTLs detected in the entire rice population (Figure 6). Additionally, two QTLs for grain As concentration, one in the AUS subpopulation (qGAS18) and the other in the TRJ subpopulation (qGAS22), were co-localized on the chromosome 11 (Figure 6).

Candidate Gene Prediction for Grain THMM Accumulation
We determined whether the identified QTLs harbored known THMM-related genes. qGAS8 and qGAS17 contain OsPIP2;7, encoding an aquaporin involved in arsenite permeability and transport (Mosa et al., 2012). qGAS12 contains OsARM1 and OsPT23. The former encodes a MYB transcription factor responsible for As root-to-shoot translocation , and the latter encodes a phosphate transporter involved in the uptake of arsenate (Yu et al., 2012;Clemens and Ma, 2016). Next, we predicted the possible candidate genes in the regions of QTLs (±200 kb of the significant SNPs; Supplementary  Table S5). Genes encoding proteins with transport functions are known to play important roles in THMM accumulation in plants (Clemens and Ma, 2016). We found that 61.8% of the identified QTLs contained genes encoding transporter-related proteins ( Supplementary Table S5). Notably, high concentrations of THMMs in the soil used in this study could induce toxic effect and stress for rice plants. Interestingly, OsPIP2;7 and OsARM1 harbored in the identified QTLs (qGAS8, qGAS12, FIGURE 5 | Genome-wide association mapping results for grain Pb concentrations. Manhattan plots of association mapping results (P-values) are shown at the (left), and the corresponding Quantile-Quantile plots are shown at the (right). Association mappings were performed for the entire population with all 276 accessions, as well as for each subpopulation with adequate accessions (AUS, aus; IND, indica; TEJ, temperate japonica; TRJ, tropical japonica). The red dashed lines indicate the threshold for significance (log 10 -transformed P-values < 5.0 or 6.6 for AUS subpopulation). and qGAS17) for grain As accumulation were reported to be involved in As tolerance and detoxification (Mosa et al., 2012;Wang et al., 2017). By examining the THMM stressresponsive genes previously identified using transcriptome analysis in rice (Huang et al., 2012;Yu et al., 2012;Ogo et al., 2014), we found that 12 and six QTLs for grain As and Cd concentrations contain genes responding to As and Cd stress, respectively, including OsARM1 (Supplementary  Table S6).
Subsequently, we focused on qGAS1, which had the best P-value among the QTLs detected in the entire population. Analysis of LD blocks revealed two blocks in the region of qGAS1 (Figure 7). Almost all significant SNPs were located in the second block (Figure 7). We found four transporter-related genes in this block. According to protein homology, LOC_Os01g55500 encodes a nucleobase-ascorbate transporter, LOC_Os01g55600 and LOC_Os01g55610 encode nitrate transporters, and LOC_Os01g56050 encodes a MATE (also known as multi-antimicrobial extrusion) transport protein. The expression profiles of the four genes showed that they were mainly expressed in seeds/inflorescences, except LOC_Os01g55500 (which was mainly expressed in the shoot apical meristem; Figure 8). Next, we performed haplotype analysis for these genes (Figure 9A). LOC_Os01g55500 and LOC_Os01g56050 showed different grain As concentrations between some of the haplotypes (Figure 9B). Taken together, our findings suggest that LOC_Os01g56050 might be a possible candidate gene for qGAS1.
In our SNP dataset, only three SNP markers were found on the promoter (within 1 kb) of LOC_Os01g56050, which divided the gene alleles into three haplotypes ( Figure 9A). Haplotype A with the highest grain As concentration was primarily distributed in the TEJ subpopulation, whereas haplotype C with the lowest grain As concentration was primarily distributed in the TRJ subpopulation (Figures 9B,C). We then extracted all the SNPs on LOC_Os01g56050 from the whole-genome resequencing data of rice population . The eight SNPs found on this gene also divided the gene alleles into three major haplotypes Threshold for significance, P-value < 1.00 × 10 −5 . a Chr., chromosome. b SSN, significant SNP number. c SNP with the best P-value. d Allele, minor/major alleles. e MAF, minor allele frequency. f Effect, allele effect with respect to the minor allele. Threshold for significance, P-value < 1.00 × 10 −5 (<2.62 × 10 −7 for AUS subpopulation). a Chr., chromosome. b SSN, significant SNP number. c SNP with the best P value. d Allele, minor/major alleles. e MAF, minor allele frequency. f Effect, allele effect with respect to the minor allele. Table S7). Two of the SNPs caused nonsynonymous changes between the haplotypes (valine 213 to isoleucine 213 and lysine 459 to isoleucine 459 ; Supplementary Table S7). The other SNPs were located in the introns or caused synonymous changes (Supplementary Table S7). Both valine and isoleucine are hydrophobic amino acids and have similar biochemical characters, whereas lysine and isoleucine have distinct biochemical characters (lysine is a basic amino acid). Furthermore, lysine 459 is a relatively conserved residue in the MATE domain (Supplementary Figure S3). Therefore, the Threshold for significance, P-value < 1.00 × 10 −5 (<2.62 × 10 −7 for AUS subpopulation). a Chr., chromosome. b SSN, significant SNP number. c SNP with the best P-value. d Allele, minor/major alleles. e MAF, minor allele frequency. f Effect, allele effect with respect to the minor allele.

DISCUSSION
Although the genetic analysis of grain THMM accumulation is important, the association mapping of the causal genetic loci has not yet been comprehensively conducted in rice grown in heavily multi-contaminated farmlands. We completed the association mapping of three THMM concentrations in rice grain by using 276 accessions and 416 K SNP markers. In addition to the 22 QTLs (21 non-redundant loci) identified in the entire population, we discovered 48 QTLs (47 non-redundant loci) in the subpopulations. In all, we detected 60 QTLs (55 non-redundant loci) for grain THMM concentrations in rice. The genes involved in As accumulation have been investigated widely, but only three of 22 QTLs for grain As concentration contain known As-related genes. No known Cd-/Pb-related genes were found in the 17/21 QTLs for grain Cd/Pb concentration. These findings suggest that the genetic regulation of grain THMM accumulation is very complex and remains largely unknown.
Notably, the detection powers for grain Cd and Pb concentrations in the entire population were lower than those for grain As concentration (Figure 3 and Table 4). This could be because the heritability of As accumulation is considerably higher than that of Cd (flooded condition) and FIGURE 8 | Expression profiles of four potential candidate genes for the qGAS1 locus. The relative expression level indicates the fold change of the signal for the gene relative to the control signal. The different stages of panicle and seed development are based on the study of Itoh et al. (2005). SAM, shoot apical meristem.
Pb accumulation (Table 1), as revealed by previous studies (Norton et al., 2014b;Pinson et al., 2015). We observed a significant positive correlation between grain Cd and Pb concentrations, but failed to detect some common QTLs governing both their concentrations. This implies that both grain Cd and Pb accumulation are impacted by similar non-genetic factors. Cd and Pb accumulation in plants is known to be strongly affected by environments (Norton et al., 2014b;Pinson et al., 2015;Clemens and Ma, 2016). Therefore, environmental factors, rather than genotypes, might be attributed to this positive correlation. Both the broad-sense heritabilities of grain Cd and Pb concentrations were lower than 40% (Table 1), supporting the large impact of environmental factors on Cd and Pb accumulation in rice grain. Compared with those under unflooded field condition, the average grain concentrations of As were higher and those of Cd were lower under flooded field condition (Pinson et al., 2015). Consistent with this, we found a negative correlation between grain As and Cd concentrations under our flooded field condition, which was also responsible for the relatively low accumulation of grain Cd in the rice accessions grown in the soil with high Cd concentrations (Figure 1 and Table 1). We only found a few QTLs involved in the accumulation of two kinds of THMMs, suggesting that the accumulation of different THMMs in grain is differently regulated.
Because of the relatively small number of accessions, the detection powers for association mapping could have decreased in the respective subpopulations. Unexpectedly, we detected several strong peak signals, such as qGPB19 for grain Pb concentration in TRJ subpopulation, which were not detected in the entire population (Figures 3-6 and Tables 4-6). This suggests the existence of major intra-subpopulation QTLs for grain THMM concentrations, which are not detected in the entire population. Subpopulation-specific QTLs might be determined by the presence/absence or different frequencies of specific alleles in one or more subpopulations. A weak population structure in the subpopulations could also benefit the association mapping (Yano et al., 2016). We did not perform association mapping for each PC, because the subpopulations were well consistent with the PCs (Supplementary Figure S1). Norton et al. (2014a) performed association mapping of grain As [as well as copper, molybdenum, and zinc (Zn)] concentration by using 37 K SNPs. Nine of the QTLs identified in our study were co-localized with those identified in their study (Figure 6). Therefore, 13 loci for grain As concentration were newly identified in our study. Considerably more SNP markers and different field environments might partly account for these differences in QTLs. The association study of grain Cd and Pb accumulation was not performed by Norton et al. (2014a). We also compared our results with those of Yang et al. (2018), which were obtained in normal farmlands. Six of the QTLs for grain As concentration were co-localized with those identified in their study (Figure 6). Surprisingly, none of the QTLs we identified for grain Cd and Pb concentration were co-localized with those in their study. This suggests that environmental factors considerably affect the variations of grain Cd and Pb concentrations, or low and high Cd/Pb concentrations could be regulated by different mechanisms. Contrarily, four QTLs (2 non-redundant loci) for grain As concentration were detected (co-localization) in all three studies (Figure 6). These commonly detected QTLs should be useful for breeding rice varieties with low accumulation of THMMs.
High soil THMM concentrations could impose THMM stress on rice plants and induce stress responses in this study. Some of the QTLs identified for grain THMM accumulation might also be related to THMM tolerance. In line with this assumption, two candidate genes (OsPIP2;7 and OsARM1) in the QTLs for grain As accumulation have been reported to function in As tolerance and detoxification (Mosa et al., 2012;Wang et al., 2017), and 18 QTLs for grain As and Cd concentrations contain genes responding to As or Cd stress. In this study, the soil also contained a high level of Zn (about 500 mg/kg) Wu et al., 2013). This could trigger plant responses to the toxic effects, but whether high Zn level directly or indirectly influences THMM accumulation in grain is not yet known. The unclear complex effects of multiplex soil contamination on plants limit the strength of this research. However, multiple contaminations widely exist in farmland near mines and E-waste recycling areas. In the future, the possible multiplex toxic effects of multiple THMMs in soil and the possible interaction effects between THMM accumulation in plants need to be investigated.
The qGAS1 with the best P-value in the entire population was also not detected by Norton et al. (2014a). We examined whether transporter-related genes could be the candidate gene for qGAS1 (Figures 7-9). LOC_Os01g56050 encodes a transport protein belonging to the MATE family, which is involved in multiple functions and processes, including metal detoxification (Omote et al., 2006;Takanashi et al., 2014). Several plant MATE transporters have been reported to mediate metal (aluminum, iron, and Cd) translocation and tolerance (Li et al., 2002;Rogers and Guerinot, 2002;Magalhaes et al., 2007;Yokosho et al., 2009Yokosho et al., , 2011Maron et al., 2010). The substrates of MATE transporters are usually organic cations (Omote et al., 2006). The MATE transporter encoded by LOC_Os01g56050 might transport As-ligand complexes such as phytochelatins to decrease As accumulation in grain. The confirmation of LOC_Os01g56050 as the causative gene for qGAS1 needs to be further confirmed using CRISPR/Cas9 editing or transgene lines.
We identified several rice accessions with low grain THMM concentrations, which can be used for rice breeding (Supplementary Table S4). However, few accessions showed low accumulation of all three THMMs in grain, suggesting the need of pyramiding breeding for low grain THMM accumulation rice, i.e., introduction of respective loci for low accumulation of different THMMs into one accession. For these accessions with low grain THMM concentrations, we didn't find a common geographical origin or a clear phylogenetic cluster, suggesting the complex regulation of THMM accumulation. In conclusion, our study reveals the complex regulation of THMMs in rice grain and novel loci and genes governing THMM accumulation. Both genetic and environmental factors contribute to the variation of grain THMM concentrations. The accumulation of different THMMs in grain is regulated by different genetic loci/genes involved in different pathways. Our results can help reduce THMM contents in rice grains and promote marker-assisted breeding of rice with low THMM content.