Induced Mutagenesis in UGT74S1 Gene Leads to Stable New Flax Lines with Altered Secoisolariciresinol Diglucoside (SDG) Profiles

Flax secoisolariciresinol (SECO) diglucoside (SDG) lignan is an emerging natural product purported to prevent chronic diseases in humans. SECO, the aglycone form of SDG, has shown higher intestinal cell absorption but it is not accumulated naturally in planta. Recently, we have identified and characterized a UDP-glucosyltransferase gene, UGT74S1, that glucosylates SECO into its monoglucoside (SMG) and SDG forms when expressed in yeast. However, whether this gene is unique in controlling SECO glucosylation into SDG in planta is unclear. Here, we report on the use of UGT74S1 in reverse and forward genetics to characterize an ethyl methane sulfonate (EMS) mutagenized flax population from cultivar CDC Bethune and consisting of 1996 M2 families. EMS mutagenesis generated 73 SNP variants causing 79 mutational events in the UGT74S1 exonic regions of 93 M2 families. The mutation frequency in the exonic regions was determined to be one per 28 Kb. Of these mutations, 13 homozygous missense mutations and two homozygous nonsense mutations were observed and all were transmitted into the M3 and M4 generations. Forward genetics screening of the population showed homozygous nonsense mutants completely lacking SDG biosynthesis while the production of SMG was observed only in a subset of the M4 lines. Heterozygous or homozygous M4 missense mutants displayed a wide range of SDG levels, some being greater than those of CDC Bethune. No additional deleterious mutations were detected in these mutant lines using a panel of 10 other genes potentially involved in the lignan biosynthesis. This study provides further evidence that UGT74S1 is unique in controlling SDG formation from SECO and this is the first report of non-transgenic flax germplasm with simultaneous knockout of SDG and presence of SMG in planta.

Flax secoisolariciresinol (SECO) diglucoside (SDG) lignan is an emerging natural product purported to prevent chronic diseases in humans. SECO, the aglycone form of SDG, has shown higher intestinal cell absorption but it is not accumulated naturally in planta. Recently, we have identified and characterized a UDP-glucosyltransferase gene, UGT74S1, that glucosylates SECO into its monoglucoside (SMG) and SDG forms when expressed in yeast. However, whether this gene is unique in controlling SECO glucosylation into SDG in planta is unclear. Here, we report on the use of UGT74S1 in reverse and forward genetics to characterize an ethyl methane sulfonate (EMS) mutagenized flax population from cultivar CDC Bethune and consisting of 1996 M2 families. EMS mutagenesis generated 73 SNP variants causing 79 mutational events in the UGT74S1 exonic regions of 93 M2 families. The mutation frequency in the exonic regions was determined to be one per 28 Kb. Of these mutations, 13 homozygous missense mutations and two homozygous nonsense mutations were observed and all were transmitted into the M3 and M4 generations. Forward genetics screening of the population showed homozygous nonsense mutants completely lacking SDG biosynthesis while the production of SMG was observed only in a subset of the M4 lines. Heterozygous or homozygous M4 missense mutants displayed a wide range of SDG levels, some being greater than those of CDC Bethune. No additional deleterious mutations were detected in these mutant lines using a panel of 10 other genes potentially involved in the lignan biosynthesis. This study provides further evidence that UGT74S1 is unique in controlling SDG formation from SECO and this is the first report of nontransgenic flax germplasm with simultaneous knockout of SDG and presence of SMG in planta.

INTRODUCTION
Flax lines with high linolenic acid (Duguid et al., 2014), low linolenic acid (Green and Marshall, 1984;Rowland, 1991;Dribnenki et al., 1996), and low cyanogenic glucosides (Duguid, personal communications) are some of the major achievements in linseed genetics and breeding. Whereas, the first two attributes are associated with fatty acid content, the last one refers to flax seed content in linustatin and neolinustatin, two cyanogenic glucosides that are toxic to mammals (EFSA, 2007). The nutritional benefits of flaxseed linolenic acid in human and animal health are now well established, especially for the positive roles it plays in reducing incidence of cardiovascular diseases (Simopoulos, 2008). Flaxseed is also the richest source of lignans (Thompson et al., 2006;Pan et al., 2009;Touré and Xueming, 2010) for which a wide variety of purported health benefits have been reported (Webb and McCullough, 2005;Adolphe et al., 2010;Buck et al., 2011;Wang et al., 2015). In planta, flax lignans are usually found as secoisolariciresinol diglucosides (SDGs) ester-linked within an oligomeric matrix called the lignan macromolecule (Struijs et al., 2009;Kosińska et al., 2011). Its monomeric aglycone (SECO) and intermediate monoglucoside (SMG) forms do not naturally accumulate in planta.
Lignan metabolism in mammals has been the focus of many detailed studies (During et al., 2012;Udani et al., 2013;Setchell et al., 2014;Mukker et al., 2015). After consumption, the lignan macromolecule complex is hydrolyzed, SDG is deglucosylated into SECO and absorbed in the gut (Clavel et al., 2007). Nonabsorbed SECO (50-72% of ingested SECO) is subsequently metabolized into the enterolignans, enterolactone (ENL), and enterodiol (END) mainly in the colon by the intestinal microflora (Clavel et al., 2006(Clavel et al., , 2007Landete et al., 2015). Enterolignans have been reported to elicit modest estrogen-like activity by binding estrogen receptors (ERs), ERα or ERβ (Satake et al., 2015). Nonetheless, low concentrations of intact lignans have been detected in the sera of mammals fed lignan-rich diets, suggesting that non-metabolized lignans are taken up by the mammalian digestive system and manifest ER-independent activities in vivo and in vitro (During et al., 2012;Satake et al., 2015). As a β-glucoside however, SDG metabolism may vary between individuals due to differences in their gut microflora (Rowland et al., 2000;Clavel et al., 2005;Landete et al., 2015), thereby affecting its bioavailability in some individuals (Possemiers et al., 2007) similar to what has already been shown with isoflavone diazein, which is metabolized only by one third of humans into its bioactive S-equol form (Setchell and Cole, 2006;Landete et al., 2015). Recently, During et al. (2012) reported a linear increase in the uptake of lignan aglycone forms, including pinolariciresinol (PINO), SECO, and ENL, by human intestinal Caco-2 cells by simple diffusion or low affinity transporter. Less than 0.1% SDG absorption was observed, compared to 2% for SECO and PINO, and 7% for ENL, evidencing the effect of glucosylation on absorption and bioavailability. However, no SMG of flax lignan has been available so far for such testing. Due to its lower molecular weight of 524.6 g/mol and the lack of a second bulky and polar glucoside moiety, it is logical to conceive that SMG uptake by Caco-2 cells through diffusion would exceed that of SDG which has a molecular weight of 686.7 g/mol. Currently, SECO and SMG are only obtained after acid hydrolysis of SDG , a process that is not environmentally friendly. Because SDG deglucosylation is a pre-requisite to any absorption or conversion into END and ENL in vivo, production of seeds with altered SDG glucosylation in vivo is of interest for designing a highly bioavailable functional flaxseed food. It would be advantageous to create flax lines with reduced lignan glucosylation in the forms of SECO or SMG in planta.
Glycosylation is a biological process that leads to chemical complexity and diversity of plant natural products (Osmani et al., 2009;Wang and Hou, 2009). This synthesis modification is catalyzed by enzymes of the glycosyltransferases (GT) superfamily in which family 1 GT, refers to uridine glycosyl transferases (UGTs) (Caputi et al., 2012) that are characterized by a PSPG box consisting of a 44 amino acid consensus signature motif (Gachon et al., 2005). UGTs transfer UDP-activated sugar moieties to specific acceptor molecules (Witte et al., 2009) and their glycosylation activities on acceptors can modulate their pharmacological activity (Osmani et al., 2009). Recently, UGT74S1 (JX011632), a family 1 GT, was identified in flax and characterized to sequentially glucosylate SECO into SMG and SDG (Ghose et al., 2014; Figure 1). UGT74S1 has also been reported to be a single-copy gene and the key player in controlling SDG formation in flax (Fofana et al., 2017). Furthermore, Trp 355 and His 352 were found to be critical for the UGT74S1 glucosylation activity toward SECO whereas Gln 337 and Ser 357 appeared essential for SMG conversion into SDG in vitro . Until now however, no study has assessed the direct effect of UGT74S1 mutations in planta. Moreover, despite growing interests in flax lignans, breeding for SDG per se has so far received little attention (Pavelek et al., 2015), with most studies being limited to observed variations among flax varieties and accessions within collections (Saastamoinen et al., 2013). So far, no flax lines with altered lignan glucosylation have been reported.
Flaxseed is a health-functional crop and the commercial production of genetically modified flax remains an impediment to conquering such markets (Fofana et al., 2010). Mutation breeding is a well-established method in flax improvement and has led to the development of cultivars with reduced linolenic acid (Green and Marshall, 1984;Rowland et al., 1995;Ntiamoah and Rowland, 1997). Targeting Induced Local Lesions in Genomes (TILLING) is a reverse genetic tool used in functional genomics and crop improvement (McCallum et al., 2000;Sikora et al., 2011). Traditional mutagenesis, including radiation or chemical mutagenesis, is followed by the detection of mutations in the causal gene(s) responsible for the phenotype(s) of interest (McCallum et al., 2000;Slade and Knauf, 2005;Uchida et al., 2011;Chantreau et al., 2013). Recently, for example, Chantreau et al. (2013) developed a flax EMS mutagenized population and characterized the roles of coumarate-3-hydroxylase (C3H) and cinnamyl alcohol dehydrogenase (CAD) in fiber biosynthesis in flax stems by TILLING via forward and reverse genetics. The chemical mutagen, ethyl methane sulfonate (EMS), generates a wide range of mutant alleles, primarily point mutations (Comai and Henikoff, 2006) and, current advances in next generation sequencing have greatly improved the efficiency of inducedmutation detection (Galindo-González et al., 2015). With the release of the flax genome sequence (Wang et al., 2012), the DNA sequence of almost all flax genes is known. However, the gene functions and their link to phenotypic traits remain largely incomplete or completely unknown.
The present study was undertaken to (1) characterize an EMS mutagenized flaxseed population through the identification of lines carrying mutations in UGT74S1 and displaying altered lignan glucosylation phenotypes, and (2) determine the extent of the role played by UGT74S1 in SDG formation through SECO glucosylation. Using a next generation sequencing platform for the amplicon re-sequencing of the UGT74S1 gene in an EMS mutagenized flax population, we identified stable mutant lines with altered SDG lignan profile and demonstrated that the lossof-function of SECO glucosylation into SDG was attributable solely to UGT74S1.

Plant Materials
A total of 1996 M2 EMS mutagenized flax families were obtained from Dr. Gordon G. Rowland (Crop Development Centre, University of Saskatchewan, Saskatoon, SK) who multiplied them before distribution. These M2 families were originally developed by Dr. Michael K. Deyholos' laboratory (University of Alberta, Edmonton, AB) as part of the Genome Canada's Total Utilization Flax GENomics (TUFGEN) project. To generate these lines, seeds of flax (Linum usitatissimum L.) cultivar CDC Bethune (Rowland et al., 2002) were treated with 0.5% (v/v) EMS for 4 h, planted and grown to seed set. M2 seeds from M1 plants were harvested separately as previously described (Galindo-González et al., 2015). Five seeds from each of the 1996 M2 families were hand sown in 30 cm rows in 2011 at the Harrington Research Farm of Agriculture and Agri-Food Canada (Charlottetown, PEI) along with 10 rows of untreated CDC Bethune. Individual EMS plants within a row are referred to as lines and, all plants within a row constitute a family. Leaves of one line from each M2 family and 20 CDC Bethune plants were collected, frozen in liquid nitrogen and stored at −80 • C for DNA extraction and reverse genetic analysis. All M2 plants were destroyed before flowering to avoid any potential cross pollination and seed production in the field. Re-sampling from the original M2 seed stock was also performed, whenever necessary, by planting in greenhouse to generate subsequent mutant generations.

Reverse Genetics of UGT74S1
Genomic DNA Extraction and Targeted UGT74S1 Gene Amplification Leaf samples of each 1996 M2 lines and 20 CDC Bethune plants were arranged in individual wells of 21 × 96-well plates containing sterile tungsten beads (Sigma Aldrich, CA, USA), freeze-dried and genomic DNA was extracted using the Qiagen DNeasy Plant Mini Kit (Qiagen, Mississauga, ON) following the manufacturer's recommendations. DNA was quantified using Picogreen (Life technologies, CA, USA) and the quality was verified by agarose gel electrophoresis.
For the amplification and re-sequencing, the genomic DNA region of UGT74S1 (Supplementary Figures 1A,B) corresponding to locus Lus10017825.g (scaffold35, position 7,827-9,972) of the flax draft genome (Phytozme.jgi.doe.gov) was targeted using universal and standard tagging approaches. The first exon of the gene was amplified and tagged using the universal tagging approach, where the gene specific primers flax1_F1b (GAAGGTGACCAAGTTCATGCACCCCACTTCTCCAATTC TC) and flax1_R1 (AATGCGTTACAAGCACATCTCGATTATC ATTGCTCGAAACGTG) contained a 5 ′ tail identical to primers TRS_F (GAAGGTGACCAAGTTCATGC) and TRS_R (AATGCGTTACAAGCACATCTC), respectively. From the TRS_F and TRS_R primers, 96 versions carrying an additional 8-base barcode at the 5 ′ -end were used (Supplementary Table 1). The second exon was amplified and tagged by a more conventional approach using only two primers per PCR reaction. For each gene-specific primer, 96 versions of UGT74S1 gene-specific primer (flax1_F2b GTCAATCGAGCCAACTCTCG) carrying an additional 8base barcode at the 5 ′ end (Supplementary Table 2) were used in combination with gene-specific primer flax1_R2b (ACAAAACCTATCTTCCCAAGTCA). The 20 µl PCR reactions were set up using the 5 × MyTaq Mastermix (Bioline, Berlin, Germany). The standard tagging reactions contained 5 pM of each primer, while the universal tagging reactions contained 2 pM of each gene-specific primer and 20 pM of each TRS primer. Cycling conditions consisted of 2 min at 96 • C for pre-denaturation followed by 35 cycles of 15 s at 96 • C, 30 s at 55 • C, and 2 min at 72 • C.

Library Construction and Pyrosequencing
After PCR amplifications, amplicons from each exonic region in each 96-well plate were pooled, purified using MinElute gel purification kit (Qiagen) to remove non-specific amplification products, size-selected and quantified. The two gel-purified amplicon pools derived from each individual exonic DNA plate were mixed in equimolar amounts to create 21 pools, which were further split into a 10 and an 11 pool sets. DNA from each of the 11 or 10 pool sets were ligated to 11 or 10 different 454 multiplex identifier (MID) adaptors and two Roche/454-compatible sequencing libraries were constructed and sequenced by pyrosequencing at LGC Genomics GmbH (Berlin, Germany). All sequencing reactions were based upon FLX-Titanium chemistry (Roche/454 Life Sciences, Branford, CT, USA; www.454.com) and all methods were performed according to manufacturer's protocols. The resulting sequences were processed using the standard Roche software for base calling, adaptor and quality trimming (Genome Sequencer FLX System Software manual version 2.3).

Bioinformatics of Sequence Reads and EMS-Induced SNP Detection in M2 Plants
The sequencing reads were deconvoluted using an in-housedeveloped tag sorting and clipping program, allowing no more than a single nucleotide mismatch. The barcodes were sorted and trimmed, and individual FASTQ files were generated for each sample.
To confirm the reference identity of the wild type target sequence, reads from the 20 wild type CDC Bethune plants were aligned to the genomic reference sequence of UGT74S1 (Lus10017825.g) using Bowtie2 v2.1.0 (Langmead and Salzberg, 2012) with default parameters in "local" alignment mode. The alignments were coordinate-sorted and saved in a standard Binary Sequence Alignment/Map (BAM) format. Subsequently, GATK v1.6-11 (http://www.broadinstitute.org/gatk/) was used to call SNPs by running the GATK walkers on the alignment files. Using GATK, IndelRealigner and UnifiedGenotyper were run to realign reads around indels called and to generate raw SNP calls, respectively. To identify and retain high quality SNPs, QualFilter (QUAL ≥ 30.0), QDFilter (QD ≥ 5.0, Qualityby-Depth), DPFilter (DP ≥ 10, sample read depth at locus), StrandFilter (FS ≤ 60.0, Phred-scaled p-value using Fisher's Exact Test to detect strand bias), and ABHetFilter (ABHet ≤ 0.9, testing allele balance for heterozygous) were applied using Variant Filtration. The final results were stored in raw and filtered standard variant call format (VCF) files. The VCF files from GATK were further subjected to homopolymer filtering to reduce false-positive calls in homopolymer-dense regions and, this was followed by a final manual inspection of the SNP calls to generate consensus sequences.
To detect EMS-induced SNP in M2 plants, reads from all 1996 M2 EMS lines were aligned using Bowtie2 v2.1.0 as described earlier, and raw and filtered SNPs were called. SNPs causing codon changes were identified using SnpEff v3.2 (http://snpeff.sourceforge.net/). Exon positions described for UGT74S1 (Ghose et al., 2014) were used to set up a SnpEff database for automatic annotation of VCF files (http://snpeff.sourceforge.net/SnpEff_manual.html#output) and an allele table was constructed for all samples.
A consensus sequence was generated using sequences derived from all samples. Final VCF files from the 1996 M2 lines were used to produce one putative genomic sequence FASTA file per sample. Regions not covered by the PCR amplicons were soft-masked using sequence changed into lowercase, likewise for regions with a read coverage below 10× (for bases with Phred score ≥20). Heterozygous SNP calls were annotated with standard ambiguous nucleotide sequence code (http://www.bioinformatics.org/sms2/iupac.html). The genomic sequences were subsequently used to generate putative coding sequence (CDS) FASTA files and haplotype sequences were generated. Predicted peptide sequences were obtained from the CDS using a standard DNA codon table.
The mutation rate was determined from UGT74S1 sequence for all the EMS lines sequenced, as performed routinely in EMS TILLING population. The mutation rate in the exonic region was calculated as followed: [(exon 1 + exon 2) × total number of original EMS lines sequenced]/total number of variants observed. Mutational effects for each of the missense and nonsense mutations were predicted using the SuSPect method, a standalone web server (Yates et al., 2014; http://www.sbg.bio.ic. ac.uk/phyre2/html/help.cgi?id=help/investigator). The wild type UGT74S1 protein sequence was uploaded and a heat map was generated for each amino acid position in the protein against a substitution by any of the 20 possible amino acids with colorcoded effect level.

Forward Genetics Screening for Altered SDG Lignan Phenotypes in M2 Families Carrying UGT74S1 EMS-Induced Mutations
A total of 93 M2 families were identified to carry at least one mutation in UGT74S1, of which 18 carried sense mutations and were not considered for further analysis. Among the remaining 75 M2 families carrying EMS-induced missense and nonsense mutations, six had few or no seeds available and could also not be further tested. Because seeds were not collected from the M2 plants grown in the field that were used in reverse genetics, a new set of original M2 seeds were resampled from the remaining 69 M2 families, grown in a greenhouse and subjected to further analyses.
In a first step, forward genetics was performed using a subsample of original M2 seeds from each of the 69 M2 families in which missense or nonsense mutations were detected for assessing the effect of mutations on SDG production. Total lignan was extracted from 50 mg of seeds per family and hydrolyzed by alkaline hydrolysis as described by Ghose et al. (2014). The wild type CDC Bethune was used as control. Nonhydrolyzed lignan polymer and hydrolyzed lignan were purified from families 1230 and 2340 which had nonsense mutations and separated by UPLC to assess the potential effect of mutations on the lignan macromolecule and its constituent phenolic acid glucoside profile. SDG was hydrolyzed from all 69 M2 families, purified, separated by UPLC, confirmed by mass spectrometry and quantitated as previously reported (Ghose et al., 2014). Three independent extractions were performed and analyzed separately. SDG level from each M2 family was compared to that of CDC Bethune.

Validation of EMS-Induced SNPs in M2 Plants
After initial SDG lignan determination in the 69 M2 families, we sought to validate the mutations detected by amplicon resequencing in the M2 seeds. Five M2 seeds from each of the 69 M2 families were re-sampled, planted and grown as individual lines in the greenhouse along with CDC Bethune. Each germinated plant was assigned a number and grown to maturity. Hereafter, individual mutant lines are designated by the family number from which they were derived, followed by a number and, if needed, a letter identifier. M3 seeds and leaf tissues were collected separately from each individual M2 plant (or line). At this point, we retained plants (lines) of 28 M2 families with nonsense or missense mutations and that produced enough M3 seeds for further testing (Supplementary  Table 3). Genomic DNA was extracted using the Mag-Bind Plant DNA plus Kit (Omega Bio-tek, Norcross, GA, USA) following manufacturer recommendations. SNP re-validation was performed by genotyping 1 or 2 plants (or lines) per family using Kompetitive allele specific PCR (KASP, LGC Genomics, Beverly, MA) assays. A total of 20 KASP assay, were designed to target 16 missense and four nonsense mutations (Supplementary  Table 4). Each reaction was carried out using 11-22 technical replicates of each DNA sample and two no template controls to ensure proper clustering as recommended by the supplier. All reactions were performed on a CFX 96 Real Time PCR system (BioRad, Mississauga, ON, Canada) in a total volume of 10 µL containing 5 ng of DNA, 5 µL of 2X KASP master mix, and 0.14 µL of KASP-specific assay mix. The first step of the touchdown PCR included an initial denaturation at 94 • C for 15 min, followed by 10 cycles consisting of a denaturation at 94 • C for 20 s, a touchdown annealing step starting at 61 • C for 60 s and achieving a final annealing at 55 • C with 0.6 • C decreases per cycle. The second step consisted of 26 cycles, each including a denaturation at 94 • C for 20 s and a combined annealing and extension step at 55 • C for 60 s. Completed KASP reactions were read at 37 • C for 1 min. Two to four additional KASP recycling programs were performed for each assay to improve sample grouping, obtain tighter data point clusters and increase confidence in genotype assignment. A single recycling program consists of three cycles with denaturation at 94 • C for 20 s followed by an annealing/extension step at 57 • C for 60 s.

Stability of EMS-Induced Mutations in M3 Plants
To determine the stability and heritability of the EMS-induced mutations confirmed in the M2 flax lines by KASP genotyping, 3-45 viable M3 seeds per M2 plants were planted and grown to maturity in single pots in a greenhouse and from which M4 seeds were harvested. Leaf tissue was collected from each M3 plant, genomic DNA extracted, and KASP genotyping was performed as described above.

Forward Genetics Screening for Altered SDG Lignan Phenotypes in M4 Plants Carrying UGT74S1 EMS-Induced Mutations
After re-validation of the EMS-induced mutations in M2 and M3 plants, the stability of changes in the SDG lignan profiles were assessed in the M4 seeds produced by single seed descent of M3 plants derived from eight families (Table 1) as previously described (Ghose et al., 2014).

Confirmation or Detection of EMS-Induced Mutations in UGT74S1 and Other SDG Biosynthetic Genes Using Targeted Gene Panel Ampliseq Sequencing
The flax SDG biosynthetic pathway is relatively well-known (Figure 1). To determine whether or not random EMS mutations were also induced in other SDG-related biosynthetic genes of the mutant lines, Ampliseq sequencing of 11 genes, including UGT74S1, was performed using the Ion PGM TM system (Life Technologies/Thermo Fischer, CA, USA), as reported by Nishio et al. (2015). The plant material used consisted of a total of 84 M4 plants from 14 lines in six families previously shown to carry homozygous mutations (Supplementary Table 5). Nine CDC Bethune plants were used as positive control. DNA was extracted

Custom Target Gene Panel Design
A custom Ion AmpliSeq TM panel targeting 11 genes, including six dirigent proteins (DIR 1-DIR 6), two PINO reductases (PLR 1 and 2), reported to be involved in SECO biosynthesis (Halls et al., 2005;Nakatsubo et al., 2008;Gasper et al., 2016) and three UGTs (UGT74S1, UGT74S3, and UGT74S4) recently shown to play different roles in SECO glucosylation (Ghose et al., 2014, Fofana et al., 2017 was designed using Ion AmpliSeq Designer (IAD) software v.2.0.3 (Thermo Fisher Scientific, CA, USA). For each target, the 50 bp upstream and downstream of the genes were added to cover the complete coding sequence. The custom panel was designed in such a way that 11,226 of the 11,248 bp potential targeted regions were covered, accounting for 99.6% coverage (22 bp missing) and included 49 primer pairs divided into two pools of 27 and 22 primer pairs each, respectively (Supplementary Table 6). The targeted 11,248 bp genomic regions were uploaded as a reference BED file sequence. This Ampliseq custom panel was used to screen the 84 EMS mutagenized and nine wild type CDC Bethune plants.

Library Preparation and Sequencing
Genomic DNA and library preparation were performed following Life Technology/Thermo Fisher recommendations.

Mapping Sequencing Data and Variant Call
The Ampliseq sequence data was processed using the Ion Suite Software v 5.04 and the Ion Server, and mapped to the flax Ampliseq panel targeted genomic regions uploaded as the reference BED file. Variants were called using Torrent Variant Caller plug-in software set to default parameters under high stringency (minimum allele frequency 0.1, minimum SNP quality 10 and minimum coverage of 5 for SNP identification; minimum allele frequency 0.1, minimum SNP quality 10 and a minimum coverage of 10 for INDEL identification) as performed by Nishio et al. (2015). The selected variant positions were detected, genotyped and reported as wild type, heterozygous or homozygous. For each variant, its mutational effect was predicted using the phyre 2 investigator software as previously reported .

Targeted Amplicons of UGT74S1
The two genomic amplicons containing the UGT74S1 gene were successfully amplified and sequenced from the 2016 plants.
Amplicon 1 included 55 nucleotides of the 5 ′ UTR, the 651 nucleotides of exon 1, and 274 nucleotides of the single intron. Amplicon 2 covered the last 93 nucleotides of the intron, the 756 nucleotides of exon 2, and 248 nucleotides of the 3 ′ UTR. It is worth mentioning that only a portion of the intron, represented by 367 out of the 739 nucleotides of the intronic region was amplified and sequenced (Figure 2).

EMS-Induced Mutations Detected in UGT74S1
Following SNP variants call and filtrations, a total of 80 positions displayed SNP variants, including 40 positions in exon 1, 7 intronic positions, and 33 positions in exon 2 ( Table 2). Only 3.12% of the calls were missing for the 80 SNP positions of the 2016 individuals. Since EMS mutagenesis usually causes transition mutations, GC content was determined in the amplicons. Exon 1 had a GC content of 54% and a higher proportion of SNP positions than exon 2 which had a GC content of 46% (Table 2). In the M2 lines sequenced, heterozygosity at  SNP positions was observed four times more frequently than homozygosity, showing that most of the EMS-induced mutations were not fixed in the M2 population. Among the exonic mutations, 22.5% were homozygous, thus fixed in these plants.
Because not all substitutions have the same mutational effects on gene functionality, the mutational effect for each mutation was determined. Ten mutated amino acids at 12 positions showed high to very high mutational effects, while another 16 amino acid changes were found with low mutational effects at 16 positions (Table 3). Interestingly, two of the mutations with high mutational effects appeared to be nonsense homozygous mutations, thereby expected to have a deleterious effect on SECO glucosylation. Altogether, 73 SNPs were observed in the exonic regions in 93 single plants originating from different M2 families (Figure 2, Table 3). Seven of the 93 M2 plants had two exonic SNPs each (labeled with * in Table 3). Because six SNP positions had both a homozygous and a heterozygous SNP variants, 79 variants were identified at the 73 exonic SNP positions and, of those, 55 were missense, six were nonsense, and 18 were sense mutations. Only 18 of the 79 SNP variants were homozygous including 13 of the missense, 2 of the nonsense, and 3 sense mutations ( Table 3). The mutation rate in the exonic regions was calculated to be 1 per 28 kb [i.e., (651 + 756 bp) ×1996/100 = 28,084 bp/1 or 1 mutation in 28 kb]. Eight of the missense mutations and one nonsense mutation from 12 M2 plants were located in the PSGP region spanning amino acids 334-377 (Figure 2).

Effect of UGT74S1 Mutations on SDG Content in Bulked M2 Seeds
To establish a genotype to phenotype relationship, the effect of each mutation on SDG lignan production was assessed in planta by determining the total SDG lignan content in the original mature bulked M2 seeds in comparison with that of the wild type CDC Bethune. UPLC chromatograms of non-hydrolyzed oligomeric lignan polymers from the M2 family 1230 carrying a homozygous nonsense mutation in UGT74S1 displayed a narrower peak whereas that of the M2 family 2340 that carried a heterozygous nonsense mutation showed additional peaks overlapping the main peak ( Supplementary Figure 2A). Variations were also observed in the profiles of SDG and other phenolic acid glycosides present in the lignan macromolecule (Supplementary Figure 2B). By determining the total hydrolyzed SDG lignan content in the original M2 bulked mature seeds from each of the 69 M2 families, we observed 21 with reduced SDG content, eight of which displayed almost no SDG. In contrast, 43 families had a higher SDG content compared to CDC Bethune (Figure 3). In many cases, no direct correlation could be established between the mutation and alteration of the lignan profile. In fact, some M2 families such as 828, 919, 1230, and 2340, carrying homozygous nonsense mutations showed increased SDG lignan content compared to the wild type. Similarly, eight M2 families (882, 1403, 1571, 1572, 2092, 2229, 2566, and 2881) carrying homozygote missense mutations showed higher or inconsistent lignan content (Figure 3). These unexpected observations prompted us to re-assess and re-validate the genotype of single M2 plants derived from each of the 69 M2 families after re-sampling within the original seed lots.

Validation of EMS-Induced Mutations in M2 Plants
To validate some of the EMS-induced mutations identified in UGT74S1 by amplicon resequencing, plants from 28 M2   NA, Not applicable for no amino acid change; *Lines with mutations at two loci. Heterozygous SNP (single strand nucleotide change) and homozygous SNP (change of double strand nucleotide as marked in bracket) at each position are indicated and separated by a "/". Positions of amino acids changed by EMS mutagenesis within the PSPG motif are indicated in bold.

Stability of EMS-Induced Mutation Effect on SDG Content in M4 Seeds Derived from Single Seed Descent of M3 Plants
By phenotyping the M4 seeds derived from individual M3 plants for SDG, we identified knock-out mutants devoid of SDG (Figure 4). However, most of the heterozygous and homozygous missense mutations showed a wide range of SDG content, some having SDG content greater than CDC Bethune and others, such as M4 plants from family 2004 (homozygous missense mutation) were complete knock-outs ( Figure 5). Surprisingly, disappearance of the SDG peak did not result in the appearance of SECO peaks in any of the knock-out lines. Instead, the mass spectra analysis of peak #10 revealed the presence of SMG in these knock-out lines but not in the others (Figure 5). Interestingly, EMS mutations led to variations in the levels of the phenol acid glucosides encountered in the lignan macromolecule (Supplementary Figure 3).

UGT74S1 Is Unique in Controlling SECO Glucosylation into SDG
To ascertain whether mutations occurred in other lignan-related biosynthetic genes that may have contributed to the observed altered lignan phenotypes, plants with altered lignan profiles were subjected to targeted Ampliseq gene sequencing of 11 genes known to play diverse roles in SDG lignan biosynthesis (Figure 1).
In the current study, we mined an EMS mutagenized flaxseed population and used an NGS-based reverse genetics platform to identify flax lines carrying mutations in the lignan glucosylation gene UGT74S1 and to determine allele-specific mutational effects on SDG lignan production in planta. The study demonstrated that EMS-induced mutations in UGT74S1 resulted in alteration of SDG accumulation in seeds and, as such, it constitutes the first report of non-transgenic flax germplasm with altered lignan content. Whereas, TILLING terminology is relatively new, generating mutant alleles in flax using EMS is not (Green and Marshall, 1984;Rowland et al., 1995). The novelty of the chemical mutagenesis in the plant reverse genetics field resides in the much larger screening throughput capacity that we can now achieve (Sikora et al., 2011;Vicente-Dólera et al., 2014). In the present report, 1996 M2 families were screened by reverse genetics and, as expected, only G to A or C to T transition mutations were observed (Sikora et al., 2011). The calculated mutation frequency of one mutation per 28 kb is close to the range recently reported rate of one mutation per 30-49 kb in flax treated with different doses of EMS (Chantreau et al., 2013). A high proportion of 91% of the SNP variation was located in exons, in agreement with their higher G/C content which are more prone to EMS alkylation causing mainly transition mutations (Sikora et al., 2011). In this study, 22.5% of exonic mutations were homozygous, with potential loss (Uchida et al., 2011) or gain-of-function effects (Bailly et al., 2014). Indeed, whereas DNA mutations can affect any of the amino acids, not all substitutions have the same mutational effects on gene functionality (Rogers et al., 2000).  , 2015). In agreement with these findings, the EMS treatment randomly mutagenized nine amino acids (A342F, S343N, W355Stop, S357L, E360K, A361T, L362F, G365R, and G375D) within the PSPG in planta and, as previously observed in vitro, homozygous nonsense EMS-induced mutation of W 355 to stop codon completely abolished SDG formation in planta, but retained trace amounts of SMG. Homozygous nonsense mutation of W 205 to a stop codon outside the PSGP as well as the homozygous missense G282E mutation also led to lossof-function of UGT74S1 for SECO glucosylation into SDG, but again a small amount of SMG remained. In a recent study, we reported that UGT74S1 is the key player in controlling SECO glucosylation into SDG, while also showing that UGT74S3 and UGT74S4, a pair of duplicated genes most closely related to UGT74S1, were able to glucosylate SECO into SMG at low efficiency but unable to synthesize SDG from SMG (Fofana et al., 2017). These findings suggested that these two paralogs may play roles in other biological processes other than SDG formation from SECO (Fofana et al., 2017). This was corroborated herein where we showed that plants from four UGT74S1 EMS lines lacking mutations in UGT74S3 and UGT74S4 failed to produce SDG but could still produce trace amounts of SMG. This clearly demonstrates occurrence of low background activity of UGT74S3 and UGT74S4 catalyzing trace SMG formation despite UGT74S1 being knocked out, in agreement with our previous report (Fofana et al., 2017). Moreover, some residual glucosylation activity may also be associated with each amino acid substitution effect within the mutated UGT74S1, as reported by in vitro sitedirected mutagenesis data . Additionally, the pool of phenolic acid glucosides was found to be increased in all mutant lines, as compared to the lignan component of the polymer. Additional biochemical studies using the phenolic acids encountered in the lignan macromolecule as substrates are required to characterize the exact role(s) of these two UGT paralogs in planta. Targeted gene panel sequencing by Ampliseq methodology confirmed the mutations detected in UGT74S1 by amplicon re-sequencing and KASP assays. It also identified the P471S mutation in UGT74S3 from UGT74S1 mutant lines G282E-Homo-2004-9 andG282E-Homo-2004-14, each showing small amounts of SMG, thereby demonstrating the residual activity from UGT74S4 and/or the residual glucosylation activity associated with the G282E substitution in the mutated UGT74S1. The M4 plants from lines 2004-9 and 2004-14 also carried mutations, albeit with low effects, in both DIR4 and DIR5 genes at positions Y35H, K38H, R40Q, and L15S, but still produced trace amounts of SMG. This observation demonstrates that SECO was produced in these lines and that DIR4 and DIR5 are not rate limiting in the pinoresinol formation. Considering that not all the flax genome was scanned in the current study, one may wonder whether some transcription factors controlling lignan biosynthetic genes may have also been affected by EMS. Whereas, this assumption cannot be totally ruled, it seems to be highly unlikely or these mutations, if any, may be of very low or neutral effect because the lignan phenotypes observed in the lines herein described are in agreement with that from the in vitro data , in the absence of implication of any transcription factors.
In conclusion, this study found EMS mutagenesis to be successful in altering SECO glucosylation toward SDG formation in planta. The wide range of lignan profiles observed from the current investigation constitutes a useful non-GMO genetic resource for flax breeders in developing new cultivars with high SDG content or new cultivars with SMG as a new trait in flax. It also offers plant biologists and natural health product scientists an opportunity to better understand the behavior of plants carrying the new trait in field environments. Using Ampliseq sequencing of UGT74S1 along with 10 other genes potentially playing a role in the SDG production, we validated our SNP detection approach and provided further evidence that UGT74S3 and UGT74S4 are not critical for SDG production in planta and that UGT74S1 is unique in controlling SDG formation from SECO.
FIGURE 5 | Variations in SDG and SMG lignan contents in M3 and M4 seeds from EMS mutagenized flax lines following lignan complex hydrolysis. SDG and SMG UPLC peaks were monitored using mass spectra and quantified. EMS lines are named as EMS-induced amino acid changed followed by the homo or heterozygous status and the line number. Single asterisk (*) indicates premature stop codon. The profile of M3 seeds derived from M2 line 1230-H2 is indicated by a double asterisk (**). Lignan was extracted from CDC Bethune and from M4 seeds derived from their respective M3 plants.
FIGURE 6 | Distribution of EMS-induced mutations on five genes of the 11 custom gene panel submitted to Ampliseq sequencing. UTRs are indicated by white boxes, exons by solid black boxes, and introns by gaps with downward arrows. The PSPG region within exon 2 of UGT74S1 and UGT74S3 is colored in magenta. The magenta box in DIR genes indicates DIR protein domains. The PSPG and dirigent regions are conserved domains for UGT and dirigent proteins, respectively. Green arrows, missense mutations; orange arrows, sense mutations; red arrows, nonsense mutation; black arrows, mutations in introns; asterisks (*) indicate position of homozygous mutations only; The black lines on the right top of each gene indicate the gene drawing scale. Lines in which mutations were found are shown and indicated by EMS family number (four Arabic numbers) followed by a dash and line numbers (1 or 2 digits arabic or alpha numeric numbers). NA, not applicable: indicates that no such mutation was found in that gene. The two lines 2004-9 and 2004-14 from family 2004 and carrying mutations in more than one genes are highlighted in red.

AUTHOR CONTRIBUTIONS
BF: Conception, coordination, design, experiments, data analysis, interpretation, and writing of the manuscript; KG: Performed experiments, data analysis, interpretation, drafting and revision of the manuscript; JM: UPLC and mass spectrometry data acquisition and analysis of lignan extracts, revision of manuscript; DM and AS: EMS population nursery; SC: coordination, administration, and revision of the manuscript; MD and GR: EMS mutagenesis and M2 population generation, revision and proof reading of the manuscript. All authors read, commented and approved the manuscript.

FUNDING
This study was conducted as part of the Total Utilization Flax Genomics (TUFGEN) project funded by Genome Canada/Genome Prairie with financial contribution from the Flax Council of Canada to BF.