A bird’s-eye view: exploration of the flavin-containing monooxygenase superfamily in common wheat

The Flavin Monooxygenase (FMO) gene superfamily in plants is involved in various processes most widely documented for its involvement in auxin biosynthesis, specialized metabolite biosynthesis, and plant microbial defense signaling. The roles of FMOs in defense signaling and disease resistance have recently come into focus as they may present opportunities to increase immune responses in plants including leading to systemic acquired resistance, but are not well characterized. We present a comprehensive catalogue of FMOs found in genomes across vascular plants and explore, in depth, 170 wheat TaFMO genes for sequence architecture, cis-acting regulatory elements, and changes due to Transposable Element insertions. A molecular phylogeny separates TaFMOs into three clades (A, B, and C) for which we further report gene duplication patterns, and differential rates of homoeologue expansion and retention among TaFMO subclades. We discuss Clade B TaFMOs where gene expansion is similarly seen in other cereal genomes. Transcriptome data from various studies point towards involvement of subclade B2 TaFMOs in disease responses against both biotrophic and necrotrophic pathogens, substantiated by promoter element analysis. We hypothesize that certain TaFMOs are responsive to both abiotic and biotic stresses, providing potential targets for enhancing disease resistance, plant yield and other important agronomic traits. Altogether, FMOs in wheat and other crop plants present an untapped resource to be exploited for improving the quality of crops.


Introduction
Flavin-containing monooxygenases (FMOs; EC 1.14.13.8) constitute a ubiquitous class of ancient, highly-conserved enzymes that have been well-characterized in bacteria, fungi, and insects, catalyzing the transformations of many molecules across all domains of life (Eswaramoorthy et al., 2006;Huijbers et al., 2014;Mascotti et al., 2015).FMOs are dimeric in nature, with Rossman folds in a bi-lateral distribution containing FAD-and NAD(P)H-binding domains (Schlaich, 2007;Rossner et al., 2017;Thodberg and Jakobsen Neilson, 2020).Typically, FMOs act on small, nucleophilic substrates that contain either a sulfur (S) or nitrogen (N) atom, resulting in the oxygenation of a wide range of compounds; this process often aides in the detoxification of xenobiotic compounds in many organisms (Eswaramoorthy et al., 2006).While FMOs are well characterized in bacteria, fungi, and insects, less is known of many of the FMOs in vascular plants, despite abundant putative plant FMOs having been observed (Schlaich, 2007).
Multiple studies from the last two decades indicate that a distinct subset of plant FMOs have evolved to participate in biosynthesis of auxin, a crucial hormone for plant growth (Kendrew, 2001;Schlaich, 2007;Yamamoto et al., 2007;Zhao, 2014).This specialized group of plant FMOs has been coined 'YUCCA', named after the phenotype of a dominant Arabidopsis thaliana fmo deletion mutant resembling the yucca plant (Zhao et al., 2001) and many of which have been functionally characterized in this model plant (Cao et al., 2019;Qin et al., 2020).Due to the significance auxin biosynthesis has on plant growth, development, and yield in crop plants, many studies since then have focused on identifying and characterizing the orthologous 'YUCCA' FMOs across a range of commercially valuable plant species, such as apple, peach, soybean, and rice (Yamamoto et al., 2007;Qin et al., 2020;Song et al., 2020;Luo et al., 2022;Zhang et al., 2022).Additionally, a subset of FMOs in A. thaliana have been characterized for their activity in the S-oxygenation of sidechains in various glucosinolate (GSL) compounds (Li et al., 2011).GSLs are potent, specialized defense metabolites found exclusively in the order Brassicales known to play a major role against insect herbivory (Kong et al., 2016;Cang et al., 2018).Grouped as "FMO-GSOXs", these FMOs, which participate in processing metabolites, play a significant defensive role in plants that produce GSLs.Of the 29 total identified FMOs in A. thaliana, eleven are annotated to be YUCCAs, twelve are purportedly FMO-GS-OXs, one has been reported to be crucial for systemic acquired resistance (SAR) against microbial pathogens (Hartmann et al., 2018;Czarnocka et al., 2020), and five do not presently have designated functions.A recent review (Mitchell and Weng, 2019) provided a brief synopsis of these three groups of plant FMOs.Together these studies reveal that FMOs play diverse biological responses in plants, much of which have yet to be characterized.
Outside A. thaliana, much less is known about non-YUCCA FMOs, particularly in cereal crops such as Triticum aestivum (common bread wheat), an essential food source for over 50% of the world's population (Curtis and Halford, 2014).Yang et al. (Yang et al., 2021) focused on 'YUCCA' genes in wheat and identified 63 TaYUCCA FMO genes which they assigned to six subclades in a gene genealogy comparison; for a subset of these genes, they analyzed transcriptional activity using public transcriptomic data in an attempt to reveal function.Gaba et al. (2023) recently surveyed the genomes of ten different wild and cultivated rice species for FMO genes and revealed how little is presently known about FMOs outside the YUCCA group.Their study emphasized the need for more in-depth studies of other FMOs in crop plants.
Here, we address the extent to which FMOs have expanded across vascular plants by surveying comprehensively FMOencoding genes across a broad range of plant taxa.We identified 170 likely wheat FMOs (TaFMOs) in the cultivar 'Chinese Spring' (the fully-sequenced reference genome, RefSeq v2.1), and sought to expand knowledge of the evolution, distribution, gene expansions, and potential functional diversity for different groups of TaFMO genes by using thorough phylogenetic, gene genealogy and transcriptome analyses using data obtained from publicly available studies.In addition, we take into account existing literature pertaining to TaYUCCAs (Li et al., 2014;Yang et al., 2021) and present a more cohesive picture of the biological roles that TaFMOs may play across a broad range of conditions, with a particular focus on various pathogen defense responses.

Genome-wide identification of FMO genes
We obtained a protein-family hidden Markov model (HMM) profile for the FMO-like family (PF00743 v22) from the Protein Family (Pfam) database (https://pfam.xfam.org/),and conducted a local HMM search (HMMER v3.3.2;http://hmmer.org/)with significance parameters of E-value < 1e -2 against the International Wheat Genome Sequencing Consortium (IWGSC) fully-sequenced reference genomes for version RefSeqv2.1, to identify all putative wheat flavin-containing monooxygenase genes (TaFMOs) for the cultivar 'Chinese Spring'.Results from HMMER were further consolidated with the use of query-based searches in both the online database Wheat Proteome (Duncan et al., 2017) with the search term 'flavin-containing monooxygenase' and the search term 'flavin monooxygenase' in the UniProt database (https://www.uniprot.org/).Similarly, all FMO candidate genes from other plant species spanning a vast range of green plants and algae, were found through a HMMbased search.Table 1 lists the total number of FMOs per plant species.We used the summary of relationships of major angiosperm lineages a v a i l a b l e a t N C B I C o m m o n T r e e b u i l d e r ( h t t p s : / / www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi) to display the distribution of FMOs in vascular plants (Figure 1; for sources of sequence and protein data and gene search details, see Supplementary Tables S1, S2; Appendix S1, Supplementary Figure S1).

Characterization of the TaFMO family genes
The full wheat cDNA, CDS, and peptide sequences of candidate TaFMO genes identified were retrieved from IWGSC (The International Wheat Genome Sequencing Consortium (IWGSC) et al., 2018).Gene Ontology (GO) terms were retrieved from the Uniprot database (https://www.uniprot.org/;accessed January 2023).The structure of exons and introns were determined using the CDS and gene sequences of TaFMOs analyzed with the Gene Structure Display Server (GSDS) version 2.0 (gsds.cbi.pku.edu.cn;Hu et al., 2015).A small subset of putative TaFMOs were represented by splice variants and included for downstream analyses, on the basis that each variant could encode for fully functioning TaFMO proteins acting in different tissues, conditions, and developmental stages.
To assess whether the enzymatic activity of each candidate TaFMO could be carried out, the presence of three previously welldescribed FMO motifs (Eswaramoorthy et al., 2006;Mishina and Zeier, 2006;Schlaich, 2007;Thodberg and Jakobsen Neilson, 2020) known to be crucial for cofactor binding and required for proper FMO function-the FAD-and NAD(P)H-binding motifs crucial for oxygenation activity, the FMO-identifying motif facilitating in NAD (P)H-binding at the core pocket of the enzyme, and the ATGcontaining motif thought to govern hydroxylation activity-were assessed with multiple sequence alignments (MSA) using MAFFT with the E-INS-I setting; see Appendix S2 (Katoh et al., 2019).Candidates that did not possess one or all motifs were retained in downstream phylogenetic analyses but flagged as 'non-canonical' (nc) FMOs with the assumption that oxygenation activity may be compromised, as we used conservation of all motifs as a proxy for predicting FMO function.We predicted the putative protein structure of all TaFMOs using the Protein Homology/Analogy Recognition Engine V 2.0 (Phyre 2 ; Kelley et al., 2015) and then visualized them with PyMOL (Version 2.0 Schrödinger and DeLano, 2020).We used the Simple Modular Architecture Research Tool (SMART; https://smart.embl.de/;Letunic et al., 2021) program to gather protein domains for display (Figure 2).Possible transmembrane domains and signal peptides for all TaFMO were predicted using the programs DeepTMHMM and SignalP -6.0 (Hallgren et al., 2022;Teufel et al., 2022).

Phylogenetic analyses of the TaFMO gene family and sub-families
We inferred a maximum likelihood (ML) phylogeny to define relationships between total validated TaFMOs and validated FMOs from the model plant A. thaliana, the cereal crop Oryza sativa subsp.japonica, and sister group of all other angiosperms, Amborella trichopoda.We selected a red algae having a single FMO as an outgroup.We first aligned full-length FMO peptide sequences using MAFFT with the E-INS-i setting (Katoh et al., 2019).We validated the full-length alignment of the total FMOs across plant species using the software HOMO v2.0 (Jermiin, 2017) by using it to determine whether all FMO sequences met the phylogenetic assumption of evolution under reversible, stationary, and globally homogenous conditions (Jermiin, 2017;Jermiin et al., 2020).Following this, poorly aligned regions of the MSA were masked using the AliStat program (Wong et al., 2014) to ensure downstream analysis focused on regions where

Species
Common Name

Zea mays
Corn 37 Sun and Bakkeren 10.3389/fpls.2024.1369299 Frontiers in Plant Science frontiersin.orgamino acid residues were readily alignable across sequences (see Appendix S2) (Wong et al., 2014).We then manually inspected alignments using Mesquite v 3.7 (Maddison and Maddison, 2023) and adjusted accordingly.Lastly, we used IQ-TREE v 2.2.0 to infer the ML tree based on amino acid alignments; we inferred the optimal substitution model using ModelFinder (JTT+I+G4), considering the Bayesian Information Criterion, BIC (Kalyaanamoorthy et al., 2017), and calculated branch support from 1000 standard bootstrapping A maximum-likelihood (ML) consensus genealogy tree of total protein sequences of candidate TaFMOs (black gene IDs), OsjFMOs (yellow), AtFMOs (green), AmtriFMOs (purple), and GasuFMOs (grey).One GasuFMO (red microalgae) was used as outgroup.ML geneology was constructed from IQ-TREE (1000 standard bootstrap replicates, JTT+I+G4 substitution model).Three major subclades A (orange shading), B (purple) and C (green) were confidently supported across vascular plant FMOs.Outer circles: (I) shows the domain architecture the candidate FMO belongs to (from HMMER), as annotated in Figure 1; non-canonical 'nc' and truncated FMO protein candidates are marked with a dark grey box.(II) indicates predicted subcellular localization of TaFMO proteins.(III) represent transcript values (in FPKM) of each TaFMO candidate gene expressed under 5 tissue types (root, stem, leaves, spike, grain) from RNAseq experiments compiled on the Wheat Proteome database (Duncan et al., 2017).(IV) indicates SMART-predicted protein domains inclusive in each candidate FMO.Group A contains the least number of TaFMOs, totaling 55 splice variants pertaining to 39 genes (see S1 notes for explanation of exclusion of one TaFMO in the analysis), nine of which are categorized as 'nc'.Group B contains a total of 63 splice variants pertaining to 57 genes, 14 of which are 'nc'.Lastly, Group C contains 80 splice variants pertaining to 74 genes, 16 of which are 'nc'.replicates (Felsenstein, 1985).The tree figures were made with iTOL (Letunic and Bork, 2021).We constructed separate ML phylogenies between all TaFMO and FMOs from closely-related species in the grass order, Poalesspecifically, H. vulgare (barley), T. urartu (red einkorn wheat) and O. sativa ssp.japonica (rice) to improve our understanding of the relatedness of TaFMO in the three major clades A, B, and C (Supplementary Figures S2-5).For detailed notes on alignment and modelling, see Appendix S2.

Gene duplication and homeolog analysis, and protein motif analysis
We retrieved chromosomal locations of all TaFMOs in wheat from the RefSeqv2.1 gff3 file.We inferred homeologs based on wellsupported phylogenetic clustering; separate sub-phylogenies were inferred for the three major partitions (clades A, B, and C) and a 70% bootstrap support value was used as a threshold to acknowledge clades with moderate support.Gene duplication and syntenic gene blocks of all TaFMOs (including 'nc' TaFMOs) were inferred through the MCSanX algorithm (Wang et al., 2012) via TBtools software v1.116 (Chen et al., 2022) and NOTUNG (Chen et al., 2000) to resolve tandem duplications and relationships between TaFMO homoeologous groups.Genes with no discernible relatives were denoted as 'orphan' TaFMO (Tables 2,  3).A comprehensive synchronized map of all TaFMO homeologs (Figure 3) connected by designated group colors assigned in Supplementary Figures S2-4 was generated using the shinyCircos software in RStudio (Yu et al., 2018;R Core Team, 2021).See Appendix S3 for more details.A de novo search for putative conserved protein motifs among all TaFMOs, HvFMOs, OsjFMOs, AtFMOs, AmTriFMOs, and PpFMOs was conducted with MEME Suite (Bailey et al., 2015): see Appendix S4.

TaFMO promoter regulatory elements analysis and transposable element identification
We searched promoter regions 1.5 kb upstream from the start codon of validated TaFMOs for plant cis-acting regulatory elements (PlantCARE: http://bioinformatics.psb.ugent.be/webtools/plantcare/html/; Lescot, 2002) and visualized putative cis-acting regulatory elements (CAREs) governing the different clades of TaFMOs (Supplementary Tables S3, 4, Appendix S5).We also compiled a list of putative transposable elements (TEs) detected by the CLARI-TE program (https://github.com/jdaron/CLARI-TE)either spanning introns and exons of TaFMOs, or flanking TaFMOs within a 2 kb region on either side (Table 4), from the TE annotation file provided in IWGSC RegSeqv2.1 (International Wheat Genome Sequencing Consortium et al., 2018;Zhu et al., 2021).The TE positions flanking or inside genes were transposed into TaFMO gene structure analysis.Refer to Appendix S6 for details.

Transcriptome, proteome, and tissue localization data retrieval and display
Using public transcriptomic data across a range of conditions, we extracted candidate TaFMO gene expression information, obtaining transcript values in FPKM for five general tissues (root, stem, leaves, spike, and grain) and proteome data (peptide spectral counts) for different tissues from the developmental atlas in the Wheat Proteome database (Duncan et al., 2017).We acquired transcript data for a broad range of different conditions in values of log 2 transcripts per million (tpm) for TaFMO expression under various abiotic stressors (cold, drought, heat) and biotic stressors (i.e., infection by Septoria blotch, Fusarium graminearum, or Puccinia striiformis f.sp.tritici) from the Wheat Expression Database (Ramıŕez-Gonzaĺez et al., 2018) in order to survey the differential expression of TaFMO candidates, and infer their possible biological functions (see Supplementary Tables S5-8 for an overview of studies and sources used).

FMOs are a largely expanded gene family in flowering plants
The FMO gene family is highly expanded in the vascular plants, as previously reported (Eswaramoorthy et al., 2006;Mishina and Zeier, 2006;Schlaich, 2007;Thodberg and Jakobsen Neilson, 2020).Figure 1 exhibits the general number of potential FMO proteinencoding genes among select vascular plant reference genomes.The three most commonly present types of domain architectures that represent these plant FMOs are a single FMO-like domain (FMOlike), two consecutive FMO-like domains (FMO-like x 2), and a pyruvate redox 3 (Pyr3) domain.Some FMOs are predicted as fusion proteins, such as found in both wheat (Ta) and barley (Hv) with a tyrosine protein kinase conjugated to an FMO, alluding to a possible function in environmental sensing and downstream signaling coupled with oxidation processes (Tichtinsky et al., 2003;Shumayla and Upadhyay, 2023).The phenomenon of FMO fusion proteins does not present for lycophytes, moss, or algae, where total numbers of FMOs in the genome are substantially reduced relative to flowering vascular plant species.The total number of FMOs in plant genomes also appears to be largely reflective of species ploidy levels for recently diverged taxa, as exemplified in Triticum urartu (2n, 41 putative TuFMOs), T. dicoccoides (4n, 84 putative TdFMOs) and T. aestivum (6n, 170 putative TaFMOs).However, in plants considered diploid, the number of FMOs varies substantially, such as in Arabidopsis (29 AtFMOs), rice (31 OsjFMOs), and barley (59 HvFMOs), up to 64 potential FMOs in sunflower (H.annuus), which has double the amount of most diploid species in the eudicots (lower, green clade in Figure 1).We report several additional FMO candidates for rice compared to a recent report of 28 OsjFMOs by (Gaba et al., 2023), and for barley previously reported having 41 HvFMOs (Thodberg and Jakobsen Neilson, 2020), highlighting the complexity of conducting genome-wide searches for highly expanded gene families.An HMM search conducted against the RefSeqv2.1 wheat reference genome for cultivar 'Chinese Spring' yielded a total of 171 high confidence (HC) TaFMO genes, and 82 low confidence (LC) TaFMO genes.We selected 170 HC TaFMO genes including those deemed 'unmapped' to specific chromosomes after consolidation with other search methods (see Supplementary Figure S1, Appendix S1 for additional information).A total of 34 TaFMO candidate genes (40 including transcript splice variants) are denoted as 'non-canonical' (nc), based on alterations in-or the absence of-protein motifs previously reported as crucial for proper FMO function, discussed below (Table 2).The 'nc' TaFMOs possessing alterations in important protein motifs but were still found in synteny with other TaFMO homoeologous triads, were retained in downstream analyses; further experimental validation is required to prove their functional and biological relevance as an FMO in wheat.Other more highly truncated 'nc' TaFMOs unlikely functional in the canonical sense, could still highlight trends in motif diversification/loss for certain subclades when included in a phylogenetic analysis.The same logic was applied to sort for FMOs of Arabidopsis thaliana (At), Oryza sativa ssp.japonica (Osj), Hordeum vulgare (Hv), and Triticum urartu (Tu).

TaFMOs are consistently divided into three major groups across vascular plants, and span all chromosomes and sub-genomes
A maximum-likelihood phylogeny was constructed with protein sequences representing the total 170 TaFMO gene candidates (198 independent splice variants); FMOs from Amborella trichopoda (Amtri), rice Oryza sativa sp.Japonica (Osj), and A. thaliana (At) were included to probe for phylogenetic partitioning of TaFMOs among flowering plants.A red microalgae (G.sulphuraria) served as an outgroup, whose genome encodes for only one known FMO (Figure 1).We observed three major phylogenetic groupings, indicated as A, B, and C in Figure 2. Domain architecture analysis (Figure 2, circle I) shows that Group A FMOs mostly share the same domain architecture type as the FMO from the outgroup red algae ('FMO-like x 2'), with two TaFMOs as exceptions having the 'pyruvate redox 2' architecture (TraesCS3B03G0026700.1, TraesCS3B03G0026800.2).Group B (purple) and Group C (green) FMOs form a clade across vascular plants with Group B mostly composed of an 'FMO-like' architecture, whereas group C has a mix between the 'FMO-like' and 'pyruvate redox 3' architectures.The distribution of total FMOs across an array of vascular plants (or within one species) has been documented by Thodberg and Jakobsen Neilson (2020); Gaba et al. (2023), andYoshimoto et al. (2015), and the phylogenies presented in these studies also find Group B (AtFMO1-containing clade) and Group C ('YUCCA' clade) to be more closely related to each other than to Group A (the AtFMO GSOX-containing clade).

Sub-groups reveal non-uniform patterns of TaFMO gene expansion
Tables 2, 3 summarize the synteny status of Groups A, B, and C TaFMO homeologs and homeolog sub-genome ratios (denoted as a TaFMO ratio of 1: 1: 1 from A, B, and D sub-genomes), respectively.Group A TaFMOs show substantially less syntenic triads with only 8.2% of triads in Group A exhibiting an even ratio of 1: 1: 1; even more, around 42% of triads in Group A were missing either an A-, B-, or D-copy, a higher frequency than the 13.2% expected for all wheat proteins in the genome (Table 3).Group C, subclade C1, showed the most conserved 'clean' homoeologous triads (56.3%) such that there were rarely any singleton/orphan or 'nc' TaFMOs, and each TaFMO triad had corresponding orthologues from other grass species (Supplementary Figure S4); the high evolutionary conservation of these TaFMO triads is perhaps due to functional constraints influencing gene retention.In contrast, TaFMOs in subclade C2 (along with barley FMOs) appear to have large, uneven homeolog expansions via tandem duplications and intraor interchromosomal dispersion of genes (Supplementary Figure S4).
In Group B, 36.4% of TaFMOs belong to even triads which more closely resemble the anticipated percentage (35.8%) of all predicted wheat proteins belonging to even homoeologous triads (Table 3) (Ramıŕez-Gonzaĺez et al., 2018).Group B appears to have many instances of sub-genome homeolog expansion and uneven tandem duplications (Table 2).Subclade B1 appears to contain three pairs of TaFMO WGD homeologs that underwent multiple duplications, with more HvFMO orthologues present than OsjFMO and TuFMO, (Supplementary Figure S3), while subclade B2 shows FMO expansions for all grass species surveyed, but none for Arabidopsis (Supplementary Figure S3).
Such divergent patterns of gene retention and expansion among homoeologous triads between the three major groups of TaFMOs might implicate differing biological roles (Birchler and Yang, 2022).make a point that genes involved in defense mechanisms are more likely to diverge after duplication events; the retention of the active sites for some of the highly duplicated Group B triads could infer a role in defense-related functions.

Variations in canonical motifs delineate the FMOs in wheat and related grasses
We detected fifteen of the most conserved protein motifs in FMOs across Arabidopsis and the grasses rice (Osj), barley (Hv), red einkorn (Tu) and wheat (Supplementary Figure S5).FMOs in plants are reported to contain three crucial protein motifs essential for 'canonical' FMO activity; these are the FAD-binding motif (GxGxxG), the NAD(P)H-binding motif (GxGxxG), and the FMO-identifying motif [FxGxxxHxxxY/F;(Eswaramoorthy et al., 2006;Hansen et al., 2007;Huijbers et al., 2014)].While the FADand NAD(P)H-binding sites are active sites surrounding the enzyme pocket where hydroxylation takes place, the FMOidentifying motif is considered a linker region between the two active sites (Malito et al., 2004).The last and less widely talked about motif is the 'F/LATGY' motif, more recently referred to in some studies as the 'ATG-containing' or 'TGY' motif (Yoshimoto and Saito, 2015;Gaba et al., 2023); this motif is thought to be a crucial  factor governing N-hydroxylation (Stehr et al., 1998;Fennema et al., 2016).
The FAD-binding motif at the N-terminus is very well conserved with regards to three glycine residues across all groups (Figures 4A, B).However, the residues surrounding this motif for Group A and subclade B2 from Group B is recorded to be distinct from that in subclade B1 and Group C (Figures 4A, B; Supplementary Figures S2-4); the FAD-binding site variant 1 (for Group A and B2) and the other variant are highlighted in orange in Figure 4A, B. An extra copy of the FAD-binding variant 1 also exists between the NAD(P)H-binding site and the ATG-motif in most of the examined grass species (Hv, Tu, Ta) in subclade C2, but not for rice.This may indicate a novel function for FMOs specific to Triticeae.The NAD(P)H-binding motif across all FMOs (Supplementary Figure S5) is well-conserved for the first glycine residue, but not for the other two glycine residues (Gxgxxg; Figure 4A).The FMO-identifying motif for FMOs in Group A has the first residue swapped from the canonical F (Phenylalanine) for W (Tryptophan), which is the case for most of the plant species examined (Figure 4A).Additionally, the ATG-containing motif for Group A FMOs across all flowering plants surveyed are highly conserved, represented as HCTGY or YCTGY.The differences of the Group A ATG-motif and FMO-identifying motifs relative to Groups B and C may allude to specificity for S-oxygenation of compounds and specialized metabolites yet uncharacterized in cereals, where Group A FMOs from A. thaliana (FMO-GSOX enzymes) are well-characterized for their S-oxygenation activity during synthesis of specialized metabolites (Hansen et al., 2007).The ATG-containing motif is most varied for Group B FMOs, where in addition to the canonical '(F/L)ATGY' residues reported by (Schlaich, 2007), other variations are present (LATGF, FLATGF, FATGY, LATGY, FGTGF) in the B2 expanded subclade comprising other grass species (Tu, Hv, Osj).By contrast, Group C FMOs mostly present with 'LATGY' motifs (and some 'MATGY' motifs in barley and wheat) in subclade C1 and 'FATGY' motifs in subclade C2 (Figure 4A).
Beyond the documented protein motifs discussed above, one novel motif is reported for each Group A and Group C FMOs (referred to here as motif-A1 and motif-C1, respectively; see Supplementary Figure S5 for these and other motifs referred to in this section).Meanwhile, Group B FMOs have four clade-specific motifs (motif-B1-4) (Figure 4).Groups A, B, and C FMOs all share one novel motif each with each other (motif-AB1, motif-AC1, motif-BC1).Additionally, there is a novel motif common to almost all FMOs located at the N-terminus following the FADbinding site for most FMOs, reported here as motif-ABC1 (Figure 4A).A look at predicted protein models highlights that many of these novel motifs may present as-yet-unclassified linker Genomic distribution of TaFMO groups A, B, and C candidates presented on a Circos plot (shinyCircos).Colors of gene labels correspond to the Group A (orange), B (blue) and C (green) partitions of TaFMO candidates.Red-labelled genes are orphaned/singleton TaFMOs, and grey-labelled genes are 'nc' TaFMOs.Cytobands exhibit chromosome tracks for chromosomes 1-7 (and unmapped chromosome, ChrUn) for each sub-genome A, B, and D in wheat.Links inside the circos plot are representative of whole-genome-duplication homeologs (syntenic triads) of wheat TaFMO genes delineated through a genome-wide synteny analysis; links are discretely colored in correspondence to annotated subclades for each of the TaFMO candidates, seen in Supplementary Figures S2-4, S2 notes.
regions contributing to the integrity of the or motifs that govern substrate specificity and binding affinity (Figure 4B).We hypothesize that differences in tertiary protein structures, including these clade-specific motifs, likely play a crucial role in substrate specificity (Li et al., 2008;Zhu et al., 2019).While no plant FMOs have been crystalized to date, the use of 3D protein homology modelling and ligand-docking simulations-in corroboration with functional analyses-may shed some light on which compounds TaFMO, and other cereal FMOs, may be acting upon.

Possible roles of group A and C TaFMO: from pathogen defense to grain development
Group A FMOs contain all the Arabidopsis FMO-GSOX (glucosinolate oxidase) genes.These FMOs have been experimentally validated to participate in glucosinolate (GSL) biosynthesis by catalyzing the S-hydroxylation of methylthioalkyls to methylsulfinylalkyls, where GSLs are a class of specialized sulphur-containing metabolites involved in plant-herbivory defenses (Hansen et al., 2007;Li et al., 2011;Kong et al., 2016;Cang et al., 2018;Schall et al., 2020).GSLs are predominantly found in the mustard seed family (Brassicaceae), although there is evidence that 500 neighboring eudicot species outside this family may also contain one or more of the 120 documented GSLs (Flamini, 2012).The possible metabolites that these Group A FMOs help modify in cereals are not well-understood, as cereals are not reported to produce GSLs (Hansen et al., 2007;Thodberg and Jakobsen Neilson, 2020).More recently, other A. thaliana FMOs have been documented to synthesize trimethylamine Noxide (TMAO) (Fennema et al., 2016), the levels of which are increased in plants under low temperatures, drought, and high salt stress conditions (Cataláet al., 2021).Group A FMOs segregate into two major subclades, A1 and A2 (Figures 2, Supplementary Figure S2).While little is known for Group A FMOs in barley and red einkorn, it is reported that five rice FMOs from subclade A1 ( O s 0 1 t 0 3 6 8 0 0 0 -n c , O s 0 2 t 0 5 8 0 6 0 0 -n c , O s 0 7 t 0 1 1 1 7 0 0 , Os07t0111900, Os07t0112000) are co-expressed with WRKY13 (a transcription factor involved in pathogen defense) upon attack by rice sheath-infecting fungi (John Lilly and Subramanian, 2019).For TaFMOs in subclade A1, the literature supports involvement in root drought tolerance, tiller growth, and nutrient accumulation in the grain (Table 2; Supplementary Figure S6).The rice FMO Os10t0553800 in subclade A2 may play an important role in rice germination seedling establishment during floods, via epigenetic methylation responses to anaerobic conditions (Castano-Duque et al., 2021).By contrast, several TaFMOs in subclade A2 are implicated in disease resistance against both biotrophic (Navathe et al., 2022) and necrotrophic pathogens (Nussbaumer et al., 2015), in addition to root drought tolerance (Supplementary Figure S6).The amino oxidase (AOX) protein domain which is seen in almost all FMOs in Group A at the Nterminus but rarely for Group B (and which is missing from Group C), is larger in subclade A2 AtFMOs (~150 amino acid residues), but not for wheat (~60 residues) or other plant FMOs (Figure 2).The AOX domain in Group A TaFMOs appears to encompass the motif-A1 mentioned previously unique to Group A and may participate in substrate binding specificity for novel compounds in wheat.
Group C contains the "YUCCA" (YUC) genes that are known to participate in auxin (idole-3-acetic acid, IAA) biosynthesis (Kendrew, 2001;Cao et al., 2019), where auxin is involved in many plant development processes (Schlaich, 2007;Chandler, 2015).We identified 74 TaFMOs in Group C, which can be split into two main subclades, C1 and C2.Several TaFMOs in subclade C1 are implicated in the response to drought stress (Table 2; Supplementary Figure S7), found to be upregulated during root drought treatment (Grzesiak et al., 2019), or associated with an increased reactive oxygen species (ROS, H 2 O 2 ) burst in response to root drought conditions (Kamruzzaman et al., 2022).In subclade C2, several TaFMOs are implicated in plant development, including in grain (Mangini et al., 2021) and seed development (Li et al., 2014), and tiller number increase (Yu et al., 2021).To date, only three homoeologous genes involved in seed development have been functionally characterized via gene cloning (Li et al., 2014).A catalogue of 63 'TaYUC' gene candidates involved in auxin biosynthesis by Yang et al. (2021) included several TaFMO genes outside Group C (dispersed throughout subclade B2; Supplementary Table S4), hinting towards a flexibility in TaFMO involvement in auxin biosynthesis-regulated plant development.In the absence of a full gene family survey and experimental characterization, adopting a more wheat-specific nomenclature for the FMO gene family via the guidelines put forward by the Wheat Initiative community (Boden et al., 2023), may minimize confusion on gene names in wheat to streamline future research efforts.

Group B TaFMOs: major players in broad-spectrum disease resistance
Group B FMOs can be split into two distinct subclades, B1 and B2 (Figure 2).This group contains the Arabidopsis AtFMO1 and AtFMO2 genes.AtFMO1 hydroxylates the N atom of an L-lysine catabolic product to form N-hydroxylated pipecolic acid (NHP), crucial for establishing systemic acquired resistance (SAR) to combat microbial pathogen invasions (Conrath, 2006;Mishina and Zeier, 2006;Chen et al., 2018;Hartmann et al., 2018), and a lack of NHP (due to defects in its biosynthesis) results in immunocompromised Arabidopsis plants.Various studies have recently contributed to better understanding the mechanism of NHP-mediated SAR regulation in other plants, including cereal crops, by searching for functional orthologs to AtFMO1 and their possible involvement in disease resistance (Holmes et al., 2019;Lenk et al., 2019;Schnake et al., 2020;Vlot et al., 2021;Zhang et al., 2021).
In subclade B1, AtFMO1 and AtFMO2, the only Group B FMOs in Arabidopsis, form a clade with 18 TaFMOs, where only three candidates are 'nc' (Table 2, Figure 4B).All FMOs from subclade B1 possess the most minimal number of known protein domains via SMART analysis, whereas multiple protein domains are present for TaFMO from all other clades (Figure 2).Of note, AtFMO1 harbors an additional K-oxygenase domain (IPR025700) in the first half of its protein sequence, which is not seen in any of the B1 TaFMOs or AtFMO2 but is present in most B2 TaFMOs and TaFMOs in Groups A and C. The K-oxygenase domain is involved in siderophore biosynthesis, and in ornithine and lysine hydroxylation (Visca et al., 1994;Krithika et al., 2006); therefore, presence of this Koxygenase may play a critical role in hydroxylation of L-lysine and/or L-ornithine derived compounds.
One of the B2a-2 homoeologous triads (TraesCS5A03G0836200.1, TraesCS5B03G0873900.1, TraesCS5D03G0799100.1) shows greater upregulation of transcription after Pst infection at one day post-infection (dpi), which is sustained through 11 dpi; TraesCS5A03G0836200.1 is also reported by (Vikas et al., 2022) in a genome-wide association study (GWAS) to be likely involved in seedling resistance to the biotrophic fungal pathogen Puccinia triticina (Pt), the wheat leaf rust pathogen closely related to Pst.Another TaFMO in sub-clade B2b-4c(2), TraesCS7B03G1279500.1, might be involved in Puccinia graminis f.sp tritici (Pgt, stem rust) resistance (Sahu et al., 2021).In a QTL analysis by (Matros et al., 2017), the two triads in subclade B2a-2 were also detected as significantly enriched in SNPs associated with ornithine metabolism in wheat.Ornithine is an amino acid bearing great resemblance to lysine and has functional significance in plant abiotic stress tolerance (Kalamaki et al., 2009) through its role in osmoregulation during drought and salinity stress (Roosens et al., 1998;Xue et al., 2009).These two triads are among the only Group B TaFMOs to be transcriptionally responsive in the abiotic conditions presented, downregulated during drought stress, heat stress, and combined drought + heat stress one hour, but bounce back to nearly control levels after six hours of treatment for certain splice variants (Figure 5 -IV) (Liu et al., 2015).Transcripts of the TaFMO triad in B2a-1 (except for the 'nc' 5D homeolog) have been detected in wheat roots (Figure 2, Table 2) (Grzesiak et al., 2019).reported that genes of this triad were upregulated in response to water deprivation in wheat root tissues, though no upregulation of transcription in leaf tissues was detected in the drought study by Liu et al. (2015) (Figure 5 -IV), highlighting a need for careful interpretation of tissue-specific expression patterns of TaFMOs.Furthermore, a GWAS analysis indicates that the B-copy gene of this triad, TraesCS5B03G0877300.1, influences root-growth and biomass yield-related traits (Zhao et al., 2023).Trends for other candidates of interest are found in Appendix S7.
In summary, several lines of evidence hint at roles in responsiveness to various microbial diseases for several TaFMOs in subclade B2.Additionally, certain subclade B2 TaFMOs may operate during plant development (tiller increase) and drought, heat, and salinity tolerance.Thodberg et al. (2020) showed that a novel Group B FMO in fern (FOS1) participates in both the Nhydroxylation of a novel class of metabolites not previously reported in vascular plants and in the biosynthesis of cyanogenic glycoside.Likewise, the exploration and characterization of these B2 TaFMOs may identify novel functions and metabolites towards plant development, biotic, and abiotic stressors.

TaFMO UTRs are populated with elements involved in complex regulation
Untranslated regions in transcripts (UTRs) have been reported in plants as important regulatory regions often harboring important cis-acting regulatory elements (CAREs) that influence gene expression, modulate translational efficiency, and increase the coding capacity of genes that have different splice variations (Mignone et al., 2002;Srivastava et al., 2018).We found a wide variety of CAREs in the promoter regions 1.5 kb upstream of the start codon of TaFMO genes (Figures 6; Supplementary Figures S8,  S9).The presence and sizes of 5'-and 3'-UTRs among TaFMOs varied; we observed that for all TaFMOs, the 3'-UTRs were longer than the 5'-UTRs, when UTRs were present (Figures 7; Supplementary Figures S10, S11).Srivastava et al. ( 2018) reported elongated UTRs (particularly 3'-UTRs) in rice compared to Arabidopsis.In wheat, 3'-UTRs have also been reported to have a critical involvement in mediating drought stress (Ma et al., 2023) and in establishing resistance against stripe rust (Zhang et al., 2019), possibly through regulating mRNA stability.Of note, all the tandemly duplicated wheat TaFMOs in subclade C2 lack any UTRs (Supplementary Figure S11) and show virtually no (or low) transcript responsiveness to the biotic and abiotic conditions explored (Supplementary Figure S7 -III, IV), with minimal presence of expressed peptides detected in various tissue types.Transcription of group B TaFMOs under various conditions explored from literature data.(Columns I) reports of previous evidence of transcript detection or cloning under various conditions described in the legend.(Columns II) Protein expression (spectral peptide counts) for each candidate in various tissues from https://wheatproteome.org/ (Duncan et al., 2017).(Columns III) Heatmap of TaFMO transcripts in response to four types of biotic stresses; response to Fusarium graminearum infection(Fg) (Schweiger et al., 2016), mock inoculation versus pathogen infection in a timecourse of 3, 6, 12, 24, 36, and 48 hours post infection (hpi); Magnaporthe oryza (Mo) (Islam et al., 2016), asymptomatic (control) versus symptomatic (infection) conditions at grain-filling stage; Zymoseptoria tritici (Zt) (Yang et al., 2013;Rudd et al., 2015), mock inoculation versus pathogen infection in a time-course of 1, 4, 9, 14, 21 days post infection (dpi); Puccinia striiformis f.sp.tritici (Pst) (Cantu et al., 2013;Dobon et al., 2016), uninfected control versus pathogen infection in a time-course of 1, 2, 3, 5, 7, 9, 11 dpi.(Columns IV) Heatmap of TaFMO transcript response to grain development over a time-course (4, 8 12 days post anthesis) (Li et al., 2013;Yang et al., 2015), heat and drought stress (1 and 6 hours) (Liu et al., 2015), and cold stress (control versus 2-week cold stress) (Li et al., 2015).Heatmap data is displayed as Log2 transcripts per million (tpm), with blue being 0 and higher-fold expression approaching bright yellow (see legends).
Focusing on the Group B TaFMOs, analysis of the CAREs revealed involvement responsiveness to major phytohormones and conditions (Figure 6; Appendix S5).Jasmonic acid (JA)responsive CAREs were present in all TaFMO promoters at one or multiple sites; JA is a phytohormone known for involvement in many plant processes ranging broadly from photosynthesis to heat stress-responsiveness (Per et al., 2018), but most widely documented for defense against necrotrophic pathogens (Qi et al., 2016) and induced systemic resistance (Heil, 2002).In wheat, JAsignaling has also been implicated in seedling salt-stress tolerance (Qiu et al., 2014).Other notable phytohormone-responsive CAREs in Group B are for salicylic acid (SA), gibberellic acid (GA) and ethylene (ETH) (Supplementary Tables S3, 4, Figure 6).SA is a major phytohormone involved in SAR and pathogen defense signaling pathways (Dey et al., 2014;Ding et al., 2018).SAresponsive CAREs were found in higher abundance and closer to the start codons of the triads from B2a-2 (Figure 6, Supplementary Table S4), whose genes were more transcriptionally active in Zt-and Pst-infected wheat (Figure 5 -III); this may indicate that number and location of these CAREs influence which TaFMOs are activated under pathogen stress.GA is a phytohormone that was documented in wheat for its involvement in enhancing disease resistance against F. graminearum infection, by modulating both primary and specialized metabolism during early plant defense signaling (Buhrow et al., 2021); GA-responsive CAREs are found predominantly in subclade B2, further supporting that TaFMOs in this subclade are likely involved in microbial disease resistance.Auxin (IAA)-responsive CAREs are found to be abundant among many Group B TaFMOs (Supplementary Table S4), challenging the idea of Group C TaFMOs-the 'YUCCA' FMO clade-being largely responsible for auxin biosynthesis and growth regulation in plants for underreported crop species (Kendrew, 2001).

Transposable elements, gene structure diversity, and differential TaFMO expression
We explored transposable element (TE) distribution in or near the identified TaFMO genes and affecting intron-exon structure to find support for relationships among TaFMO family members (Figure 7,Supplementary Figures S10,11;Appendix S6).Focusing again on the Group B TaFMOs, we observed much variation in the number and spatial distribution of introns and exons; high similarity of gene structures was found among certain syntenic triads highlighting distinct TaFMO subfamilies that possibly remain conserved for functional importance.In contrast, extensive intronexon shuffling following gene duplication events (Kolkman and Stemmer, 2001;Franca et al., 2012) may indicate lower selective pressure and neofunctionalization, such as in the case for subclades B2b-4c1 and B2b-4c2 (Figure 7).
Nearly all TaFMOs (90%) are flanked by TEs within a 2 kb region upstream and/or downstream of the start or stop codons (Figure 7, Table 4) in accordance with a report of TEs representing over 80% of the wheat genome (Choulet et al., 2014;Bariah et al., 2020).44.7% of all TaFMO genes show disruption by TE insertion, mostly occurring inside introns, inside 5'-or 3'-UTRs, or, very rarely, inside exons (Figure 7).TE insertions are present in 56.1% of Group B TaFMO, with the largest TE interruption in TraesCS2B03G0193800 increasing its total gene length to upwards of 29 kb (Figure 7).Most TE disruptions in subclade B2 occur in the B-and D-copy homeologs Plant cis-acting regulatory elements (CARE) detected in a 1.5 kb region upstream of start codons for all TaFMO in Group B X-axis denote the number of base pairs away from the start codon, where 5' is 1,500 bp upstream ATG start codon (3' and 0 bp is right before the ATG start codon).Polytomy regions are denoted by a red box.Presence of regulatory elements specific to different tissues are described in (I).Other responsive promoter elements are denoted in the figure legend and displayed as rectangles of corresponding color to the condition in (II).and appear to be correlated with differential expression patterns among homoeologous triads.Varying locations of TE insertions the highly tandemly duplicated B-copy TaFMOs in B2b-4c(1) correlate with no transcriptional expression compared to the Aand D-copies, and only an A-copy TaFMO in B2b-4c(2) lacking TE insertions was transcriptionally active (Figure 5 -III, IV; Figure 7).In contrast, one of the triads in B2a-2 (TraesCS5A03G0836200, TraesCS5B03G0873900, TraesCS5D03G0799100) shares the same TE insertion in its three members, which are all more transcriptionally active than gene members in the other B2a-2 triad lacking this TE insertion, particularly upregulated towards Pst infection and cold stress responses (Figure 5 -III, IV; Figure 7).Yang et al. (2021) reported on the gene structure of a candidate 'YUCCA' FMO in wheat, TaYUCCA5-D (TraesCS1D03G0254900.1, Group B), containing a first intron size of approximately 13 kb.We now know this is due to a TE insertion (Figure 7), underscoring a need for expanding on factors driving gene structure diversity in large gene-family analyses.We see that TE insertions in the intronic and 3'-UTR regions may play a role in influencing gene expression of triads on a sub-genome level, possibly leading to eventual neofunctionalization of FMO family members.Several studies indicate that TEs are important drivers in polyploid plant evolution (Vicient and Casacuberta, 2017;Gill et al., 2021), such as in wheat where distinct TE families are correlated with up-or down-regulation of genes in their vicinity (Bariah et al., 2023).Thus, exploring classes of TE elements that lie in or around TaFMOs could aid in predicting gene expression regulation in the different wheat sub-genomes.

FIGURE 1
FIGURE 1Total number of predicted FMOs and their main domain architectures as reported from Hidden Markov Modelling (HMM).Branch colors indicate major subdivisions amongst vascular plants.Total numbers of FMOs are reported as a number at the tips of each bar, and multivariate bars shows colors corresponding to the proportion of FMOs for each of the various kinds of domain architectures seen in the legend.

FIGURE 3
FIGURE 3 FIGURE 4 (A) Consensus MEME motifs for conserved and unique motifs across Groups A, B, and C TaFMO proteins.(B) Predicted 3D protein structures for Groups A, B, and C TaFMO with highlighted motifs.

TABLE 1
List of plant species in FMO search (HMM profile PF00743) and number of total FMOs found.
-data not available.†denotes n>1, where n represents either the A, B, or D subgenome copy duplicated as 2 or more copies.$denotes orphaned/single TaFMO which were not found in synteny with any known triads.¥denotes TaFMO which exhibited any A, B,and/or D subgenome duplication ratios not mentioned above.‡unable to consolidate exact relation in synteny and phylogenetic analyses presented in this study.§ reported % of genes in wheat falling into each category by IWGSC 2018.

TABLE 4
Transposable element status for TaFMO in Group A, B, and C.