Identification and Comparative Analysis of H2O2-Scavenging Enzymes (Ascorbate Peroxidase and Glutathione Peroxidase) in Selected Plants Employing Bioinformatics Approaches

Among major reactive oxygen species (ROS), hydrogen peroxide (H2O2) exhibits dual roles in plant metabolism. Low levels of H2O2 modulate many biological/physiological processes in plants; whereas, its high level can cause damage to cell structures, having severe consequences. Thus, steady-state level of cellular H2O2 must be tightly regulated. Glutathione peroxidases (GPX) and ascorbate peroxidase (APX) are two major ROS-scavenging enzymes which catalyze the reduction of H2O2 in order to prevent potential H2O2-derived cellular damage. Employing bioinformatics approaches, this study presents a comparative evaluation of both GPX and APX in 18 different plant species, and provides valuable insights into the nature and complex regulation of these enzymes. Herein, (a) potential GPX and APX genes/proteins from 18 different plant species were identified, (b) their exon/intron organization were analyzed, (c) detailed information about their physicochemical properties were provided, (d) conserved motif signatures of GPX and APX were identified, (e) their phylogenetic trees and 3D models were constructed, (f) protein-protein interaction networks were generated, and finally (g) GPX and APX gene expression profiles were analyzed. Study outcomes enlightened GPX and APX as major H2O2-scavenging enzymes at their structural and functional levels, which could be used in future studies in the current direction.


INTRODUCTION
Reactive oxygen species (ROS), once perceived as toxic byproducts, were known to cause oxidative damage in cells Suzuki and Mittler, 2006). Later, novel regulatory roles of these species were revealed in a wide range of biological processes such as cell signaling, growth, development, programmed cell death, and plant responses to various biotic/abiotic stress factors (Mullineaux and Karpinski, 2002;Uzilday et al., 2014). H 2 O 2 is an endogenous ROS species known to play a dual role in plants, where it is beneficial at low concentrations but lethal at higher levels (Petrov and Van Breusegem, 2012). Nevertheless, at steady state levels, H 2 O 2 acts as signaling molecule inducing the signal transduction mechanism to produce various cellular responses. Interestingly, pre-treatment of plants with H 2 O 2 makes them more tolerant to biotic/abiotic stresses (Hossain et al., 2015). H 2 O 2 was also noted for its regulatory functions in photosynthesis, cell cycle, development, senescence, and apoptosis Petrov and Van Breusegem, 2012). H 2 O 2 has been accepted as a central component of signal transduction pathways in plant-adaptation to altered environmental conditions as it is both the only ROS with high permeability across membranes (that enables the transport of signals to distant sites) and its high stability when compared to other ROS with ∼1 ms half-life (Bienert et al., 2007;Dynowski et al., 2008;Petrov and Van Breusegem, 2012). On the other hand, when the delicate balance between production and scavenging of H 2 O 2 is disturbed, its overproduction results in significant damage to cell structures (Anjum et al., 2015;Sofo et al., 2015). To overcome H 2 O 2 -related cellular damage, aerobic organisms have developed various antioxidant machineries with enzymatic and non-enzymatic components. Ascorbate peroxidase (APX), glutathione peroxidase (GPX), and catalase (CAT) are the main enzymes responsible for suppressing toxic levels of H 2 O 2 (Apel and Hirt, 2004). However, APX may have pivotal roles in ROSscavenging because even very low concentrations are sufficient for H 2 O 2 decomposition (Anjum et al., 2014;Sofo et al., 2015). APX (EC, 1.11.1.11) belongs to the plant-type heme peroxidase superfamily in plants (Lazzarotto et al., 2011). Genome-wide studies demonstrated that APX in higher plants is encoded by multigenic families. Arabidopsis was reported to contain nine APX genes; whereas, rice has eight and tomato seven (Chew et al., 2003;Teixeira et al., 2004;Najami et al., 2008). Different isoforms are classified into sub-families according to their subcellular localization. Transmembrane domains in Nand C-terminal regions, as well as organelle-specific target molecules are the primary determinants in target localization of APXs (Ishikawa et al., 1998;Negi, 2011). Among nine APX genes identified in Arabidopsis, three were found to be encoded in cytosol whereas the other six were distributed in stroma, thylakoid, and peroxisome (Chew et al., 2003;Mittler et al., 2004). In rice, chloroplastic isoforms were expressed by three genes, cytosolic and peroxisomal forms were both encoded by two genes, and one gene was for the mitochondrial APX (Teixeira et al., 2006;Anjum et al., 2014). APX activity was also reported to increase under various stress conditions. For example, APX is differentially upregulated in response to heavy metal, drought, water, and heat stress (Sharma and Dubey, 2005;Koussevitzky et al., 2008;Yang et al., 2008;Anjum et al., 2014). In a previous study, Arg-38, Glu-65, Asn-71, and Asp208 residues were reported to be conserved among the entire APX family and known to be important in ligand (heme)-binding (Welinder, 1992). In addition to enzymatic properties, structural investigations on catalytic domains of the enzymes have been also performed. Three-dimensional structures of cAPX, sAPX, and their substrates showed the relationship between loop structure and stability in the absence of ascorbate (AsA; Yabuta et al., 2000;Anjum et al., 2014). The mitochondrial and chloroplastic APXs (<30 s) have shorter half inactivation times (>1 h) compared to cytosolic and peroxisomal isoforms, which makes them more sensitive in either low concentrations or the absence of AsA (Caverzan et al., 2012;Anjum et al., 2014). Another important enzyme in H 2 O 2 -scavenging is the GPX from the non-heme containing peroxidase family . In Arabidopsis, eight GPX genes were reported (Milla et al., 2003;Koua et al., 2009). Based on in silico analysis, GPXs were predicted in chloroplast, mitochondria, cytosol, and ER localizations (Rouhier and Jacquot, 2005), and demonstrated high level of sequence similarity with strictly conserved cysteines and motifs (Dietz, 2011). Plant GPXs have cysteine residue in their active site (Koua et al., 2009), which is functional in both glutathione (GSH) and thiol peroxidase classes of the non-heme family. GPXs were also reported to be involved in stress responses. Many studies have demonstrated the significant increase in mRNA levels of GPXs under various abiotic/abiotic stress conditions such as oxidative stress, pathogen attack, metal, cold, drought, and salt (Navrot et al., 2006;Diao et al., 2014;Fu, 2014;Gao et al., 2014). For example, GPX genes were found to be upregulated under excess H 2 O 2 and cold stresses in rice (Passaia et al., 2013). Transcriptome analysis indicated high level of GPX transcripts in dehydrated Glycine max samples (Criqui et al., 1992;Ferreira Neto et al., 2013). Several transgenic studies also supported the proposed function of GPXs. For example, the overexpression of GPX in its transgenic tomato resulted in higher tolerance against abiotic stress (Herbette et al., 2011). In addition to stress response, GPXs are also thought to regulate cellular redox homeostasis by modulating the thiol-disulfide balance . GPX expression was found to be highly upregulated to maintain redox homeostasis under oxidative stress which helped Brassica rapa to adapt long-term spaceflight (Sugimoto et al., 2014).
A scan of contemporary literature reveals a paucity of information on the identification and comparative analysis of GPX and APX in model and economically important food crops. Given the above, employing bioinformatics approaches, efforts were made in this study (a) to identify potential GPX and APX genes/proteins from 18 different plant species, (b) to analyze their exon/intron organization, (c) to provide detailed information about their physico-chemical properties, (d) to identify conserved motif signatures of GPX and APX, (e) to construct their phylogenetic trees and 3D models, (f) to generate protein-protein interaction networks, and finally (g) to analyze GPX and APX gene expression profiles.

Analysis of GPX and APX Genes/Proteins
Physicochemical properties of GPX and APX proteins were determined by using ProtParam tool (Gasteiger et al., 2005). Sub-cellular localization was predicted by CELLO (Yu et al., 2006) and WoLF PSORT (Horton et al., 2007) servers. Exonintron organization of GPX/APX genes was analyzed by using a GSDS server . The Conserved motif structure of GPX/APX sequences was analyzed using the MEME tool with the following parameter settings: maximum number of motifs to find, 5; minimum width of motif, 6 and maximum width of motif, 50 (Bailey et al., 2009). Protein sequences were aligned by ClustalW (Thompson et al., 1994) and phylogenies were constructed by MEGA 6 (Tamura et al., 2013) with the maximum likelihood (ML) method for 1.000 bootstraps. The gene duplication events were detected using the following criteria: (a) length of alignable sequence covers >75% of the longer gene, and (b) similarity of aligned regions is >75% (Gu et al., 2002). The expression data of APX and GPX genes at anatomical and developmental levels were retrieved from the Genevestigator database (Hruz et al., 2008). 3D models of APX/GPXs were predicted by using the Phyre 2 server (Kelley and Sternberg, 2009). Model validation was performed by Rampage Ramachandran plot analysis (Lovell et al., 2003). 3D structure comparisons were done by calculating RMSD values of models using the CLICK server employing α-carbon superposition (Nguyen et al., 2011). Putative interaction partners of APX/GPXs were predicted with the STRING server (Franceschini et al., 2013) and an interactome network was generated using cytoscape (Smoot et al., 2011).

RESULTS AND DISCUSSION
H 2 O 2 plays double roles in plants and modulates various crucial metabolic processes (Petrov and Van Breusegem, 2012). However, its increased levels can cause severe damage to cell structures; hence, steady-state level of cellular H 2 O 2 is required to be tightly regulated (Anjum et al., 2014(Anjum et al., , 2015Sofo et al., 2015). GPX and APX are two major ROS-scavenging enzymes which catalyze the reduction of H 2 O 2 to prevent H 2 O 2 -derived cellular damage. In order to understand the structural, functional as well as evolutionary aspects of GPX and APX, employing bioinformatics approaches, this study attempted to present comparative analyses of putative GPX and APX homologs identified from18 plant species.

Retrieval of GPX Genes/Proteins
Eight potential Arabidopsis GPX protein sequences, namely GPX1-8, obtained from the UniProtKB/Swiss-Prot database of NCBI were used as queries in Phytozome database to retrieve the very close homologs of GPX sequences in 18 plant species. In the selection of GPX homologs from blastp hits, very strict criteria (only the highest hit sequence) was applied to avoid the redundant sequences and alternative splices of the same gene. A total of 87 GPX sequences were identified from the protein datasets of 18 plant species. These include; 8 genes for A. thaliana, 4 genes for B. distachyon, 8 genes for B. rapa, 1 gene for C. reinhardtii, 6 genes for C. sativus, 5 genes for E. grandis, 5 genes for G. max, 6 genes for G. raimondii, 5 genes for M. truncatula, 5 genes for O. sativa, 5 genes for P. vulgaris, 2 genes for P. patens, 5 genes for P. trichocarpa, 5 genes for P. persica, 5 genes for S. lycopersicum, 4 genes for S. bicolor, 5 genes for V. vinifera, and 3 genes for Z. mays (Table 1). Then, genomic, transcript, CDS, and protein sequences of identified 87 GPX sequences were retrieved for further analyses.

Sequence Analysis of GPX Genes/Proteins
A total of 87 GPX homologs were identified in the protein datasets of 18 plant species using Arabidopsis GPX1-8 for homology search. Identified GPX homologs belonged to the GSHPx (PF00255) protein family. They encoded a polypeptide of 166-262 amino acids residues (average length 197.5) and 18.4-29.7 kDa molecular weight with 4.59-9.60 pI value. The sequence variations in analyzed GPXs primarily derived from the "transit peptide" residues between organelle and non-organelle related GPXs ( Table 1). Studies of molecular cloning and sequencing in A. thaliana have reported that chloroplastic GPX1 and GPX7 consisted of 236 and 233 amino acids, respectively; the first 1-64 residues in GPX1 and 1-69 residues in GPX7 from N-terminal site contained the transit peptides (Mullineaux et al., 1998;Lin et al., 1999;Mayer et al., 1999). Arabidopsis GPX2 and GPX4 were reported to be 169 and 170 residues, respectively with cytosolic localization: thereby, they did not contain any transit peptide (Lin et al., 1999). Arabidopsis GPX3 and GPX6 were 206   More than one localization specified in a single column also shows the significance of other entries in order.
and 232 residues, respectively, with mitochondrial localizations; the first 1-12 amino acids in GPX3 and 1-54 residues in GPX6 covered the transit peptide (Lin et al., 1999;Mayer et al., 1999). Arabidopsis GPX5 was 173 residues with probable ER or Plasma membrane localization, without transit peptide (Erfle et al., 2000). Arabidopsis GPX8 comprised of 167 amino acids with cytosolic or nuclear localization, without transit peptide (Theologis et al., 2000). In the present study, alignment analysis revealed that in chloroplastic/mitochondrial-related GPXs, the transit peptide sequences formed the first 50-90 amino acid residues from the N-terminal site while in extra cellular/plasma membrane-related GPXs, residues of the first 20-50 amino acid from N-terminal region contained the putative transit peptide. However, cytosolic sequences lacked of any putative transit residues (Supplementary Figure S1). Thus, analyzed GPX sequences were roughly categorized in three main groups based on their sequence length; the chloroplastic/mitochondrial related GPXs comprised the longer sequences (i), extra cellular/plasma membrane related GPXs formed the medium-sized sequences (ii), and cytosolic related GPXs included the shorter sequences (iii). In addition, the regions corresponding to the transit peptide sites in analyzed sequences did not demonstrate any particular patterns. The less-conserved transit peptide residues could be related with the functional diversities of GPXs at various targets. However, despite the variations in sequence length and transit peptide residues, transcripts of GPX homologs mainly contained the six exons. Therefore, it is reasonable to claim that analyzed GPX sequences could have highly-conserved protein sequences, preserved during the formation of various GPXs. The alignment analysis of 87 GPX protein homologs also confirmed this claim, demonstrating the presence of more conserved residues in the main sites of the active enzyme (Supplementary Figure  S2). Moreover, to discern the conserved motif patterns in GPX sequences, the most conserved five motif sequences were searched in sets of 87 GPX homologs using MEME tool ( Table 2). Motif 1 and 3 were the 50 amino acid residues, while the motif 2 was 41, motif 4 was 15, and motif 5 was 6 residues in length.
Motif 1 and 3 were related with the GSHPx (PF00255) protein family and present in almost all GPX homologs. The presence of long conserved residues and their relation with the GSHPx family could indicate the highly conserved structures of GPX sequences at these sites between/among species.

Phylogenetic Analysis of GPXs
The evolutionary relationships between identified GPX sequences were analyzed by MEGA 6 using the Maximum Likelihood (ML) method with 1000 bootstraps. The constructed phylogeny included all 87 GPX homologs to discover the phylogenetic distribution of sequences (Figure 1). The tree was divided into six major groups based on the tree topology, and each group was indicated with a different color segment. The red segment included cytosolic, extra cellular, and plasma membrane FIGURE 1 | Phylogenetic tree of glutathione peroxidase (GPX) homologs from 18 plant species. Tree was constructed by MEGA 6 using Maximum likelihood (ML) method with 1000 bootstraps. Segment classification based on the consensus of two subcellular localization servers, CELLO and WoLF PSORT as well as tree topology for ambiguous sequences. Red segment includes cytosolic, extra cellular, and plasma membrane related GPXs, green segment contains mitochondrial and chloroplast related GPXs, blue segment only have cytosolic GPXs, cyan segment includes cytosolic, and chloroplast/mitochondrial related GPXs, yellow segment contains cytosolic/nuclear related GPXs, and non-colored segment has lower plant Chlamydomonas GPX with chloroplastic/mitochondrial relation.
related GPXs, the green segment contained mitochondrial and chloroplast related GPXs, the blue segment only had cytosolic GPXs, the cyan segment included cytosolic and chloroplast/mitochondrial related GPXs, the yellow segment contained cytosolic/nuclear related GPXs, and the noncolored segment had lower plant Chlamydomonas GPX with a chloroplastic/mitochondrial relation. Annotation of each segment based on the consensus of two subcellular localization servers, CELLO and WoLF PSORT, as well as tree topology for ambiguous sequences. Mainly cytosolic, nuclear, extra cellular and plasma membrane related GPXs were clustered together, while chloroplast/mitochondrial related GPXs also cluster together. Therefore, the presence or absence of transit peptide residues was the main contributing entity in the phylogenetic distribution of GPX sequences. In addition, the presence of sequences with different subcellular localizations in the same group inferred the possibility of gene duplication events in the formation of various GPX sequences. Duplications in plant genomes could be either as small-scale such as tandem and segmental duplications, or as large-scale such as whole-genome duplications (Ramsey and Schemske, 1998). The segmental duplications are observed in different chromosomes whereas tandem duplications occur in the same chromosome (Liu et al., 2011). Several segmental duplications were identified between GPX pairs ( Table 3). The presence of segmental duplications, particularly between sequences with various subcellular localizations may partly explain the possibility of gene duplication events in GPX formations.

Expression Profile Analysis of GPXs
The potential expression profile of GPX genes was analyzed at 105 anatomical parts and 10 developmental stage levels using model organism A. thaliana GPXs from Genevestigator platform (Figure 2). Eight Arabidopsis genes, namely GPX1 (AT2G25080), GPX2 (AT2G31570), GPX3 (AT2G43350), GPX4 (AT2G48150), GPX5 (AT3G63080), GPX6 (AT4G11600), GPX7 (AT4G31870), and GPX8 (AT1G63460) were retrieved from the "Affymetrix Arabidopsis ATH1 Genome Array" platform using the Genevestigator interface, and conditions and genes with similar profiles were comparatively analyzed using the Hierarchical clustering tool with the Euclidean distance method. At an anatomical part level (Figure 2A), analyzed GPX1-8 genes were expressed in almost all 105 anatomical tissues in Arabidopsis plants. However, various root and leaf protoplast cells, seed-related tissues, and active growth zones demonstrated significantly higher GPX activity. This indicates that stress factors and/or active metabolism could lead to the up-regulation of various GPX genes in different tissues. Many studies have also showed that balance between production and scavenging of H 2 O 2 could be disturbed by various biotic/abiotic stress factors or perturbations such as drought, salinity, pathogen attack, oxidative state of the cells (Apel and Hirt, 2004;Anjum et al., 2014Anjum et al., , 2015Sofo et al., 2015). Besides, a number of studies were also available demonstrating the functional roles of GPXs in plant stress resistance/tolerance. For example, a GPX gene in Pennisetum glaucum enhanced the drought and salinity stress tolerance (Islam et al., 2015). Citrus GPX3 was reported to be
Silencing of mitochondrial GPX1 in O. sativa demonstrated the impaired photosynthesis in response to salinity (Lima-Melo et al., 2016). Glutathione transferases and peroxidases were reported as key components in Arabidopsis salt stress-acclimation . GPX1 and GPX3 in legume root nodules played a protective function against salt stress, oxidative stress, and membrane damage (Matamoros et al., 2015). Therefore, GPXs, which are the antioxidant enzymatic components of the cells, are consequently induced to suppress/eliminate the toxic levels of H 2 O 2 . The increased GPX activities in analyzed Arabidopsis tissues could thereby be derived from the increased H 2 O 2 or H 2 O 2 -related products. In addition, clustering analysis of GPX genes showed that GPX2, 3, 5, 6, and 8 demonstrate relatively similar expression profiles compared to those of GPX1, 4, and 7. At the developmental level (Figure 2B), the expression profiles of Arabidopsis GPX1-8 genes were analyzed at 10 developmental stages, including senescence, mature siliques, flowers and siliques, developed flower, young rosette, germinated seed, seedling, bolting, young flower, and developed rosette. GPX1-8 were relatively expressed in all developmental stages. However, the expression of GPXs in the senescence stage demonstrated slightly different patterns, particularly the mitochondrial GPX6 gene had the highest expression profile compared to other developmental stages. This may have been caused by senescence-related cellular deteriorations, leading to the substantial metabolic or physiological changes that significantly affect the overall metabolic energy consumption. Therefore, it seems that the expression profiles of GPXs are highly associated with the metabolic state of the cells.

3D Structure Analysis of GPXs
3D models of putative GPXs were constructed by using Phyre 2 server for eight identified Arabidopsis GPX1-8 gene sequences (Figure 3). These sequences were: AT2G25080.1 Genes and conditions with similar profiles were comparatively analyzed using hierarchical clustering tool with Euclidean distance method at Genevestigator platform.
Frontiers in Plant Science | www.frontiersin.org (GPX6), AT4G31870.1 (GPX7), and AT1G63460.1 (GPX8). In modeling, three templates such as 2F8A:A (GPX1, GPX3, GPX6, and GPX7), 2V1M:A (GPX2 and GPX5), and 2P5Q:A (GPX4 and GPX8) were used to maximize the alignment coverage, percentage identity and confidence for submitted sequences. Predicted models demonstrated the ≥98% of residues in allowed region in Ramachandran plot analysis, indicating that constructed models were fairly in good quality. To find out the structural divergence/similarity in generated models, the structures were superposed to calculate the percentage of structural overlap and RMSD values ( Table 4). GPX1-GPX3, GPX4-GPX8, and GPX6-GPX7 pairs demonstrated the highly conserved structural overlap (100%) with 0.14, 0.00, and 0.03 RMSD values, respectively. The each designated pair also belonged to either chloroplastic/mitochondrial or cytosolic form, indicating their functional similarities with minor specifications. In addition, GPX1-GPX6 and 7, GPX2-GPX5, and GPX3-GPX6 and 7 pairs showed very high structural similarity with ≥94 structural overlaps. Despite the highly conserved structures of Arabidopsis GPX members, some minor variations were also present. It seems that these divergences in GPXs may not change the protein-3D structure, however, they could attribute the new functional roles to catalytic activities.

Interaction Partner Analysis of GPXs
The interactome network was constructed for 10 putative interactors of Arabidopsis cytosolic GPX2, using Cytoscape with STRING data (Figure 4). APX1 (L-ascorbate peroxidase), GSH2 (glutathione synthetase), GSTF6 (glutathione S-transferase F6), GSTT1 (glutathione S-transferase THETA 1), PER1 (1-Cys peroxiredoxin PER1), AT1G65820 (glutathione S-transferase), GSTF12 (glutathione S-transferase phi 12), GSTF2 (glutathione S-transferase F2), GSTF8 (glutathione S-transferase F8), and GSTU19 (glutathione S-transferase U19) proteins were predicted as the main interaction partners of Arabidopsis cytosolic GPX2. APX1 is a type of H 2 O 2 -scavenging enzyme and a central component in the reactive oxygen gene network (Storozhenko et al., 1998;Fourcroy et al., 2004). GSH2 involves in the second step of the glutathione synthesis pathway from L-cysteine and L-glutamate (Wang and Oliver, 1996). GSTF6 functions in camalexin biosynthesis, is involved in the conjugation of reduced glutathione to various exogenous/endogenous hydrophobic electrophiles, and has a detoxification role for certain herbicides (Su et al., 2011). GSTT1, GSTF8, and GSTU19 are reported to have glutathione S-transferase or peroxidase activity. They further conjugate the reduced glutathione to various exogenous/endogenous hydrophobic electrophiles and play a detoxification role for certain herbicides (Wagner et al., 2002). PER1 is an antioxidant protein involved in the inhibition of germination under stress (Haslekås et al., 1998). AT1G65820 is a glutathione S-transferase. GSTF12 is involved in the transport of anthocyanins and proanthocyanidins into the vacuole (Kitamura et al., 2004). GSTF2 plays a role in binding and transport of small bioactive products and defenserelated compounds under stress (Smith et al., 2003). The analysis indicated that cytosolic GPX2 enzyme is closely related with various pathways involving in antioxidant and secondary metabolite metabolisms, thereby supporting the functional role of GPXs in H 2 O 2 -scavenging and plant defense.
18 plant species. In the selection of APX homologs from blastp hits, a very strict criterion (only the highest hit sequence) was applied to avoid redundant sequences and alternative splices of the same gene. A total of 120 APX sequences were identified from the protein datasets of 18 plant species. These were 8 genes for A. thaliana, 7 genes for B. distachyon, 8 genes for B. rapa, 4 genes for C. reinhardtii, 5 genes for C. sativus, 7 genes for E. grandis, 7 genes for G. max, 8 genes for G. raimondii, 7 genes for M. truncatula, 6 genes for O. sativa, 6 genes for P. vulgaris, 5 genes for P. patens, 7 genes for P. trichocarpa, 6 genes for P. persica, 7 genes for S. lycopersicum, 8 genes for S. bicolor, 6 genes for V. vinifera, and 8 genes for Z. mays (Table 5). Then, genomic, transcript, CDS, and protein sequences of 120 identified APX sequences were retrieved for further analyses.

Sequence Analysis of APX Genes/Proteins
A total of 120 APX homologs were identified in protein datasets of 18 plant species using Arabidopsis APX1-6, APXT, and APXS sequences by homology search. Identified APX sequences contained the peroxidase (PF00141) protein family domain. They encoded a protein of 197-478 amino acids residues (average length 323.9) and 23.7-52.1 kDa molecular weight with 5.03-9.23 pI value. The sequence variations in analyzed APXs demonstrated a correlation with their putative localizations,   More than one localization specified in a single column also shows the significance of other entries in order.
thereby indicated the presence of transit residues ( Table 5). Molecular cloning studies from A. thaliana have demonstrated that APX1, APX2, and APX6 are polypeptides of 250, 251, and 329 amino acids, respectively, with cytosolic localizations but without transit peptide (Davletova et al., 2005;Jones et al., 2009;Aryal et al., 2011). APX3 and APX5 consisted of 287 and 279 amino acids, respectively, with peroxisomal localizations; however, sites of transit peptide residues are not precisely specified (Panchuk et al., 2002;Narendra et al., 2006;Bienvenut et al., 2012). APX4 is a 349 amino acids protein with chloroplastic localization, including 1-82 residues as transit peptide from the N-terminal site (Kieselbach et al., 2000;Panchuk et al., 2005;Aryal et al., 2011). APXT is a 426 amino acids chloroplastic protein, including 1-78 residues of transit peptide (Theologis et al., 2000;Panchuk et al., 2005). APXS consists of 372 amino acids with chloroplastic and/or mitochondrial localizations, but the exact site of the transit peptide is not specified (Jespersen et al., 1997;Mayer et al., 1999;Chew et al., 2003). In the present study, multiple-alignment of APX sequences revealed that chloroplastic/mitochondrial-related APXs contained the transit peptide residues in approximately 1-90 amino acids from the N-terminal site while cytosolic APXs did not have any putative transit peptide (Supplementary Figure S4). Thus, the analyzed APX sequences were gathered in two main groups based on subcellular localizations, such as chloroplastic/mitochondrial APXs (i) and cytosolic APXs (ii). In addition, the regions corresponding to the transit peptide sites in analyzed sequences did not demonstrate any particular pattern. This could indicate that less conservancy in transit peptides may be associated with the functional diversities of APXs at various targets. Besides, APX transcripts mainly consisted of 8-12 exons, supporting the relatively less conserved structure of APXs compared to GPXs. However, alignment analysis also demonstrated the presence of a considerable degree of conserved residues in the main sites of enzyme (Supplementary Figure S5). Moreover, to analyze the availability of any conserved motif pattern/s in APX sequences, the most conserved five motif sequences of APX homologs were searched using MEME tool ( Table 6). Motif 1 was 29 residues long, motif 2 and 4 were 21 residues, motif 3 was 32 residues, and motif 5 was 25 residues in length. However, only motifs 2 and 3 had a relation with the peroxidase (PF00141) protein family, and in this case were present in most of the sequences. This could indicate the highly conserved structures of APX sequences at those sites with peroxidase activity.
Furthermore, alignment analysis also demonstrated that Asp (D) and Gly-Gly (GG) residues are strictly conserved in all aligned sequences, indicating their potential functions in enzyme activity and/or stability (Supplementary Figure S6). To infer a functional relationship between these conserved residues and APX sequences, we searched for the known binding residues of model organism Arabidopsis APXs in the UniProtKB database (http://www.uniprot.org/uniprot/). The following residues were reported as potential active and metal binding residues for Arabidopsis GPX1-6, APXT, and APXS: APX1 (Arg-38, His-42, His-163, Thr-164, Thr-180, Asn-182, Ile-185, Asp-187), APX2 (Arg-39, , APX3 , , APX6 (Arg-119, His-123, His-224), APXT . These active and metal binding residues did not correspond to any of the strictly conserved residues in analyzed APX sequences but they were found to be conserved at considerable rates. However, when taken into consideration that some of the strictly conserved residues in analyzed GPX sequences correspond to the catalytic sites of the enzymes, we can make an inference that these strictly conserved residues in APX sequences may also be associated with the peroxidase activity of the enzyme.

Phylogenetic Analysis of APXs
To analyze the evolutionary relationship between identified APX homologs, the phylogenetic tree was constructed by MEGA 6 using the Maximum Likelihood (ML) method with 1000 bootstraps (Figure 5). The constructed tree was divided into five major groups based on the tree topology, and each group was indicated with a different color segment. Blue, red, and green segments included the chloroplast/mitochondria-related APXs with relatively longer, medium and short sequences, respectively, whereas cyan and yellow segments mainly contained longer and shorter cytosolic APX sequences, respectively. Annotation of each segment was based on the consensus of two subcellular localization servers, CELLO and WoLF PSORT, as well as tree topology for ambiguous sequences. Overall, it was observed that cytosolic-related APXs clustered together, while alternatively chloroplast/mitochondrial-related APXs were together. In addition, in clustering of sequences at sub-branches was primarily based on the sequence length and monocot/dicot separation. However, there were considerable variations between sequences, even those belonging to the same subcellular localization. It is thought that these sequence variations could be attributed to the various functional diversities of APXs and/or be associated with different subcellular localizations. Moreover, some sequences were also available with different subcellular localizations in the same clade, indicating the possibility of gene duplication events in formation of some APX genes. The gene duplication events were searched based on the previously designated protocol (Gu et al., 2002). In doing so, several segmental and tandem duplications were identified between some APX pairs ( Table 7). The identified segmental or tandem duplications in APX genes were observed between either chloroplastic and chloroplastic, or cytosolic and cytosolic forms. This could indicate the possibility of gene duplication events in the formation of close APX homologs. FIGURE 5 | Phylogenetic tree of ascorbate peroxidase (APX) homologs from 18 plant species. Tree was constructed by MEGA 6 using Maximum likelihood (ML) method with 1000 bootstraps. Segment classification based on the consensus of two subcellular localization servers, CELLO and WoLF PSORT as well as tree topology for ambiguous sequences. Blue, red, and green segments include chloroplast/mitochondrial related APXs with mainly longer, medium and short sequences, respectively, while cyan and yellow segments contain longer and shorter cytosolic APX sequences, respectively.
At the anatomical level ( Figure 6A), APX genes were expressed in almost all analyzed tissues of Arabidopsis with various folds. It was clear that the expression levels of genes were closely related with the expressed tissue type/s. For example, both cytosolic APX1 and chloroplastic/mitochondrial SAPX had significantly higher expression in actively growing zones, as well as many root and root protoplast-related structures. APX3, APX4, APX6, and TAPX were expressed in various shoot, bud, leaf, flower and seed related tissues at considerable rates. All these indicated that stress factors, actively growing tissues as well as normal physiological and metabolic changes could induce the expression of APX genes in tissue-dependent way. All these Glycine max (L.) Merr. Glyma.06G114400-Glyma.04G248300 Glyma.11G078400-Glyma.12G032300

Bradi1g16510-Bradi1g65820
Gossypium raimondii Ulbr. Gorai.009G104500-Gorai.009G420500 Zea mays L. GRMZM2G006791-GRMZM2G120517 GRMZM2G054300-GRMZM2G137839 metabolic activities or their related consequences could exert the stresses to the cells. Many studies have further demonstrated that abiotic/abiotic stress factors such as heavy metal, drought, water, heat, cellular H 2 O 2 level, oxidative state of the cell could increase the expression of APX genes to either suppress or eliminate the stressors (Ishikawa and Shigeoka, 2008;Koussevitzky et al., 2008;Yang et al., 2008;Petrov and Van Breusegem, 2012). For example, overexpression of Solanum melongena APX6 in transgenic O. sativa seedlings demonstrated high flood tolerance, reduced oxidative injury and fast plant growth rates (Chiang et al., 2015a). APX regulation by nitric oxide (NO) as a redox indicator in oxidative stress or as part of hormone induced signaling pathway in lateral root development were demonstrated (Correa-Aragunde et al., 2015). S-nitrosylation of Arabidopsis APX1 at Cys32 increased the H 2 O 2 scavenging activity of enzyme, resulting in improved oxidative stress tolerance (Yang et al., 2015). Overexpression of APX and Cu/Zn SOD increased the drought resistance and recovery capacity from drought stress in Ipomoea batatas (Lu et al., 2015). Overexpressed Brassica campestris APX gene in transgenic Arabidopsis enhanced the heat tolerance via elimination of H 2 O 2 (Chiang et al., 2015b). Therefore, increased APX activity in cells is an indicator of the presence of stress factors. At the developmental level (Figure 6B), the expression profile of Arabidopsis APXs was analyzed at 10 developmental stages: senescence, mature siliques, flowers and siliques, developed flower, young rosette, germinated seed, seedling, bolting, young flower, and developed rosette. In all developmental stages, APXs were relatively expressed. However, the expression pattern in senescence was slightly different from other developmental stages, notably cytosolic APX6 showed the highest expression. Interestingly, the Arabidopsis GPX6 gene also demonstrated the highest expression fold at senescence stage, inferring the possibility of functional similarities of these two enzymes. Overall, the expression profile and fold of APXs in various tissues and stages show that cells are constantly put under stress even with normal physiological and metabolic changes, requiring plants to eliminate these stressors.
datasets of these species identified a total of 120 putative APXs (Tables 1, 5). Sequences of GPX homologs contained the GPX (PF00255) protein family domain while APX homologs included the peroxidase (PF00141) domain. GPX genes encoded a protein of 166-262 residues with 18.4-29.7 kDa molecular weight and 4.59-9.60 pI value, while APXs encoded a polypeptide of 197-478 residues with 23.7-52.1 kDa molecular weight and 5.03-9.23 pI value. GPX transcripts mainly contained six exons; whereas, APX usually had 8-12 exons, implicating the relatively less conserved structure of APXs compared to GPXs. Sequence variations in GPX and APX homologs primarily derived from the "transit peptide" residues between organelle and non-organelle related sequences. Besides, regions corresponding to transit peptide sites in APX/GPX sequences did not demonstrate any particular pattern, indicating the less conserved structure of transit peptides thereby the functional diversities of APXs/GPXs at various targets. In addition, multiple-alignment analyses demonstrated the presence of a considerable degree of conserved residues in main sites of both enzymes. In GPX phylogeny, cytosolic-, nuclear-, extra cellular,-and plasma membrane-related GPXs were relatedly clustered while chloroplast/mitochondrial-related GPXs grouped together. APX phylogeny also showed similar clustering pattern, in which cytosolic-related APXs were relatedly clustered while chloroplast/mitochondrial-related APXs were together. This indicates that presence/absence of "transit peptide" residues was the main determinant in phylogenetic distribution of APX/GPX sequences. Moreover, presence of sequences with different subcellular localizations in the same phylogenetic group inferred the possibility of gene duplication events in formation of some APX/GPX sequences. Several segmental duplications were identified in some GPX pairs, while several segmental and tandem duplications were available in some APX pairs. Expression profiles of GPX and APX genes in model organism Arabidopsis indicated that stress factors, actively growing tissues, even normal physiological, and metabolic changes could induce the expression of APX/GPX genes. Interactome analyses of Arabidopsis cytosolic APX1 and GPX2 also implicated that both enzymes are closely related with antioxidant and redox homeostasis, secondary metabolite metabolisms and stress adaptation thereby supporting the functional roles of APXs/GPXs in H 2 O 2 -scavenging and plant defense. Despite of some minor variations, APX and GPX members, they topologically demonstrated highly conserved structure.

CONCLUSIONS
The presence or absence of transit peptide residues are the main contributing factors in subcellular localization and phylogenetic distribution of APX/GPXs. The APX/GPX expression is highly associated with the metabolic state of the cells. In addition, there are grounds for belief that these two enzymes work together in various pathways such as antioxidant and secondary metabolite metabolisms, redox homeostasis, stress adaptation, and photosynthesis/respiration. This also supports the functional role of these enzymes in H 2 O 2 -scavenging, thereby implicating their importance in the plant defense. However, further molecular and physiological studies are required to elucidate the various functional roles of APX/GPX isoforms.

AUTHOR CONTRIBUTIONS
IK and EF contributed to the study conception and design. KK performed experiments and collected data. Data analysis and interpretation were performed by RV. IK, EF, KK, and RV prepared, and NA performed critical reading and revision of the manuscript. IO and EF supervised and MO coordinated this work. All the authors read and approved the final version.