In silico study for diversing the molecular pathway of pigment formation: an alternative to manual coloring in cotton fibers

Diversity of colors in flowers and fruits is largely due to anthocyanin pigments. The flavonoid/anthocyanin pathway has been most extensively studied. Dihydroflavonol 4-reductase (DFR) is a vital enzyme of the flavonoid pathway which displays major impact on the formation of anthocyanins, flavan 3-ols and flavonols. The substrate specificity of the DFR was found to play a crucial role in determination of type of anthocyanidins. Altering the flavonoid/anthocyanin pathway through genetic engineering to develop color of our own choice is an exciting subject of future research. In the present study, comparison among four DFR genes (Gossypium hirsutum, Iris × hollandica, Ang. DFRI and DFRII), sequence alignment for homology as well as protein modeling and docking is demonstrated. Estimation of catalytic sites, prediction of substrate preference and protein docking were the key features of this article. For specific substrate uptake, a proline rich region and positions 12 plus 26 along with other positions emphasizing the 26-amino acid residue region (132–157) was tested. Results showed that proline rich region position 12, 26, and 132–157 plays an important role in selective attachment of DFRs with respective substrates. Further, “Expasy ProtParam tool” results showed that Iris × hollandica DFR amino acids (Asn 9: Asp 23) are favorable for reducing DHQ and DHM thus accumulating delphinidin, while Gossypium hirsutum DFR has (Asn 13: Asp 21) hypothesized to consume DHK. Protein docking data showed that amino acid residues in above mentioned positions were just involved in attachment of DFR with substrate and had no role in specific substrate uptake. Advanced bioinformatics analysis has revealed that all above mentioned positions have role in substrate attachment. For substrate specificity, other residues region is involved. It will help in color manipulations in different plant species.


Introduction
More than 200,000 different types of compound have been shown to be produced collectively by higher plants, and some of these are able to generate bright colors in flowers, fruit or foliage . The human eye can detect light, as reflected or transmitted by a compound under wavelengths 380 and 730 nm, while insects recognize the light of shorter wavelengths (Davies, 2004). In plants, three major classes of pigments for coloration exist which includes carotenoids, flavonoids/anthocyanins, and betalains.
Diversity of red, purple and blue colors of flowers as well as fruits are due to the nature of these polyphenolic pigments. In plants, these secondary metabolites are found ubiquitously. Among factors influencing flower color, the flavonoid/ anthocyanin biosynthesis has been most extensively studied. The various factors which influence the final color formation includes anthocyanin structures, vacuolar pH, copigments and metal ions.
Dihydroflavonol 4-reductase (DFR) is a vital enzyme of the flavonoid pathway which shows a major impact on the formation of anthocyanins, flavan 3-ols, and flavonols. In ornamental flower plants, the color has been modified by altering the expression levels of DFR genes (Aida et al., 2000). The substrate specificity of the DFR plays a crucial role in determining which anthocyanidins, a plant will accumulate (Forkmann and Heller, 1999).
DFR is unique in a sense that it uptakes flavoniod substrates depending on the B-ring hydroxylation pattern. The DFRs of several plants accept dihydroflavonols having one (dihydrokaempferol, DHK), two (dihydroquercetin, DHQ), or three (dihydromyricetin, DHM) hydroxyl groups on the B-ring. NADPH is used as a cofactor of DFR, which helped in catalyzing the reduction of dihydroflavonols to leucoanthocyanidins which were common precursors of anthocyanin (Helariutta et al., 1993;Tanaka et al., 1995;Dellus et al., 1997).
Substrate specificity of DFR determines the conversion of metabolic flux toward desired anthocyanidin biosynthesis. For example in Petunia hybrid, the transformation of a DFR from maize evaded the gap in the pathway of anthocyanin formation and successfully generated orange color flower by the reduction of DHK and the accumulation of pelargonidin anthocyanins. Similarly, petunia or viola F3 ′ 5 ′ H gene in combination with the petunia DFR gene was transformed in cultivars of white carnation that specifically missing DFR gene, hence resulted in violet carnations by the generation of delphinidin based anthocyanins (Holton, 1996;Fukui et al., 2003).
The two most common fiber colors in colored cotton cultivars (Gossypium hirsutum) are "brown and green." They are ecofriendly, economical, and valuable for human health (Zhu et al., 2006) Four key genes of the flavonoid biosynthetic pathway (GhC4H, GhCHS, GhF3 ′ H, and GhF3 ′ 5 ′ H, GhDFR) have been reported in Gossypium hirsutum.
The idea of pigment engineering, for diversity of colors in the cotton fiber, can be opened up by investigating the mechanism of pigmentation. The key is to explore and understand the molecular basis of pigment formation and its deposition in fibers . The purpose of this research was to investigate the putative involvement of flavonoid biosynthesis pathway structural gene (DFR) in the pigmentation and coloration of cotton fiber.
To achieve this objective a comparison was made in retrieving sequences of two DFR genes from blue flower species and two from white including sequence alignment for homology and protein modeling. Further more the presence of proline rich region along with positions 12 and 26 was reported to be important in determining substrate specificity for DFR of A. angustifolia (Gosch et al., 2014). Other positions emphasizing the 26-amino acid region (132-157) were also mentioned in the literature which played a role in utilizing specific dihydroflovonols: DHK, DHQ, and DHM (Johnson et al., 2001;Xie et al., 2004). It was tested in Gossypium hirsutum as well as in Iris × hollandica by protein-ligand docking analysis. Analysis of gene cluster encoding dihydroflavonol 4reductases in the Lotus japonicus genome showed that three out of six DFR proteins exhibit catalytic activity, their substrate preferences settled with the variation of a specific active site residue (Aspartic acid or Asparagine) and found to be involved in controlling the substrate specificity (Shimada et al., 2005).
The structure of the DFR proteins (Gossypium hirsutum as well as in Iris × hollandica) have not yet determined experimentally (NMR or X-ray), and for that reason models were built by following homology modeling and threading protocol. Moreover, for the estimation of respective catalytic sites and prediction of substrate preference, protein docking was performed. The aim of this study was to identify the region that involves in the substrate specificity of DFR to give readers an idea of manipulating the pathways involved in color modification, so that Gossypium hirsutum fiber pigmentation may be altered, similarly as reported in the case of some ornamental flower plants.

Materials and Methods
In silico analysis was carried out in order to evaluate substrate specificity of different types of DFRs (dihydromyricetin reductase, dihydroquercetin reductase and dihydrokaempferol 4-reductase). For this assessment, dihydroflavonol 4-reductase (DFR) sequence information available in NCBI database was used. Protein sequence alignment, designing of protein structures, ligand structure retrieval and molecular docking was also carried out.
To evaluate substrate specificity, Asn as well as Asp percentage estimation in DFR sequences for Iris × hollandica and Gossypium hirsutum was done by using "Expasy ProtParam tool."

Refinement and Evaluation of DFR Protein Model
The model was further refined by using online tool, ModRefiner accessed on Zhang Lab website (http://zhanglab.ccmb.med. umich.edu/ModRefiner/). This tool minimized the energy of the model and took the residues of protein in the allowed region. The models were evaluated and validated by producing Ramachandran plot. These plots were plotted by using the online RAMPAGE tool (http://mordred.bioc.cam.ac.uk/~rapper/ rampage.php). Ramachandran plots of proteins determined their stability.

Ligand Preparation
The structures of biological compounds of flavonoid pathway dihydrokaempferol (PubChem: 122850), dihydroquercetin (PubChem: 439533), and dihydromyricetin (PubChem: 161557) were downloaded from the PubChem database in 2D format. For preparation of ligand structures which were used in docking, hydrogen atoms were added to each ligand and their energy was minimized by the means of MMFF94X force field at 0.05 gradients. Then, these ligand structures were saved in.mol2 file format. A database of ligands (DFR acceptor compounds) was created in MOE software.

Protein-ligand Docking of DFR Protein and Ligand Molecules
Three-dimensional model of Gossypium hirsutum and Iris × hollandica were constructed by using I-TASSER server. The water molecules were removed with the help of MOE software. After the removal of water molecules, hydrogen atoms were added to the receptor proteins. Optimization of receptor molecule was achieved by energy minimization and 3D protonation  (with help of AMBER99 force field option of MOE). The gradient was 0.05 and receptor was minimized unless root mean square gradient fall below 0.05. After 3D protonation of the receptor protein, the hydrogen molecules were hidden. This energy minimized, 3D protonated receptor molecules were then used for docking analysis. Box of 26 residues as reported by Shimada et al. (2005) was aligned with Iris × hollandica and Gossypium hirsutum. These aligned residues were used as pocket site. Molecular docking was carried out against the databases mentioned previously, after the modeling and preparation of ligand and receptor molecules. The receptor residues in region FIGURE 4 | Three dimensional protein model of and Iris × hollandica protein, predicted by I-TASSER.
132-157 of Gerbera DFR correspond to 148-174 residues in G. hirsutum as well as 127-153 in Iris × hollandica were selected and docked with ligands. Docking output database file having receptor ligand complex was saved in.mdb format. The docked complexes were categorized with increasing S-value (Final score to indicate binding free energy). The complexes with minimum S were taken to evaluate the interactions of ligand with the active site residues of the receptor proteins. Best hydrogen bonding plus π-π interactions were evaluated by the using ligX option of MOE.

Comparison of DFR Reported Residues Involved in Substrate Specificity
For comparison, at position 12 and 26, DFR sequences were aligned by using the CLC Genomics Workbench 8 as mentioned earlier. From sequence alignment results it was evaluated that at position 12, Ang. DFRI, Ang. DFRII, Gossypium hirsutum DFR, and Iris × hollandica DFR have proline, serine, proline and glycine respectively. This result showed same residue (proline) in both Gossypium hirsutum and Angelonia DFR I whereas, Angelonia DFRII and Iris × hollandica DFR had serine and glycine which were functionally similar residues (Figure 1). In present study, the sequence alignment showed proline at 12 position in Gossypium hirsutum from which it is hypothesized that it would reduce dihydrokaempferol. Whereas, this particular region is absent in both Iris × hollandica and Ang.II DFRs, resulting in delphinidin accumulation which is responsible for production of blue color (Figure 2). For further confirmation DFRs from five more species along with these four plant species were carried out to evaluate results. However, with respect to Ang. DFRI, sequence alignment of other five species (Rosa chinensis, Vaccinium macrocarpon, Gerbera hybrid, Petunia × hybrida, and Ampelopsis grossedentata) showed the deletion of proline rich region and thus unable to reduce DHK.
On 26th position regarding Ang.DFRI has arginine and all other species DFR have glycine (position 25 in Ang.DFRII,13 Rosa chinensis, 16 Vaccinium macrocarpon, 13 Gerbera hybrid, 15 Petunia×hybrida, 12 Ampelopsis grossedentata, and 30 in Gossypium hirsutum DFR (Figure 2). In DFRs, where proline is absent at position 12, it would show glycine on that particular or nearby position. Thus, it is speculated that when a proline rich region is absent, the DFRs would reduce DHQ or DHM as substrate. Results showed that proline rich region in general and position 12 in particular plays main role in selective attachment of these reductases with their relative specific substrates (DHK).

Role of Asn and Asp Type DFRs in Substrate Specificity
Recent analysis on DFRs (Iris × hollandica and Gossypium hirsutum) was done by "Expasy ProtParam tool" in order to get information about the presence of total Asn and Asp residues in them ( Table 1A). Results showed that Iris × hollandica DFR has (Asn 9: Asp 23) while Gossypium hirsutum DFR has (Asn 13:Asp 21) ( Table 1B). This is postulated that Iris × hollandica DFR (Asp-type) has more likeness to reduce DHM in comparison to DHK hence it has a key role in accumulating delphidin by utilizing DHM as substrate.

Modeling, Refinement, Evaluation and Validation of DFR Protein
For the construction of 3D Model of DFR protein, retrieved sequences of both Iris × hollandica DFR and Gossypium hirsutum DFR were submitted to online I-TASSER server (Figures 3, 4).
Further refinement was achieved by using the ModRefiner tool. The constructed models were subjected to RAMPAGE to create Ramachandran Plot for model evaluation. Figures 5,  6 showed a Ramachandran plot of the Dihydroflavonol 4-Reductase protein's model. Plot displayed the presence of 343 (97.2%) residues in favored region, 8 (2.3%) residues in allowed region while 2 (0.6%) residues in outlier region in case of Gossypium hirsutum DFR (Figure 6) and 338 (94.2%) residues in favored region 17 (4.7%) residues in allowed region plus 4 (1.1%) residues in outlier region in Iris DFR (Figure 5). These computed results validated the models because for a fine model more than 90% residues should be in both favored and allowed region.

Protein-ligand Docking Results
Docking results in case of Iris × hollandica showed that position 130 has Asp as well as Gln in all: DHK, DHQ, and DHM. Additional Glutamine at position 135 also showed attachment with DHK. However, Lys at positions 132 (of Iris × hollandica DFR) has been engaged in DHM and DHQ binding. Ala in position 126 (DHM) and His 218 (DHQ) had an impact on substrate binding (Figure 7).
As far as Gossypium hirsutum DFR is concerned, Ala at position 153 is present in all types of dihydroflavonols. Asp at position 151 is involved in Gossypium hirsutum DFR binding with DHK (Figure 8). Serine is positioned at 239 which showed interaction in DHM while lle at 240 is involved in DHQ.
From all data gathered from these proteins docking it was concluded that all above mentioned residues at defined FIGURE 7 | Two and three dimensional interaction diagrams of DFR Iris × hollandica with dihydroflavonols. Interaction diagrams were attained by using ligand interaction analysis feature of MOE. particular positions were just involved in attachment with dihydroflavonols (DHK, DHQ, and DHM) and they have no role in the specificity of substrates. Present work showed that 26 residue region was highly variable in DFRs from different plant species.

Discussion
The substrate specificity of DFR has been widely studied in a range of plants (Hua et al., 2013) Dihydroflavonol 4-reductases from seven different species (Ang.DFRII, Rosa chinensis, Vaccinium macrocarpon, Gerbera hybrid, Petunia×hybrida, Ampelopsis grossedentata, and Iris ×hollandica) displayed deletion of the proline rich region and had glycine mostly in close proximity or at position 12 in comparison of position 26 of Ang. DFRI. Therefore, most of these DFRs could be able to reduce DHQ or DHM but not DHK. Results matched with pervious studies done by Johnson et al. (2001) who reported inability of petunia DFR to reduce DHK but with the transformation of maize DFR in petunia generated orange colored flowers showing DHK reduction (Meyer et al., 1987). Although between Ang. DFRI and DFRII sequences, 99% identity exist, yet position 12 and 26 was reported to be crucial in substrate specificity. Findings were also similar to Gosch and his co-workers who documented proline rich region near N-terminus and same residue at position 12 while glycine at position 26 (in all DFRs except Ang.DFRI) andfurther emphasized on alteration of AngI/II DFR functional activity by interfering with this region (Gosch et al., 2014). However, the region was a proline rich motif that acts as a NADPH binding site (Petit et al., 2007). Presence of proline at 12 position in Gossypium hirsutum DFR showed ability to reduce dihydrokaempferol. Recent results indicate that the reduction of dihydrofavonols by DFR can be a significant enzymatic step for flower color determination, by DHK hydroxylation medicated by F3 ′ H or F3 ′ 5 ′ H. Those DFRs which have the ability to consume DHQ along with DHM as substrates could produce red or blue flowers. Absence of proline rich region in DFRs would reduce DHQ and DHM as substrate. Results were in harmony with Gosch et al. (2014) who reported blue color generation by the replacement of arginine residue by glycine in Ang.DFRI and mutation of proline at 12 position. He further explained that variability among amino acid sequences in the particular region or variation in substrate specificity by a single mutation in a particular area in DFRs from different plant species or even in different cultivars of the same species could have different substrate preferences. In another report, aspartic acid at position 134 in petunia showed its ability to utilize DHM thus had blue colored flowers (Johnson et al., 2001). These contradictions in results may be due to species and amino acid residue differences. All mentioned Docked protein showed that respective residues were involved in substrate attachment and not in specificity of particular substrate. Moreover, the complete loss of Ang.DFRII activity with no production of blue color flowers was observed by site mutagenesis created by the replacement of glycine with arginine at position 12 (Gosch et al., 2014). Recent work showed increased aspartic acid residues in Iris × hollandica DFR which favors DHM as substrate in comparison to Gossypium hirsutum DFR having asparagines residues hence unable to reduce DHM and able to reduce DHK. Current findings were in support with Shimada et al. (2005) who reported that out of six DFR proteins, the DFR2 and DFR3 (Asn-type DFRs) showed increased uptaking of DHK than DHQ as substrate whereas DFR5 (Asp-type) reduces very less DHK in the Lotus japonicus. Previous research enlightened that amino acid residue differences caused different substrate preferences therefore, transformation of Iris DFR in rose reduced DHM as substrate and generated blue hued flowers whereas Gossypium hirsutum DFR reduced DHK had brown shade color lint (Katsumoto et al., 2007). Similar findings had been documented in M. Truncatula where DFR (Asp-type) reduces DHK minutely (Xie et al., 2004). It is anticipated that by the transformation of Iris × hollandica DFR in Gossypium hirsutum blue pigmentation could be achieve in fiber. Moreover, from all these in silico analysis it is hypothesized that by the transformation of specific DFR which reduces its relative substrate different colored phenolic pigments can be deposited in cotton fiber and hence colored cotton of desired colors will be generated in future. This approach presents first report to generate ecofriendly colored cotton.

Author Contributions
Ammara A participated in the experimental design; material collection including manuscript right up, Aftab A participated in the bioinformatics analysis, SD helped in writing the manuscript. All other authors read and approved the final manuscript.