Genome-Wide Identification, Localization, and Expression Analysis of Proanthocyanidin-Associated Genes in Brassica

Proanthocyanidins (PA) is a type of prominent flavonoid compound deposited in seed coats which controls the pigmentation in all Brassica species. Annotation of Brassica juncea genome survey sequences showed 72 PA genes; however, a functional description of these genes, especially how their interactions regulate seed pigmentation, remains elusive. In the present study, we designed 19 primer pairs to screen a bacterial artificial chromosome (BAC) library of B. juncea. A total of 284 BAC clones were identified and sequenced. Alignment of the sequences confirmed that 55 genes were cloned, with every Arabidopsis PA gene having 2–7 homologs in B. juncea. BLAST analysis using the recently released B. rapa or B. napus genome database identified 31 and 58 homologous genes, respectively. Mapping and phylogenetic analysis indicated that 30 B. juncea PA genes are located in the A-genome chromosomes except A04, whereas the remaining 25 genes are mapped to the B-genome chromosomes except B05 and B07. RNA-seq data and Fragments Per Kilobase of a transcript per Million mapped reads (FPKM) analysis showed that most of the PA genes were expressed in the seed coat of B. juncea and B. napus, and that BjuTT3, BjuTT18, BjuANR, BjuTT4-2, BjuTT4-3, BjuTT19-1, and BjuTT19-3 are transcriptionally regulated, and not expressed or downregulated in yellow-seeded testa. Importantly, our study facilitates in better understanding of the molecular mechanism underlying Brassica PA profiles and accumulation, as well as in further characterization of PA genes.


INTRODUCTION
In oilseed brassicas, a yellow-seeded form is preferred over a black-or brown-seeded counterpart mainly because of a thinner seed coat and higher oil content (Friedt and Snowdon, 2009;Velasco and Ferna'ndez-Martı'nez, 2009). Importantly, proanthocyanidins (PAs) play a critical role in this differential pigmentation process (Auger et al., 2010;Fang et al., 2012;Lu et al., 2012).
Proanthocyanidins (PAs) are end-products of a well-studied branch of the flavonoid biosynthetic pathway in higher plants (Winkel-Shirley, 2001;Lepiniec et al., 2006;Saito et al., 2013). In Arabidopsis, a close relative of the Brassica species, 19 single-copy genes have been associated with PA (Appelhagen et al., 2014(Appelhagen et al., , 2015Ichino et al., 2014). These genes can be divided into three classes based on their functions: structural, transcriptionally regulatory, or genes responsible for PA modification, transport, and oxidation. PA genes have also been cloned from a dozen other plant species (Hichri et al., 2011;Falcone Ferreyra et al., 2012) such as maize, and soybean (Yang et al., 2010;Senda et al., 2012). In contrast to single-copy genes in Arabidopsis, several plant species have multiple homologs for a given PA gene. For example, there are nine CHS homologs in soybean (Yi et al., 2010).
In Brassica species homologous cloning is used to isolate PA genes by such as DFR/TT3 (Yan et al., 2008;Akhov et al., 2009), ANS/TT18 (Yan et al., 2011), ANR/BAN (Nesi et al., 2009), TT10 , TT2 (Wei et al., 2007), TT8 (Padmaja et al., 2014), TT12 (Chai et al., 2009), TT16 (Deng et al., 2012;Chen et al., 2013), TTG1 (Zhang et al., 2009;Yan et al., 2014) and TTG2 (Li et al., 2015). However, homologous cloning has drawbacks. It needs prior knowledge of sequences of homologous gene, and is slow and difficult to amplify all members of a gene family, particularly in polyploid species, e.g., Brassica juncea, an allotetraploid species. To address these limitations, nextgeneration sequencing has been widely adopted. Up to date the genomes of over 100 plant species, including B. rapa (Wang et al., 2011), B. olearcea , and B. napus (Chalhoub et al., 2014) have been sequenced. Very recently, the genome sequence of B. nigra has also been released (http://www.ncbi.nlm.nih.gov/ genome/10988). Whole-genome sequence annotation facilitates in genome-wide identification of PA genes (Velasco et al., 2007;Guo et al., 2014). However, the PA genes of Brassica species have not been analyzed in great detail. Furthermore, the complete genome sequencing of Brassica juncea has not been achieved to date. Yang et al. (2014) has conducted a survey of genome sequences in B. juncea. Genome survey sequencing (GSS) can provide information about gene content, functional elements and molecular markers (Jiao et al., 2012;Hirakawa et al., 2015), as well as compare genes of related species for the phylogenetic reconstruction of other non-model species.
Reverse transcription-polymerase chain reaction (RT-PCR), real-time fluorescent quantitative PCR, and transcriptome sequencing (RNA-seq) can analyze the spatial and temporal expression pattern, functions and interactions among various genes . RNA-seq is widely used to estimate transcript amounts and to obtain a quantitative account of transcript amounts in organisms, organs, tissues, or specific cell types, frequently comparing transcript amounts among different samples (Martin et al., 2013;Weber, 2015).
In the present study, GSS was conducted on the inbred line of B. juncea var. Purple-leaf Mustard (PM), and a total of 69,193 coding genes, including 72 PA genes, were predicted by annotation of GSS. Approximately 19 primer pairs specific for PA genes were then designed to screen a bacterial artificial chromosome (BAC) library of B. juncea, which was constructed from the same inbred line. In total, 284 BAC clones were identified and 55 B. juncea PA genes were confirmed by sequencing of fragments amplified from representative BAC clones. Its genomic or chromosomal positions were predicted by mapping to the sequenced B. rapa, B. nigra, or B. napus genomes, which was used as reference genomes to perform phylogenetic analysis on the full-length gene sequences and the end sequences of gene-carrying BACs. The expression level of PA genes were estimated in the seed coat and compared between the yellowand brown-seed coat by fragments per kilobase of exon model per million mapped reads (FPKM) analysis of RNA-seq data in B. juncea and B. napus. Identification, mapping, and expression analysis of the PA genes in the present study may facilitate in better understanding the genetic mechanism underlying proanthocyanidin biosynthesis, profile, and accumulation in various Brassica species.

Plant Accessions
The inbred line of B. juncea var. PM was used for GSS and construction of the BAC library. RNA was extracted from the seed coat of the inbred line of B. juncea var. Sichuan Yellow (SY, yellow-seeded) and its brown-seeded near-isogenic lines (NILA and NILB), the black-seeded B. napus cv. Xiangyou 15 and two of its F 7 recombinant inbred linesRIL52 and RIL55 15 days after pollination (DAP, torpedo to late torpedo stage) (Liu et al., 2009;Nesi et al., 2009). The plants were grown in a greenhouse under a photoperiod of 16 h/8 h (day/night cycle) at 22 • C.

Genome Sequencing, Sequence Assembly, Gene Prediction, and Annotation
Paired-end (PE) libraries were prepared using total DNA from PM, which were then constructed according to the instructions provided by Illumina (San Diego, CA, USA) with a 500-bp insert size and 125-bp read length. Sequence analyses were conducted using the Illumina HiSeq 2000 platform.
The obtained reads were subjected to quality control as follows: bases with quality scores <10 were filtered out by FastQC-0.11.3 (Schmieder and Edwards, 2011). Adaptor sequences in the reads were trimmed using fastx clipper of the FASTX-Toolkit 0.0.13 (http://hannonlab.cshl.edu/fastx_ toolkit). After trimming, reads including N nucleotide lengths of <100 bases were excluded, and the remaining high-quality data was used for de novo sequence assembly by SOAP (Schmieder and Edwards, 2011). Protein-encoding sequences in the assembled genomic sequences of PM were predicted by Augustus 2.7 (Stanke and Waack, 2003) using the A. thaliana training set under the default parameters. Reciprocal best-hit analysis (Moreno-Hagelsieb and Latimer, 2008) was performed to compare the results of the prediction by using B. rapa training sets.

Construction, Pooling, and Screening of the BAC Library
The B. juncea BAC library named ZBjuH was constructed from the inbred line of the PM that were treated with the restriction endonuclease HindIII (Luo and Wing, 2003). This library consists of 71,808 clones with an average insert size of 126 kb genomic DNA, and an estimated 10.8-fold coverage of the B. juncea genome. The clones were arranged in 187 384-well plates. The clones were organized into three-dimensional BAC pools of plates, rows, and columns. The superplate consisted of 19 DNA samples, each representing 10 BAC plates, except for superplate 19, which only consists of 7 384-well plates. The first dimension consisted of the BAC clone plate of 187 DNA samples. The second and third dimensions consisted of 8 and 12 DNA samples, respectively, for the pooled 16 rows and 24 columns of the BAC clones. Screening of single BAC clones was performed in a fivestep PCR process ( Figure S1). The PCR primers were designed according to the conserved sequences of the PA genes that were annotated from the B. juncea GSS (Table S1). PCR reactions were performed in a total volume of 10 µL with a reaction mixture as follows: 10 × PCR buffer (1.0 µL), dNTP mix (10 mM each, 0.15 µL), 1 U Taq DNA polymerase (Takara, Japan), 1 µL template, 10 mM forward primer (0.5 µL), 10 mM reverse primer (0.5 µL) and ddH 2 O up to 10 µL. A "touchdown" PCR amplification program is used as follows: 94 • C for 5 min; 6 cycles of 30 s at 94 • C, 40 s at 62 • C with a 1 • C decrease in the annealing temperature per cycle, and 1 min at 72 • C; 30 cycles of 30 s at 94 • C, 45 s at 56 • C, and 1 min at 72 • C; and a final extension at 72 • C for 10 min. The PCR products were observed by electrophoresis on 1.5% agarose gels using ethidium bromide and UV visualization. The BAC clones from which the fragment of expected size was amplified were considered positive BAC clones.

Grouping and Sequencing for Full-Length Gene of Positive BAC Clones
Gene fragments amplified from the positive BAC clones were sequenced and aligned with annotated PA genes using DNAMAN4.0 (LynnonBiosoft, USA) to confirm whether the cloned and the annotated gene were the same copy. When a cloned gene harbored a single nucleotide difference (SNP) and/or insertion or deletion (Indels) in its sequence from the corresponding annotated gene, the cloned and the annotated genes are considered different. For each PA gene, one or two BAC clones were selected for sequencing of the full-length genes by the high-quality, longer read Sanger method (Life Technologies, Shanghai). PA Genes in B. napus, B. nigra, and B. rapa The sequences of cloned B. juncea PA genes were mapped to the released B. napus (http://www.genoscope.cns.fr/blat-server/ cgi-bin/colza/webBlat), B. nigra (http://www.ncbi.nlm.nih.gov/ genome/10988), or B. rapa (http://brassicadb.org/brad/blastPage. php) reference genome to search for homologous B. napus, B. nigra or B. rapa PA genes with an identity ≥90%. Phylogenetic analysis of homologous PA genes in B. juncea, B. rapa, B. napus, and Arabidopsis was performed by using neighbor-joining (NJ) method as provided in MEGA 5.2 (Tamura et al., 2011), and the reliability of the phylogenetic trees was evaluated by the bootstrap method, with 1000 replications. The B. juncea PA genes on the same branch (clade) of the phylogenetic tree were classified into a homologous group.

Sequencing and Mapping of BAC Ends
The BACs used for full-length sequencing of the gene were also sequenced for end-sequencing on an ABI 3730X DNA analyzer (Life Technologies, Shanghai). The sequencing primers were modified pIndigoBAC536 cloning vector-derived sequencing primers M13R (5 ′ -CAGGAAACAGCTAT-GACC-3 ′ ) and S2 (5 ′ -CGAATTCGAGCTCGGTACCC-3 ′ ). The sequence obtained by using the primer M13R was designated as left end (L) of the BAC clone, whereas the sequence by S2 was considered the right end (R). BAC end-sequences (BESs) were also mapped to the recently sequenced B. napus (http://www.genoscope.cns. fr/blat-server/cgi-bin/colza), B. nigra (http://www.ncbi.nlm.nih. gov/genome/10988) or B. rapa (http://brassicadb.org) reference genome to assign a genomic location when at least 100 bp aligned to the reference genome, with at least 75% identity. If hits were obtained at multiple locations in any one of the reference genomes, then a BES was assigned to the position of the hit with the highest identity. The position of a BES was indicated by the first and the last assigned nucleotide (nt) on each reference genome.

Expression Analysis of PA Genes in Seed Coat
Isolation, reverse transcription and RNA-seq analysis of RNA from fresh seed coats were performed as described by Liu et al. (2013). The expression level of every PA gene in the seed coat was calculated using the FPKM method (Mortazavi et al., 2008). To compare transcript abundance of cloned PA genes in seed coat between the yellow-seeded inbred SY and its brown-seeded near-isogenic lines (NILA and NILB), the respective mapped reads from the SY/NILA and the SY/NILB pairs for each gene were counted using TopHat v2.0.9 (Kim et al., 2013). Fold changes for each gene between NILs and SY were computed as the ratio of the FPKM values. When the FPKM value of NILs or SY was 0, the substitute 0.001 was used for estimation of fold change. To display changes of PA gene expression in seed coat, the heatmap was constructed by using Heml software ("Normalization:" Logarithmic Base 2, "DEMO:" Canvas) (Deng et al., 2014).
The primers used in RT-PCR expression analysis are listed in Table S2. The following cycling parameters were used for amplification of the PA genes: 1 cycle of 4 min at 94 • C; 38 cycles of 50 s at 94 • C, 50 s at 58 • C, 1 min at 72 • C; one cycle of 6 min at 72 • C. The PCR products were verified by gel electrophoresis as earlier described.

Identification and Cloning of PA Genes in B. juncea
A total of 56.2 Gb high-quality sequencing data were assembled into 835 Mb of genomic sequence, with contig and scaffold N50 sizes of 2584 bp and 16,777 bp in B. juncea (Table S3). A total of 233,309 coding genes were predicted by Augustus 2.7 (Table S3) and annotated by alignment of the deduced amino acid sequence to B. rapa genes (http://brassicadb.org/brad/). Approximately 69,193 records were screened out, with sequence identity greater than 70% and alignment length greater than 100 amino acids, which correspond to 32,798 B. rapa genes ( Table S4). For a B. rapa gene, an averaged 2.1 homologs, at most 11 homologs, were detected in the B. juncea genome. Among the 69,193 predicted B. juncea genes, 72 were identified as PA genes ( Table S5). The number of B. juncea genes that were homologous to a given Arabidopsis PA gene varied from two (DFR, TT1, TT2, TT8, TTG1, and TT12) to six (TT4, TT6, and ANR) ( Table S5). Furthermore, two annotated B. juncea genes of TT6 and TT7 were located within the same scaffold (Table S5).
A total of 284 positive BAC clones were identified using 19 PA gene-specific primer pairs from ZBjuH BAC library ( Table 1). The amplified fragments were sequenced, and 284 clean sequences with sizes between 192 and 1487 bp were obtained. Alignment showed that these fragments represented 55 B. juncea PA genes, corresponding to 16 Arabidopsis PA genes, with each Arabidopsis PA gene having 2-7 B. juncea homologs ( Table 1). All cloned B. juncea PA genes, except for BjuTT4-2, BjuTT4-7, and BjuTT16-6, showed genomic sequences that were similar to the corresponding predicted PA genes. These amplified sequences were not evenly distributed among genes. For 6 genes, only one sequence was each identified, whereas at least 10 sequences were detected for 7 other genes. The remaining 42 genes were each carried by 2-9 BAC clones (Table 1), which is consistent with coverage of the genome by the BAC library used. No BAC clones were identified for six the annotated genes (TT4_g135394, TT5_g158015, ANR_g228640, ANR_g226654, TT19_g144296, and TT19_g167454) ( Figure S3).
One or two BAC clones were chosen for each of the above mentioned PA gene groups of BAC clones and sequenced by walking to obtain full-length gene sequence. Alignment of the resultant full-length gene with its respective GSS sequence indicated that two predicted genes was in fact from the same gene because each of them was only a portion of the same gene (Table S6). Finally, 55 PA genes were confirmed in B. juncea by BAC sequencing ( Table 2).

Genomic Locations of PA Genes in Brassica Species
BLAST of these cloned 55 B. juncea PA genes against the B. rapa or B. napus reference genome identified 31 and 58 homologous genes in B. rapa and B. napus, respectively ( Table 2). The neighbor-joining tree of the PA genes from B. juncea, B. rapa, B. napus, and Arabidopsis showed that TT4 genes were clustered into five homologous groups, TT5, TT6, and TT16 each into three groups; TT10, TT18, TTG2, and TT19 each into two groups; and the remaining TT3, TT7, ANR, TT1, TT2, TT8, TTG1, and TT12 genes were clustered into only one homologous group, indicating that these genes were highly conserved in terms of genomic sequence ( Figure S2).
Mapping of these cloned 55 B. juncea PA genes to the B. rapa, B. nigra, or B. napus reference genome indicated that 30 and 29 PA genes were homologous to the genes located in A-genome chromosomes except A04 of B. rapa and B. napus, respectively, whereas 23 of the other 25 genes were located in the B-genome chromosomes except B05 and B07 of B. nigra, the remaining two gene (BjuTT5-4 and BjuTT2-2) were anchored on scaffold_30.1 and scaffold_500.1 of B. nigra, respectively, which have not yet been mapped onto a chromosome (Table 3, Figure 1). These PA genes have >95 identity (Table 3). Moreover, 23 of these A-genome PA genes were, respectively, located on the same chromosomes in B. rapa and B. napus, but additional genes may be located in either the same or different A-genome chromosomes or C-genome chromosomes because their positions have not been mapped to the B. napus reference genome ( Table 3). The B-genome and the C-genome contributed 25 and 29 PA genes to B. juncea and B. napus genome, respectively, which is approximately equal to the number of PA genes from the A-genome.
To confirm the above genomic locations, the BAC clones used for sequencing full-length genes were also sequenced for BESs. The resulting BESs between 587 and 1233 bp in length were also mapped in a similar way. Mapping of the BESs to the B. rapa reference genome showed that both BESs of 23 A-genome B. juncea PA genes were mapped around the genomic position as mapped by the full-length sequence of the corresponding genes. However, one BES of the BACs carrying two A-genome genes, i.e., BjuTT2-1 and BjuTTG1-1 was mapped to an unfixed scaffold, whereas one BES of the BACs carrying the remaining five Agenome genes, i.e., BjuTT5-2, BjuTT6-1, BjuANR-2, BjuTT10-1, and BjuTTG2-1 was mapped to an unexpected genomic position ( Table 4). Mapping of the BESs to the B. napus reference genome generated a more complicated picture. For only 15 A-genome B. juncea PA genes, both BESs were mapped around the genomic position as mapped by the full-length sequence of the corresponding genes. One or both BESs of the BACs carrying 7 Agenome genes, i.e., BjuTT5-1, BjuTT6-2, BjuTT7-1, BjuTT16-2, BjuTT1-1, BjuTT2-1, and BjuTTG1-1 were mapped to an unfixed scaffold, whereas one or both BESs of the BACs carrying the remaining 8 A-genome genes were mapped to an unexpected Agenome chromosome, or a C-genome chromosome in B. napus reference genome ( Table 4). Mapping of the BESs to the B. nigra reference genome showed that both BESs of 19 B-genome B. juncea PA genes were mapped around the genomic position as mapped by the full-length sequence of the corresponding genes, one BES of the BACs carrying three B-genome genes, i.e., BjuTT4-6, BjuTT18-4, and BjuTT7-2 was mapped to an unexpected genomic position in the B. nigra reference genome, and then one BES of the BACs carrying the remaining three B-genome genes, i.e., BjuTT5-4, BjuTT1-2, and BjuTT2-2 was mapped to an unfixed scaffold ( Table 4).

Expression of PA Genes in Seed Coat of B. juncea and B. napus
Fragments Per Kilobase of a transcript per Million (FPKM) analysis indicated that 55 annotated B. napus PA genes (excluding BnaCnng01290D and BnaA09g29340D), and all cloned B. juncea PA genes except BjuTT5-1 and BjuTT5-4 were expressed in seed coat (Figure 2, Table S7). However, transcript abundance significantly varied among PA genes, as well as accessions. In general, the expression level of structural and transporter Frontiers in Plant Science | www.frontiersin.org Frontiers in Plant Science | www.frontiersin.org  The BAC clone in bold was used to sequence full-length sequence of the gene.

DISCUSSION
In the present study, we identified 55, 58, and 31 PA genes in B. juncea, B. napus, and B. rapa through a combination of experimental and bioinformatics approaches, analyzed their phylogenetic relationship and genomic locations in Brassica, and detected and compared their expression in seed coats of different accessions by RNA-seq. Cloning of these genes not only lays a foundation for the elucidation of the molecular mechanism underlying PA accumulation/profile and seed pigmentation  in Brassica species, but also facilitates in the functional characterization of each PA gene. The PA genes in Arabidopsis (16) were almost doubled in B. rapa (31) and nearly quadrupled in B. juncea (55) and B. napus (58). The ancestral A, B, and C genomes of the Brassica species contributed a comparable number of PA genes. These findings are consistent with mesopolyploid nature of B. rapa and the allopolyploid nature of B. juncea and B. napus, implying that polyploidization plays an important role in expansion of PA genes. However, the number of PA genes in allopolyploid B. juncea and B. napus does not amount to the sum of PA genes from both ancestral species due to gene loss by genomic fractionation during allopolyploidization. Bra036307 and Bra009770 might have been lost in B. juncea and B. napus, respectively.
Phylogenetic analysis and genomic localization of B. juncea PA genes indicated that 30 and 29 B. juncea PA genes were homologous to genes located in the A-genome chromosomes of B. rapa and B. napus, respectively ( Figure S2, Table 3). However, both BESs of 23 and 15 A-genome B. juncea PA genes were mapped around the B. rapa and B. napus genomic position, as mapped by the full-length sequence of the corresponding genes, respectively ( Table 4). The other BESs were mapped to other chromosomes or not detected in the B. rapa and B. napus reference genome. These findings indicate that although B. rapa, B. juncea, and B. napus have the common A-genome, the chromosomes of each of these species do not harbor the same structure (Zou et al., 2016). On the other hand, assembly of the present reference genomes of Brassica species need improving.
For 6 of the annotated PA genes in B. juncea GSS, no BAC clones were identified. Sequence analysis revealed that the annotated genes ANR_g228640, ANR_g226654, and TT19_g167454 were false genes or artifacts that arose by misassembled sequences because these annotated genes only contain a part of the protein domains of the corresponding genes and its alignment ratios were significantly lower than other predicted genes (Table S5). No BACs carrying the annotated gene TT4_g135394, TT5_g158015, or TT19_g144296 were detected, most probably because the sequenced fragments amplified from positive BACs were too short to distinguish different members of a gene family (Table 1), or maybe because the primers used in screening the BAC library were not appropriate. In contrast, the cloned BjuTT4-1, BjuTT4-7, and BjuTT16-5 genes were not predicted from our GSS dataset, illustrating that these genes were missed in our genome sequence survey of B. juncea genome, most probably because of insufficient sequencing depth or assembly errors.
In Arabidopsis, three additional PA genes TT15 (DeBolt et al., 2009), TT9 (Ichino et al., 2014), and TT13/aha10 (Appelhagen et al., 2015) have recently been cloned. Their Brassica homologs were not investigated in the present study. In our next study, we will clone and analyze these genes to complete the set of PA genes in Brassica spp. Initial screening of our BAC library identified seven BAC clones for each of these three genes. Sequencing of the fragments amplified from these BACs is underway.
RNA-seq and FPKM analyses showed that BnaCnng01290D, BnaA09g29340D, BjuTT5-1 and BjuTT5-4 were not expressed in the seed coat, indicating that these genes might not be involved in seed pigmentation. Interestingly, the BjuTT3, BjuTT18, and BjuANR genes were not expressed in yellowseeded testa, but expressed very high in brown-seeded testa of B. juncea (Figure 2, Table S7), which is consistent with previous results (Yan et al., 2008(Yan et al., , 2011Akhov et al., 2009;Liu et al., 2009Liu et al., , 2013Jiang et al., 2013), suggesting that seed color is determined by expression of genes that encode enzymes that catalyze PA biosynthesis. Concomitant with the absence of expression of these enzyme-encoding genes in yellow-seeded testa, the early stage genes, BjuTT4-2 and BjuTT4-3, which encode chalcone synthase, and transporter genes, BjuTT19-1 and BjuTT19-3, which encode glutathione transferase, were remarkably downregulated or not expressed in yellow-seeded testa (Table S7). These findings illustrate that these genes are co-regulated with BjuTT3, BjuTT18, and BjuANR, and their expression is not essential to the production of biosynthetic substrates and epicatechin transport in yellow-seeded testa. Other BjuTT19 and BjuTT4 genes did not show differential expression between yellow-and brown-seeded testa (Figure 2, Table S7), implying that these genes are not involved in seed pigmentation and that their biological roles require further investigation.
To answer the questions why all the BjuTT3, BjuTT18, and BjuANR genes are not fully expressed in yellow-seeded testa and why these genes are mutated, transcriptionally regulated, or both, we also cloned full-length genomic sequences of these genes from SY and compared them with the corresponding sequences from PM. Comparative analysis showed no differences, except for a 33-bp and 2-bp difference in BjuTT18-2 and BjuTT3-1. In Arabidopsis, the genes TT3, TT18, and ANR are transcriptionally regulated by TT2-TT8-TTG1 complex (Xu et al., 2013). Comparison between SY and PM uncovered a 1275-bp insertion in exon 7 of BjuTT8-1 and a C-T transition in exon 7 of BjuTT8-2 of SY, which is almost in agreement with findings from Padmaja et al. (2014) who speculated that the TT8 gene controls seed pigmentation in B. juncea.

CONCLUSIONS
A total of 55 genes homologous to 16 Arabidopsis proanthocyandin-associated genes were identified and cloned  from B. juncea. Approximately 58 and 31 PA genes were detected in B. napus and B. rapa genome databases. Around 30 of these cloned B. juncea genes were located in the A-genome chromosomes, except A04, whereas the remaining 25 were mapped to the B-genome chromosomes, except B05 and B07. A majority of these genes were expressed in the seed coat of B.juncea and B. napus. Tissue-specific expression of the TT4, TT5, and TT19 genes were observed in B. juncea and B. napus. BjuTT3, BjuTT18, BjuANR, BjuTT4-2, BjuTT4-3, BjuTT19-1, and BjuTT19-3 were transcriptionally regulated in the seed coat and not expressed or downregulated in yellow-seeded testa. In summary, the present study facilitates in better understanding the molecular mechanism underlying PA accumulation/profile and seed pigmentation, as well as in further characterization of the structure, variations, and functions of PA genes in Brassica spp.

AUTHOR CONTRIBUTIONS
ZL and CG designed the research. XL, YL, MY, and DS performed the research and analyzed the data. XH took part in screening of the BAC library. SL and SC provided the genes primers and assisted with sequencing of the BAC clones. ZL and XL wrote the manuscript. All authors read and approved the final manuscript.

FUNDING
This work was supported by the National Natural Science Foundation of China (No.31101176 and No.31271762).
Figure S1 | The PCR screening systems and process of BAC library. Screening of a specific clone by five-step PCR is shown: Step 1, screening of 19 superplates: A positive signal was detected in the superplate3 (plates SP 021-030). Steps 2-4, screening against superplate3 by 3D-PCR: Positive signals were identified in 1D, 2D, and 3D; these consisted of Plate034, C15&16, and RI&J, respectively, indicating that Plate034, column 15/16, and row I/J contained the specific BAC DNA.
Step 5, screening of four candidate BACs: A positive signal was detected in the one of the four candidate BACs (ZBjuH034I15, ZBjuH034I16, ZBjuH034J15, and ZBjuH034J16). Consequently, a BAC clone containing the specific sequence was identified as the clone of ZBjuH034J15.  Table S1 | Sequences of the primer pairs used in screening for proanthocyanidin-associated genes of Brassica juncea BAC library. Table S2 | Sequences of the primer pairs used in expression analysis of proanthocyanidin-associated genes of Brassica juncea.