Genome-Wide Identification, Evolution, and Expression Analysis of the ATP-Binding Cassette Transporter Gene Family in Brassica rapa

ATP-binding cassette (ABC) proteins can act as transporters of different substrates across biological membranes by hydrolyzing ATP. However, little information is available about ABC transporters in Brassica rapa, an important leafy vegetable. In the present study, we carried out genome-wide identification, characterization and molecular evolution analyses of ABC gene family in B. rapa and 9 other plant species. A total of 179 B. rapa ABC genes (BraABCs) were identified. Among them, 173 BraABCs were identified on 10 chromosomes. Based on phylogenetic analysis and domain organization, the BraABC family could be grouped into eight subfamilies. BraABCs in the same subfamily showed similar motif composition and exon-intron organization. Common and unique cis-elements involved in the transcriptional regulation were also identified in the promoter regions of BraABCs. Tissue-expression analysis of BraABCs demonstrated their diverse spatiotemporal expression profiles. Influences of the whole genome triplication (WGT) on the evolution of BraABCs were studied in detail. BraABCs were preferentially retained compared with their neighboring genes during diploidization after WGT. Synteny analysis identified 76 pairs of syntenic BraABC paralogs among the three subgenomes of B. rapa, and 10 paralog pairs underwent positive selection with ω (= Ka/Ks) ratios greater than 1. Analyses of the expression patterns of syntenic BraABC paralogs pairs across five tissues and under stress treatments revealed their functional conservation, sub-functionalization, neo-functionalization and pseudogenization during evolution. Our study presents a comprehensive overview of the ABC gene family in B. rapa and will be helpful for the further functional study of BraABCs in plant growth, development, and stress responses.


INTRODUCTION
ATP-binding cassette (ABC) transporters constitute one of the largest gene families that are ubiquitously present in all living organisms, from prokaryotes to humans (Dassa and Bouige, 2001). The majority of ABC proteins that have been characterized are ATP-dependent, and membrane-bound transporters are able to translocate a wide range of molecules through intra-and extracellular membranes (Higgins, 1992). Plant genomes are characterized by a large number of ABC genes (ABCs) encoding for more than 100 ABC transporters, which is more than most other organisms, that are involved in a broad range of biological functions (Kang et al., 2011).
The possession of a nucleotide-binding domain (NBD) is usually used to define ABC proteins (Verrier et al., 2008). There are several highly conserved motifs within the NBD domain, including the Walker A and Walker B sequences, the ABC signature motif, the H loop and the Q loop (Higgins and Linton, 2004). Apart from the NBD domain, ABC proteins also contain hydrophobic transmembrane domains (TMDs). Several transmembrane α-helices are generally present in TMDs. NBDs usually act as energy providers for substrate translocation or non-transport processes by ATP-binding and ATP-hydrolyzing. TMDs function as recognizers and channels for substrates to translocate the lipid bilayer (Sánchez-Fernández et al., 2001). In eukaryotes, there are two common arrangements for ABC transporters: full-sized transporters and half-sized transporters. The structure of canonical ABC transporters (full-sized ABCs) contains four domains: two NBDs and two TMDs in a single polypeptide. Half-sized transporters consist of only two domains (1 TMD and 1 NBD). Half-sized ABCs have to form homo-or heterodimers to conduct the function of the substrate pump. ABC proteins lacking a TMD are usually not involved in transmembrane transport.
Several categorization methods have been proposed to classify ABC proteins, of which the Human Genome Organization (HUGO) scheme has currently been widely adopted to classify human and plant ABC proteins (Dean et al., 2001;Verrier et al., 2008). According to the HUGO scheme, eukaryotic ABC proteins are divided into eight subfamilies based on the NBD phylogenetic relationship, homologous relationship and domains organization: from ABC subfamily A (ABCA) to ABC subfamily H (ABCH). ABCH was identified in arthropod genomes, but is absent in fungi, mammals and plants (Annilo et al., 2006;Dermauw and Van Leeuwen, 2014). Recently, ABC subfamily I (ABCI) containing the "prokaryotic"-type ABCs was found in plants but is absent in most animal genomes (Verrier et al., 2008).
In total, nine subfamilies (ABCA-ABCI) have been identified, and eight of them (ABCA-ABCG and ABCI) are present in plant genomes (Kang et al., 2011). Proteins of the ABCA-ABCD subfamilies have a forward direction for domains organization (TMD-NBD). Relatively, the proteins of the ABCG and ABCH subfamilies contain the reverse domain organization (NBD-TMD). While ABCE and ABCF proteins consist of only two NBDs, these proteins are characterized as soluble proteins. ABCI proteins usually possess only one single domain, mainly NBD or TMD. The genetic evidence indicates that some of these single domains encoded by ABCI proteins can assemble into functional multi-subunit ABC transporters (Verrier et al., 2008).
The functions of ABC proteins in humans have been extensively studied. In addition to acting as active transporters, some ABCs function as ion channels, receptors or are even involved in mRNA translation and ribosome biogenesis. In humans, ABCs are also identified as having multidrug resistances in cancer cells; for example, proteins within the ABCB and ABCG subfamilies can protect cells from hydrophobic xenobiotics (Sarkadi et al., 2006). In plants, ABCs were initially identified as transporters participating in detoxification processes (Martinoia et al., 1993). Subsequently, an increasing number of plant ABCs have been functionally characterized as far away from detoxification functions. Plant ABCs participate in many important physiological and developmental processes; furthermore, the transport substrates of plant ABCs are divergent and include conjugated compounds, phytohormones, primary products, lipids and lipophilic compounds (Kang et al., 2011).
The evolution of the angiosperm genome is characterized by polyploidization through whole-genome duplication (WGD), followed by diploidization, which is typically accompanied by considerable homoeologous gene loss (Stebbins, 1950). A number of polyploidy events have been identified along different lineages of flowering plants. After polyploidy, duplicated genes experience different fates over evolutionary time. Most likely, one copy of paralogous genes loses or becomes pseudogenes Conery, 2000, 2003). However, numerous functional importance duplicated gene pairs can survive by undergoing tens of millions of years of natural selection (Schnable et al., 2009;Schmutz et al., 2010). Gene categories whose products include transcription factors, ribosomal proteins or protein kinases are preferentially retained (Blanc and Wolfe, 2004;Maere et al., 2005). This can be explained by the gene dosage hypothesis, positing that genes whose products interact with one another or in networks tend to be dosage sensitive and would be over-retained Birchler and Veitia, 2007). In sum, duplicated genes undergo one of the following evolutionary fates: functional conservation (in which both copies maintain the same function), sub-functionalization (the ancestral function is subdivided between copies), neo-functionalization (one copy acquires a new function) or pseudogenization (one copy becomes unexpressed or functionless) (Lynch and Conery, 2000).
Brassicaceae is a large plant family containing ∼338 genera and 3,700 species (Bailey et al., 2006). Owing to its remarkable species, genetic and physiological diversity, as well as significant economic potential, Brassicaceae has become a model for polyploidy and evolutionary studies. In Brassicaceae, Arabidopsis underwent three WGD events after its lineage diverged from the monocot lineage: the paleohexaploidy (γ) duplication shared with most dicots and two subsequent genome duplications (α and β) since its divergence from Carica papaya (Bowers et al., 2003). B. rapa, a diploid Brassica species, not only shared the three paleo-polyploidy events with Arabidopsis but also underwent a further whole genome triplication (WGT) since its divergence from Arabidopsis 13 to 17 million years ago (Town et al., 2006). After the WGT, B. rapa has undergone considerable fractionation: the 41,174 genes in the B. rapa genome are considerably fewer than would be expected from a simple WGT of the ∼27,000 genes in the Arabidopsis genome (Wang et al., 2011). The extent of gene loss varies among the three subgenomes. The LF (least fractionated) subgenome retains ∼70% of the genes found in Arabidopsis, whereas the MF1 (medium fractionated) and MF2 (most fractionated) subgenomes retain ∼46 and 36%, respectively (Wang et al., 2011). B. rapa, because of its close relationship with Arabidopsis and representative WGT event, has thus become an excellent model for studying the evolution of genome structure and gene fate after WGD.
As a superfamily, ∼22 out of 130 Arabidopsis ABCs (AtABCs) have been functionally characterized in the model plant. B. rapa is an important leafy vegetable in the Brassicaceae family; however, little information about the ABC gene family is available in B. rapa. The objective of the present study was to conduct a genome-wide identification and characterization of the ABC gene family in B. rapa. We carried out comprehensive studies of the ABC gene family in B. rapa for the first time, including the phylogenetic relationships, chromosomal locations, gene structures, cis-elements, and expression profiles of BraABCs in different tissues. Influences of WGT on the evolution of BraABCs were also analyzed, including the retention and conservation of BraABCs after WGT and syntenic BraABC paralog pairs among three subgenomes of B. rapa. Expression patterns of syntenic paralog BraABCs in different tissues and under various abiotic stresses were also determined to investigate their molecular evolutionary fates. Our findings provide a basis for better understanding of the function and evolutionary history of BraABCs and will help to further investigate the detailed molecular and biological functions of BraABCs.

Data Sources and ABC Sequences Retrieval
The B. rapa annotated genome sequences were downloaded from the Brassica database (BRAD; http://brassicadb.org/brad/; Cheng et al., 2011). Several datasets and multiple steps were used to conduct a whole-genome search. Hmmsearch from the HMMER program (3.0) (Eddy, 1998) was used to search the Brassica rapa genome to identify putative ABCs containing the ABCtransporter domain (PF00005), the ABC-2 transporter domain (PF01061), the ABC transporter transmembrane region domain (PF00664), the CYT domain (PF01458) or the mce related protein domain (PF02470) at a threshold of E < 1E −4 . The Hidden Markov Model (HMM) profiles of these domains were downloaded from the Pfam 27.0 database (http://Pfam.sanger.ac. uk/; Finn et al., 2013). Then, we manually verified the candidate ABCs using the SMART database (http://smart.embl-heidelberg. de/; Letunic et al., 2012), and sequences with obvious errors and/or lengths of less than 100 amino acids were manually removed.

Multiple Alignment and Phylogenetic Analysis of the ABC Gene Family
Amino acid sequences of the ABC-transporter domain (PF00005) were extracted according to their locations in ABC proteins. Multiple alignments of ATPase domain sequences were conducted using ClustalW (Thompson et al., 1997) of the MEGA6 (Tamura et al., 2013) software with the default options. The phylogenetic trees were constructed using the maximum likelihood (ML) method in MEGA6 with the Jones, Taylor, and Thornton (JTT) amino acid substitution model. One thousand bootstrap replications were performed in each analysis. The remaining parameters were set to the defaults.

Conserved Motifs Identification and Exon-Intron Structure Analysis
Conserved motifs of ABC proteins were identified using the local MEME software version 4.9.0 (Bailey et al., 2009), with the following parameter settings: maximum number of motifs = 15, and optimum motif width = 10-200 residues; the other parameters were the defaults. The coding sequences (CDSs) and the DNA sequences were used to predict the exon-intron structure of ABCs using the Gene Structure Display Server 2.0 (GSDS 2.0) (http://gsds.cbi.pku.edu.cn; Hu et al., 2015).

Cis-Elements Analysis and Proteins Interaction Network Prediction
All of the promoter sequences (2,000 bp upstream of initiation codon "ATG") of BraABCs were extracted from the B. rapa genome according to the Generic File Format (GFF) file. Then, the cis-regulatory elements of promoters for each gene were identified by PLACE Web Signal Scan-PLACE (https://sogo.dna.affrc.go.jp/cgi-bin/sogo.cgi?lang=en&pj=640& action=page&page=newplace). The interaction network of BraABC proteins was constructed by applying STRING software (Search Tool for the Retrieval of Interacting Genes/Proteins, http://string-db.org/).

Gene Chromosomal Location, Gene Synteny Analysis, and Syntenic BraABCs Paralogs Pair Identification
Chromosomal locations of BraABCs were drawn from top to bottom on B. rapa chromosomes according to their positions in genome annotation. The Perl in-house program was used to draw the location images of BraABCs on chromosomes.
Syntenic relationships of ABCs between Arabidopsis and B. rapa were identified by searching "syntenic gene" in the Brassica database (BRAD) (Cheng et al., 2011). Then, the Circos program (Krzywinski et al., 2009) was applied to present the syntenic relationships on their chromosomes.
After the identification of the BraABCs, syntenic BraABCs paralogs pairs among the three B. rapa subgenomes were identified by searching "syntenic gene" in BRAD (Cheng et al., 2011). The non-synonymous substitution rate (Ka), the synonymous substitution rate (Ks) and the ω (= Ka/Ks) of paralog pairs were estimated by KaKs_Calculator 2.0 (Wang et al., 2010). The duplication date of paralog pairs was estimated according to formula T = Ks/2λ, assuming a clock-like rate (λ) of 1.5 synonymous substitutions per 10 8 years for B. rapa (Koch et al., 2000).

Expression Pattern Analysis
To analyze the expression patterns of BraABCs in different tissues, we used the reported Illumina RNA-seq data by Tong et al. (2013). Of the dataset, five tissues, including the roots, stems, leaves, flowers and siliques of B. rapa accession Chiifu-401-42 were analyzed. Gene expression levels were quantified by FPKM (Fragments Per Kilobase of transcript per Million fragments mapped) values. An expression cluster of BraABCs in the five tissues was generated by Cluster 3.0 (http://bonsai.hgc. jp/~mdehoon/software/cluster/software.htm), and the cluster results were shown using Tree View software (http://jtreeview. sourceforge.net/). The RNA-seq dataset was also used to analyze the expression profiles of syntenic paralog pairs in different tissues.
To investigate the expression patterns of syntenic BraABC paralog pairs under normal and abiotic stress conditions, the cultivar Chiifu-401-42 was cultivated in potting soil at 24/16 • C, with a photoperiod of 14/10 h for day/night in a growth chamber. Five-leaf-stage plants were used to conduct different abiotic stress treatments under continuous time (6, 12, and 24 h). Hormone treatment was performed with 100 µM ABA; for drought stress treatment, the pots were irrigated with 15% (w/v) polyethylene glycol PEG and kept standing in the irrigation solution for 30 min under normal growth conditions; salt treatment was performed with 250 mM NaCl. After the treatments, the total RNAs were isolated from young leaves using an RNA kit (TaKaRa, Dalian, China). The RNAs were reverse transcribed into cDNA with a PrimeScript RT reagent Kit (TaKaRa, Dalian, China). Six pairs of syntenic paralogs were randomly selected to conduct qRT-PCR. The B. rapa actin gene Bra028615 was used as an internal control. Primers for qRT-PCR were designed using Beacon Designer 7. The qRT-PCR experiments were performed with three biological and technical replicates. The SYBR Premix Ex Taq kit R (TaKaRa, Dalian, China) was used to detect gene expression according to the manufacturer's recommendations on the One-step Real-Time PCR System Time PCR Detection System (Applied Biosystems, Foster City, CA, USA). The cycling profile was as follows: 94 • C for 30 s, 94 • C for 10 s for 40 cycles, and 58 • C for 30 s, followed by a melting curve analysis at 65 • C for 10 s for 61 cycles. The relative gene expression levels were analyzed using the comparative Ct value method (Heid et al., 1996).

Genome-Wide Identification, Phylogenetic Analysis, and Classification of ABC Genes in B. rapa
To explore all of the putative ABCs in the B. rapa genome, the HMM profiles of the ABC-transporter domain, the ABC-2 transporters domain, the ABC transporter transmembrane region domain, the CYT domain and the mce related protein domain were employed as queries to search against the protein database of B. rapa using the Hmmsearch program. A total of 179 BraABCs, which contained at least one of the above domains, were identified ( Table 1). Among the 179 BraABC proteins, 162 proteins contained one or more NBD domains, and an additional 17 proteins without NBD domains were assigned to ABCI, based on the domain composition of the ABC proteins (Verrier et al., 2008).
Understanding the evolutionary relationship between BraABC proteins and AtABC proteins could contribute to classifying and assessing the potential functions of BraABCs. The NBD domains of the 162 BraABC proteins and 117 AtABC proteins (117 AtABC proteins contained the NBD domain) were aligned, and an unrooted phylogenetic tree was constructed by MEGA6 using the Maximum Likelihood (ML) method with 1,000 bootstraps. The AtABC proteins of each subfamily were grouped together, whereas the proteins of ABCI were dispersive in the phylogenetic tree. Overall, BraABC proteins followed the phylogenetic pattern of Arabidopsis (Supplementary Figure 1). Based on the phylogenetic relationship with AtABCs, BraABCs were also divided into eight subfamilies: ABCA, ABCB, ABCC, ABCD, ABCE, ABCF, ABCG and ABCI. ABCG, ABCB, ABCI, and ABCC were the largest subfamilies containing 63, 38, 30, and 21 members, respectively. ABCA, ABCE, and ABCF included 11, 7, and 7 members, respectively. Only 2 members were identified in ABCD ( Table 1). The protein domain organizations of BraABCs in each subfamily were consistent with that of Arabidopsis (Verrier et al., 2008).
Similarly, Hmmsearch searches were also conducted in the selected angiosperms: P. trichocarpa, G. max, A. lyrate, C. papaya, V. vinifera, B. distachyon, O. sativa, and A. trichopoda. The number of identified ABCs in these species ranged from 113 to 271 (Figure 1). There were 271 ABCs existing in the G. max genome, while the members in other species ranged from 113 in C. papaya, 130 in Arabidopsis, 132 in A. lyrate, 137 in A. trichopoda, 138 in B. distachyon, 141 in O. sativa, 179 in B. rapa, 181 in V. vinifera and 204 in P. trichocarpa. The identified ABCs in these species are listed in Supplementary Table 1. To gain insight into the evolutionary history and relationship of ABCs in plants, we constructed a phylogenetic tree using ABCs of six representative species, including Arabidopsis (Verrier et al., 2008), B. rapa, C. papaya, V. vinifera, O. sativa, and A. trichopoda. In contrast with the ABCA-ABCG members, some ABCI proteins lacked the NBD domain, and some ABCI    members were scattered in the phylogenetic tree, although they contained the NBD domain. Therefore, our phylogenetic tree was constructed using the NBD domains of ABCA-ABCG members from these species. As shown in Figure 2, seven distinct clades were clearly distinguished based on the tree topology, namely ABCA, ABCB, ABCC, ABCD, ABCE, ABCF, and ABCG, suggesting that these groups have diverged prior to the divergence of angiosperm. Next, the copy number of ABCs in each subfamily of these species was quantified (Figure 1). In each studied species, the copy numbers of different subfamilies varied greatly. For the copy numbers of subfamilies in all of the studied species, we found that ABCB, ABCC, ABCG, and ABCI were the larger subfamilies. ABCA, ABCD, ABCE, and ABCF contained a small number of members, and the copy numbers of ABCD, ABCE, and ABCF were relatively constant. B. rapa has undergone a further WGT since its divergence from Arabidopsis. Compared with Arabidopsis, the expansion of the ABC gene family in B. rapa was largely the result of the expansion of ABCs in ABCB, ABCC, ABCG, and ABCI (Figure 1).
The 179 BraABCs were designated as BraABCA1 through BraABCI30 according to their respective subfamily and their location on the chromosome ( Table 1). The comprehensive information of BraABCs, including the locus name, gene location, predicted protein length, exon number, molecular weight (MW), isoelectric point (PI) and transmembrane helices number (TMs), is listed in Table 1. The size of the BraABC proteins varied from 118 (BraABCB31) to 5,408 (BraABCI6) amino acids, with an average of ∼984 amino acids. There were few differences in the protein lengths of ABCC, ABCE and ABCF, whereas the protein lengths of ABCB, ABCG and ABCI changed greatly. The MWs of the BraABC proteins ranged from 12.87 kDa (BraABCB31) to 611.88 kDa (BraABCI6). The variation range of PI was from 4.79 (BraABCI18) to 10.52 (BraABCI22), indicating that these BraABC proteins might exist and function in different parts of the cells. Most of the BraABC proteins (162 out of 179, 90.5%) presented multiple exons and sixty proteins (33.5%) that contained more than 10 exons. The majority of the BraABC proteins contained transmembrane helices (TMs), except for all of the ABCE and ABCF proteins as well as parts of the ABCI proteins. The ubiquity of TMs was in accordance with the transmembrane transport functions of ABCs. Pairwise comparisons of the full-length BraABC protein sequences within each subfamily were conducted. The protein sequences within ABCA, ABCB, and ABCI exhibited higher identities than other subfamilies (Supplementary Figure 2). In contrast, the sequence identities of ABCD and ABCG were lower, indicating that the degree of sequence divergence of the two subfamilies was higher than other subfamilies. The other families, such as ABCC, ABCE, and ABCF exhibited medium sequence identities.
Simultaneously, we further identified tandem arrays of BraABCs along the 10 B. rapa chromosomes. A tandem array was defined as multiple members of BraABCs occurring within the same or neighboring intergenic regions. As shown in Figure 3, 11 BraABCs clusters (genes marked in purple) containing 28 tandemly duplicated genes were identified on chromosomes A03, A04, A05, A06, and A09.

Motif Composition and Exon-Intron Organization of BraABC Proteins
To better understand the conservation and diversity of motif compositions and gene structures of BraABCs, the conserved motifs and exon-intron organization of BraABCs were analyzed.
The conserved motifs of BraABC proteins in ABCA-ABCG were analyzed using MEME software, and we identified 15 conserved motifs (Figure 4). The lengths of the conserved motifs were between 26 and 154 amino acids. The number of the conserved motifs in each BraABC protein varied from 1 to 8. Several motifs were widely spread among BraABC proteins; for instance, motifs 1 and 3 were present in the most proteins of ABCA-ABCG. In contrast, other motifs were specific to only one or two subfamilies. For example, motifs 4, 5, 8, and 12 were specific to ABCB and ABCC, motifs 10 and 15 were specific to ABCB, and motif 14 was specific to ABCA. ABCG contained the specific motifs 2, 6, 7, 9, 11, and 13, and these motifs were probably required for specific protein functions. The varieties of motif compositions in different subfamilies suggested sources of functional differentiation in BraABCs during the evolutionary process. BraABC proteins grouped into the same subfamily exhibited similar motif characteristics, suggesting functional similarities for members in the same subfamily. Logos of the 15 identified motifs are shown in Supplementary Figure 3.
We also analyzed the exon-intron organizations of BraABCs using GSDS 2.0. Similar to the motif composition, most of the BraABCs within the same subfamily generally exhibited similar gene structure in terms of the numbers and lengths of introns and exons. Moreover, BraABCs that had a closely phylogenetic relationship and shared a more similar gene structure (Supplementary Figure 4).

Conservation of BraABCs following the Whole Genome Triplication Event
To explore the influences of WGT on the evolution of BraABCs, we studied the conservation of BraABCs after WGT. After diverging from Arabidopsis, the gene number of the B. rapa genome was ∼42,000, which is considerably lower than the simple triplication of the ∼30,000 genes in the Arabidopsis genome. Therefore, substantial genes must have been lost (fractionation) in B. rapa after triploidization (Wang et al., 2011). To evaluate the fractionation extent of BraABCs, we compared the retention of BraABCs relative to the set of 3,580 neighboring genes (10 on either side), flanking the 179 BraABCs (Supplementary Table 2). In comparison, 47.2% of BraABCs were retained in one copy, which was greater than the retention of neighboring genes (38.7%). A similar percent (33.7 and 30.6%, respectively) of BraABCs and neighboring genes were retained in the two copies. And more BraABCs (19.1%) were retained in the three copies than their neighboring genes (9.2%) ( Figure 5A).
The triplicated B. rapa genome can be divided into three subgenomes: LF, MF1, and MF2. The subgenomes were grouped based on the level of gene loss relative to Arabidopsis since the WGT 13-17 MYA. Compared with their neighboring genes, more BraABCs were significantly retained in all three subgenomes. Consistent with the whole genome level, BraABCs distributed in the LF subgenome were more than two other subgenomes ( Figure 5B). Given the well-conserved synteny between Arabidopsis and B. rapa (Cheng et al., 2012), in our study, there were four collinear relations for ABCs between B. rapa and Arabidopsis: a single copy of BraABC to a single copy of Arabidopsis (SB to SA); a tandem array of BraABCs to a tandem array of Arabidopsis (TB to TA); a single copy of BraABC to a tandem array of Arabidopsis (SB to TA); and a tandem array of BraABCs to a single copy of Arabidopsis (TB to SA). The four syntenic relations were presented using the Circos program ( Figure 5C). On the whole, BraABCs were preferentially retained, consistent with the gene balance hypothesis and demonstrating the significance of ABCs in B. rapa.

Cis-Elements of BraABCs Involved in Transcriptional Regulation
To identify cis-elements related to transcriptional regulation, 2 kb promoter regions for each of the 179 BraABCs were identified. Gaps existed in the promoter regions of 13 BraABCs after assembling the B. rapa genome. Therefore, these BraABCs were excluded in the following study. Next, we applied the Plant Cisacting Regulatory DNA Elements (PLACE) website to analyze the cis-elements and identified a total of 291 different cis-elements in all of the studied BraABCs.
A total of 13 common cis-regulatory elements were identified in all of the promoter regions of the BraABCs, which were highly conserved among all of the studied BraABCs ( Table 2). Two common cis-regulatory elements, WRKY71OS and GT1CONSENSUS were responsive to plant hormones, including ABA, GA, and SA, suggesting that ABA, GA, and SA could affect the expression levels of BraABCs. Some common cis-regulatory elements were responsive to stresses, such as pathogens (WRKY71OS) and wounds (WBOXNTERF3). Of the 13 common cis-regulatory elements, GATABOX, IBOXCORE, and GT1CONSENSUS were required for transcriptional regulation by light. DOFCOREZM participated in carbon metabolism, and GATABOX played a role in molecular light switching; therefore, we speculated that BraABCs were likely to participate in energy metabolism. Two common cis-elements, POLLEN1LELAT52 and GTGANTG10, were required for transcriptional regulation in pollen, suggesting that BraABCs might be involved in the reproductive process.
We also identified unique cis-element sequences that are present in the promoter regions of unique BraABCs. In total, 40 unique cis-elements were identified in the promoter regions of the 36 BraABCs (Supplementary Table 3). Interestingly, 3 unique cis-elements (ACGTSEED3, ABREMOTIFIOSRAB16B, and VOZATVPP) were presented in only BraABCF3. BraABCG18 and BraABCB10 both contained two unique cis-elements. There was only one unique cis-element present in the other 33 BraABCs. The unique cis-element of these genes indicated their expression specificity.

Gene Expression of BraABCs in Different Tissues
To characterize the expression patterns of individual BraABCs in different tissues, we used publicly available RNA-Seq data of different tissues as a resource (Tong et al., 2013). A heat map displaying the expression profiles of BraABCs in roots, stems, leaves, flowers and siliques was generated, and the BraABCs were clustered by their expression patterns. As shown in Figure 6A, the majority of BraABCs presented different expression patterns, whereas a few of them exhibited similar expression patterns. A total of 165 BraABCs (92.2%) were determined as being expressed in at least one tissue, and 117 BraABCs (65.4%) expressed in all tissues, including 3 ABCA genes, 26 ABCB genes, 19 ABCC genes, 2 ABCD genes, 5 ABCE genes, 6 ABCF genes, 37 ABCG genes and 19 ABCI genes. Relatively, the expression of 14 BraABCs was not detected in any of the five tissues. Some BraABCs also exhibited tissue-specific expression, and there were  (Figure 6B), suggesting that these genes may play specific roles in the relevant tissues. As shown in Figure 6A, some genes of ABCA, ABCB, and ABCG exhibited variable expression profiles throughout the five tissues, such as BraABCA10, BraABCB13, and BraABCG25. In contrast, the majority of genes in ABCC, ABCD, ABCE, ABCF, and ABCI presented relatively consistent expression levels across the five tissues, such as BraABCC18, BraABCD2, BraABCE6, BraABCF1, and BraABCI14. As shown in Figure 6A, we found the expression profiles of BraABCs in the roots exhibited large differences compared with the other four tissues, whereas similar expression patterns of BraABCs were observed for the stems and leaves, and the flowers and siliques.

Syntenic BraABC Paralog Pairs among the Three Subgenomes of B. rapa
The three subgenomes of B. rapa are the evolutionary products of WGT and a lot of synteny blocks among them. Syntenic paralogs are genes that are located in the syntenic fragments. Syntenic BraABC paralog pairs among LF, MF1, and MF2 were identified by searching "syntenic gene" in the BRAD (Cheng et al., 2011). In total, 76 pairs of BraABC syntenic paralogs were identified among the three subgenomes (Table 3). Synonymous (Ks) and non-synonymous (Ka) values were calculated to explore the selective pressures on these paralog pairs. We found that the ω (= Ka/Ks) ratios of 10 syntenic paralogs (13.16%) were more than 1, indicating positive selection on these BraABC paralogs. The ABCC21_ABCC3 pair had the highest ω ratios, with a ratio of 1.9264. The rest of the paralogs were under purifying selection, with ω ratios lower than 1. Among them, the ω ratios of 22 paralog pairs were lower than 0.1, suggesting that these gene pairs underwent strong purifying selection stress, making the functions of these paralog pairs trend toward relative similarity. Furthermore, the duplication time of these paralogs pairs were estimated by using a relative Ks measure as a proxy for time. The estimated duplication time of BraABC paralog pairs indicated that it spanned from 3.45 to 42.2 MYA, with an average duplication time of ∼15.13 MYA, which was consistent with the recent WGT time of B. rapa (13-17 MYA) (Wang et al., 2011).

Evolutionary Fates Prediction of Syntenic BraABC Paralog Pairs
Profiling gene expression may provide an ample amount of information about the mode and tempo of duplicated genes.
To predict the evolutionary fates of syntenic BraABC paralog pairs, we took advantage of the deep RNA-Seq data of B. rapa to investigate their expression profiles in different tissues, including roots, stems, leaves, flowers and siliques (Tong et al., 2013). Generally, if a Pearson correlation coefficient (PCC) of the two factors is greater than 0.6, we can infer that they have a close positive correlation. Among the 76 pairs of paralogs, 27 pairs had positive expression correlations across the five tissues, which were greater than 0.6 measured by the PCCs (Figure 7). Such high positive correlations among these paralog pairs probably indicated functional conservation or sub-functionalization for these paralog pairs after duplication. Simultaneously, 40 pairs of syntenic paralogs were negatively correlated with negative PCC values or exhibited little correlation with small positive PCC values in gene expression, suggesting neo-functionalization for these paralog pairs. Furthermore, there were 9 pairs of paralogs in which one copy of each pair did not express in any of the 5 tissues. Therefore, there were no available (NA) results of PCCs for these paralog pairs. Among the 9 paralog pairs, 8 pairs were composed of genes from two different subfamilies, and one copy of each of these paralog pairs was the ABCI gene. Interestingly, all of these ABCI genes were not expressed in any of the studied tissues, indicating that they probably became pseudogenes. Therefore, pseudogenization of these 9 pairs of paralogs may occur during evolution. From the cluster of different tissues, the leaves clustered with the stems and the flowers clustered with the siliques in the heat map of BraABC paralog pairs indicated a similar expression profile (Figure 7). Taking an overall view of the cluster for BraABC paralog pairs, paralog pairs with similar evolutionary fates tended to further cluster together, confirming that the expression patterns of paralog pairs were related to their evolutionary fates after duplication. Approximately 42.1% (32 out of 76) of the paralog pairs presented no less than a two-fold expression difference (absolute value of log 2 FPKM ratio ≥ 1) in all of the studied tissues. In brief, the expression correlation analysis of syntenic paralog pairs showed their functional roles in functional conservation, sub-functionalization, neofunctionalization and pseudogenization in the BraABC gene family.
We also investigated the expression patterns of syntenic BraABC paralog pairs under ABA, drought and salt stresses with qRT-PCR. Three pairs of conserved or sub-functional paralogs and three pairs of neo-functional paralogs were randomly selected to study their expression patterns. Primers for qRT-PCR are listed in Supplementary Table 4. As shown in Figure 8,   a range of expression profiles of the selected BraABCs were observed after exposure to ABA, drought and salt stress. Some paralog pairs exhibited similar expression patterns, whereas others presented different expression profiles. The paralog pairs with positive expression correlations measured by PCCs across the five tissues also had positive correlations under the stress treatments ( Figures 8A-C), which also indicated the functional conservation or sub-functionalization in these paralog pairs. Similarly, paralog pairs which exhibited negative correlations across the studied tissues also presented negative correlations under the stress treatments (Figures 8D-F), suggesting that these paralog pairs were neo-functional. On the whole, the expression correlations of the selected syntenic BraABC paralog pairs across the five tissues were consistent with those under the stress treatments.

Interaction Network Analysis among BraABC Proteins
The interaction network of BraABC proteins, including the functional and physical interactions, were examined using STRING software and the corresponding database to retrieve the protein interactions. As shown in Figure 9, a dense interaction network formed among the BraABC proteins. Most of the BraABC proteins were involved in interaction relations with other proteins, except for BraABCA4, BraABCA9, BraABCG18, and several ABCI proteins. Interaction relations not only existed among proteins within the same subfamily but also among proteins from different subfamilies. Though interacting frequently with proteins in other subfamilies, some ABCB proteins did not interact with proteins of ABCB, shown as being surrounded by the yellow curve in Figure 9. The majority of proteins located in the center of the network were ABCG proteins, indicating that these proteins had more complex interaction relations with other BraABC proteins. The gene dosage hypothesis predicts that genes will be preferentially retained if their products are dose sensitive, interacting either with other proteins or in networks Birchler and Veitia, 2007). Consequently, ABCG genes were probably more preferentially retained compared with genes of other subfamilies during evolution. Indeed, ABCG had the largest number of genes among all of the BraABC subfamilies.

DISCUSSION
Due to their sessile nature, plants have evolved complex movement systems to establish steep concentration gradients of solutes across cellular membranes to adapt to the variable environment. The particularly large complement of ABC proteins play a major role (George and Jones, 2012). ABC proteins are involved in various biological processes and are ubiquitous in all living organisms. To date, most research into the functions of ABCs has focused on Arabidopsis and rice, whereas studies are limited in terms of other non-model plants like B. rapa (Kang et al., 2011). In our research, we conducted comprehensive studies of ABCs in B. rapa, including whole genome-wide identification, evolutionary relationships, chromosomal locations, structural investigation, conservation after WGT and expression patterns in different tissues and under different stress treatments.
In this study, we systematically identified 179 BraABCs, representing 0.43% of the annotated protein coding genes in B. rapa (Wang et al., 2011). Gene duplication events are important to the rapid expansion and evolution of gene families. Furthermore, segmental duplication and tandem duplication are known to be major duplication modes for gene family expansion (Cannon et al., 2004). Previous studies revealed that B. rapa not only shared three paleo-polyploidy events with Arabidopsis but also underwent a further WGT event since its divergence from Arabidopsis 13 to 17 MYA (Town et al., 2006). In our study, the average duplication time of syntenic BraABC paralog pairs was 15.13 MYA, which was close to the recent WGT date of B. rapa. A total of 87 BraABCs were contained in the 76 pairs of BraABC syntenic paralogs among three subgenomes stemming from WGT. Simultaneously, 28 BraABCs were found to be located as 11 tandem arrays on chromosomes (Figure 3). Taken together, syntenic paralog analysis and chromosomal distribution revealed that the expansion of the ABC gene family in B. rapa could be attributed to the recent WGT and tandem duplication. Similarly, the expansion of other gene families in plants, such as the PG of Populus, the bZIP of apple and the ST of pear are also associated with WGD/segmental and tandem duplications (Yang et al., 2013;Li et al., 2015;Zhao et al., 2016).
The phylogenetic tree demonstrated that BraABCs can be divided into eight subfamilies (ABCA-ABCG and ABCI) (Supplementary Figure 1). All subfamilies, except ABCI, were clustered together in the phylogenetic tree constructed from ABCs of B. rapa and Arabidopsis, whereas the ABCI genes containing the NBD domain were scattered in the phylogenetic tree. This topology is similar to that of the phylogenetic tree constructed from the ABCs of maize (Pang et al., 2013). The dispersed distribution in the phylogenetic tree and non-uniform domain organization of ABCI genes indicated that the functions of ABCI are more divergent than other subfamilies. The subfamily classification was further supported by the results of domain organization, exon-intron structure and motif composition.
The evolution complexity of angiosperm genomes has been characterized by polyploidization through WGD followed by diploidization. Studies of Arabidopsis and its close relative B. rapa can provide fundamental insights into the evolutionary patterns of plant genomes. After triploidization, there is considerable gene loss (fractionation) in B. rapa, and is consistent with the widespread gene loss after WGD events in other eukaryotes (Sankoff et al., 2010). In theory, the number of ABCs in the newly formed hexaploid B. rapa genome would be three-fold as in Arabidopsis, whereas only 179 BraABCs were identified, indicating that some BraABCs were lost following WGT. The gene balance hypothesis predicts that genes whose products participate in macromolecular complexes, signaling networks or transcription, are more likely to be retained, avoiding network imbalances associated with the loss of one member Birchler and Veitia, 2007). In B. rapa, the ABCs were biased and preferentially retained. The preferential retentions of BraABCs were consistent with the gene balance hypothesis and demonstrate that BraABCs play key roles in the adaptation of B. rapa. In the interaction network of BraABC proteins (Figure 9), the ABCG proteins were highly connected with other BraABC proteins. Simultaneously, ABCG had the largest number of genes among all subfamilies, suggesting that ABCG genes were probably more preferentially retained during evolution. In turn, the preferential retention of ABCG genes supports the reliability of the gene balance hypothesis.
Evidence has demonstrated that ABC genes are involved in plant growth, development and stress tolerance. AtABCB1, a member of AtABC, can participate in auxin transport, and AtABCB1 overexpressing plants develop longer hypocotyls (Sidler et al., 1998;Noh et al., 2001). AtABCG25 is involved in abscisic acid transport and response (Kuromori et al., 2010). OsABCC1, a rice transporter, is involved in detoxification and reduce arsenic in rice grains (Song et al., 2014). Overexpression of AtABCG36 improves drought and salt stress resistance in Arabidopsis (Kim et al., 2010). Gene expression patterns provide important clues for gene functions. Thus, we conducted expression analysis for all BraABCs in the roots, stems, leaves, flowers and siliques tissues using public RNA-seq data (Tong et al., 2013). Most of the BraABCs (92.2%) were expressed in at least one of the five tissues. Some BraABCs exhibited tissuespecific expression patterns (Figure 6B), indicating that they might play specific roles in the relevant tissues. Cis-elements in the promoters of genes can be bound by transcription factors and involved in gene expression regulation. To further understand the possible transcriptional regulation mechanisms of BraABCs, we scanned the common cis-regulatory elements, which were conserved in the promoter regions of all of the studied BraABCs. Among the identified 13 common cis-elements, WRKY71OS was responsive to ABA. However, in our qRT-PCR experiments, the expression profiles of some BraABCs and the presence of WRKY71OS in their promoters were not in good agreement under ABA treatment. Some BraABCs was down-regulated under ABA treatment or nearly unresponsive to ABA at some time points, such as BrABCB13, BrABCB12, BrABCB11, BrABCB5, and BrABCG45 (Figure 8). This suggests that the induction of these BraABCs might result from the functions of a complex array of cis-elements, and some unidentified ciselements might play a vital role in regulating the expression of these BraABCs under ABA treatment. Furthermore, this indicates that BraABCs have both shared, as well as distinct regulatory modules in response to ABA treatment. Moreover, of the identified common cis-elements, DOFCOREZM and GATABOX were related to carbon metabolism and molecular light switching, respectively, indicating that BraABCs probably participated in energy metabolism. Combining gene expression results with common cis-elements of BraABCs, we further confirmed the important roles of BraABCs during growth, development and stress response.
Altogether, 76 pairs of BraABC syntenic paralogs were identified among the three subgenomes. The average duplication time of the 76 pairs of paralogs was 15.13 MYA, which was consistent with the generated time of the three subgenomes (Wang et al., 2011). According to the ratio of non-synonymous to synonymous substitutions ω (= Ka/Ks), the selection type acting on the coding sequences can be measured. Ka/Ks ratios of 22 paralogs pairs were < 0.1, indicating that these paralog pairs were under strong purifying selection pressures. Furthermore, their functions were potentially more constrained with limited functional divergence occurring after duplication. It is worth noting that the ω ratios of 10 pairs of paralogs were > 1, representing positive selection and fast evolutionary rates in these BraABC paralogs at the protein level. This is different from other gene families in plants, such as PG of Populus, BURP of Medicago and ACD of tomato, which contain a few or even no paralog pairs undergoing positive selection (Yang et al., 2013;Li et al., 2016;Paul et al., 2016), and a relatively large percentage of BraABC paralogs pairs underwent positive selection in our study. We deduced genes of these paralog pairs might evolve some new functions to adjust to their living environment.
Four possible fates have been suggested for the evolution of gene duplication: functional conservation, sub-functionalization, neo-functionalization and pseudogenization (Lynch and Conery, 2000). Expression correlation analysis of syntenic BraABC paralog pairs across different tissues and under stress treatments could help reveal their functional roles in evolutionary fates.
Our results indicate that all four fates turned up among the 76 pairs of paralogs. The high positive correlations of the 27 paralog pairs suggest the conservation of ancestral gene functions or sub-functionalization through the division of labor by retaining different parts of their ancestral functions. In contrast, 40 pairs of paralogs had small positive PCCs or negative PCCs, indicating that neo-functionalization and added functional diversity were likely to occur for the BraABC gene family. In particular, among the 9 pairs of paralogs undergoing pseudogenization, 8 pairs were composed of genes from two different subfamilies. Furthermore, ABCI genes were contained in each of these 8 pairs of paralogs. Interestingly, all of these ABCI genes had probably already become pseudogenes. Genes of ABCA-ABCG included multidomains, whereas the ABCI genes generally contained only one domain, mainly NBD or TMD. We speculated that the ABCI genes in these eight paralog pairs were generated from the domain loss of genes in ABCA-ABCG. In short, the functions of genes in the BraABC gene family were enhanced and expanded through gene duplications. However, further functional analyses of BraABCs are still needed to determine which evolutionary fate the syntenic paralogs pairs undergo during the process of sequence and functional evolution.
In conclusion, we performed the first genome-wide analysis of the ABC gene family in B. rapa, and a total of 179 BraABCs were identified. Phylogenetic relationships, gene structures and cis-regulatory elements of BraABCs were studied in detail. Expression patterns of BraABCs were also characterized in different tissues and under different stresses. Furthermore, the conservation of BraABCs and the evolutionary fates of syntenic BraABC paralog pairs following WGT were studied. Our results could help to further investigate the biological functions of BraABCs in plant growth, development, and stress response.

AUTHOR CONTRIBUTIONS
CY, WD, and XH conceived and designed the experiments. CY and SL performed the experiments. CY, WD, YL, and XH analyzed data. CY and XH wrote the manuscript. All authors read and approved the manuscript.