Edited by: Wenqin Wang, Shanghai Jiao Tong University, China
Reviewed by: Changjiang Yu, Qingdao Institute of Bioenergy and Bioprocess Technology (CAS), China; Yong2017 Zhou, King Abdullah University of Science and Technology, Saudi Arabia
This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Yellow lupine (
Yellow lupine (
The main constraint on a large-scale cultivation of yellow lupine comes from its excessive shedding of generative organs, which contributes to significant yield losses. Therefore, current research focuses on the development of varieties of yellow lupine and cultivation conditions that would prevent massive flower and pod dropping, consequently stabilizing the yield in various environmental conditions (Lucas et al.,
Advances in high-throughput techniques have found new opportunities for deeper exploration of complex nets of factors that regulate biological processes. However, it generates tremendous amount of data, which is impossible to analyze without powerful computers and programming skills. For example, in databases like SRA NCBI, only raw data are deposited, which makes the information unavailable to a wider scientific audience. Due to the current trend in analyzing big amounts of biological data in evolutionary context, it is of great importance to provide the users with the most convenient way possible. One of the best solutions includes the creation of a database with user-friendly interface and downloadable data in the form of analysis-ready tables.
Exemplary databases of this type for other plant species usually contain data on one type of RNA, either encoding proteins (Kawahara et al.,
In case of
Detailed analysis of the data concerning miRNA and siRNA in yellow lupine flowers has been already published (Glazinska et al.,
LuluDB was created on the basis of NGS sequencing analysis of sRNA, transcriptomes, and degradome libraries obtained from generative organs of yellow lupine cv. Taper: flowers in various stages of development, developing pod walls and seeds, flower pedicels, and pods undergoing abscission and control ones. Through this experimental design, we aimed at examining global changes in expression during flower development, and wanted to determine the differences in their development depending on the location in the inflorescence, which is associated with the tendency to fall off/transform into pods (van Steveninck,
List of samples deposited to date in the LuluDB database.
UF1 | Upper flowers stage 1 | Flowers from upper part of raceme in stage 1 | • | • | (Glazinska et al., |
|
UF2 | Upper flowers stage 2 | Flowers from upper part of raceme in stage 2 | • | • | (Glazinska et al., |
|
UF3 | Upper flowers stage 3 | Flowers from upper part of raceme in stage 3 | • | • | • | (Glazinska et al., |
UF4 | Upper flowers stage 4 | Flowers from upper part of raceme in stage 4 | • | • | (Glazinska et al., |
|
LF1 | Lower flowers stage 1 | Flowers from lower part of raceme in stage 1 | • | • | (Glazinska et al., |
|
LF2 | Lower flowers stage 2 | Flowers from lower part of raceme in stage 2 | • | • | (Glazinska et al., |
|
LF3 | Lower flowers stage 3 | Flowers from lower part of raceme in stage 3 | • | • | • | (Glazinska et al., |
LF4 | Lower flowers stage 4 | Flowers from lower part of raceme in stage 4 | • | • | (Glazinska et al., |
|
FPNAB | Flower pedicels non-abscissing | Pedicels of non-abscissing flowers | • | • | (Glazinska et al., |
|
FPAB | Flower pedicels abscissing | Pedicels of abscissing flowers | • | • | (Glazinska et al., |
|
PW1 | Pod walls stage 1 | Pod walls in early stage of development | • | • | This study | |
PW2 | Pod walls stage 2 | Pod walls in middle stage of development | • | • | This study | |
PW3 | Pod walls stage 3 | Pod walls in late stage of development | • | • | • | This study |
PS1 | Pod seeds stage 1 | Seeds in early stage of development | • | • | This study | |
PS2 | Pod seeds stage 2 | Seeds in middle stage of development | • | • | This study | |
PS3 | Pod seeds stage 3 | Seeds in late stage of development | • | • | • | This study |
PNAB | Pods non-abscissing | Non-abscissing pods | • | • | This study | |
PAB | Pods abscised | Abscissing pods | • | • | This study |
After the sequencing and preliminary data analysis, the data concerning sequences of identified coding RNAs and ncRNAs were first deposited in the raw form in NCBI SRA database and then analysis-ready data were uploaded into the LuluDB database.
The database contains sequences of 456 known and 32 novel miRNAs, as well as 318 phased siRNAs identified in yellow lupine along with information about their expression and target transcripts. In our previous paper (Glazinska et al.,
Expression of small RNAs in individual samples is stated in RPM (reads per million). For both miRNAs and siRNAs, potential target transcripts were identified by degradome data analysis carried out with CleaveLand4 (Addo-Quaye et al.,
We have identified lncRNAs by performing BLASTn search within CantataDB (Szcześniak et al.,
LuluDB contains 267,349 protein-coding RNA sequences with annotations to commonly used databases: Blastp, Blastx, Eggnog, KEGG, CantataDB, miRBase, NCBI protein, Pfam, Rfam, and GO. Because the reference yellow lupine genome sequencing is still in progress (Iqbal et al.,
The expression levels in FPKM unit (fragments per kilobase of exon per million fragments mapped) are shown only for the relevant libraries.
You can easily navigate to major database components from the top of the home page (
Screenshot of LuluDB home page and of the interface to submit BLAST searches.
One of the most crucial elements of the home page is the Browse section. This page contains links to various parts of the database, such as miRNA, phased siRNA, lncRNA, as well as protein-coding RNAs (
Screenshot of LuluDB page concerning example miRNA.
The phased siRNA section is structured in similar manner (
On the lncRNA main page, there is a list of transcripts identified as lncRNAs, composed of ID from Cantata, LuluDB transcript ID, and Trinity Id (
In the protein-coding transcript section, the list of transcripts can be searched by TRINITY ID, LuluDB ID, or ORF type, which can be: “complete,” “internal,” “3prime_partial,” or “5prime_partial” or annotations to various databases (
Summary of protein-coding transcripts deposited to date in LuluDB annotated in various open access databases.
RFAM | 4,433 |
PFAM | 198,225 |
CantataDB | 31,718 |
Nr | 288,854 |
SwissProt | 216,711 |
KEGG | 247,375 |
GO | 534,413 |
miRBase | 2,565 |
Results of sRNA and RNA deep sequencing were validated using qPCR technique. Validation for data concerning flowers was already presented in our recent paper (Glazinska et al.,
Relationship between next-generation sequencing and qPCR results.
In order to show the functionality of LuluDB and present an exemplary analysis pipeline, we used the database interface to identify transcripts encoding homologs of
Firstly, we downloaded the CDS sequences of selected
All sequences identified by BLASTn are annotated as
In
In many plant species
In this work, we have demonstrated that members of all
Phylogenetic tree and domain structure of members of DCL families identified in
In plants, members of DCL protein families contain six conserved domains: N-terminal helicase domain (built with DEXD/H-box and helicase-C subdomains), followed by DUF283 (domain of unknown function, also known as Dicer-dimer or Dicer dimerization domain), PAZ (Piwi-Argonaute-Zwille), tandemly arranged two RNase III domains, and up to two C-terminal dsRBD (dsRNA binding) domains (Carmell and Hannon,
The presence of all of the abovementioned domains in DCL1 is highly conserved across plants including legumes, which proves that it plays the most important role in sncRNA biogenesis (Gasciolli et al.,
In our analyses, only the
In addition to analyzing the putative amino acid sequence of DCLs, we have also explored nucleotide sequences of transcripts, which encode these proteins. The mRNA and CDS sequences deposited in the database enable the identification of non-translated sequences, e.g., 5′UTR, which often contain regulatory sequences providing premises for speculation on the possible factors affecting expression of the studied genes. We performed sequence analysis of 5′UTR regions of genes encoding DCL1, 3, and 4 (
We have analyzed selected 5′UTRs by querying PlantCare, a database of plant
Heatmap presenting expression of RNAs coding for DCLs identified in yellow lupine, created using the “ComplexHeatmap” R package.
In soybean,
Literature data contain evidence that miR162 regulates the
After typing the phrase “miR162” in the search bar of the browse/miRNA section, we are presented with a list of eight lupine miRNAs annotated as miR162, which means that they are identical to miR162s from other plant species deposited in miRBase (
Homologs of miR162 identified in yellow lupine. Left: aligned miRNA sequences with the sequence logo on the top. Right: Heatmap presenting expression of homologs of miR162 identified in yellow lupine, created using the “ComplexHeatmap” R package.
Analyses of the data for miR162 present in LuluDB indicate that this miRNA can also regulate the expression of
It is noteworthy that the download feature is very useful for both analysis and presentation of the data. The downloaded files include information about miRNA ID, its sequence, the list of target genes, and expression of miRNA in different organs. Redirection links to pages for individual targets show that majority of them were identified in degradomes and that they are in fact transcripts annotated as
Regarding the already mentioned novel miRNA ID486, which is most likely responsible for regulation of
List of target genes for novel miR486 from
Ll_transcript_256739 | Flowers | GCAGAGTCTGCACACAAACGAA | 903 | 924 | Cleavage | Endoribonuclease Dicer homolog 2 |
Ll_transcript_256747 | Flowers | GCAGAGTCTGCACACAAGCGAA | 903 | 924 | Cleavage | Endoribonuclease Dicer homolog 2 |
Ll_transcript_256729 | Flowers | GCAGAGTCTGCACACAAACGAA | 1115 | 1136 | Cleavage | Endoribonuclease Dicer homolog 2 |
Ll_transcript_256731 | Flowers | GCAGAGTCTGCACACAAACGAA | 903 | 924 | Cleavage | Endoribonuclease Dicer homolog 2 |
Ll_transcript_256736 | Flowers | GCAGAGTCTGCACACAAACGAA | 1115 | 1136 | Cleavage | Endoribonuclease Dicer homolog 2 |
Ll_transcript_479081 | Pods | GCAGAGTCTGCACACAAACGAA | 1131 | 1152 | Cleavage | Endoribonuclease Dicer homolog 2 |
Ll_transcript_479083 | Pods | GCAGAGTCTGCACACAAACGAA | 1491 | 1512 | Cleavage | Endoribonuclease Dicer homolog 2 |
Ll_transcript_479084 | Pods | GCAGAGTCTGCACACAAACGAA | 1020 | 1041 | Cleavage | Endoribonuclease Dicer homolog 2 |
Ll_transcript_479065 | Pods | GCAGAGTCTGCACACAAACGAA | 1131 | 1152 | Cleavage | Endoribonuclease Dicer homolog 2 |
Ll_transcript_479068 | Pods | GCAGAGTCTGCACACAAACGAA | 919 | 940 | Cleavage | Endoribonuclease Dicer homolog 2 |
Ll_transcript_256739 | Flowers | GCAGAGTCTGCACACAAACGAA | 903 | 924 | Cleavage | Endoribonuclease Dicer homolog 2 |
Ll_transcript_256726 | Flowers | GCAGAGTCTGCACACAAACGAA | 1115 | 1136 | Cleavage | Endoribonuclease Dicer homolog 2 |
Ll_transcript_256727 | Flowers | GCAGAGTCTGCACACAAACGAA | 1115 | 1136 | Cleavage | Endoribonuclease Dicer homolog 2 |
Ll_transcript_256729 | Flowers | GCAGAGTCTGCACACAAACGAA | 1115 | 1136 | Cleavage | Endoribonuclease Dicer homolog 2 |
Ll_transcript_256734 | Flowers | GCAGAGTCTGCACACAAACGAA | 1115 | 1136 | Cleavage | Endoribonuclease Dicer homolog 2 |
Ll_transcript_256736 | Flowers | GCAGAGTCTGCACACAAACGAA | 1115 | 1136 | Cleavage | Endoribonuclease Dicer homolog 2 |
Ll_transcript_479081 | Pods | GCAGAGTCTGCACACAAACGAA | 1131 | 1152 | Cleavage | Endoribonuclease Dicer homolog 2 |
Ll_transcript_479083 | Pods | GCAGAGTCTGCACACAAACGAA | 1491 | 1512 | Cleavage | Endoribonuclease Dicer homolog 2 |
Ll_transcript_479084 | Pods | GCAGAGTCTGCACACAAACGAA | 1020 | 1041 | Cleavage | Endoribonuclease Dicer homolog 2 |
Ll_transcript_479065 | Pods | GCAGAGTCTGCACACAAACGAA | 1131 | 1152 | Cleavage | Endoribonuclease Dicer homolog 2 |
Ll_transcript_479068 | Pods | GCAGAGTCTGCACACAAACGAA | 919 | 940 | Cleavage | Endoribonuclease Dicer homolog 2 |
Ll_transcript_219270 | Flowers | TCAGGGTCTGCGCGCAAACAAA | 2411 | 2432 | Cleavage | Nucleolar protein 12 |
Ll_transcript_219271 | Flowers | TCAGGGTCTGCGCGCAAACAAA | 1000 | 1021 | Cleavage | Nucleolar protein 12 |
Ll_transcript_219276 | Flowers | TCAGGGTCTGCGCGCAAACAAA | 657 | 678 | Cleavage | Nucleolar protein 12 |
Ll_transcript_219278 | Flowers | TCAGGGTCTGCGCGCAAACAAA | 1089 | 1110 | Cleavage | Nucleolar protein 12 |
Ll_transcript_386361 | Pods | TCAGGGTCTGCGCGCAAACAAA | 2294 | 2315 | Cleavage | Nucleolar protein 12 |
Ll_transcript_386366 | Pods | TCAGGGTCTGCGCGCAAACAAA | 2412 | 2433 | Cleavage | Nucleolar protein 12 |
Ll_transcript_386378 | Pods | TCAGGGTCTGCGCGCAAACAAA | 1255 | 1276 | Cleavage | Nucleolar protein 12 |
Ll_transcript_386378 | Pods | TCAGGGTCTGCGCGCAAACAAA | 1255 | 1276 | Cleavage | Nucleolar protein 12 |
Analysis of miR486.
Homologs of both miR162 as well as new miR486 do not show differential expression in pedicels of abscissing and non-abscissing flowers similarly to abscissing and non-abscissing pods, which can indicate that they are not directly linked to the generative organ abscission process in yellow lupine. However, changes in their accumulation during the development suggest that they regulate the sRNA biogenesis depending on the stage of development in both flowers and pods.
The data present in the database was already used to identify new mechanisms for regulating gene expression by sRNA in yellow lupine; e.g., we described the involvement of sRNAs in
To date, the LuluDB is profiled to provide information about the regulation of transcripts by miRNA and siRNA, confirmed by degradome analysis. However, it also contains a category of “lncRNA,” which was not explored by our group, and gives the opportunity to perform preliminary
The presented LuluDB database has been equipped with user-friendly and intuitive tools for searching and investigating our NGS data, including more advanced bioinformatics.
Diagram depicting the possible ways to access and explore the database.
Yellow lupine plants used for RNA extraction were cultivated in Nicolaus Copernicus University's experimental field in Piwnice near Torun (Poland, 53°05′42.0″N 18°33′24.6″E), as described in detail in Glazinska et al. (
RNA isolation from all of the collected samples was carried out using miRNeasy Mini Kit (Qiagen, Venlo, the Netherlands) with on-column DNA digestion with the RNase-Free DNase Set (Qiagen, Venlo, the Netherlands), as described in detail in Glazinska et al. (
Similarly isolated total RNA was used to create transcript libraries using the NEBNext Ultra Directional RNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, MA, USA) and sequenced on the HiSeq4000 platform in the 100 paired-end mode as described in detail in Glazinska et al. (
Degradomes were obtained using total RNA pooled from samples UF3/LF3 and PW3/PS3 to meet the amount of material required for sequencing. The protocol for degradome library preparation and detailed information can be found in Glazinska et al. (
The
Annotation of transcriptomes was performed with Trinotate (v 3.0.2). BLASTX with “max_target_seqs 1” option was used to identify the sequence similarity between lupine transcripts and proteins annotated in Swiss-Prot, a non-redundant and manually curated dataset from the UniProt database. Open reading frames were predicted with TransDecoder (v 5.0.1) (Haas et al.,
To identify phylogenetically conserved mature miRNAs with sequences and lengths identical to known plant miRNAs, we searched miRBase for similarity at the mature miRNA level. Short reads from RNA-Seq were compared against mature miRNAs from miRBase (Kozomara et al.,
MiRNA and siRNA expression was analyzed using the Stem Loop RT-qPCR technique according to Glazinska et al. (
The RNA-Seq data and small RNA-Seq data have been uploaded to SRA database and are available under BioProject ID PRJNA419564 and Submission ID SUB3230840.
LuluDB was developed using Hypertext Markup Language (HTML), Sassy Cascaded Style Sheets (SCSS), Cascading Style Sheets (CSS), PHP 5.6, Yii 2.0 PHP framework (
All datasets analyzed for this study are included either in the article/
PG: conceptualization, funding acquisition, supervision, and writing—review and editing. PG, MW, and JK: data curation. PG, MK, and WG: investigation, visualization, and writing—original draft. JK and MW: software.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The Supplementary Material for this article can be found online at:
Alignment of sequences of corresponding DCL1-coding transcripts expressed in flowers and pods.
Screenshot of LuluDB page concerning:
Screenshot of LuluDB page concerning protein-coding RNA sequence.
Juxtaposition of NGS and qPCR expression levels of eight transcripts used for validation. Homologs of the same transcript found in flowers and pods are shown separately.
Details of data deposition in NCBI SRA.
A list of DCL sequences from
Results of LuluDB search by built-in BLASTn using
Results of LuluDB search by built-in BLASTn using
Results of LuluDB search by built-in BLASTn using
Results of LuluDB search by built-in BLASTn using
List of regulatory sequences identified within 5′UTRs of
A detailed list of miRNAs in LuluDB annotated as miR162.
List of target transcripts for members of
List of primers and UPL probes used for RT-qPCR reaction.