MIRNA-DISTILLER: A Stand-Alone Application to Compile microRNA Data from Databases

MicroRNAs (miRNA) are small non-coding RNA molecules of ∼22 nucleotides which regulate large numbers of genes by binding to seed sequences at the 3′-untranslated region of target gene transcripts. The target mRNA is then usually degraded or translation is inhibited, although thus resulting in posttranscriptional down regulation of gene expression at the mRNA and/or protein level. Due to the bioinformatic difficulties in predicting functional miRNA binding sites, several publically available databases have been developed that predict miRNA binding sites based on different algorithms. The parallel use of different databases is currently indispensable, but highly uncomfortable and time consuming, especially when working with numerous genes of interest. We have therefore developed a new stand-alone program, termed MIRNA-DISTILLER, which allows to compile miRNA data for given target genes from public databases. Currently implemented are TargetScan, microCosm, and miRDB, which may be queried independently, pairwise, or together to calculate the respective intersections. Data are stored locally for application of further analysis tools including freely definable biological parameter filters, customized output-lists for both miRNAs and target genes, and various graphical facilities. The software, a data example file and a tutorial are freely available at http://www.ikp-stuttgart.de/content/language1/html/10415.asp


INTRODUCTION
MicroRNAs (miRNAs) are single-stranded, non-coding RNA molecules consisting of about 22 nucleotides which generally bind to the 3 -untranslated regions (UTRs) of mRNAs. This usually leads to a down regulation of target gene expression (Bartel, 2009). Following binding to seed sequences of 7-8 nucleotides at the 3 -UTR of target gene transcripts, the target mRNA is either degraded or translation is inhibited, typically resulting in lower expression of protein, although activation of translation has also been reported (Vasudevan et al., 2007;Guo et al., 2010). The large number of miRNAs, estimated at over 1000 different molecules per mammalian species (Berezikov et al., 2005), and their poorly defined binding specificities allow for a vast number of potential miRNA-target gene interactions, which has been estimated to affect expression of one third or even more of all genes in the genome (Lim et al., 2005).
However, systematic investigation is still hampered by the lack of efficient experimental and reliable bioinformatics methods for target identification. Several dedicated public databases are now available, each of them using a specific bioinformatics or computational strategy to predict miRNA-target sites. The different algorithms result in predictions with little overlap regarding number and identity of miRNAs for a given target, or of targets for a given miRNA (Sethupathy et al., 2006). Because of these basic limitations of in silico prediction, the parallel use of several databases is necessary to enhance the sensitivity and/or specificity of prediction, and is therefore indispensable in this field. Currently there are at least two databases, miRecords and miRò (Laganà et al., 2009;Xiao et al., 2009) which compile miRNA predictions from different databases. Although these tools are useful for some applications, we noticed a lack of practical solutions for the management of large prediction data sets adapted to the needs of the wet-lab researcher.
We have therefore developed MIRNA-DISTILLER, a standalone program which allows to automatically extract selected data for miRNAs predicted to interact with a given set of target genes from several publically available databases (in the current version including TargetScan, microCosm, and miRDB). The program calculates intersections and stores data locally for application of further analysis tools (Figure 1). For example, adjustment of binding scores provided by each database and freely definable filter definitions for additional miRNA properties (e.g., tissue-specific expression profiles) allows effective selection of candidate miR-NAs for a given gene of interest. The MIRNA-DISTILLER thus facilitates miRNA bioinformatics and helps to focus experimental validation on the most promising candidates.

ISOLATION AND QUANTIFICATION OF miRNAs
For profiling expression of miRNAs four pools of liver total RNA, each consisting of 10 different liver donor tissues were prepared. The origin of the human liver samples was described before (Klein et al., 2010). In brief, liver tissues were previously collected from patients undergoing liver surgery at the Department of General, www.frontiersin.org Visceral, and Transplantation Surgery (A. K. Nuessler, P. Neuhaus, Campus Virchow, University Medical Center Charité, Humboldt University Berlin, Germany). The study was approved by the ethics committees of the medical faculties of the Charité, Humboldt University, and of the University of Tuebingen and conducted in accordance with the Declaration of Helsinki. Written informed consent was obtained from each patient. All tissue samples had been examined by a pathologist and only histologically normal liver tissue was collected and stored at −80˚C. Total RNA was extracted using the mirVANA kit from Ambion (Austin, TX, USA). Reverse transcription of total RNA (1 μg) was performed by using stem-loop RT primer pool A v 2.0 from Applied Biosystems (Foster City, CA, USA) following the protocol provided by the manufacturer. qPCR for miRNA expression was performed with micro fluidic cards TaqMan ® Low Density Array A v 2.0 for humans (Applied Biosystems) and detected with the ABI 7900HT Fast real time RT-PCR system. Relative quantification (RQ) was calculated by normalization to the endogenous control MammU6 RQ = 2exp(− Ct).

PROGRAM DESCRIPTION GENERAL IMPLEMENTATION
MIRNA-DISTILLER was programmed in Visual Basic.Net (Microsoft). The software, a data example file and a tutorial are freely available at http://www.ikp-stuttgart.de/content/language1/ html/10415.asp

SYSTEM REQUIREMENTS
The MIRNA-DISTILLER is an executable stand-alone software which runs without installation in the Microsoft Windows operating systems windows vista and windows 7 (XP and previous versions need to have windows framework.net 2.0 or higher installed).
Database selection: the current program version (MIRNA-DISTILLER Version 1.0.0.5) allows to connect to TargetScan 1 (Lewis et al., 2005;Grimson et al., 2007;Friedman et al., 2009), microCosm 2 (Griffiths-Jones et al., 2006, 2008, and miRDB 3 (Wang, 2008). These databases have been selected because they are among the most comprehensive ones and because they share a similar searching format which allowed development of a search mask that accepts the official gene name, thus avoiding the necessity to copy/paste sequences or gene IDs.

QUERY INTERFACE
On the main interface genes of interest are entered by their official gene name with the "New Gene" button into the "Genes" tab ( Figure 2). Entering only one or several hundred genes of interest is possible. The program then automatically downloads miRNA prediction data and corresponding binding scores for each selected database for local storage. Selecting one of the previously loaded genes in the left window provides a summary table for the predicted miRNAs, where miRNAs predicted from all of the three databases are highlighted in blue. The total number of miRNAs predicted by each database are visualized in a bar diagram.
The gene-linked"miRNA Ranking"tab by default provides a list with all miRNAs ranked according to the number of their target genes (Figure 3). Additional sorting options, including alphabetical, are also supported. Once the miRNA data have been retrieved for a given gene set, the gene targets predicted for each miRNA can also be displayed with corresponding binding scores and listed according to the database. This feature thus allows the user to identify possible target genes for a given miRNA.

FILTERING OPTIONS
Two different filters are available to be used separately or in combination. The"Actual Minimum Score"filter limits all data according to the selected minimum score. "Filter Definitions" allows the definition of filters in a user-specific way based on additional miRNA properties, e.g., such obtained from the researchers own investigations. A typical application of this filter is to delete miRNAs not expressed in the tissue of interest, based on experimental data.
User-specific filtering of the data with the "Filter-Definitions" option provides a broad spectrum of adoptions by using classical operators (e.g., bigger, smaller, is part of). The combination of user-defined miRNA properties and filter-definitions supports creative data management.

EXAMPLE OF GENE-miRNA INTERACTIONS
We analyzed 62 selected ADME genes (i.e., genes involved in the absorption, distribution, metabolism, or excretion of drugs and foreign chemicals) for miRNA binding sites. In order to filter the prediction data only for liver expressed miRNAs, we measured miRNA expression in pooled human liver samples. The ADME FIGURE 2 | Main interface. The left window lists the genes of interest that have been entered before, the middle window contains the predicted miRNAs with corresponding scores for each selected database and the intersection list, and the right window displays the selected intersection graphically.
FIGURE 3 | miRNA Ranking. All miRNAs are ranked by their occurrence in the preloaded gene set. It is also possible to sort the miRNAs alphabetically. Furthermore the program allows to predict target genes for all previously selected miRNAs.
www.frontiersin.org genes are listed in Table 1 with their total number of predicted miRNA binding sites compiled with the MIRNA-DISTILLER from all three databases. After defining and applying a filter for liver specific miRNAs based on the experimental expression data, between 10 and 67% (average 25%) of the predictions remained, thus effectively reducing the number of relevant predictions for each gene.

EXAMPLE OF PUBLISHED GENE-miRNA INTERACTIONS
To further demonstrate the usefulness of parallel database searches as a proof of concept, 42 published targets of miR-122, the major liver specific miRNA (Gramantieri et al., 2007;Filipowicz and Grosshans, 2011), were loaded in the MIRNA-DISTILLER from miRTarBase 4 . This database has stored more than 3000 miRNAtarget interactions collected and curated manually from literature 4 http://mirtarbase.mbc.nctu.edu.tw/index.html to represent functionally relevant miRNAs. Thus, the miRNAtarget interactions in miRTarBase have been experimentally validated by different experimental approaches (Hsu et al., 2011). For these validated miR-122 target genes we would expect that the ranking of miRNAs provided by miRNA-DISTILLER should result in miR-122 as the most common miRNA among this set of genes. As shown in Figure 4, miR-122 was indeed found to be on the top of the ranking list and was predicted to bind to 34 of the 42 genes. Some but not all of the binding sites were predicted by more than one database.

LINKAGE TAB
The "Linkage" tab ( Figure 5) is an additional feature that automatically displays a graphical matrix (heatmap) that compares any two genes with respect to their predicted miRNAs, facilitating identification of potentially co-regulated genes. The calculation of this graph takes filter settings automatically into account.
Frontiers in Genetics | Non-Coding RNA

ADVANTAGE OVER ALREADY EXISTING TOOLS
MIRNA-DISTILLER is focused on high-throughput screening of miRNA binding sites for a defined gene set in order to find new possible miRNA-gene interactions. Several web-based tools for compiling miRNA data from different databases already exist, e.g., the above mentioned miRecords (Xiao et al., 2009) and miRò (Laganà et al., 2009). Although these tools also allow to retrieve and to compile data from various miRNA databases, they lack some features we consider highly useful in our program. Thus, the features to store data locally and to combine them with own FIGURE 4 | Illustration of the linkage graph. The Linkage tab automatically calculates the fraction of concordant miRNAs predicted for any two genes and displays them as a graphical matrix. This function facilitates identification of potentially co-regulated genes. The calculation of this graph takes filter settings automatically into account.
experimental data to allow specific retrievals and continuous work with the data are missing in the other tools. Also immediate graphical display of results has not been implemented. In this sense our program is unique as none of the already existing tools has implemented similar options. MIRNA-DISTILLER is therefore a useful tool for miRNA researcher dealing with huge data sets to manage and interactively work with them.

CONCLUSION
MIRNA-DISTILLER facilitates systematic analyses of miRNA predictions for given gene set by combining several useful tools in one application. It allows simultaneous data download from up to three common publically available databases, their comparison and local data storage for further analysis, including several flexible filter options. The user friendly interface permits automatic data processing for large numbers of genes and in combination with the provided tools helps to prioritize potential miRNA binding sites. In this way, miRNA-DISTILLER assists to collect and analyze complex miRNA data and helps to generate hypotheses for experimental validation.

ACKNOWLEDGMENTS
We would like to thank Kathrin Klein and Reiner Hoppe (IKP Stuttgart) for evaluating the program during initial phases of development, and Adrian Schröder (Center for Bioinformatics at the University of Tübingen, Germany) for testing the program. Funding: This work was supported by the German Federal Ministry of Education and Research (BMBF Virtuelle Leber grant 0315756 to Ulrich M. Zanger) and by the Robert Bosch Foundation, Stuttgart, Germany. www.frontiersin.org