Introduction
Planarians are highly regenerative flatworms. They can regrow any missing organ by activating injury and regeneration-specific gene expression programs (Reddien, 2018). The planarian genome encodes over 20,000 genes (Rozanski et al., 2019), and therefore understanding the genetic basis for regeneration in this seemingly simple organism is a complex task. The use of RNA-sequencing (RNA-Seq) for gene expression analysis has transformed the study of planarian gene function, by facilitating the collection of thousands of data points in a single experiment. The accessibility of RNA-Seq technologies is demonstrated by the rapid growth in the number of RNA-Seq libraries from planarians, with over 3,450 planarian RNA BioSamples deposited to the publicly available Sequence Read Archive (Leinonen et al., 2011). The availability of transcriptional profiling has driven the discovery of key factors in planarian regeneration and stem cell (neoblast) biology. This includes the discovery of transcription factors associated with lineage selection and maturation (Forsthoefel et al., 2012; Tu et al., 2015), injury response genes (Wenemoser et al., 2012; Wurtzel et al., 2015), and regulators of neoblast differentiation (Lapan and Reddien, 2012; Zhu et al., 2015).
RNA-Seq data allows, in principle, highly reproducible research using common bioinformatic processing tools that are available to the scientific community. Compared to other research modalities, the format of high-throughput sequencing data is standardized, machine readable, and found in accessible repositories (Cock et al., 2010; Leinonen et al., 2011). However, despite these properties, using or comparing RNA-Seq data across projects and research groups presents several challenges. Data processing and analysis differ between scientists and teams, and continuously evolve based on the available tools, protocols, and technology. Moreover, RNA-Seq data is commonly found in raw fastq format, requiring computational proficiency and resources for even basic analysis. Even processed RNA-Seq data, often available as supplementary information in manuscripts, can be difficult to use. For example, different transcriptome assemblies are used in the planarian community for RNA-Seq data mapping, limiting comparisons of available data. Major planarian computational resources, such as PlanMine and Planosphere, have become instrumental for the research community by providing powerful tools for studying gene function and genome browsing using a gene or transcript focused interface (Rozanski et al., 2019; Nowotarski et al., 2021). Yet, a major strength of RNA-Seq data is the ability to examine changes in expression of multiple transcripts simultaneously.
Here, we curated published planarian RNA-Seq data and implemented a computational pipeline, based on broadly used RNA-Seq analysis tools, for analyzing gene expression. We developed a web-application, PLANAtools, that facilitates browsing, mining, downloading, and visualizing planarian RNA-Seq data interactively, requiring minimal computational skills and resources from the user.
Results
Data retrieval and processing
A list of deposited Schmidtea mediterranea bulk RNA-Seq metadata was retrieved from the NCBI Sequence Read Archive (Leinonen et al., 2011). The list was curated manually and each library was associated with its controls and replicates. Each set of replicates and their controls was processed as a single biological experiment. Raw RNA-Seq libraries were retrieved from the SRA using the fasterq-dump tool (Leinonen et al., 2011) or by using bowtie2 --sra-acc parameter (Langmead and Salzberg, 2012). Raw RNA-Seq data was mapped to the planarian dd_v6 transcriptome assembly (Rozanski et al., 2019) using bowtie2 with parameter--fast. Paired-end RNA-Seq data was mapped as single-end by considering the first read in a read pair. The mapped data was transformed to Binary Alignment Map (BAM) format using SAMtools (Li et al., 2009). For each biological experiment (i.e., replicates and controls), a read count table was produced using featureCounts v2 with parameters [-M -s 0] (Liao et al., 2014). Data normalization and differential gene expression calling were then performed using DESeq2 using the rlog and DESeqDataSetFromMatrix functions (Love et al., 2014). Differential gene expression analysis was not performed on conditions where no biological replicates were available. Processed data tables were then indexed using the R data.table package and stored as binary files using the saveRDS function.
Application implementation
PLANAtools is accessible from https://wurtzellab.org/planatools. It was developed in R using the Shiny framework and packaged to a docker image. The current release includes gene expression analyses of 168 assays from over 40 manuscripts. Data browsing was implemented to accomplish four main uses (Figure 1): 1) Showing gene expression in published datasets as a heatmap or volcano plot based on transcripts that are selected by the users. Transcript selection is performed using free text search of transcript IDs or their putative annotation, as previously described (Dagan et al., 2022). 2) Showing differentially expressed genes of an assay as a heatmap, a volcano plot, or a table in the datasets that are available and analyzed with PLANAtools. 3) Presenting gene expression across datasets by comparing the expression of a single gene in control and treatment samples across assays. 4) Finally, to intersect a list of queried genes with a list of differentially expressed genes in an assay. Resulting visualizations are interactive online, and downloadable as vector images for further processing. The processed gene expression tables and differential gene expression analyses are also available for download.
FIGURE 1
Usage
PLANAtools is a hub for gene expression analyses, and facilitates data visualization, browsing and acquisition. In addition, PLANAtools provides links to the original publication, to the raw deposited data in SRA (Leinonen et al., 2011), and to major organism-specific gene resources, such as FlyBase, PlanMine, and WormBase (Rozanski et al., 2019; Thurmond et al., 2019; Harris et al., 2020). The functionality of the website is showcased in a video tutorial that is found on the website demonstrating the major features of the application.
Summary
PLANAtools provides non-experts access to planarian gene expression data, gene annotations, and external data sources. In contrast to most planarian resources, it is not gene- or transcript-centric. Instead, it allows an overview of gene expression using interactive visualizations and direct downloads of the original data. Therefore, it bridges the gaps in the ability to access, use, and re-use planarian gene expression resources. Periodic updates to PLANAtools are planned, pursuant to the availability of newly deposited planarian gene expression datasets.
Statements
Data availability statement
Publicly available datasets were analyzed in this study. This data can be found here: https://wurtzellab.shinyapps.io/planatools/.
Author contributions
MH and OW designed and implemented the research. MH and OW wrote the manuscript.
Funding
This work is supported by the European Research Council Horizon 2020 research and innovation programme (No. 853640) and the Israel Science Foundation (grant 2039/18). OW is a Zuckerman Faculty Scholar.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
1
CockP. J. A.FieldsC. J.GotoN.HeuerM. L.RiceP. M. (2010). The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 38, 1767–1771. 10.1093/nar/gkp1137
2
DaganY.YesharimY.BonneauA. R.FrankovitsT.SchwartzS.ReddienP. W.et al (2022). m6A is required for resolving progenitor identity during planarian stem cell differentiation. EMBO J. 41, e109895. 10.15252/embj.2021109895
3
FincherC. T.WurtzelO.De HoogT.KravarikK. M.ReddienP. W. (2018). Cell type transcriptome atlas for the planarian Schmidtea mediterranea. Science360, eaaq1736. 10.1126/science.aaq1736
4
ForsthoefelD. J.JamesN. P.EscobarD. J.StaryJ. M.VieiraA. P.WatersF. A.et al (2012). An RNAi screen reveals intestinal regulators of branching morphogenesis, differentiation, and stem cell proliferation in planarians. Dev. Cell23, 691–704. 10.1016/j.devcel.2012.09.008
5
GaliliT.O’CallaghanA.SidiJ.SievertC. (2018). Heatmaply: an R package for creating interactive cluster heatmaps for online publishing. Bioinformatics34, 1600–1602. 10.1093/bioinformatics/btx657
6
HarrisT. W.ArnaboldiV.CainS.ChanJ.ChenW. J.ChoJ.et al (2020). WormBase: A modern model organism information resource. Nucleic Acids Res. 48, D762–D767. 10.1093/nar/gkz920
7
LangmeadB.SalzbergS. L. (2012). Fast gapped-read alignment with Bowtie 2. Nat. Methods9, 357–359. 10.1038/nmeth.1923
8
LapanS. W.ReddienP. W. (2012). Transcriptome analysis of the planarian eye identifies ovo as a specific regulator of eye regeneration. Cell Rep. 2, 294–307. 10.1016/j.celrep.2012.06.018
9
LeinonenR.SugawaraH.ShumwayM. (2011). The sequence read archive. Nucleic Acids Res. 39, D19–D21. 10.1093/nar/gkq1019
10
LiH.HandsakerB.WysokerA.FennellT.RuanJ.HomerN.et al (2009). The sequence alignment/map format and SAMtools. Bioinformatics25, 2078–2079. 10.1093/bioinformatics/btp352
11
LiaoY.SmythG. K.ShiW. (2014). featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics30, 923–930. 10.1093/bioinformatics/btt656
12
LoveM. I.HuberW.AndersS. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550. 10.1186/s13059-014-0550-8
13
NowotarskiS. H.DaviesE. L.RobbS. M. C.MatentzogluN.DoddihalV.et al (2021). Planarian anatomy ontology: A resource to connect data within and across experimental platforms. Development148, dev196097. 10.1242/dev.196097
14
ReddienP. W. (2018). The cellular and molecular basis for planarian regeneration. Cell175, 327–345. 10.1016/j.cell.2018.09.021
15
RozanskiA.MoonH.BrandlH.Martin-DuranJ. M.GrohmeM. A.HuttnerK.et al (2019). PlanMine 3.0-improvements to a mineable resource of flatworm biology and biodiversity. Nucleic Acids Res. 47, D812–D820. 10.1093/nar/gky1070
16
ThurmondJ.GoodmanJ. L.StreletsV. B.AttrillH.GramatesL. S.MarygoldS. J.et al (2019). FlyBase 2.0: The next generation. Nucleic Acids Res. 47, D759–D765. 10.1093/nar/gky1003
17
TuK. C.ChengL. C.T K VuH.LangeJ. J.McKinneyS. A.SeidelC. W.et al (2015). Egr-5 is a post-mitotic regulator of planarian epidermal differentiation. eLife4, e10501. 10.7554/eLife.10501
18
WenemoserD.LapanS. W.WilkinsonA. W.BellG. W.ReddienP. W. (2012). A molecular wound response program associated with regeneration initiation in planarians. Genes Dev. 26, 988–1002. 10.1101/gad.187377.112
19
WurtzelO.CoteL. E.PoirierA.SatijaR.RegevA.ReddienP. W. (2015). A generic and cell-type-specific wound response precedes regeneration in planarians. Dev. Cell35, 632–645. 10.1016/j.devcel.2015.11.004
20
WurtzelO.OderbergI. M.ReddienP. W. (2017). Planarian epidermal stem cells respond to positional cues to promote cell-type diversity. Dev. Cell40, 491–504.e5. 10.1016/j.devcel.2017.02.008
21
ZhuS. J.HallowsS. E.CurrieK. W.XuC.PearsonB. J. (2015). A mex3 homolog is required for differentiation during planarian stem cell lineage development. eLife4, e07025. 10.7554/eLife.07025
Summary
Keywords
planarian, database, gene expression, regeneration, RNAseq, data visualization
Citation
Hoffman M and Wurtzel O (2023) PLANAtools—An interactive gene expression repository for the planarian Schmidtea mediterranea. Front. Cell Dev. Biol. 11:1149537. doi: 10.3389/fcell.2023.1149537
Received
22 January 2023
Accepted
16 March 2023
Published
23 March 2023
Volume
11 - 2023
Edited by
Stefano Tiozzo, Université Paris-Sorbonne, France
Reviewed by
Maja Adamska, Australian National University, Australia
Peter Ladurner, University of Innsbruck, Austria
Updates
Copyright
© 2023 Hoffman and Wurtzel.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Omri Wurtzel, owurtzel@tauex.tau.ac.il
This article was submitted to Evolutionary Developmental Biology, a section of the journal Frontiers in Cell and Developmental Biology
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.