Original Research ARTICLE
Weighted gene co-expression analyses point to long non-coding RNA hub genes at different Schistosoma mansoni life-cycle stages
- 1Butantan Institute, Brazil
- 2Programa de Pós-graduação em Bioinformática, Instituto de Matemática, Estatística da Universidade de São Paulo, Brazil
- 3Institute of Chemistry, University of São Paulo, Brazil
Long non-coding RNAs (lncRNAs) (>200 nt) are expressed at levels lower than those of the protein-coding mRNAs, and in all eukaryotic model species where they have been characterized, they are transcribed from thousands of different genomic loci. In humans, some four dozen lncRNAs have been studied in detail, and they have been shown to play important roles in transcriptional regulation, acting in conjunction with transcription factors and epigenetic marks to modulate the tissue-type specific programs of transcriptional gene activation and repression. In Schistosoma mansoni, around ten thousand lncRNAs have been identified in previous works. However, the limited number of RNA-seq libraries that had been previously assessed, together with the use of old and incomplete versions of the S. mansoni genome and protein-coding transcriptome annotations, have hampered the identification of all lncRNAs expressed in the parasite. Here we have used 633 publicly available S. mansoni RNA-seq libraries from whole worms at different stages (n=121), from isolated tissues (n=24), from cell-populations (n=81) and from single-cells (n=407). We have assembled a set of 16,583 lncRNA transcripts originated from 10,024 genes, of which 11,022 are novel S. mansoni lncRNA transcripts, while the remaining 5,561 transcripts comprise 120 lncRNAs that are identical to and 5,441 lncRNAs that have gene overlap with S. mansoni lncRNAs already reported in previous works. Most importantly, our more stringent assembly and filtering pipeline has identified and removed a set of 4,293 lncRNA transcripts from previous publications that were in fact derived from partially processed mRNAs with intron retention. We have used weighted gene co-expression network analyses and identified 15 different gene co-expression modules. Each parasite life-cycle stage has at least one highly correlated gene co-expression module, and each module is comprised of hundreds to thousands lncRNAs and mRNAs having correlated expression patterns at different stages. Inspection of the top most highly connected genes within the modules’ networks has shown that different lncRNAs are hub genes at different life-cycle stages, being among the most promising candidate lncRNAs to be further explored for functional characterization.
Keywords: Parasitology, RNA-Seq (quantification) analysis, Single-cell sequencing data, Schistosoma mansoni, Long non-coding RNAs (lncRNAs), Weighted gene coexpression network analysis
Received: 04 Apr 2019;
Accepted: 09 Aug 2019.
Copyright: © 2019 Maciel, Morales-Vicente, Silveira, Ribeiro, Olberg, Pires, Amaral and Verjovski-Almeida. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Prof. Sergio Verjovski-Almeida, Butantan Institute, São Paulo, São Paulo, Brazil, email@example.com