Gene Expression Profiling in the Injured Spinal Cord of Trachemys scripta elegans: An Amniote with Self-Repair Capabilities

Slider turtles are the only known amniotes with self-repair mechanisms of the spinal cord that lead to substantial functional recovery. Their strategic phylogenetic position makes them a relevant model to investigate the peculiar genetic programs that allow anatomical reconnection in some vertebrate groups but are absent in others. Here, we analyze the gene expression profile of the response to spinal cord injury (SCI) in the turtle Trachemys scripta elegans. We found that this response comprises more than 1000 genes affecting diverse functions: reaction to ischemic insult, extracellular matrix re-organization, cell proliferation and death, immune response, and inflammation. Genes related to synapses and cholesterol biosynthesis are down-regulated. The analysis of the evolutionary distribution of these genes shows that almost all are present in most vertebrates. Additionally, we failed to find genes that were exclusive of regenerating taxa. The comparison of expression patterns among species shows that the response to SCI in the turtle is more similar to that of mice and non-regenerative Xenopus than to Xenopus during its regenerative stage. This observation, along with the lack of conserved “regeneration genes” and the current accepted phylogenetic placement of turtles (sister group of crocodilians and birds), indicates that the ability of spinal cord self-repair of turtles does not represent the retention of an ancestral vertebrate character. Instead, our results suggest that turtles developed this capability from a non-regenerative ancestor (i.e., a lineage specific innovation) that was achieved by re-organizing gene expression patterns on an essentially non-regenerative genetic background. Among the genes activated by SCI exclusively in turtles, those related to anoxia tolerance, extracellular matrix remodeling, and axonal regrowth are good candidates to underlie functional recovery.

We tested 4 pipelines: Trans-abyss, Cap3 + Abyss, Oases and Trinity, aiming to determine which one performs better in our dataset, namely optimizing two parameters: the completeness of the reconstructed transcripts and the number of transcripts that are effectively recovered The test was conducted as follows: a group of 62 full length T. scripta mRNAs (downloaded from GenBank which were obtained by Sanger sequencing from individually isolated mRNA molecules) was used as the gold standard. For each of the assemblers we measured the number of mRNA that were recovered and the % of completeness. A Q scores, which was defined as the weighted averaged of the quantity and quality of reconstructed mRNAs (Q=∑NiPi., where P is the proportion of the reference mRNA that is well reconstructed (i.e. completeness) and Ni is the number of mRNAs reconstructed at a Pi proportion, further details described in Greif et al, 2013). As shown in the figure presented below, Trinity was the assembler which performed the best. In effect, even if the combination of Abyss+CAP3 reconstructed a higher proportion of mRNAs, the completeness of its reconstruction are considerably lower than that of Trinity. In turn, Oases and Abyss(alone) clearly down-performed in comparison to the other two. This final assembly obtained by Trinity, and used for downstream analysis, consisted of 244912 contigs with a N50 of 1143 bp. The assembly was done using both paired-end and single reads (orphan pairs after filtering) as described by the developers of Trinity.

Orthology assignment
Identification of orthologs was performed using a strategy that combined an automatic detection step using the program fastortho (http://enews.patricbrc.org/fastortho/) after which we integrated other sources of information such inspection of phylogenetic trees and online web tools containing databases of vertebrate orthologous genes. Valentin-Kahan et al., 2017 For fastortho program, the inputs were multifasta files containing all annotated genes (amino acid sequences) from each species. Fastortho uses blastp with a default e-value of 1e-5. From an initial group of homologous genes, fastortho attempts to distinguish between orthologs and paralogs using the Markov Cluster algorithm (MCL; Van Dongen 2000; http://micans.org/mcl/) in the same way as Orthomcl (Li, Stoeckert and Roos, Genome Res. 2003.). For powering the entries in the similarity matrix we used the default inflation value=1.5 (MCL, powers both the matrix and the individual entries).
The set of COGs (Clusters of Orthologous Groups) obtained by using fastortho was submitted to additional checks and refinements. First we addressed special situations; these are the cases of several genes that were missing in the genome of the turtle Pelodiscus sinensis that we suspected could be the result of incomplete genome assembly/annotation. This was done by searching for their presence in the genome of another turtle species (Chelonia mydas). The results of this study are presented in table S4a. Blastn, and tblastn were used to search in the genomes of Gallus gallus and Anolis carolinensis. Some genes we suspected their absence could also be attributed to incomplete assembly/annotation (for these two species we also used HCOP, see later).
We also checked in detail problematic COGs having inconsistencies and gene absences. The former refers to situations with higher or lower % of amino acid identity than what would be expected according to the phylogeny. For instance, turtles and human with significantly higher amino acid identity % than turtles and chicken. These "problematic" COGs, were analyzed by visual inspection of reciprocal hits in NCBI blastp and inspection of the phylogenetic tree built with the group of homologs (also examining the phylogenetic trees that are prebuilt in databases of homologs). In many cases the inconsistencies could be attributed to the fact that the annotation in one of the problematic species includes only a gene segment (having lower or higher conservation than the average of the gene). In other cases the inconsistencies were due to incorrect orthology assignment, and these were manually corrected. Other type of COGs considered "problematic" are those having gene absences. An important source of information for checking these latter COGs consisted in searching the presence of homologs in databases of othologous genes. This search was done on the basis of standard gene symbols (gene names). For this purpose it was necessary, in some cases, to annotate the C. picta sequence since had no gene symbol associated to them. To conduct databese searches we used HCOP (http://www.genenames.org/cgi-bin/hcop), which is a meta-database of orthologous genes. HCOP integrates data from different sources, including other databases of orthologs such as OrthoDB (http://www.orthodb.org/), OrthoMCL (http://orthomcl.org/orthomcl/), NCBI Homologene (https://www.ncbi.nlm.nih.gov/homologene), Ensembl, Phylomedb (http://phylomedb.org/) among others. HCOP also includes data from species specific databases (which contain manually curated information) such as Zebrafish (ZFIN http://zfin.org/), Xenopus (Xenbase, http://www.xenbase.org/), Chicken (Birdbase http://birdgenenames.org/), etc.
In a few cases there were conflicting results between the fastortho results (which in turn rely on blastp) obtained by us, and those indicated by HCOP. That is, our results indicated that a given gene was absent in a given species, nevertheless an otrtholog to the gene in question was reported in the database. Whenever these contradictions appeared we proceeded as follows: the gene was marked as uncertain in figure 6 (violet) and table S4, and we also used psi-blast, a more sensitive homology search program, to look for it in the corresponding virtual proteome. Another source of data used to search for missing genes consisted in examining the genomes of other species that are comparatively close to the species where the gene is missing (such as C. midas for the case of P. sinesis, rats in the case of mice). Valentin-Kahan et al., 2017 All the information concerning the conflicting evidence is detailed in table S4 for each problematic gene.           Note that the non-matching positions between the species are scattered along the sequences, therefore a substantial proportion of RNseq reads will fail to map onto the C. picta genomic sequence. This is shown in panel B that corresponds to screenshots of mapping between reads