Non-coding RNAs Potentially Controlling Cell Cycle in the Model Caulobacter crescentus: A Bioinformatic Approach

Caulobacter crescentus represents a remarkable model system to investigate global regulatory programs in bacteria. In particular, several decades of intensive study have revealed that its cell cycle is controlled by a cascade of master regulators, such as DnaA, GcrA, CcrM, and CtrA, that are responsible for the activation of functions required to progress through DNA replication, cell division and morphogenesis of polar structures (flagellum and stalk). In order to accomplish this task, several post-translational (phosphorylation and proteolysis) and transcriptional mechanisms are involved. Surprisingly, the role of non-coding RNAs (ncRNAs) in regulating the cell cycle has not been investigated. Here we describe a bioinformatic analysis that revealed that ncRNAs may well play a crucial role regulating cell cycle in C. crescentus. We used available prediction tools to understand which target genes may be regulated by ncRNAs in this bacterium. Furthermore, we predicted whether ncRNAs with a cell cycle regulated expression profile may be directly regulated by DnaA, GcrA, and CtrA, at the onset, during or end of the S-phase/swarmer cell, or if any of them has CcrM methylation sites in the promoter region. Our analysis suggests the existence of a potentially very important network of ncRNAs regulated by or regulating well-known cell cycle genes in C. crescentus. Our hypothesis is that ncRNAs are intimately connected to the known regulatory network, playing a crucial modulatory role in cell cycle progression.


INTRODUCTION
In the last two decades, bacterial non-coding RNAs (ncRNAs) and small RNAs in particular (sRNAs, 50-400 nts), have emerged as central regulators of important cellular processes (Dutta and Srivastava, 2018). Most sRNAs are post-transcriptional regulators positively or negatively affecting the translation and/or stability of their targets. As a result, sRNAs have been shown to play key roles in the adaptive response to the environment and to stress conditions, in particular. In Enterobacteria, most of the proficient sRNAs/target pairs requires the RNA chaperone Hfq that both stabilizes them and facilitates the RNA duplex formation; however, this role of Hfq in sRNA control of gene expression is not conserved in all bacteria that possess an Hfq homolog (Vogel and Luisi, 2011). While sRNAs are best characterized in enterobacteria (Escherichia coli and Salmonella) and a few other well studied species (e.g., Staphylococcus, Sinorhizobium, Bacillus, and Listeria), transcriptomic analyses performed in extremely diverse species indicate that ncRNAs exist in virtually all bacteria, and their characterization in those other species is still a challenge today (Barquist and Vogel, 2015). Clearly the ability to know and master the activity of ncRNAs targeting specific function(s) can pave the way to the development of new precisely targeted weapons against pathogens, a pressing issue given that most of them are increasingly resistant to most known antibiotics.
Caulobacter crescentus is a model system to investigate global regulation mechanisms such as the bacterial cell FIGURE 1 | Cell cycle progression in Caulobacter crescentus. (A) At every cell division cells divide in two different cell types, a swarmer cell (G1) and a stalked cell. Swarmer cell is unable to replicate the DNA but it is the only form able to move, as it possesses a flagellum and pili. When the swarmer cell finds a suitable environment it differentiates in a stalked cell, loosing the flagellum, retracting the pili and synthetizing a stalk at the former flagellated pole. After differentiation cells are able to initiate a single round of replication of DNA (S-phase) until a new swarmer pole is formed and cells divide at the end of a short G2 phase. The single circular chromosome of C. crescentus is indicated as a red circle. (B) This remarkable cell cycle progression is under the control of several master regulators responsible for the cell cycle regulated expression of hundreds of genes (DnaA, CtrA, CcrM, and GcrA). Protein levels are here represented in different colors. (C) Based on Table 1 data, ncRNAs were organized by their expression levels (Schrader et al., 2014;Zhou et al., 2015). cycle and differentiation/morphogenesis (Lasker et al., 2016). This alphaproteobacterium produces a swarmer non-replicative motile cell and a sessile replicative stalked cell at every round of cell division ( Figure 1A) explaining why it became the model organism in a number of top level laboratories around the world. The remarkable regulatory program implementing the cell cycle/differentiation can be easily investigated in C. crescentus, as a large number of pure swarmer cells in the G1 phase can be isolated and studied while in synchrony (Schrader and Shapiro, 2015). This differentiation program is under the control of a set of regulators that implement a finely orchestrated genetic circuit ( Figure 1B). In C. crescentus, swarmer cells differentiate in stalk cells and fire a single round of DNA replication thanks to the protein DnaA, which binds the unique origin of replication and activates the DNA polymerase complex (Collier, 2012;Felletti et al., 2018). Besides controlling the replication of the chromosome, DnaA controls the transcription of the gene encoding GcrA, which in turn controls many essential genes during S-phase, including ctrA (Holtzendorff et al., 2004;Hottes et al., 2005;Collier et al., 2006). This latter gene codes for a response regulator that activates cell division and the expression of genes essential for cell differentiation, such as those for flagellum/pilum assembly, chemotaxis, stalk biogenesis and many others (Reisenauer et al., 1999;Jones et al., 2001;Laub et al., 2002;Biondi et al., 2006). DnaA, GcrA and CtrA constitute an essential transcriptional cascade that requires multiple regulatory levels in order to ensure the correct timing of ensuing events. For example, GcrA activity depends on methylation of the chromosome by the methyl-transferase CcrM (Fioravanti et al., 2013;Murray et al., 2013;Mohapatra et al., 2014;Haakonsen et al., 2015); CtrA's activity is regulated by phosphorylation (Biondi et al., 2006) and inactivated by a ClpXPdependent degradation (Ryan et al., 2002(Ryan et al., , 2004Joshi et al., 2015). Among well-known regulatory layers, surprisingly, the activity and the role of ncRNAs in C. crescentus has been poorly investigated, and still, they represent ideal candidates for such regulations, as they provide dynamic patterns not easily achievable with transcriptional regulation only. In 2008, a paper entitled "Small non-coding RNAs in Caulobacter crescentus" described 27 ncRNAs in this organism (Landt et al., 2008). Unfortunately only few of them appeared to be involved in important functions and, besides a work in 2010 describing CrfA (Landt et al., 2010), an sRNA involved in adaptation to carbon starvation, no other ncRNA identified in 2008 was further characterized. More recently another sRNA was characterized in C. crescentus and named GsrN (Tien et al., 2017). This ncRNA is involved in multiple stresses response factor directly controlled by the general stress sigma factor, σ T . Finally new recent approaches using RNAseq and post-genomic techniques expanded the plethora of ncRNA candidates in this species to over 100 (Zhou et al., 2015).
This observation prompted us to start a systematic investigation of this new ncRNA world aiming to identify global regulators and ncRNAs possibly involved in the cell cycle in C. crescentus. We applied several bioinformatics tools in order to understand what kind of functions may be controlled by ncRNAs and we focused on uncharacterized ncRNAs with a changing expression level during cell cycle. We predicted their targets in the C. crescentus genome and we integrate this information with motif scanning and ChIP-Seq data.

Prediction of Targets of ncRNAs
Sequences of ncRNAs were retrieved by the annotation of genes by previous results (Schrader et al., 2014;Zhou et al., 2015). PredatorRNA (Eggenhofer et al., 2011). was used using the genome NC_011916, as deposited in the PredatorRNA website, and prediction was performed as default. TargetRNA2 (Kery et al., 2014) was performed using the following settings: NTs before start codon 80, NTs after start codon 20, Seed length 7, sRNA conservation and accessibility true, sRNA window size 13, mRNA structural accessibility true, Interaction region 20, Filter size 1,000, P-value threshold 0.5 (predictions were considered significant only with a P < 0.05). Finally CopraRNA (Wright et al., 2014) was used with default settings.

Prediction of DnaA, CcrM, and CtrA Consensus Sequences
Prections of DnaA, CcrM, and CtrA potential controls were performed as previously described (Brilli et al., 2010). Position Weight Matrices (PWM) modeling the CtrA and DnaA binding sites were used to scan the entire genome by calculating a measure of similarity for each genome position with the formula: From Schneider et al. (1986). Basically, for each position n in a genomic window of length L, where L is also the length of the transcription factor binding motif, we sum the logarithm base 2 of the frequency of nucleotide x at position n of the motif and then we average over all nucleotides. As this score is continuous, there is the need to establish a threshold, that can be quite arbitrary. In this context, we normalize all the scores with respect to the maximum score attainable by the PWM under analysis, and we only retain scores that are at least 60% of the maximum.
ChIPseq data of GcrA were analyzed as previously described (Fioravanti et al., 2013) using the new annotation of ncRNAs (Zhou et al., 2015).

Selection of Cell Cycle Regulated ncRNAs
Previous works based on total RNA sequencing and 5 ′ -RACE have identified a list of predicted ncRNAs expressed in culture conditions (Schrader et al., 2014;Zhou et al., 2015). Based on this experimental analysis we recovered all sequences previously identified and we initially separated ncRNA candidates based on previous annotation and their dynamic regulation during cell cycle. Out of 199, identified as ncRNAs expressed in rich or poor media, respectively PYE and M2G, 88 were further characterized for their Transcriptional Start Site (TSS), with few of them having multiple transcription start sites (Zhou et al., 2015). Among those ncRNAs, 43 were related to the translational machinery (tRNAs or ribosome-related), therefore we excluded them in the following analyses, as well as the TmRNA (Keiler and Shapiro, 2003;Cheng and Keiler, 2009;Russell and Keiler, 2009). The bibliographical data mining Frontiers in Genetics | www.frontiersin.org therefore allowed us to select 44 ncRNAs for further analysis. Among them, as previously described in the introduction, two ncRNAs were already characterized and named CrfA (Landt et al., 2008(Landt et al., , 2010 and GsrN (Tien et al., 2017). In conclusion 42 ncRNAs were the object of this study (Figure 1 and Table 1). By studying the expression patterns of the selected ncRNA in synchronized cells, we classified those 42 ncRNAs based on their expression levels during cell cycle. Specifically, we identified 23 ncRNAs whose expression changes during the cell cycle: 6 are expressed in G1, 2 at the onset of the S phase (noted as G1-S), CCNA_R0025 and CCNA_R0164 are expressed in G1 and G2 phases, while 10 are expressed in S-phase (noted as S) and 2 are toward the end of the cell cycle (G2). The latter genes may be reflecting the accumulation in a specific cell type, as we will discuss later. The observation that some ncRNAS genes are cell cycle regulated, may suggest a putative function associated to regulation of functions that are required at a specific phase (Laub et al., 2000).

Prediction of Target Genes Regulated by ncRNAs
Although experimental validation is absolutely required to identify targets of ncRNAs, predictive tools may be useful to suggest candidate target genes and therefore give a glance about the functions regulated by a set of sRNAs. This is particularly valid in case of a systematic analysis as the experimental validation/identification of targets is not easily scalable. In order to reconstitute the whole ncRNAs network connected to cell cycle, we used three different available tools, mainly RNApredator (Eggenhofer et al., 2011) but then as confirmation also CopraRNA (Backofen et al., 2014;Wright et al., 2014) and TargetRNA2 (Kery et al., 2014). Each analysis provides specific features and it is able to predict classes of targets. By using PredatorRNA we predicted the first 100 targets that were retained for the analysis (Supplementary Table 1).
Genes coding for factors playing a role in cell cycle regulation were suggested as probable targets of some of the ncRNAs ( Table 2). In particular we considered genes annotated as of cell cycle/cell division regulators. In order to evaluate the predictions, we performed an additional analysis based on the following consideration: regions that are complementary to the target RNA should be located within open regions (loops) of the ncRNA molecule and not to paired regions (stems) to ensure accessibility. Therefore, we aimed to understand in silico how those putative small RNAs were structured and whether targets were interacting with the same loop regions. We used RNAfold (Gruber et al., 2008) and mFold (Zuker, 2003) in order to identify loop regions that would be more accessible for the interaction with target genes.
We asked whether among cell cycle-regulated ncRNAs target genes were more likely being cell cycle regulators with respect to non-cell cycle regulated ncRNAs. Among the 22 ncRNAs that showed dynamic expression during the cell cycle, we found 13   (Fioravanti et al., 2013) *Opposite for genes annotated on the minus strand.
Frontiers in Genetics | www.frontiersin.org This observation suggests that ncRNAs may be indeed cell cycleregulated as they act on functions that need to be activated only at specific phases of cell cycle progression. But can we identify whether ncRNAs are regulated by one of the master regulators of the cell cycle?

Presence of DnaA and CtrA Boxes or Methylation Sites Upstream Cell Cycle-Regulated ncRNAs
In order to understand how ncRNAs are regulated by the cell cycle we explored the presence of known binding sites in their promoter regions. Specifically promoters were scanned for the presence of DnaA, CcrM (GAnTC) or CtrA putative DNA binding sites (both "half " and "full" binding sites) (Ouimet and Marczynski, 2000;Brilli et al., 2010). We also analyzed the presence of GcrA binding sites using previously published Chromatin Immunoprecipitation-deep sequencing (ChIP-seq) data (Fioravanti et al., 2013;Murray et al., 2013). Those four regulators represent the core transcriptional machinery responsible for cell cycle progression and for the coordinated expression of all the cell cycle-dependent functions. The analysis revealed that many of those ncRNAs are potentially regulated by master regulators of cell cycle (Table 3). In particular ncRNAs expressed in S-phase appear to possess binding sites of master regulators, justifying their cell regulated expression. Although the expression pattern may be the result of combinatorial regulation, this enrichment of cell cycle regulators binding motifs is intriguing and asks for experimental validation. In order to understand the full picture of interconnection between ncRNAs and master regulators, we integrated the data of Table 2 (cell cycle targets of ncRNAs) with those in Table 3 (master regulators regulating ncRNAs) and we represent the regulations as a network (Figure 2). This network represents ncRNAs that are potentially regulated by master regulator of cell cycle and that are eventually connected to cell cycle regulator genes.

CONCLUSIONS AND PERSPECTIVES
Our comprehensive analysis of C. crescentus ncRNAs has revealed that many of those factors potentially play important roles during the cell cycle. In particular, our predictions show that many of them target the UTRs of dnaA or hdaA. Initiation of DNA replication is a fundamental event during the cell cycle and for this reason, it likely justifies multiple regulation levels.
DnaA has been shown to be subject to multiple regulatory layers (Felletti et al., 2018), including a ncRNA named SsrA, or tmRNA, a ncRNA interacting with ribosomes to regulate protein degradation (Keiler and Shapiro, 2003). Here we indicate for the first time several ncRNAs able to potentially regulate dnaA.
However the extent of ncRNAs regulation on the cell cycle is even vaster (Figure 2). Clearly, CtrA and CcrM seem to play a major role in coordinating the expression of many genes, while dnaA seems to be a major target of at least four ncRNAs. Several ncRNAs seem to be controlled by two master regulators and R0116 is controlled by CcrM, GcrA, and CtrA, suggesting an important role for this ncRNA.
Although this analysis is based on predictions and every connection must be validated experimentally, the amplitude of all connections revealed by this bioinformatic predictions strongly suggests that ncRNAs are indeed playing a major role in cell cycle regulation of C. crescentus. Next years will reveal whether this preliminary analysis grasped this important role.

AUTHOR CONTRIBUTIONS
All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.