Identifying RNA N6-Methyladenosine Sites in Escherichia coli Genome

Zhang, Jidong; Feng, Pengmian; Lin, Hao; Chen, Wei

doi:10.3389/fmicb.2018.00955

ORIGINAL RESEARCH article

Front. Microbiol., 14 May 2018

Sec. Systems Microbiology

Volume 9 - 2018 | https://doi.org/10.3389/fmicb.2018.00955

Identifying RNA N⁶-Methyladenosine Sites in Escherichia coli Genome

1. Department of Immunology, Zunyi Medical College, Zunyi, China
2. Hebei Province Key Laboratory of Occupational Health and Safety for Coal Industry, School of Public Health, North China University of Science and Technology, Tangshan, China
3. Key Laboratory for Neuro-Information of Ministry of Education, Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
4. Department of Physics, Center for Genomics and Computational Biology, School of Sciences, North China University of Science and Technology, Tangshan, China

Abstract

N⁶-methyladenosine (m⁶A) plays important roles in a branch of biological and physiological processes. Accurate identification of m⁶A sites is especially helpful for understanding their biological functions. Since the wet-lab techniques are still expensive and time-consuming, it's urgent to develop computational methods to identify m⁶A sites from primary RNA sequences. Although there are some computational methods for identifying m⁶A sites, no methods whatsoever are available for detecting m⁶A sites in microbial genomes. In this study, we developed a computational method for identifying m⁶A sites in Escherichia coli genome. The accuracies obtained by the proposed method are >90% in both 10-fold cross-validation test and independent dataset test, indicating that the proposed method holds the high potential to become a useful tool for the identification of m⁶A sites in microbial genomes.

Introduction

At present, ~150 kinds of RNA modifications have been found in different RNA species (Boccaletto et al., 2018), which not only enrich the genetic information, but also play critical roles in a variety of biological processes as mentioned in a recent review (Roundtree et al., 2017). Among these modifications, the N⁶-methyladenosine (m⁶A) is the most abundant posttranscriptional modification and has been found in the three domains of life. m⁶A has been found to participate in various biological activities, such as mRNA splicing (Nilsen, 2014), mRNA translation (Wang et al., 2015), mRNA maturation (Hoernes et al., 2016), stem cell proliferation (Bertero et al., 2018), and even a series of diseases (Zhang et al., 2016; Cui et al., 2017; Li et al., 2017).

In order to reveal its biological functions, different kinds of high-throughput sequencing techniques have been proposed to map the locations of m⁶A on genome wide (Dominissini et al., 2013; Linder et al., 2015; Wan et al., 2015; Hong et al., 2018). Although these techniques promoted the research progress on understanding the biological functions and the identification of RNA modifications, they are still labor-intensive and cost-ineffective. In addition, the resolution of detecting m⁶A sites for most techniques is still not satisfactory. Therefore, it's necessary to develop novel methods to detect m⁶A sites.

Giving the credit to the experimental data yielded by these high-throughput sequencing techniques as reported in a recent work (Chen X. et al., 2017), some machine learning based computational methods have been proposed to identify m⁶A sites (Chen et al., 2015a,b, 2016a, 2017b,c; Zhou et al., 2016). Although these methods are really good complements to experimental methods for detecting m⁶A sites, to the best of our knowledge, so far there is no computational tool available for detecting m⁶A sites in microbial genomes.

Stimulated by the successful applications of machine learning methods in computational genomics and proteomics (Chen et al., 2012; Feng et al., 2013; Cao et al., 2016, 2017a,b; Hu et al., 2018), in the present work, we presented a support vector machine (SVM) based method for identifying m⁶A sites in the Escherichia coli (E. coli) genome. By encoding the RNA sequences using nucleotide chemical property and accumulated nucleotide frequency, the proposed method obtained promising performances in 10-fold cross validation test. Moreover, we also validated the method on the independent dataset and obtained satisfactory results.

Materials and methods

Benchmark dataset

The m⁶A site containing sequences of E. coli genome were obtained from the RMBase database (Xuan et al., 2018). All the sequences are 41 bp long with the m⁶A site in the center. To overcome redundancy and reduce the homology bias, sequences with more than 80% sequence similarity were removed by using the CD-HIT program (Fu et al., 2012). After such a screening procedure, 2,055 m⁶A site containing sequences were retained and regarded as positive samples.

The negative samples (non-m⁶A site containing sequences) were obtained by choosing the 41-bp long sequences with the central adenosine that was not experimentally confirmed occurring methylation on its 6th nitrogen. By doing so, we could obtain a large number of negative samples. After removing sequences with identify >80%, the number of negative samples are still dramatically larger than that of positive samples. To balance out the numbers between positive and negative samples in model training, we randomly picked out the same number of negative samples and repeated this process 10 times. Therefore, 10 negative subsets were obtained, and each of them includes 2,055 non-m⁶A site containing sequences. The positive and negative samples thus obtained are provided in Supplementary Material.

Sequence encoding scheme

Inspired by recent studies (Chen et al., 2016b,c,d, 2017a,d; Feng et al., 2017), in order to transfer the RNA sequences into discrete vectors that can be recognized and handled by machine learning methods, we encoded RNA sequences using nucleotide chemical properties and accumulated nucleotide frequency. Their brief descriptions are as following.

The four nucleotides, namely, adenine (A), guanine (G), cytosine (C), and uracil (U) can be classified into three different groups according to their physicochemical properties, i.e., ring structures, secondary structures, and chemical functionality (Chen et al., 2016b,c,d, 2017a,d; Feng et al., 2017). Therefore, based on the different physicochemical properties, the four coordinates (1, 1, 1), (0, 0, 1), (1, 0, 0), and (0, 1, 0) were used to represent the four bases (A, C, G, and U) of RNA, respectively.

In order to include nucleotide composition surrounding the modification site as well, the accumulated nucleotide frequency of any nucleotide n_j at position i was also used to represent RNA sequences and was defined as

where |N_i| is the length of the sliding substring concerned, l denotes each of the site locations counted in the substring, qϵ{A, C, G, U}.

By integrating both nucleotide physicochemical properties and accumulated nucleotide frequency, an L nt long RNA sequence could be represented a 4L-dimensional vector (Chen et al., 2016b,c,d, 2017a,d; Feng et al., 2017).

Support vector machine

As an efficient supervised machine learning algorithm, SVM has been widely used in the realm of bioinformatics (Cao et al., 2014; Li et al., 2017; Wang et al., 2017b; Zhang et al., 2017). Its basic idea is to transform the input data into a high dimensional feature space and then determine the optimal separating hyperplane.

In the current study, the implementation of SVM was performed by using the LibSVM package 3.18, available at http://www.csie.ntu.edu.tw/~cjlin/libsvm/. The radial basis kernel function (RBF) was used to obtain the classification hyperplane. The grid search method was applied to optimize its regularization parameter C and kernel parameter γ.

Evaluation metrics

The performance was evaluated by using the following four metrics, namely sensitivity (Sn), specificity (Sp), Accuracy (Acc), and the Mathew's correlation coefficient (MCC), which can be expressed as

where TP, TN, FP, and FN represent true positive, true negative, false positive, and false negative, respectively.

To further evaluate the performance of the current method more objectively, inspired by recent works (Wang et al., 2017a), the ROC (receiver operating characteristic) curve was also plotted. Its vertical coordinate indicates the true positive rate (sensitivity) and the horizontal coordinate indicates the false positive rate (1-specificity). The area under the ROC curve (auROC) is an indicator of the performance quality of a binary classifier, i.e., the value 0.5 of auROC is equivalent to random prediction while the value 1 of auROC represents a perfect one.

Results and discussions

Performance for m⁶A site identification

In statistical prediction, independent dataset test, K-fold cross-validation test and jackknife test are often used to derive the metric values for a predictor (Chou, 2011). In order to saving computational time, the 10-fold cross-validation test was used to examine the performance of the proposed method. In 10-fold cross-validation test, the samples in the dataset are randomly partitioned into 10 equal sized sub-datasets. Of the 10 sub-datasets, a single sub-dataset is retained as the validation data for testing the model, and the remaining 9 sub-datasets are used as training data. The process is then repeated 10 times, with each of the 10 sub-datasets used exactly once as the validation data.

By encoding RNA sequences using nucleotide chemical property and accumulated nucleotide frequency, each sample in the dataset was represented by a (4 × 41) = 164-dimensional vector and used as the input of SVM. The 10-fold cross-validation test results for identifying m⁶A sites in E. coli were listed in Table 1. In addition, to demonstrate that whether its accuracy is sensitive to the selection of negative data, the method was also tested on the other nine negative datasets, respectively. Their predictive results of the 10-fold cross-validation were also provided in Table 1.

Table 1

Dataset	Sn (%)	Sp (%)	Acc (%)	MCC
Negative set 1	100.00	98.59	99.29	0.98
Negative set 2	100.00	98.78	99.39	0.98
Negative set 3	100.00	98.44	99.22	0.98
Negative set 4	100.00	98.88	99.44	0.98
Negative set 5	100.00	98.44	99.22	0.98
Negative set 6	100.00	98.49	99.25	0.98
Negative set 7	100.00	98.54	99.27	0.98
Negative set 8	100.00	98.69	99.34	0.98
Negative set 9	100.00	98.49	99.25	0.98
Negative set 10	100.00	98.25	99.12	0.97
Average	100.00	98.56	99.28	0.98

The 10-fold cross validation predictive results by using different negative datasets for identifying m⁶A sites in E. coli.

As indicated in Table 1, we found that the predictive accuracy is not affected by the selection of negative data. In addition, the 10 ROC curves obtained based on the 10 different negative datasets were also plotted in Figure 1. It was found that their auROCs are all higher than 0.98. These results demonstrate the reliability and robustness of the model developed in this study.

Figure 1

Comparison with other methods

In order to demonstrate the effectiveness of nucleotide chemical property and accumulated nucleotide frequency for identifying m⁶A sites in E. coli, we compared the performance of the proposed method with that of the method based on other commonly used RNA sequence features. Chen et al. have proposed the pseudo nucleotide composition (PseKNC) to represent RNA sequences (Chen et al., 2014a,b), in which both the local and global sequence order information w included. Since it has been proposed in 2014, PseKNC have been used in in many branches of computational genomics (Guo et al., 2014; Lin et al., 2014, 2017). Therefore, we employed the SVM to perform the comparisons between the model based on nucleotide chemical property and accumulated nucleotide frequency features and that based on the PseKNC features (Chen et al., 2015a). The 10-fold cross-validation test results were listed in Table 2.

Table 2

Parameters	Sn (%)	Sp (%)	Acc (%)	MCC
PseKNC	65.74	60.29	63.02	0.26
Secondary structure	67.06	60.73	63.89	0.28
Our method	100.00	98.56	99.28	0.98

Comparison of different parameters for identifying m⁶A sites in E. coli.

As indicated in a recent study (Schwartz et al., 2013), the m⁶A modification is also affected by RNA secondary structures. Therefore, we performed the prediction of m⁶A sites by using RNA secondary structure. To this end, all the sequences in the benchmark dataset were encoded by using their second structures. The details about the encoding scheme based on secondary structures can be found in a recent work (Xue et al., 2005). By doing so, each RNA sequence is converted to a 32 dimensional vector (Xue et al., 2005) and used as the input feature of SVM. Its 10-fold cross-validation test results were also listed in Table 2.

As shown in Table 2, the predictive performance of the method based on nucleotide chemical property and accumulated nucleotide frequency is dramatically higher than that based on PseKNC and RNA secondary structure.

Validation on independent dataset

The proposed method trained based on the benchmark dataset from the E. coli genome was further used to identify the m⁶A sites in the P. aeruginosa genome. For this purpose, we firstly collected the 5,814 experimentally confirmed m⁶A sites from the RMBase to form an independent dataset, which is given in Supporting Information S2. Of the 5,814 m⁶A sites in the P. aeruginosa, 5,809 were correctly identified, indicating that the proposed method is really quite promising for identifying m⁶A sites in microbial genomes.

Conclusion

In this study, we present a computational method to identify m⁶A sites in the E. coli genome by encoding the RNA sequences using nucleotide chemical property and accumulated nucleotide frequency. The results obtained based on the benchmark dataset and independent dataset demonstrate that the proposed method is powerful and promising in discovering m⁶A sites. We hope that the proposed method will be helpful for the future research on m⁶A sites in microbial genomes.

Since user-friendly and publicly accessible web-servers (Feng et al., 2018)and databases (Liang et al., 2017) represent the direction of developing new prediction method, we will make efforts in our future work to provide a web-server for the method presented in this paper.

Statements

Author contributions

HL and WC: conceived and designed the experiments; JZ and PF: performed the experiments; HL and WC: wrote the paper.

Acknowledgments

This work was supported by the National Nature Science Foundation of China (Nos. 31771471, 61772119), Program for the Top Young Innovative Talents of Higher Learning Institutions of Hebei Province (No. BJ2014028), the Outstanding Youth Foundation of North China University of Science and Technology (No. JP201502), and the Fundamental Research Funds for the Central Universities of China (Nos. ZYGX2015Z006, ZYGX2016J125, ZYGX2016J118), Natural Science Foundation of Guizhou Province (QKH-2016-1167); The Scientific and Technological Innovation Project for Oversea Students of Guizhou province (QR-2016-20); High School Science and Technology Talent Support Project of Guizhou Province (QJH-KY-2016-079).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb.2018.00955/full#supplementary-material

References

1
BerteroA.BrownS.MadrigalP.OsnatoA.OrtmannD.YiangouL.et al. (2018). The SMAD2/3 interactome reveals that TGFβ controls m(6)A mRNA methylation in pluripotency. Nature555, 256–259. 10.1038/nature25784
- CrossRef
- Google Scholar
2
BoccalettoP.MachnickaM. A.PurtaE.PiatkowskiP.BaginskiB.WireckiT. K.et al. (2018). MODOMICS: a database of RNA modification pathways. 2017 update. Nucleic Acids Res.46, D303–D307. 10.1093/nar/gkx1030
3
CaoR.AdhikariB.BhattacharyaD.SunM.HouJ.ChengJ. (2017a). QAcon: single model quality assessment using protein structural and contact information with machine learning techniques. Bioinformatics33, 586–588. 10.1093/bioinformatics/btw694
4
CaoR.BhattacharyaD.HouJ.ChengJ. (2016). DeepQA: improving the estimation of single protein model quality with deep belief networks. BMC Bioinformatics17:495. 10.1186/s12859-016-1405-y
5
CaoR.FreitasC.ChanL.SunM.JiangH.ChenZ. (2017b). ProLanGO: protein function prediction using neural machine translation based on a recurrent neural network. Molecules22:E1732. 10.3390/molecules22101732
6
CaoR.WangZ.WangY.ChengJ. (2014). SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines. BMC Bioinformatics15:120. 10.1186/1471-2105-15-120
7
ChenW.FengP.DingH.LinH. (2016a). Identifying N (6)-methyladenosine sites in the Arabidopsis thaliana transcriptome. Mol. Genet. Genomics291, 2225–2229. 10.1007/s00438-016-1243-7
8
ChenW.FengP.DingH.LinH.ChouK.-C. (2015a). iRNA-methyl: identifying N-6-methyladenosine sites using pseudo nucleotide composition. Anal. Biochem.490, 26–33. 10.1016/j.ab.2015.08.021
9
ChenW.FengP.TangH.DingH.LinH. (2016b). Identifying 2′-O-methylationation sites by integrating nucleotide chemical properties and nucleotide compositions. Genomics107, 255–258. 10.1016/j.ygeno.2016.05.003
10
ChenW.FengP.TangH.DingH.LinH. (2016c). RAMPred: identifying the N-1-methyladenosine sites in eukaryotic transcriptomes. Sci. Rep.6:31080. 10.1038/srep31080
11
ChenW.FengP.YangH.DingH.LinH.ChouK.-C. (2017a). iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences. Oncotarget8, 4208–4217. 10.18632/oncotarget.13758
12
ChenW.LeiT.-Y.JinD.-C.LinH.ChouK.-C. (2014a). PseKNC: A flexible web server for generating pseudo K-tuple nucleotide composition. Anal. Biochem.456, 53–60. 10.1016/j.ab.2014.04.001
13
ChenW.LinH.FengP. M.DingC.ZuoY. C.ChouK. C. (2012). iNuc-PhysChem: a sequence-based predictor for identifying nucleosomes via physicochemical properties. PLoS ONE7:e47843. 10.1371/journal.pone.0047843
14
ChenW.TangH.LinH. (2017b). MethyRNA: a web server for identification of N-6-methyladenosine sites. J. Biomol. Struct. Dyn.35, 683–687. 10.1080/07391102.2016.1157761
- CrossRef
- Google Scholar
15
ChenW.TangH.YeJ.LinH.ChouK.-C. (2016d). iRNA-PseU: Identifying RNA pseudouridine sites. Mol. Ther. Nucleic Acids5, 155–163. 10.1038/mtna.2016.37
16
ChenW.TranH.LiangZ.LinH.ZhangL. (2015b). Identification and analysis of the N-6-methyladenosine in the Saccharomyces cerevisiae transcriptome. Sci. Rep.5:13859. 10.1038/srep13859
17
ChenW.XingP.ZouQ. (2017c). Detecting N6-methyladenosine sites from RNA transcriptomes using ensemble support vector machines. Sci. Rep.7:40242. 10.1038/srep40242
18
ChenW.YangH.FengP.DingH.LinH. (2017d). iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties. Bioinformatics33, 3518–3523. 10.1093/bioinformatics/btx479
19
ChenW.ZhangX.BrookerJ.LinH.ZhangL.ChouK.-C. (2014b). PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions. Bioinformatics31, 119–120. 10.1093/bioinformatics/btu602
20
ChenX.SunY. Z.LiuH.ZhangL.LiJ. Q.MengJ. (2017). RNA methylation and diseases: experimental results, databases, web servers and computational models. Brief Bioinform. [Epub ahead of print]. 10.1093/bib/bbx142
21
ChouK. C. (2011). Some remarks on protein attribute prediction and pseudo amino acid composition. J. Theor. Biol.273, 236–247. 10.1016/j.jtbi.2010.12.024
22
CuiQ.ShiH.YeP.LiL.QuQ.SunG.et al. (2017). m(6)A RNA methylation regulates the self-renewal and tumorigenesis of glioblastoma stem cells. Cell Rep.18, 2622–2634. 10.1016/j.celrep.2017.02.059
- CrossRef
- Google Scholar
23
DominissiniD.Moshitch-MoshkovitzS.Salmon-DivonM.AmariglioN.RechaviG. (2013). Transcriptome-wide mapping of N(6)-methyladenosine by m(6)A-seq based on immunocapturing and massively parallel sequencing. Nat. Protoc.8, 176–189. 10.1038/nprot.2012.148
24
FengP.DingH.YangH.ChenW.LinH.ChouK.-C. (2017). iRNA-PseColl: identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into PseKNC. Mol. Ther. Nucleic Acids7, 155–163. 10.1016/j.omtn.2017.03.006
25
FengP. M.ChenW.LinH.ChouK. C. (2013). iHSP-PseRAAAC: identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal. Biochem.442, 118–125. 10.1016/j.ab.2013.05.024
26
FengP.YangH.DingH.LinH.ChenW.ChouK. C. (2018). iDNA6mA-PseKNC: Identifying DNA N(6)-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics [Epub ahead of print]. 10.1016/j.ygeno.2018.01.005
- CrossRef
- Google Scholar
27
FuL.NiuB.ZhuZ.WuS.LiW. (2012). CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics28, 3150–3152. 10.1093/bioinformatics/bts565
28
GuoS.-H.DengE.-Z.XuL.-Q.DingH.LinH.ChenW.et al. (2014). iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics30, 1522–1529. 10.1093/bioinformatics/btu083
29
HoernesT. P.HuttenhoferA.ErlacherM. D. (2016). mRNA modifications: dynamic regulators of gene expression?RNA Biol.13, 760–765. 10.1080/15476286.2016.1203504
30
HongT.YuanY.ChenZ.XiK.WangT.XieY.et al. (2018). Precise antibody-independent m6A identification via 4SedTTP-involved and FTO-assisted strategy at single-nucleotide resolution. J. Am. Chem. Soc. [Epub ahead of print]. 10.1021/jacs.7b13633
31
HuH.ZhangL.AiH.ZhangH.FanY.ZhaoQ.et al. (2018). HLPI-Ensemble: Prediction of human lncRNA-protein interactions based on ensemble strategy. RNA Biol. [Epub ahead of print]. 10.1080/15476286.2018.1457935
32
LiZ.WengH.SuR.WengX.ZuoZ.LiC.et al. (2017). FTO plays an oncogenic role in acute myeloid leukemia as a N(6)-methyladenosine RNA demethylase. Cancer Cell31, 127–141. 10.1016/j.ccell.2016.11.017
- CrossRef
- Google Scholar
33
LiangZ. Y.LaiH. Y.YangH.ZhangC. J.YangH.WeiH. H.et al. (2017). Pro54DB: a database for experimentally verified sigma-54 promoters. Bioinformatics33, 467–469. 10.1093/bioinformatics/btw630
34
LinH.DengE.-Z.DingH.ChenW.ChouK.-C. (2014). iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Res.42, 12961–12972. 10.1093/nar/gku1019
35
LinH.LiangZ.-Y.TangH.ChenW. (2017). Identifying sigma70 promoters with novel pseudo nucleotide composition. IEEE/ACM Trans. Comput. Biol. Bioinform. [Epub ahead of print]. 10.1109/TCBB.2017.2666141
36
LinderB.GrozhikA. V.Olarerin-GeorgeA. O.MeydanC.MasonC. E.JaffreyS. R. (2015). Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome. Nat. Methods12, 767–772. 10.1038/nmeth.3453
37
NilsenT. W. (2014). Molecular biology. Internal mRNA methylation finally finds functions. Science343, 1207–1208. 10.1126/science.1249340
38
RoundtreeI. A.EvansM. E.PanT.HeC. (2017). Dynamic RNA modifications in gene expression regulation. Cell169, 1187–1200. 10.1016/j.cell.2017.05.045
39
SchwartzS.AgarwalaS. D.MumbachM. R.JovanovicM.MertinsP.ShishkinA.et al. (2013). High-resolution mapping reveals a conserved, widespread, dynamic mRNA methylation program in yeast meiosis. Cell155, 1409–1421. 10.1016/j.cell.2013.10.047
40
WanY.TangK.ZhangD.XieS.ZhuX.WangZ.et al. (2015). Transcriptome-wide high-throughput deep m(6)A-seq reveals unique differential m(6)A methylation patterns between three organs in Arabidopsis thaliana. Genome Biol.16:272. 10.1186/s13059-015-0839-2
41
WangF.HuangZ. A.ChenX.ZhuZ.WenZ.ZhaoJ.et al. (2017a). LRLSHMDA: Laplacian Regularized Least Squares for Human Microbe-Disease Association prediction. Sci. Rep.7:7601. 10.1038/s41598-017-08127-2
42
WangX.ZhaoB. S.RoundtreeI. A.LuZ.HanD.MaH.et al. (2015). N(6)-methyladenosine modulates messenger RNA translation efficiency. Cell161, 1388–1399. 10.1016/j.cell.2015.05.014
43
WangY.YouZ.LiX.ChenX.JiangT.ZhangJ. (2017b). PCVMZM: using the probabilistic classification vector machines model combined with a zernike moments descriptor to predict protein-protein interactions from protein sequences. Int. J. Mol. Sci.18:1029. 10.3390/ijms18051029
44
XuanJ. J.SunW. J.LinP. H.ZhouK. R.LiuS.ZhengL. L.et al. (2018). RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data. Nucleic Acids Res.46, D327–D334. 10.1093/nar/gkx934
45
XueC.LiF.HeT.LiuG. P.LiY.ZhangX. (2005). Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine. BMC Bioinformatics6:310. 10.1186/1471-2105-6-310
46
ZhangC.SamantaD.LuH.BullenJ. W.ZhangH.ChenI.et al. (2016). Hypoxia induces the breast cancer stem cell phenotype by HIF-dependent and ALKBH5-mediated m(6)A-demethylation of NANOG mRNA. Proc. Natl. Acad. Sci. U.S.A.113, E2047–E2056. 10.1073/pnas.1602883113
- CrossRef
- Google Scholar
47
ZhangL.AiH.ChenW.YinZ.HuH.ZhuJ.et al. (2017). CarcinoPred-EL: novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods. Sci. Rep.7:2118. 10.1038/s41598-017-02365-0
48
ZhouY.ZengP.LiY. H.ZhangZ.CuiQ. (2016). SRAMP: prediction of mammalian N6-methyladenosine (m6A) sites based on sequence-derived features. Nucleic Acids Res.44:e91. 10.1093/nar/gkw104

Summary

Keywords

N⁶-methyladenosine, machine learning method, nucleotide physicochemical properties, microbial genome, pseudo nucleotide composition

Citation

Zhang J, Feng P, Lin H and Chen W (2018) Identifying RNA N⁶-Methyladenosine Sites in Escherichia coli Genome. Front. Microbiol. 9:955. doi: 10.3389/fmicb.2018.00955

Received

13 March 2018

Accepted

24 April 2018

Published

14 May 2018

Volume

9 - 2018

Edited by

Hongsheng Liu, Liaoning University, China

Reviewed by

Yongqiang Xing, Inner Mongolia University of Science and Technology, China; Renzhi Cao, Pacific Lutheran University, United States

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Hao Lin hlin@uestc.edu.cnWei Chen chenweiimu@gmail.com

This article was submitted to Systems Microbiology, a section of the journal Frontiers in Microbiology

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Systems Microbiology

ORIGINAL RESEARCH article

Identifying RNA N⁶-Methyladenosine Sites in Escherichia coli Genome

Abstract

Introduction