RNAStat: An Integrated Tool for Statistical Analysis of RNA 3D Structures

Guo, Zhi-Hao; Yuan, Li; Tan, Ya-Lan; Zhang, Ben-Gong; Shi, Ya-Zhou

doi:10.3389/fbinf.2021.809082

ORIGINAL RESEARCH article

Front. Bioinform., 11 January 2022

Sec. Drug Discovery in Bioinformatics

Volume 1 - 2021 | https://doi.org/10.3389/fbinf.2021.809082

RNAStat: An Integrated Tool for Statistical Analysis of RNA 3D Structures

ZG
Zhi-Hao Guo ^1,2
LY
Li Yuan ^1,2
YT
Ya-Lan Tan ¹
BZ
Ben-Gong Zhang ¹
YS
Ya-Zhou Shi ¹^*

1. Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, China
2. School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, China

Abstract

The 3D architectures of RNAs are essential for understanding their cellular functions. While an accurate scoring function based on the statistics of known RNA structures is a key component for successful RNA structure prediction or evaluation, there are few tools or web servers that can be directly used to make comprehensive statistical analysis for RNA 3D structures. In this work, we developed RNAStat, an integrated tool for making statistics on RNA 3D structures. For given RNA structures, RNAStat automatically calculates RNA structural properties such as size and shape, and shows their distributions. Based on the RNA structure annotation from DSSR, RNAStat provides statistical information of RNA secondary structure motifs including canonical/non-canonical base pairs, stems, and various loops. In particular, the geometry of base-pairing/stacking can be calculated in RNAStat by constructing a local coordinate system for each base. In addition, RNAStat also supplies the distribution of distance between any atoms to the users to help build distance-based RNA statistical potentials. To test the usability of the tool, we established a non-redundant RNA 3D structure dataset, and based on the dataset, we made a comprehensive statistical analysis on RNA structures, which could have the guiding significance for RNA structure modeling. The python code of RNAStat, the dataset used in this work, and corresponding statistical data files are freely available at GitHub (https://github.com/RNA-folding-lab/RNAStat).

1 Introduction

RNA molecules play important roles in various biological processes, ranging from carrying genetic information, participating in protein synthesis, catalyzing biochemical reactions, and regulating gene expressions, to acting as a structural molecule in cellular organelles (Doherty and Doudna, 2001; Dethoff et al., 2012; Cech and Steitz, 2014). Generally, to perform functions, RNAs need to form special tertiary structures, which typically can be determined by experimental methods such as cryo-electron microscopy, X-ray crystallography, and nuclear magnetic resonance spectroscopy (NMR) (Fernandez-Leiro and Scheres, 2016; Rose et al., 2017; Westhof and Leontis, 2021). However, the structures deposited in Protein Data Bank (PDB) are still limited, since it is expensive and time-consuming to experimentally derive high-resolution RNA 3D structures (Rose et al., 2017; Westhof and Leontis, 2021). This situation has led to a great demand in structural biology to envisage the RNA structures using prediction methods (Hajdin et al., 2010; Shi Y.-Z. et al., 2014; Miao et al., 2017; Schlick and Pyle, 2017).

In the last decade, there are some computational models have been developed for predicting RNA 3D structures, among which the knowledge-based fragment assembly methods (Gan et al., 2004; Das and Baker, 2007; Parisien and Major, 2008; Das et al., 2010; Flores et al., 2010; Cao and Chen, 2011; Rother et al., 2011; Popenda et al., 2012; Zhao et al., 2012; Jian et al., 2019; Zhang et al., 2021) and the physics-based coarse-grained (CG) models have gained more attention (Jonikas et al., 2009; Flores and Altman, 2010; Pasquali and Derreumaux, 2010; Flores et al., 2012; Denesyuk and Thirumalai, 2013; Xia et al., 2013; Shi YZ. et al., 2014; Šulc et al., 2014; Krokhotin et al., 2015; Boniecki et al., 2016). For example, the FARNA/FARFAR can assemble trinucleotide fragments into 3D structures corresponding to an RNA sequence with the use of the Monte Carlo algorithm and a knowledge-based energy function, and the parameters of energy function were determined from the statistical analysis of known RNA 3D structures (Das and Baker, 2007; Das et al., 2010). The SimRNA with a CG representation, which employs a statistical potential derived from PDB structures, and can fold RNAs using only sequence information (Boniecki et al., 2016). Recently, we have also provided a new CG model to predict 3D structures and stability of an RNA in ion solutions from sequence alone (Shi Y.-Z. et al., 2014, 2015, 2018; Jin et al., 2019). Although the potential energy of our model is mainly physics-based, the potentials, especially bonded potentials, were also parameterized by the statistical analysis on the available 3D structures of RNAs in PDB (Shi YZ. et al., 2014; Jin et al., 2019).

Furthermore, the existing knowledge-based methods usually produce an ensemble of candidate structures, which should be further evaluated to recognize the best candidates as close to native structures as possible (Huang and Zou, 2011; Miao and Westhof, 2017; Yan et al., 2018; Tan et al., 2019; Magnus et al., 2020). To address this issue, several statistical potentials have been developed to evaluate RNA 3D structures (Bernauer et al., 2011; Capriotti et al., 2011; Wang et al., 2015; Li et al., 2016; Li et al., 2018; Masso, 2018; Yu et al., 2019; Zhang et al., 2020), such as RASP (Capriotti et al., 2011), RNA KB potentials (Bernauer et al., 2011), 3dRNAscore (Wang et al., 2015), and DFIRE (Zhang et al., 2020). Generally, these potentials are proportional to the frequencies of occurrence of atom pairs, angles, or dihedral angles in PDB structures based on Boltzmann or Bayesian formulations (Huang and Zou, 2011; Yan et al., 2018; Tan et al., 2019). For example, Capriotti et al. have built the RASP by calculating the density distribution of distance between any two atoms in all the known RNA structures (Capriotti et al., 2011). The 3dRNAscore introduced by Wang et al. uses seven typical RNA dihedral angles as well as distance-dependent geometrical descriptions for atom pairs to construct the statistical potentials (Wang et al., 2015). In addition to structure evaluation, very recently, Xiong et al. have proposed a fully knowledge-based function (BRiQ) based on statistics of orientation distribution of one base around another base from the PDB structures for improving RNA model refinement (Xiong et al., 2021).

Obviously, all these advances on RNA structure modeling indicate that to gather various statistics of RNA 3D structures is generally essential to predict RNA tertiary structures. However, there are few tools or web servers that can be used to make comprehensive statistical analysis for RNA 3D structures (Andronescu et al., 2008; Cock et al., 2009; Baulin et al., 2016; Danaee et al., 2018; Magnus et al., 2020). Recently, Baulin et al. have proposed a database URSDB (the Universe of RNA structures database) to store information (e.g., annotations of main structural elements) obtained all RNA-containing PDBs (Baulin et al., 2016). Although the URSDB can allow the user to get statistics on structural motifs (base pairs, stems, and loops) based on the information provided by the software of DSSR (dissecting the spatial structure of RNA) (Lu et al., 2015; Lu, 2020), these statistics on RNA secondary structure motifs could be far from enough to help RNA 3D structure modeling (Miao and Westhof, 2017; Tan et al., 2019). Fortunately, several works have provided statistics of RNA structures from different aspects. For example, both the RNA 3D Motif Atlas and bpRNA can provide a statistical summary of the hairpin and internal loop motifs (Parlea et al., 2016; Danaee et al., 2018). The RNA STRAND can also provide information on structural features such as types and sizes for stems and loops (Andronescu et al., 2008). To build scoring function for RNA structure prediction, Bottaro et al. as well as Das and Baker have developed methods to calculate the geometrical properties of RNA base-pairing and base-stacking (Bottaro et al., 2014; Das and Baker, 2007). Despite all this progress, with the rapidly increasing number of RNA structures deposited in PDB (Supplementary Figure S1 in the Supplementary Material) (Rose et al., 2017; Westhof and Leontis, 2021), an available tool to convenient access comprehensive statistical information of RNA 3D structures is still necessary.

Here, we present a novel tool, named as RNAStat, special for the statistical analysis of RNA 3D structures. It can be used to calculate structural information of RNA 3D structure(s) at different levels: global 3D structural level, secondary structure level, and atomic level. We first introduced the function and principle of the RNAStat. Afterward, based on a non-redundant RNA structure dataset established by us, we utilized the RNAStat to perform statistical analysis for RNA 3D structures, and provided various statistical data of RNA structural properties (e.g., size/shape, geometry of base-pairing/stacking, secondary structure motifs, and atom-atom distance). Throughout the article, we also discussed the potential value of these statistics on RNA 3D structure prediction and evaluation.

2 Materials and Methods

The RNAStat provided in this work can be used to make calculation (or statistics) for given RNA structure(s) in the following aspects: 1) the radius of gyration (i.e., size): and shape; 2) the secondary structure motifs; 3) the geometry of base-pairing and base-stacking; 4) the distances between atoms; see Figure 1.

FIGURE 1

2.1 Radius of Gyration

The mean radius of gyration is often used as geometric measure of the size of RNA as well as DNA and protein (Hyeon et al., 2006; Rawat and Biswas, 2009), since it can be easily determined by experimental methods such as small angle neutron scattering or X-ray scattering. For RNAs, it is possible to assume equal masses for all nonhydrogen atoms, so that the of a given RNA 3D structure (in PDB format, e.g.,.cif) can be calculated by (Hyeon et al., 2006)where N is the number of heavy atoms (C, P, N, and O) in the RNA molecule, is the position of the ith atom. The in Eq. 1 represents the coordinates of the geometric center of RNA, calculated using .

2.2 Shape

Since the shape of RNAs is rather important in determining the overall motion of RNA and their interaction with other biomolecules, two rotationally invariant quantities, the asphericity parameter and shape parameter , and are used to characterize the deviation of an RNA conformation from the spherical shape (Figure 2A) (Hyeon et al., 2006). Based on the Refs. (Hyeon et al., 2006; Rawat and Biswas, 2009), the and can be determined from the inertia tensor,where , are the coordinate component, and is the -th component of the position of the ith atom. Due to the , the eigenvalues of the matrix are the squares of the three principal radii of gyration. Thus, the and can be directly calculated bywhere . As shown in Eqs 2–4, the shape parameter measures the prolateness or oblateness of a conformation and the asphericity parameter characterizes the average deviation of the conformation from spherical symmetry. The satisfies the bound , and represents prolate ellipsoid, corresponds to oblate ellipsoid, while infers symmetric sphere. The is in the range of [0, 1], where means that the RNA molecule is a perfect sphere, and otherwise, the value of indicates the extent of anisotropy.

FIGURE 2

2.3 Secondary Structure Motifs

To obtain the secondary structure motifs for an RNA PDB structure, the RNAStat can directly call the DSSR through the corresponding python command (e.g., x3dna-dssr.exe--json “+ ”-o = file); The DSSR is an integrated and automated command-line tool for analysis and annotation of RNA tertiary structures, and it can characterize nucleotides, base pairs, pseudoknots, loops, stems, and coaxially stacked helices (Lu et al., 2015; Lu, 2020); see an example in Figure 3. Based on the information extracted from DSSR, for an RNA structure set, the RNAStat can further provide the statistics of secondary structural elements, including base-pairs, stems, and various loops. In this work, we considered all C-G, A-U and G-U pairs to be canonical base pairs, and all other base pairs to be non-canonical ones, and the definitions of the secondary structural motifs can be found everywhere (Leontis and Westhof, 2001) and the simple illustration of them are also shown in Figure 3.

FIGURE 3

2.4 Geometry of Base-Pairing and Base-Stacking

Since base-pairing and base-stacking are critical interactions that stabilize RNA 3D structures (Butcher and Pyle, 2011; Bottaro et al., 2014; Wang et al., 2016; Wang et al., 2020), the RNAStat can calculate the geometry between two bases in base-pairing/stacking. First, the whole nucleobase (i.e., A, U, G, and C) is treated as a single rigid group, and a coordinate system is set up on each base, with the origin (O) at the geometric center of all the heavy atoms. Similar to the local referential of a nucleotide introduced by Gendron et al. (2001), for pyrimidines (or purines), the two unit vectors, between coordinates of atom N1 and C8 (C4 in purines), and between coordinates of atom N1 and N3, can be built, and the unit vector is oriented along the cross product . The unit vector is built between coordinates of the origin (O) and atom N1, and the unit vector is given by ; see Figure 4A. Following this definition, the position of base j in the coordinate system constructed on base i is described by the vector r_ij, which can be conveniently expressed in cylindrical coordinates (ρ,θ,z) (Gendron et al., 2001; Das and Baker, 2007; Flores et al., 2011; Bottaro et al., 2014). And then, the geometry of pairing and stacking bases can be described by the distance ρ and angle θ. Based on the information of base-pairing from DSSR, the distributions of ρ and θ can be used to characterize the geometry of different base pairs including canonical and non-canonical Watson-Crick base pairs as well as those interacting through the Hoogsteen or sugar edge; see Figures 4B,C. The definitions of different types of base-pairing can be found in Ref. (Leontis and Westhof, 2001). and Supplementary Figure S2 in the Supplementary Material. Similarly, the stacking geometric property between two neighboring bases can also be characterized by ρ-θ planes (Figure 4D).

FIGURE 4

2.5 Distance Between Any Two Atoms

As described in the Introduction, the most existing statistical potentials for RNA structure evaluation are based on the distances between various type atoms (Miao and Westhof, 2017; Tan et al., 2019). Based on the coordinates of all the heavy atoms in an RNA structure (i.e.,.cif file), the distance between any two atoms i and j with types of a and b, respectively, can be simply calculated in Cartesian coordinate by:where is the coordinates of the ith atom with type of a (e.g., P and C4′). In the RNAStat, there are two modes for users to choose: 1) calculating distances between atoms specified by the user; 2) calculating all distances between any two types of atoms. In addition to the calculation of distance, the RNAStat can automatically output the distribution of the distance between two atom types, and which could be directly used to construct distance-based statistical potential (Capriotti et al., 2011; Wang et al., 2015; Tan et al., 2019).

2.6 Dataset Used in This Work

To test the RNAStat, we established a non-redundant dataset based on the RNA 3D Hub set (Release nrlist_3.157_4.0 Å), in which the sequence identity between any two chains in the set is less than 95% (Leontis and Zirbel, 2012). Firstly, we collected 1,245 representative RNAs of all the different clusters with a resolution <4.0 Å from RNA 3D Hub list, which can be downloaded from http://rna.bgsu.edu/rna3dhub/nrlist. Then, we deleted the structure of non-RNA strands in the dataset. Afterwards, we removed the RNA structures with sequence identity ˃ 80% using the BLASTN program (Camacho et al., 2009). Finally, through the prior operation steps, 748 RNA structures were retained and their 3D structure files were downloaded from the PDB. The final RNA structure dataset used in this work can be found in the Supplementary Material as well as at GitHub (https://github.com/RNA-folding-lab/RNAStat), including PDB IDs, and PDB CIF files.

3 Results and Discussion

3.1 Overview of the RNAStat

In this work, we present the RNAStat, an integrated tool for making comprehensive statistics on RNA 3D structures. As shown in Figure 1, the RNAStat can be used to do statistical analysis for RNA 3D structures at different levels, such as global 3D structure level, secondary structure level, and atom level. The code of the RNAStat in python can be found at GitHub through https://github.com/RNA-folding-lab/RNAStat. In the following, we will give a brief introduction of the usage method of the tool.

The input to RNAStat is the coordinate file(s) of RNA 3D structure(s) in CIF format. Based on the needs of users, the input can be a single PDB file of an RNA structure or the PDB files for a given RNA structure set. For each PDB file, the RNAStat can calculate the size and shape of the RNA through Eqs 1–4 (in section of Materials and Methods), and call the DSSR to obtain its secondary structure motifs, e.g., the information of base-pairs, stems and various loops; see Figure 3. In the RNAStat, the distance between any heavy atom pair can also be calculated by Eq. 5, and the atom pair types can be specified by the user or default to all kinds of atom types, where 85 heavy atom types in four nucleotides (A, U, G, and C) are considered (Wang et al., 2015; Tan et al., 2019); see Supplementary Table S2 in the Supplementary Material. In addition, based on the information of base-pairing and the coordinates of atoms in two paired bases, the geometrical properties of base-pairing and base-stacking can also be calculated.

More importantly, for RNA structure set, the RNAstat can provide statistical information for all the above structural properties as well as the frequency distribution of various base pairs, which could be directly used to build statistical potentials for RNA structure evaluation or refinement (Miao et al., 2017; Tan et al., 2019; Xiong et al., 2021). The details of the methods for the calculations and statistical analysis can be found in section of Materials and Methods.

3.2 Test on the RNA Structure Set

To show the applicability of the RNAStat tool, we established a non-redundant RNA 3D structure dataset (see Materials and Methods), and took it as an example for RNA 3D structure analysis and statistic. Simultaneously, based on the RNA structure set, we also provided various statistical results of RNA structures, and which could contribute to building RNA statistical potentials or energy function of RNA CG models.

3.2.1 Size and Shape of RNA Structures

We calculated the radius of gyration for the 748 RNA structures in the dataset using Eq. 1, and found that generally increases with RNA length L; seen in Figure 2B. Further regression analysis showed that of RNA structures can be calculated byindicating that of folded RNA structures follows the Flory scaling law (Tanner, 2016; Hyeon et al., 2006). Although this is in accordance with the result from Hyeon et al. (i.e., ) (Hyeon et al., 2006), the parameters are slightly different. The reasons may be that the RNA structures in our non-redundant dataset are more diverse, and each is calculated based on the entire RNA structure no matter how many chains in the RNA, instead of based on each RNA chain. As shown in Supplementary Figure S3 in the Supplementary Material, the length of most RNAs in dataset is in the range of (10, 100). The corresponding regression equation for these short RNAs is (Figure 2B), suggesting that the length-dependence of structure size is relatively weak for long RNAs due to the more compact conformations. In addition, since RNA is a polyelectrolyte, its size also depends on the ion concentration (Woodson, 2005; Tan and Chen, 2006; Tan et al., 2015), which is one of the reasons why the of RNAs with same length have a significant difference.

Figure 2C depicts the distribution of asphericity parameter ∆ of RNA structures in the dataset, where ∆ spans over the whole range from 0 to 0.8, and ∼60% has ∆<0.2, suggesting that RNAs are mostly spherical in nature (Hyeon et al., 2006; Tan et al., 2015). The distribution of the shape parameter S for RNA structures is displayed in Figure 2D. The plot exhibits that almost all RNAs have S > 0, and the distribution has a significant peak around S = 0, implying that RNAs do not deviate much from the spherical symmetry. Our statistics on ∆ and S are very close to the results from RNA complexes reported in Ref. (Hyeon et al., 2006), while are with the different from those of single-chain RNAs.

3.2.2 Statistics on RNA Secondary Motifs

Since RNA structure formation is generally hierarchical (Brion and Westhof, 1997), the information of RNA secondary structures could be the key to evaluate or predict RNA tertiary structures. The DSSR software can be called by the RNAStat to analyze all the RNA tertiary structures in the dataset; see Figure 3. Based on the results from DSSR, various statistics on RNA secondary motifs can be showed.

As shown in Figure 5; Supplementary Tables S3–S5 in the Supplementary Material, the guanine nucleotide (i.e., G) and the base pairs of G-C/C-G are the most common in the RNA dataset, e.g., the probability of occurrence of G (∼34%) is apparently higher than that of the other bases. Using the dataset of RNA structures, we found that the number of base pairs grows linearly with the sequence length L with the slope as ∼0.48 (i.e., ), and the number of non-canonical base pair also increases significantly with L: ; see Figure 5B, suggesting that interaction of non-canonical base-pairing is rather important in 3D structure modeling for RNAs, especially for large RNAs (Das et al., 2010; Tan et al., 2015).

FIGURE 5

Figure 5C shows the probability of the occurrence of base pairs including canonical and non-canonical base pairs; seen also in Supplementary Table S4 in the Supplementary Material, and due to the proportional relation between base-pairing strength and their relative probability, this statistic of base pairs can be directly used to parameterize the base-pairing energy function for RNA models. For example, based on the relative probability between G-C/C-G (∼40%) and A-U/U-A (∼20%), we have set that the energy of G-C is twice the strength of the A-U in our CG model (Shi Y.-Z. et al., 2014; Jin et al., 2019), and the common non-canonical base pairs (e.g., A-G, A-A, and G-G) will be further taken into account. In addition, base-pair stacking make a significant contribution to the stability of an RNA structure (Schlick and Pyle, 2017; Miao and Westhof, 2017; Brion and Westhof, 1997; Laing and Schlick, 2009), and the stacking interaction parameters can also be obtained from the statistical frequency of base-pair stack (Supplementary Table S5 in the Supplementary Material), which could improve the predictions of RNA secondary (or 3D) structures and their thermodynamic stability (Dima et al., 2005; Gardner et al., 2011; Sloma and Mathews, 2017).

Furthermore, the distribution of length of RNA secondary structure motifs (e.g., stem and loops) could be helpful in the evaluation of structures predicted by ab initio models (Brion and Westhof, 1997; Danaee et al., 2018). Figure 6A displays the distribution of the length of stem, which is defined by the number of continuous canonical base pairs (Lu et al., 2015). Although the distribution of stem length for the RNAs in dataset is very broad, there is a prominent peak around ∼2 bp and the length of stem greater than 10 bp occur much less frequently; see Figure 6A, suggesting that stems are constantly interrupted by loops (Figure 6B) (Danaee et al., 2018). For hairpin loops shown in Figure 6C, we found that hairpin loops are most likely to have a length of 4 nt, i.e., tetraloops, which have been proved to be extremely stable by thermodynamic experiments (Butcher and Pyle, 2011), and the heptaloops (i.e., hairpin loops of length 7 nt) are the second most frequent, in line with the results from bpRNA, and RNA 3D Motif Atlas (Danaee et al., 2018; Parlea et al., 2016). On the contrary, the distribution of the bulge loop length only has one very significant peak at 1 nt, and almost all the bulge loops are with length less than 5 nt; seen in Figure 6D. The reasons could be that one stem interrupted by short bulge loops (e.g., 1 nt) is generally as stable as continuous helix with same sequence due to the coaxial-stacking interaction between two stems (Shi et al., 2015; Butcher and Pyle, 2011), while the stability of RNAs is reduced with the increase of the length of bulge loop (Zhang et al., 2019). As shown in Figures 6E,F, the distributions of internal/junction loop lengths are more complex, with more than one broad peak. For example, there are about four visible peaks observed for internal loop at 2, 4, 6, and 9 nt, respectively. Since the bases in two sides (5′ and 3′) of an internal loop often pairing together in non-canonical way, the internal loops often tend to be symmetric in order to keep a more stable structure (Laing and Schlick, 2009; Butcher and Pyle, 2011; Gardner et al., 2011). However, we only calculated the length of the entire loop without distinguishing 5′ and 3’ loop sequences, for simplicity in the present version of the RNAStat. More detailed statistics of internal/multi-loops should be taken into account in the future to help improve their energy parameters calculation.

FIGURE 6

3.2.3 Statistics on Geometry of Base-Pairing and Base-Stacking

On account of the importance of the geometrical configuration of base-pair/stacking in RNA 3D modeling (Das and Baker, 2007; Bottaro et al., 2014), the RNAStat provides the calculation or statistic of geometry of base-pairing/stacking for RNA structures; see the section of Materials and Methods. For the RNA structure dataset used in this work, the statistical results of base pairs including canonical and non-canonical ones are shown in Figure 4 and Supplementary Figure S4 in the Supplementary Material. For example, Figure 4B shows the geometric position distribution of the base U around its paired base A in A-U base pairs. Obviously, the base U appears frequently around base A at Å and corresponding to the position of canonical Watson-Crick base pairs, while the other two high probability of occurrence positions are around and , where the two bases can interact through the Hoogsteen or sugar edge; see Supplementary Figure S3 in the Supplementary Material. Naturally, the base U is almost unobservable at , where is occupied by the sugar. In contrast, the G-A base pair prefer to interact through the sugar edges; see Supplementary Figure S2 in the Supplementary Material. As shown in Figure 4D; Supplementary Figure S4 Supplementary Material, for the distribution of two stacking bases, e.g., adjacent C and G pairing with their complementary bases respectively, the base G occurs mainly above or below the base C with Å, and (Butcher and Pyle, 2011; Bottaro et al., 2014). In addition, the 3D probability distribution for each base pair can also be present (Supplementary Figure S7 in the Supplementary Material), based on which, the 3D Gaussians for each possible Leontis-Westhof (LW) base pair type and for each applicable choice of two residue types can be fitted to obtain the corresponding mean and standard deviation; see Supplementary Table S6; Supplementary Figure S8 in the Supplementary Material.

Supplementary Figures S4–S8 in the Supplementary Material show the distributions for all the base-pairing and stacking, and the corresponding data files as well as fitting parameters ( and for all base pairs with different LW types) can also be found at GitHub (https://github.com/RNA-folding-lab/RNAStat), which can be directly employed by the user to establish base-pairing/stacking potentials for RNA 3D structure prediction or evaluation.

3.2.4 Distributions of the Distance Between Atoms

In view of the fact that most of the knowledge-based statistical potentials for RNA structure evaluation are based on the distances between atoms (Bernauer et al., 2011; Capriotti et al., 2011; Huang and Zou, 2011; Tan et al., 2019). The RNAStat can also be used to calculate the distance between any two non-bonded heavy atoms located at different nucleotides in RNA. For example, the distribution of distance between two atoms with type of P is shown in Figure 7A. In addition to a very broad peak at ∼70 Å, there are three noteworthy peaks at ∼5.7 Å, ∼11.2 Å, and ∼18.4 Å, respectively. The first two peaks are corresponding to the distances of two P atoms in the nearest neighbor nucleotides and next-nearest neighbor nucleotides, respectively, and the third peak represents distance between two P atoms in paired nucleotides; see Figures 7B,C. More distance distributions of atoms with various types can also be found in Supplementary Figure S9 in the Supplementary Material as well as data files at GitHub. Besides, the RNAStat tool also allows the users to input the atoms or atom types to perform statistical analysis for their distances; see in the section of Materials and Methods.

FIGURE 7

4 Conclusion

In summary, RNAStat is an integrated computational tool to perform comprehensive statistical analysis for the RNA 3D structures given by the users. The tool cannot only automatically calculate RNA global structural properties such as size and shape, but also analyze atom-atom distance distributions at atomic level. Furthermore, the tool can provide statistics of RNA secondary structure elements (e.g., canonical/non-canonical base pairs, stems and various loops) and geometric properties of base-pairing and base-stacking. In this work, we have established and utilized a non-redundant RNA 3D structure dataset to test the usability of the tool, and the statistical data could be directly used to build statistical potentials or energy functions for RNA 3D structure evaluation and prediction.

Still and all, further improvements need to be made on the tool to perform more detailed statistical analysis and to make it easier to use. For example, most of the available RNA statistical potentials generally adopt a distance-dependent scheme, however for proteins, the orientation-dependent statistical potentials, which consider the many-body interactions by statistically describing both distance and relative orientation between interacting atom groups, and have been proved to have better performance than the traditional distance-dependent potentials (Masso, 2018; Yu et al., 2019; Zhang et al., 2020). Thus, in the further development of RNAStat, the distribution of orientation (e.g., angle and torsion angle) between atoms as well as the joint probability at the given relative distance and orientation of observing two atoms should be taken into account. In addition, although the RNAStat is free-installation and convenient to use through command lines, it is still required the python installation or corresponding environment configuration. Thus, a user-friendly webserver could be further built after the deepened improvement for the tool. Very recent studies have shown that RNA scoring functions derived from deep learning of RNA 3D structures performed well in identification of accurate structural models (Kurgan and Zhou, 2011; Li et al., 2018; Wang et al., 2018; Huang et al., 2020; Townshend et al., 2021), which suggests that more potential structural features of RNAs should be further mined with the aid of deep neural networks.

Statements

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author contributions

Z-HG, Y-ZS, and B-GZ designed the research; Z-HG and LY performed the experiments. Z-HG and Y-ZS analyzed the data. Y-ZS, Z-HG, and Y-LT wrote the manuscript. All authors discussed the results and reviewed the manuscript.

Funding

This work was supported by the Grants from the National Science Foundation of China (11971367 and 11605125).

Acknowledgments

We are grateful to Professors Zhi-Jie Tan (Wuhan University), and Jie Liu (Wuhan Textile University) for valuable discussions.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fbinf.2021.809082/full#supplementary-material

References

1
AndronescuM.BeregV.HoosH. H.CondonA. (2008). RNA STRAND: the RNA Secondary Structure and Statistical Analysis Database. BMC Bioinformatics9, 340. 10.1186/1471-2105-9-340
- CrossRef
- Google Scholar
2
BaulinE.YacovlevV.KhachkoD.SpirinS.RoytbergM. (2016). URS DataBase: Universe of RNA Structures and Their Motifs. Database (Oxford)2016, baw085. 10.1093/database/baw085
- CrossRef
- Google Scholar
3
BernauerJ.HuangX.SimA. Y.LevittM. (2011). Fully Differentiable Coarse-Grained and All-Atom Knowledge-Based Potentials for RNA Structure Evaluation. RNA17 (6), 1066–1075. 10.1261/rna.2543711
- CrossRef
- Google Scholar
4
BonieckiM. J.LachG.DawsonW. K.TomalaK.LukaszP.SoltysinskiT.et al (2016). SimRNA: a Coarse-Grained Method for RNA Folding Simulations and 3D Structure Prediction. Nucleic Acids Res.44 (7), e63. 10.1093/nar/gkv1479
- CrossRef
- Google Scholar
5
BottaroS.Di PalmaF.BussiG. (2014). The Role of Nucleobase Interactions in RNA Structure and Dynamics. Nucleic Acids Res.42 (21), 13306–13314. 10.1093/nar/gku972
- CrossRef
- Google Scholar
6
BrionP.WesthofE. (1997). Hierarchy and Dynamics of RNA Folding. Annu. Rev. Biophys. Biomol. Struct.26, 113–137. 10.1146/annurev.biophys.26.1.113
- CrossRef
- Google Scholar
7
ButcherS. E.PyleA. M. (2011). The Molecular Interactions that Stabilize RNA Tertiary Structure: RNA Motifs, Patterns, and Networks. Acc. Chem. Res.44 (12), 1302–1311. 10.1021/ar200098t
- CrossRef
- Google Scholar
8
CamachoC.CoulourisG.AvagyanV.MaN.PapadopoulosJ.BealerK.et al (2009). BLAST+: Architecture and Applications. BMC Bioinformatics10, 421. 10.1186/1471-2105-10-421
- CrossRef
- Google Scholar
9
CaoS.ChenS. J. (2011). Physics-based De Novo Prediction of RNA 3D Structures. J. Phys. Chem. B115, 4216–4226. 10.1021/jp112059y
- CrossRef
- Google Scholar
10
CapriottiE.NorambuenaT.Marti-RenomM. A.MeloF. (2011). All-atom Knowledge-Based Potential for RNA Structure Prediction and Assessment. Bioinformatics27 (8), 1086–1093. 10.1093/bioinformatics/btr093
- CrossRef
- Google Scholar
11
CechT. R.SteitzJ. A. (2014). The Noncoding RNA Revolution-Trashing Old Rules to Forge New Ones. Cell157, 77–94. 10.1016/j.cell.2014.03.008
- CrossRef
- Google Scholar
12
CockP. J.AntaoT.ChangJ. T.ChapmanB. A.CoxC. J.DalkeA.et al (2009). Biopython: Freely Available Python Tools for Computational Molecular Biology and Bioinformatics. Bioinformatics25 (11), 1422–1423. 10.1093/bioinformatics/btp163
- CrossRef
- Google Scholar
13
DanaeeP.RouchesM.WileyM.DengD.HuangL.HendrixD. (2018). bpRNA: Large-Scale Automated Annotation and Analysis of RNA Secondary Structure. Nucleic Acids Res.46, 5381–5394. 10.1093/nar/gky285
- CrossRef
- Google Scholar
14
DasR.BakerD. (2007). Automated De Novo Prediction of Native-like RNA Tertiary Structures. Proc. Natl. Acad. Sci. U S A.104, 14664–14669. 10.1073/pnas.0703836104
- CrossRef
- Google Scholar
15
DasR.KaranicolasJ.BakerD. (2010). Atomic Accuracy in Predicting and Designing Noncanonical RNA Structure. Nat. Methods7 (4), 291–294. 10.1038/nmeth.1433
- CrossRef
- Google Scholar
16
DenesyukN. A.ThirumalaiD. (2013). Coarse-grained Model for Predicting RNA Folding Thermodynamics. J. Phys. Chem. B117, 4901–4911. 10.1021/jp401087x
- CrossRef
- Google Scholar
17
DethoffE. A.ChughJ.MustoeA. M.Al-HashimiH. M. (2012). Functional Complexity and Regulation through RNA Dynamics. Nature482, 322–330. 10.1038/nature10885
- CrossRef
- Google Scholar
18
DimaR. I.HyeonC.ThirumalaiD. (2005). Extracting Stacking Interaction Parameters for RNA from the Data Set of Native Structures. J. Mol. Biol.347, 53–69. 10.1016/j.jmb.2004.12.012
- CrossRef
- Google Scholar
19
DohertyE. A.DoudnaJ. A. (2001). Ribozyme Structures and Mechanisms. Annu. Rev. Biophys. Biomol. Struct.30, 457–475. 10.1146/annurev.biophys.30.1.457
- CrossRef
- Google Scholar
20
Fernandez-LeiroR.ScheresS. H. (2016). Unravelling Biological Macromolecules with Cryo-Electron Microscopy. Nature537, 339–346. 10.1038/nature19948
- CrossRef
- Google Scholar
21
FloresS. C.AltmanR. B. (2010). Turning Limited Experimental Information into 3D Models of RNA. RNA16 (9), 1769–1778. 10.1261/rna.2112110
- CrossRef
- Google Scholar
22
FloresS. C.BernauerJ.ShinS.ZhouR.HuangX. (2012). Multiscale Modeling of Macromolecular Biosystems. Brief Bioinform13 (4), 395–405. 10.1093/bib/bbr077
- CrossRef
- Google Scholar
23
FloresS. C.ShermanM. A.BrunsC. M.EastmanP.AltmanR. B. (2011). Fast Flexible Modeling of RNA Structure Using Internal Coordinates. Ieee/acm Trans. Comput. Biol. Bioinform.8 (5), 1247–1257. 10.1109/TCBB.2010.104
- CrossRef
- Google Scholar
24
FloresS. C.WanY.RussellR.AltmanR. B. (2010). Predicting RNA Structure by Multiple Template Homology Modeling. Pac. Symp. Biocomput2010, 216–227. 10.1142/9789814295291_0024
- CrossRef
- Google Scholar
25
GanH. H.FeraD.ZornJ.ShiffeldrimN.TangM.LasersonU.et al (2004). RAG: RNA-As-Graphs Database-Cconcepts, Analysis, and Features. Bioinformatics20 (8), 1285–1291. 10.1093/bioinformatics/bth084
- CrossRef
- Google Scholar
26
GardnerD. P.RenP.OzerS.GutellR. R. (2011). Statistical Potentials for Hairpin and Internal Loops Improve the Accuracy of the Predicted RNA Structure. J. Mol. Biol.413, 473–483. 10.1016/j.jmb.2011.08.033
- CrossRef
- Google Scholar
27
GendronP.LemieuxS.MajorF. (2001). Quantitative Analysis of Nucleic Acid Three-Dimensional Structures. J. Mol. Biol.308 (5), 919–936. 10.1006/jmbi.2001.4626
- CrossRef
- Google Scholar
28
HajdinC. E.DingF.DokholyanN. V.WeeksK. M. (2010). On the Significance of an RNA Tertiary Structure Prediction. RNA16, 1340–1349. 10.1261/rna.1837410
- CrossRef
- Google Scholar
29
HuangB.DuY.ZhangS.LiW.WangJ.ZhangJ. (2020). Computational Prediction of RNA Tertiary Structures Using Machine Learning Methods. Chin. Phys. B29, 108704. 10.1088/1674-1056/abb303
- CrossRef
- Google Scholar
30
HuangS. Y.ZouX. (2011). Statistical Mechanics-Based Method to Extract Atomic Distance-dependent Potentials from Protein Structures. Proteins79 (9), 2648–2661. 10.1002/prot.23086
- CrossRef
- Google Scholar
31
HyeonC.DimaR. I.ThirumalaiD. (2006). Size, Shape, and Flexibility of RNA Structures. J. Chem. Phys.125 (19), 194905. 10.1063/1.2364190
- CrossRef
- Google Scholar
32
JianY.WangX.QiuJ.WangH.LiuZ.ZhaoY.et al (2019). DIRECT: RNA Contact Predictions by Integrating Structural Patterns. BMC Bioinformatics20 (1), 497. 10.1186/s12859-019-3099-4
- CrossRef
- Google Scholar
33
JinL.TanY. L.WuY.WangX.ShiY. Z.TanZ. J. (2019). Structure Folding of RNA Kissing Complexes in Salt Solutions: Predicting 3D Structure, Stability, and Folding Pathway. RNA25, 1532–1548. 10.1261/rna.071662.119
- CrossRef
- Google Scholar
34
JonikasM. A.RadmerR. J.LaederachA.DasR.PearlmanS.HerschlagD.et al (2009). Coarse-grained Modeling of Large RNA Molecules with Knowledge-Based Potentials and Structural Filters. RNA15, 189–199. 10.1261/rna.1270809
- CrossRef
- Google Scholar
35
KrokhotinA.HoulihanK.DokholyanN. V. (2015). iFoldRNA V2: Folding RNA with Constraints. Bioinformatics31 (17), 2891–2893. 10.1093/bioinformatics/btv221
- CrossRef
- Google Scholar
36
KurganL.ZhouY. (2011). Machine Learning Models in Protein Bioinformatics. Curr. Protein Pept. Sci.12 (6), 455. 10.2174/138920311796957621
- CrossRef
- Google Scholar
37
LaingC.SchlickT. (2009). Analysis of Four-Way Junctions in RNA Structures. J. Mol. Biol.390 (3), 547–559. 10.1016/j.jmb.2009.04.084
- CrossRef
- Google Scholar
38
LeontisN. B.WesthofE. (2001). Geometric Nomenclature and Classification of RNA Base Pairs. RNA7 (4), 499–512. 10.1017/s1355838201002515
- CrossRef
- Google Scholar
39
LeontisN. B.ZirbelC. L. (2012). “Nonredundant 3D Structure Datasets for RNA Knowledge Extraction and Benchmarking,”. RNA 3D Structure Analysis and Prediction. Editors LeontisNWesthofE (Berlin, Heidelberg: Springer), 27, 281–298. 10.1007/978-3-642-25740-7_13
- CrossRef
- Google Scholar
40
LiJ.ZhangJ.WangJ.LiW.WangW. (2016). Structure Prediction of RNA Loops with a Probabilistic Approach. Plos Comput. Biol.12 (8), e1005032. 10.1371/journal.pcbi.1005032
- CrossRef
- Google Scholar
41
LiJ.ZhuW.WangJ.LiW.GongS.ZhangJ.et al (2018). RNA3DCNN: Local and Global Quality Assessments of RNA 3D Structures Using 3D Deep Convolutional Neural Networks. Plos Comput. Biol.14 (11), e1006514. 10.1371/journal.pcbi.1006514
- CrossRef
- Google Scholar
42
LuX. J.BussemakerH. J.OlsonW. K. (2015). DSSR: an Integrated Software Tool for Dissecting the Spatial Structure of RNA. Nucleic Acids Res.43 (21), e142. 10.1093/nar/gkv716
- CrossRef
- Google Scholar
43
LuX. J. (2020). DSSR-enabled Innovative Schematics of 3D Nucleic Acid Structures with PyMOL. Nucleic Acids Res.48 (13), e74. 10.1093/nar/gkaa426
- CrossRef
- Google Scholar
44
MagnusM.AntczakM.ZokT.WiedemannJ.LukasiakP.CaoY.et al (2020). RNA-puzzles Toolkit: a Computational Resource of RNA 3D Structure Benchmark Datasets, Structure Manipulation, and Evaluation Tools. Nucleic Acids Res.48 (2), 576–588. 10.1093/nar/gkz1108
- CrossRef
- Google Scholar
45
MassoM. (2018). All-atom Four-Body Knowledge-Based Statistical Potential to Distinguish Native Tertiary RNA Structures from Nonnative Folds. J. Theor. Biol.453, 58–67. 10.1016/j.jtbi.2018.05.022
- CrossRef
- Google Scholar
46
MiaoZ.AdamiakR. W.AntczakM.BateyR. T.BeckaA. J.BiesiadaM.et al (2017). RNA-puzzles Round III: 3D RNA Structure Prediction of Five Riboswitches and One Ribozyme. RNA23, 655–672. 10.1261/rna.060368.116
- CrossRef
- Google Scholar
47
MiaoZ.WesthofE. (2017). RNA Structure: Advances and Assessment of 3D Structure Prediction. Annu. Rev. Biophys.46, 483–503. 10.1146/annurev-biophys-070816-034125
- CrossRef
- Google Scholar
48
ParisienM.MajorF. (2008). The MC-fold and MC-Sym Pipeline Infers RNA Structure from Sequence Data. Nature452, 51–55. 10.1038/nature06684
- CrossRef
- Google Scholar
49
ParleaL. G.SweeneyB. A.Hosseini-AsanjanM.ZirbelC. L.LeontisN. B. (2016). The RNA 3D Motif Atlas: Computational Methods for Extraction, Organization and Evaluation of RNA Motifs. Methods103, 99–119. 10.1016/j.ymeth.2016.04.025
- CrossRef
- Google Scholar
50
PasqualiS.DerreumauxP. (2010). HiRE-RNA: a High Resolution Coarse-Grained Energy Model for RNA. J. Phys. Chem. B114 (37), 11957–11966. 10.1021/jp102497y
- CrossRef
- Google Scholar
51
PopendaM.SzachniukM.AntczakM.PurzyckaK. J.LukasiakP.BartolN.et al (2012). Automated 3D Structure Composition for Large RNAs. Nucleic Acids Res.40, e112. 10.1093/nar/gks339
- CrossRef
- Google Scholar
52
RawatN.BiswasP. (2009). Size, Shape, and Flexibility of Proteins and DNA. J. Chem. Phys.131 (16), 165104. 10.1063/1.3251769
- CrossRef
- Google Scholar
53
RoseP. W.PrlićA.AltunkayaA.BiC.BradleyA. R.ChristieC. H.et al (2017). The RCSB Protein Data Bank: Integrative View of Protein, Gene and 3D Structural Information. Nucleic Acids Res.45, D271–D281. 10.1093/nar/gkw1000
- CrossRef
- Google Scholar
54
RotherM.RotherK.PutonT.BujnickiJ. M. (2011). ModeRNA: a Tool for Comparative Modeling of RNA 3D Structure. Nucleic Acids Res.39 (10), 4007–4022. 10.1093/nar/gkq1320
- CrossRef
- Google Scholar
55
SchlickT.PyleA. M. (2017). Opportunities and Challenges in RNA Structural Modeling and Design. Biophys. J.113, 225–234. 10.1016/j.bpj.2016.12.037
- CrossRef
- Google Scholar
56
ShiY.-Z.WuY.-Y.WangF.-H.TanZ.-J. (2014b). RNA Structure Prediction: Progress and Perspective. Chin. Phys. B23, 078701. 10.1088/1674-1056/23/7/078701
- CrossRef
- Google Scholar
57
ShiY. Z.JinL.FengC. J.TanY. L.TanZ. J. (2018). Predicting 3D Structure and Stability of RNA Pseudoknots in Monovalent and Divalent Ion Solutions. Plos Comput. Biol.14 (6), e1006222. 10.1371/journal.pcbi.1006222
- CrossRef
- Google Scholar
58
ShiY. Z.JinL.WangF. H.ZhuX. L.TanZ. J. (2015). Predicting 3D Structure, Flexibility, and Stability of RNA Hairpins in Monovalent and Divalent Ion Solutions. Biophys. J.109, 2654–2665. 10.1016/j.bpj.2015.11.006
- CrossRef
- Google Scholar
59
ShiY. Z.WangF. H.WuY. Y.TanZ. J. (2014a). A Coarse-Grained Model with Implicit Salt for RNAs: Predicting 3D Structure, Stability and Salt Effect. J. Chem. Phys.141, 105102. 10.1063/1.4894752
- CrossRef
- Google Scholar
60
SlomaM. F.MathewsD. H. (2017). Base Pair Probability Estimates Improve the Prediction Accuracy of RNA Non-canonical Base Pairs. Plos Comput. Biol.13 (11), e1005827. 10.1371/journal.pcbi.1005827
- CrossRef
- Google Scholar
61
ŠulcP.RomanoF.OuldridgeT. E.DoyeJ. P.LouisA. A. (2014). A Nucleotide-Level Coarse-Grained Model of RNA. J. Chem. Phys.140 (23), 235102. 10.1063/1.4881424
- CrossRef
- Google Scholar
62
TanY. L.FengC. J.JinL.ShiY. Z.ZhangW.TanZ. J. (2019). What Is the Best Reference State for Building Statistical Potentials in RNA 3D Structure Evaluation?RNA25 (7), 793–812. 10.1261/rna.069872.118
- CrossRef
- Google Scholar
63
TanZ.ZhangW.ShiY.WangF. (2015). RNA Folding: Structure Prediction, Folding Kinetics and Ion Electrostatics. Adv. Exp. Med. Biol.827, 143–183. 10.1007/978-94-017-9245-5_11
- CrossRef
- Google Scholar
64
TanZ. J.ChenS. J. (2006). Nucleic Acid helix Stability: Effects of Salt Concentration, Cation Valence and Size, and Chain Length. Biophys. J.90, 1175–1190. 10.1529/biophysj.105.070904
- CrossRef
- Google Scholar
65
TannerJ. J. (2016). Empirical Power Laws for the Radii of Gyration of Protein Oligomers. Acta Crystallogr. D Struct. Biol.72, 1119–1129. 10.1107/S2059798316013218
- CrossRef
- Google Scholar
66
TownshendR. J. L.EismannS.WatkinsA. M.RanganR.KarelinaM.DasR.et al (2021). Geometric Deep Learning of RNA Structure. Science373 (6558), 1047–1051. 10.1126/science.abe5650
- CrossRef
- Google Scholar
67
WangJ.ZhaoY.ZhuC.XiaoY. (2015). 3dRNAscore: a Distance and Torsion Angle Dependent Evaluation Function of 3D RNA Structures. Nucleic Acids Res.43 (10), e63. 10.1093/nar/gkv141
- CrossRef
- Google Scholar
68
WangK.JianY.WangH.ZengC.ZhaoY. (2018). RBind: Computational Network Method to Predict RNA Binding Sites. Bioinformatics34 (18), 3131–3136. 10.1093/bioinformatics/bty345
- CrossRef
- Google Scholar
69
WangY.GongS.WangZ.ZhangW. (2016). The Thermodynamics and Kinetics of a Nucleotide Base Pair. J. Chem. Phys.144 (11), 115101. 10.1063/1.4944067
- CrossRef
- Google Scholar
70
WangY.LiuT.YuT.TanZ. J.ZhangW. (2020). Salt Effect on Thermodynamics and Kinetics of a Single RNA Base Pair. RNA26 (4), 470–480. 10.1261/rna.073882.119
- CrossRef
- Google Scholar
71
WesthofE.LeontisN. B. (2021). An RNA-Centric Historical Narrative Around the Protein Data Bank. J. Biol. Chem.296, 100555. 10.1016/j.jbc.2021.100555
- CrossRef
- Google Scholar
72
WoodsonS. A. (2005). Metal Ions and RNA Folding: a Highly Charged Topic with a Dynamic Future. Curr. Opin. Chem. Biol.9, 104–109. 10.1016/j.cbpa.2005.02.004
- CrossRef
- Google Scholar
73
XiaZ.BellD. R.ShiY.RenP. (2013). RNA 3D Structure Prediction by Using a Coarse-Grained Model and Experimental Data. J. Phys. Chem. B117 (11), 3135–3144. 10.1021/jp400751w
- CrossRef
- Google Scholar
74
XiongP.WuR.ZhanJ.ZhouY. (2021). Pairing a High-Resolution Statistical Potential with a Nucleobase-Centric Sampling Algorithm for Improving RNA Model Refinement. Nat. Commun.12 (1), 2777. 10.1038/s41467-021-23100-4
- CrossRef
- Google Scholar
75
YanY.WenZ.ZhangD.HuangS. Y. (2018). Determination of an Effective Scoring Function for RNA-RNA Interactions with a Physics-Based Double-Iterative Method. Nucleic Acids Res.46 (9), e56. 10.1093/nar/gky113
- CrossRef
- Google Scholar
76
YuZ.YaoY.DengH.YiM. (2019). ANDIS: an Atomic Angle- and Distance-dependent Statistical Potential for Protein Structure Quality Assessment. BMC Bioinformatics20 (1), 299. 10.1186/s12859-019-2898-y
- CrossRef
- Google Scholar
77
ZhangB. G.QiuH. H.JiangJ.LiuJ.ShiY. Z. (2019). 3D Structure Stability of the HIV-1 TAR RNA in Ion Solutions: A Coarse-Grained Model Study. J. Chem. Phys.151 (16), 165101. 10.1063/1.5126128
- CrossRef
- Google Scholar
78
ZhangD.LiJ.ChenS. J. (2021). IsRNA1: De Novo Prediction and Blind Screening of RNA 3D Structures. J. Chem. Theor. Comput17, 1842–1857. 10.1021/acs.jctc.0c01148
- CrossRef
- Google Scholar
79
ZhangT.HuG.YangY.WangJ.ZhouY. (2020). All-atom Knowledge-Based Potential for RNA Structure Discrimination Based on the Distance-Scaled Finite Ideal-Gas Reference State. J. Comput. Biol.27, 856–867. 10.1089/cmb.2019.0251
- CrossRef
- Google Scholar
80
ZhaoY.HuangY.GongZ.WangY.ManJ.XiaoY. (2012). Automated and Fast Building of Three-Dimensional RNA Structures. Sci. Rep.2, 734. 10.1038/srep00734
- CrossRef
- Google Scholar

Summary

Keywords

RNA 3D structure, statistical analysis, secondary structure motifs, non-canonical base pair, structure evaluation

Citation

Guo Z-H, Yuan L, Tan Y-L, Zhang B-G and Shi Y-Z (2022) RNAStat: An Integrated Tool for Statistical Analysis of RNA 3D Structures. Front. Bioinform. 1:809082. doi: 10.3389/fbinf.2021.809082

Received

04 November 2021

Accepted

17 December 2021

Published

11 January 2022

Volume

1 - 2021

Edited by

Samuel Coulbourn Flores, Stockholm University, Sweden

Reviewed by

Xiaolei Zhu, Anhui Agricultural University, China

Sergio Martinez Cuesta, University of Cambridge, United Kingdom

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Ya-Zhou Shi, yzshi@wtu.edu.cn

This article was submitted to Drug Discovery in Bioinformatics, a section of the journal Frontiers in Bioinformatics

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Drug Discovery in Bioinformatics

ORIGINAL RESEARCH article

RNAStat: An Integrated Tool for Statistical Analysis of RNA 3D Structures

Abstract

1 Introduction