Abundances of microRNAs in human cells can be estimated as a function of the abundances of YRHB and RHHK tetranucleotides in these microRNAs as an ill-posed inverse problem solution

Mature microRNAs (miRNAs) are small endogenous non-coding RNAs 18–25 nt in length. They program the RNA Induced Silencing Complex (RISC) to make it inhibit either messenger RNAs or promoter DNAs. We have found that the mean abundance of miRNAs in Arabidopsis is correlated with the abundance of DRYD tetranucleotides near the 3′-end and the abundance of WRHB tetranucleotides in the center of the miRNA sequence. Based on this correlation, we have estimated miRNA abundances in seven organs of this plant, namely: inflorescences, stems, siliques, seedlings, roots, cauline, and rosette leaves. We have also found that the mean affinity of miRNAs for two proteins in the Argonaute family (Ago2 and Ago3) in man is correlated with the abundance of YRHB tetranucleotides near the 3′-end and that the preference of miRNAs for Ago2 is correlated with the abundance of RHHK tetranucleotides in the center of the miRNA sequence. This allowed us to obtain statistically significant estimates of miRNA abundances in human embryonic kidney cells, HEK293T. These findings in relation to two taxonomically distant entities (man and Arabidopsis) fit one another like pieces of a jigsaw puzzle, which allowed us to heuristically generalize them and state that the miRNA abundance in the human brain may be determined by the abundance of YRHB and RHHK tetranucleotides in these miRNAs.


INTRODUCTION
MicroRNAs (miRNAs) are small endogenous non-coding RNAs (Kozomara and Griffiths-Jones, 2010). Within the canonical biogenesis of miRNAs, their genes are transcribed by RNA polymerase II into the primary transcripts (pri-miRNAs). Special Microprocessor proteins cut away the first precursor of miRNA (pre-miRNA) and then the mature miRNA 18-25 nt in length (Kozomara and Griffiths-Jones, 2010). The miRNAs that maturate from other sources, including spliced-out introns (they are the source of mirtrons) and transfer RNAs (tRNAs), are called "non-canonical." They are less abundant in cells and their maturation time is deviant (Havens et al., 2012).
Mature miRNAs program the RISC (RNA-Induced Silencing Complex) to make it inhibit either messenger RNAs (mRNAs) or promoter DNAs (pDNAs) through the formation of mRNA(pDNA):miRNA-RISC complexes (Song et al., 2004). The function of the RISC depends upon what of the proteins in the Argonaute family is incorporated in the RISC (Gagnon and Corey, 2012). In the 3D structure of the mRNA:miRNA-Ago-RISC in Archaea (Song et al., 2004), the Ago protein interacts with the 3 -end of the miRNA.
Changes in mature miRNA abundance and sequence affecting interactions between the miRNAs and their targets were associated with various abnormalities, including neurodegeneration (Barbato et al., 2009) and cancer (Winter and Diederichs, 2011a). Winter and Diederichs (2011b) showed experimentally that the miRNA abundance in Ago2-deficient cells treated by the transcription inhibitor actinomycin D increases when the Ago2 protein is introduced to them ectopically, because the affinity of a particular miRNA for Ago2 protein influences the half-life of this miRNA in cells. Furthermore, Martinez and Gregory (2013) showed that Ago2 expression in mouse embryonic stem cells, originally low in Ago2 and then transfected by a vector containing Ago2, is dependent on miRNA abundance post-transcriptionally. Therefore, that miRNAs and human Ago2 stabilize each other is an experimentally established fact. How can we benefit from this fact?
Although nobody has measured in vivo the affinity of mature miRNAs for the different Argonaute proteins (Azuma-Mukai et al., 2008) or the abundance of these miRNAs in cells (Axtell and Bartel, 2005) under identical experimental conditions simultaneously, we have earlier demonstrated (Ponomarenko et al., 2001) that the patterns and features found in silico in one experiment readily apply to the next, at least within the limits of applicability of the theory that underlies these experiments. Consequently, we have had to work with disembodied experimental data on two taxonomically distant entities (man and Arabidopsis) using original ACTIVITY tools (Ponomarenko et al., 1997). As a result, we have successfully found correlations which fit each other like pieces of a puzzle created in two experiments, one by Winter and Diederichs (2011b) and another by Martinez and Gregory (2013), the mutual complementarity of which was, in fact, the starting point of our work. These correlations allowed us to generalize them into a heuristic hypothesis stating that miRNA abundance in the human brain depends on the abundance of YRHB and RHHK tetranucleotides in these miRNAs. This hypothesis was further confirmed using independent experimental data taken from the Sestan Brain Atlases (Kang et al., 2011).
The results obtained are discussed in terms of the "limiting stage" approximation, the linear-additive approximation, and an ill-posed inverse problem. This allowed us to conclude that in silico estimates like these can reach an acceptable accuracy level for their practical consideration by cancer and neurodegeneration researches once the preference of these miR-NAs for the proteins in the Argonaute family has become known, and so have yet unknown values of the affinity of any miRNA for two of the four proteins (50%), Ago1 and Ago4, which is absolutely required for a more accurate approximation.

NUCLEOTIDE SEQUENCES
The nucleotide sequences of the mature canonical Arabidopsis miRNAs {ξ i } were taken from a work by Axtell and Bartel (2005), ξ ∈ {a, u, g, c}. Seventeen out of 27 miRNAs were used as the training dataset (Table 1). Because miRNA lengths varied from 20 to 22 nt, our in silico processing was only confined to miR-NAs of a given length (in Tables 1, 2, these sequences are typed in CAPITALS).
The other 10 miRNAs (Figures 6, 7) were used as an independent experimental dataset (sequences not shown). Twenty-two Arabidopsis miRNAs taken from a work by Lu et al. (2005) were used as independent experimental control datasets ( Figure 4C; sequences not shown).
The nucleotide sequences of human mature miRNA were taken from a work by Azuma-Mukai et al. (2008). Twelve out of 28 mature canonical miRNAs were used as the training dataset ( Table 2).

BIOLOGICAL ACTIVITY
The relative values ranging from −0.5 to 7.8 ln for the miRNA abundance in Arabidopsis taken from a work by Axtell and Bartel (2005) are partly presented in Table 1 and fully in Figures 3, 4 (the y-axis).
The relative values ranging from 0 to 4 ln for the miRNA abundance in Arabidopsis obtained using Massively Parallel Signature Sequencing (MPSS) were taken from a work by Lu et al. (2005) and used as an independent experimental control dataset ( Figure 4, the y-axis).
The values ranging from 4.85 to 9.43 ln for the in vivo measured affinity of canonical miRNAs for the human Ago2 and Ago3 proteins were taken from a work by Azuma-Mukai et al. (2008), are partly presented in Table 2 and fully in Figures 5, 6 (the y-axis), while those for the affinity of 48 miRNAs named the "individual variants" by Azuma-Mukai et al. (2008) because of their 5 -and/or 3 -terminal differences from canonical mature miRNAs, which were associated by Azuma-Mukai et al. (2008) with (i) alternative maturation (Azuma-Mukai et al., 2008) or (ii) post-maturation processing (Azuma-Mukai et al., 2008), are shown in Figure 7.
The relative values ranging from −9.0 to 0.0 ln for the abundance of 96 human miRNAs in the human embryonic kidney cells HEK293T, some preincubated for 8 h with the transcription inhibitor actinomycin D and others not preincubated, were taken from a work by Bail et al. (2010) and used as an independent experimental control dataset (Figure 8, the y-axis).
The relative values ranging from 0 to 16 rel. un. for the miRNA abundance measured within 95 human brain regions or neocortical areas were taken from the Sestan Brain Atlases (Kang et al., 2011) and used as an independent experimental control dataset (Figure 9, the y-axis).

CORRELATIONS BETWEEN BIOLOGICAL ACTIVITY AND miRNA NUCLEOTIDE SEQUENCES
We have used our original development called ACTIVITY (Ponomarenko et al., 1997), which is a tool intended for the processing of input data on a pre-set biological activity, X({ξ i }) in known miRNA sequences, {ξ i } and searching for correlations in them.
Although ACTIVITY has been described in detail elsewhere (Ponomarenko et al., 1997), we will additionally provide a brief descriptions of its features that were critical to our current study.
First of all, ACTIVITY (Ponomarenko et al., 1997) searches for correlations between the biological activity of a miRNA, X({ξ i }), (expressed as its expression level; herein, as the experimentally measured abundance and miRNA/Ago affinity) and the weighted abundance of the tetranucleotides, [z 1 z 2 z 3 z 4 ] F , in the sequence {ξ i } of this miRNA: u, g, c, w, r, m, k, y, s, b, v, h, d, n} (IUPAC-IUB, 1971); 0 ≤ F(i) ≤ 1 is the weight of the tetranucleotide z 1 z 2 z 3 z 4 at the i-th position, with which we heuristically assessed its linear additive contribution to the X({ξ i }) value using the rule "the higher F(i), the greater the contribution" (Figure 1).
ACTIVITY (Ponomarenko et al., 1997) has built-in F(i) profiles: 180 U-shaped and 180 S-shaped curves for F(i) values, in which low and high weights have different locations and interval lengths. ACTIVITY works uniformly on each of all the possible FIGURE 1 | Sample weights (the y -axis), F(i), of tetranucleotide z 1 z 2 z 3 z 4 at the i -th position (the x -axis) of the sequence {ξ i } composed of L nt, with which we assessed the linear additive contribution to X({ξ i }) using the rule "the higher F(i), the greater the contribution." variants of weighted tetranucleotide abundance ([z 1 z 2 z 3 z 4 ] F ), their number being 360 × 15 4 = 18225000 ≈ 10 7 .
Furthermore, each [z 1 z 2 z 3 z 4 ] F {ξ i } value was compared with X({ξ i }) using bootstrapping (Hayes et al., 1989) (Hayes et al., 1989)], we seek to minimize the dependence of search results on the input dataset.
In each of these seven subsets, ACTIVITY (Ponomarenko et al., 1997) checks five types of correlation between X({ξ i }) and [z 1 z 2 z 3 z 4 ] F ({ξ i }): i) linear correlation; ii) Spearman's rank correlation; iii) Kendall's rank correlation; iv) dichotomous correlations tested by χ 2 ; and v) dichotomous correlations tested by the Fisher-Irwin test. Because it is possible to obtain quantitative estimates using linear correlations, such correlations could be useful if it were not for their sensitivity to data heterogeneity. By contrast, dichotomous correlations do not depend on data heterogeneity; however, they provide the least informative estimates above/below any pre-set threshold. Based on the usefulness-to-robustness ratio, rank correlations are between linear and dichotomous correlations. We search the input training dataset for different types of correlation and identify the best trade-offs.
As can be seen, the threshold α set at 0.05 was the reference point for υ = 0: when the tests were statistically significant, the utility values were positive and when the tests were not statistically significant, the utility values were negative. Each [z 1 z 2 z 3 z 4 ] F estimate in the input training dataset, X({ξ i }) and {ξ i }, was equal to their mean: Finally, in the given input training dataset {{ξ i }; X({ξ i })}, ACTIVITY (Ponomarenko et al., 1997) finds the only [z 1 z 2 z 3 z 4 ] F value with the highest ([z 1 z 2 z 3 z 4 ] F ({ξ i };X({ξ i })) > 0 or infers that the correlations found with the input training dataset are useless.

VERIFICATION OF THE CORRELATIONS FOUND
Because ACTIVITY finds the only best correlation from among 10 7 variants in any given input training dataset (Ponomarenko et al., 1997), verification is absolutely required. First of all, the Bonferroni test yields p( > 0) < 10 −20 (Omelyianchuk et al., 2011). This implies that it is quite unlikely that half of the 77 tests run can be satisfied simultaneously for random chance at α < 0.05.
In turn, the statistical significance of the correlation found by ACTIVITY (Ponomarenko et al., 1997) with the training dataset is tested for on the control dataset with unprocessed experimental data (Figures 3, 5).
Finally, the statistical significance of the correlations found by ACTIVITY (Ponomarenko et al., 1997) with the given training dataset is tested using independent experimental data (Figures 4, 7-9).

CLUSTER ANALYSIS OF THE miRNAs
In this work, we used standard statistical tools available in the STATISTICA system (Afifi et al., 2003), which has the "Joining (tree clustering)" mode in "Cluster" section under the "Multivariate/Exploratory" option in the "Statistics" part. Under this mode, we clustered all the RNAs being studied using all 42 = 7 × 6 possible combinations of seven Linkage rules: "Single linkage," "Complete linkage," "Unweighted pair-group average," "Weighted pair-group average," "Unweighted pairgroup centroid," "Weighted pair-group centroid," and "Ward's methods," and each from among six "Distance measures": "Squared Euclidian distance," "Euclidian distance," "City-block (Manhattan) distance," "Chebychev distance metric," "Power," "Percent disagreement," and "1-Pearson r." The color-coded results obtained from the most widely used (predefined) combination of the "Single linkage" rule and the "Euclidian distance" metric are shown in Figure 9. The results obtained from each of the other 41 combinations are not shown, because they have only minor deviations (less than 1% of RNAs) in the vicinity of the intercluster boundary caused by heterogeneity in experimental data.

miRNA ABUNDANCE IN ARABIDOPSIS
We ran ACTIVITY (Ponomarenko et al., 1997) on the experimental data (Axtell and Bartel, 2005) on the mean abundance of mature canonical ubiquitous miRNAs in Arabidopsis ( Table 1, ln[miRNA]). We composed a training dataset (Table 1) FIGURE 3 | Control test of the patterns (the x -axis) found by ACTIVITY (Ponomarenko et al., 1997) in the training dataset (Table 1) using independent experimental data (the y -axis) taken from the same data source (Axtell and Bartel, 2005). Two linear correlations in Arabidopsis: one (A) between miRNA abundance in the plant and [WRHB] F3 abundance in these miRNAs and one (B) between miRNA abundance and [DRYD] F4 abundance. Both are statistically significant in the control dataset of 11 miRNAs (Axtell and Bartel, 2005). Dashed curves depict 95% confidence intervals for linear regression (solid lines) built using STATISTICA (Afifi et al., 2003).
for ACTIVITY (Ponomarenko et al., 1997) consisting of all the variants that had the lowest and highest values for miRNA abundance in seven organs of this plant (inflorescence, stem, silique, seedling, root, cauline and rosette leaves) and the occurrence of nucleotides A, U, G, C, W, R, and K in miRNAs (IUPAC-IUB, 1971). The resulting 17 out of 27 miRNAs in the training dataset (Table 1) represent the ranges of values of miRNA properties rather than data heterogeneity (Azuma-Mukai et al., 2008). The other 10 miRNAs were used as an independent experimental control dataset (Figure 3).
The highest estimated value, = 0.48, was assigned [Equation (2)] to the correlation between the abundance of miRNAs (ln[miRNA]) in Arabidopsis and the abundance of WRHW tetranucleotides ([WRHW] F1 ) with its highest weight, F1(i), in the center of the miRNA (Figure 1, short-dashed line). In the control dataset, the correlation between [WRHW] F1 and ln[miRNA] was statistically significant (Figure 3A: Two more values were equal to 0.46 at the same WRHW tetranucleotide with narrower peaks (Ponomarenko et al., 2008). For the S-shaped weights, the highest estimated value, = 0.47, was assigned [Equation (2)] to the abundance of the DRYD tetranucleotide ([DRYD] F2 ) with its highest weight, F2(i), at the 3 -end of the miRNA (Figure 1, dotted line). In the control dataset, the correlation between [DRYD] F2 and ln [miRNA] was statistically significant (Figure 3B: r = 0.66, a < 0.05). No other values > 0 were found in the training set (Ponomarenko et al., 2008). The next nine z 1 z 2 z 3 z 4 tetranucleotides that had Because [WRHW] F1 and [DRYD] F2 were independent (r = 0.39, α > 0.25), we skipped the optimization procedure and derived the following formula: Estimates made with Equation (3) were statistically significantly (Figure 4, Table 3: r = 0.59, α < 0.0025) correlated with data reported by Axtell and Bartel (2005). As can be seen from Figure 4, the in vivo measured abundances of most miRNAs with calculated abundances ranging from 3.0 to 5.0 ln range from −0.5 to 5.5 ln (almost the full range of the graph). That is why the high r-value of the regression is probably due to the contribution of several anomalies like the most abundant miRNAs. To see if it is as it appears to be, we additionally estimated Spearmen's rank correlation coefficient (   (Table 1) and using independent experimental data (the y -axis) taken from various data sources. Twenty seven correlations in Arabidopsis (Table 3) between the miRNA abundance estimated by Equation (3), the x-axis, and those measured experimentally, the y-axis, namely: the mean abundances of 28 canonical miRNAs used above, the organ-specific abundance of these miRNAs in seedlings, siliques, inflorescences, stems, cauline leaves, rosette leaves, and roots as independently measured in vivo (Axtell and Bartel, 2005), and, finally, the abundances of 22 miRNAs obtained using Massively Parallel Signature Sequencing (MPSS) by Lu et al. (2005) as independent experimental control datasets. Dashed curves and solid lines as in the legend to Figure 3.
consider the ranks of [miRNA] values rather than their true values) for in silico [miRNA] values (the x-axis) and in vivo [miRNA] values (the y-axis). Also, Figure 4 shows seven statistically significant linear correlations between the estimates obtained using Equation (3) and the abundances of miRNA in seven organs of Arabidopsis (Axtell and Bartel, 2005), namely: inflorescences, stems, siliques, seedlings, roots, cauline leaves and rosette leaves. The dashed curves in this figure depict the boundaries of the 95% confidence interval for the mean miRNA abundance in Arabidopsis estimated by Frontiers in Genetics | Non-Coding RNA July 2013 | Volume 4 | Article 122 | 6 Equation (3). As can be seen, despite the statistical significance in the correlations between the value defined by Equation (3) and miRNA abundance, a large part of data points in Figure 4 exist outside of the dashed lines. This implies that Equation (3) is an adequate source of rough estimates of miRNA abundances in Arabidopsis organs; however, there is a high variability of their organ-specific values (the coefficient of variation, C V = σ/M 0 × 100%, expressed as the percentage of the ratio between the standard deviation and the mean, ranging from 7 to 72%, the mean being 31 ± 19%) which was ignored by Equation (3) due to lack of data. Finally, the three above mentioned correlations between in silico and in vivo [miRNA] values were statistically significant in the independent experimental dataset (Lu et al., 2005) [ Table 3: r = 0.56 (α < 0.01), R = 0.46 (α < 0.05), τ = 0.30 (α < 0.05)]. Therefore, the statistical significance of 27 independent tests (Figure 4 and Table 3) is rather an argument for than against a dependence of miRNA abundance on tetranucleotide abundance in these miRNAs.
However, the molecular mechanism that Equation (3) is consistent with remains unclear. Admittedly, Hwang et al. (2007) established experimentally that the sequence of a mature miRNA is a factor for the efficiency of its export from the nucleus to the cytoplasm, and Gantier et al. (2011) explored effects of Dicer1 on the miRNA half-life in a context dependent manner (Gantier et al., 2011). If we were to consider Equation (3) together with the results of the experiments performed by Winter and Diederichs (2011b) and by Martinez and Gregory (2013) suggesting that miRNAs and Ago2 are likely to stabilize each other, it could be admitted that Equation (3) implies miRNA/Ago2 affinity.

miRNA/Ago AFFINITY IN MAN
We ran ACTIVITY (Ponomarenko et al., 1997) simultaneously on two libraries ( We composed a training dataset ( Table 2) for ACTIVITY (Ponomarenko et al., 1997) consisting of all the variants that had the lowest and highest values for , , the abundance of nucleotides A, U, G, C, W, R, and K in miRNAs (IUPAC-IUB, 1971). The resulting 12 miRNAs in the training dataset ( Table 2) represent the ranges of values of miRNA properties rather than data heterogeneity (Azuma-Mukai et al., 2008).
The highest estimated value, = 0.36, was assigned [Equation (2)] by ACTIVITY (Ponomarenko et al., 1997) to the correlation between and the abundance, [RHHK] F3 , of the RHHK tetranucleotide (IUPAC-IUB, 1971) with its highest weight, F3(i), in the center of the miRNA (Figure 1, broken line). This corresponds to the difference that Ago2 and Ago3 have in cleaving the mRNA in the center of its complementarity with the miRNA-Ago2(3)-RISC complex (Song et al., 2004). The [RHHK] F1 values for all the 12 miRNAs of the training dataset are presented in Table 2. The correlation between [RHHK] F3 and was statistically significant (r = 0.75, a < 0.005), and so was that in the control dataset ( Figure 5A: r = 0.51, a < 0.05). Another value, 0.34, indicated at the same tetranucleotide, RHHK, with a narrower peak, F(i), in the center of the miRNA. No other higher-than-zero values were found in the training dataset (Omelyianchuk et al., 2011). The next eight tetranucleotides that had the highest -values were WRHH ( MAX = −0.  For , the highest estimated value, = 0.36, was assigned [Equation (2)] to the abundance, [YRHB] F4 , of the YRHB tetranucleotide (IUPAC-IUB, 1971) with its highest weight F4(i) at the 3 -end of the miRNA (Figure 1, solid line). This corresponds to the contact of the miRNA and the Ago protein in the 3D structure of the mRNA:miRNA-Ago-RISC complex (Song et al., 2004). The correlation between [YRHB] F4 and was statistically significant in the control dataset ( Figure 5B Figure 6 shows independent estimates made on the basis of these two correlations for the affinity of miRNAs for Ago2 ([miRNA/Ago2] = + ) and Ago3 ([miRNA/Ago3] = − ) derived without optimization: They are statistically significantly [ Figure 6: (A) r = 0.66 and (B) r = 0.66, α < 0.00025] correlated with all the experimental data (Azuma-Mukai et al., 2008). Figure 7 shows independent in silico estimates obtained using Equation (5) for 48 miRNAs named the "individual variants" by Azuma-Mukai et al. (2008) because of their 5 -and/or 3 -terminal differences from canonical mature miRNAs, which were associated by Azuma-Mukai et al. (2008) with (i) alternative maturation (Azuma-Mukai et al., 2008) or (ii) post-maturation  (Table 2) and using independent experimental data (the y -axis) on the canonical miRNAs taken from the same data source (Azuma-Mukai et al., 2008). miRNA/Ago affinity as measured in vivo (Azuma-Mukai et al., 2008) and as estimated in silico and expressed in logarithms are statistically significantly correlated for Ago2 (A) and Ago3 (B). Dashed curves and solid lines as in the legend to Figure 3. (6), the x -axis) estimated using final Equation (5) and independent experimental data (the y -axis) on the miRNA individual variants taken from the same data source (Azuma-Mukai et al., 2008). The differential affinities of the Ago2 and Ago3 proteins for 48 miRNAs named the "individual variants" by Azuma-Mukai et al. (2008) because of their 5and/or 3 -terminal differences from canonical mature miRNAs, which were associated by Azuma-Mukai et al. (2008) with (i) alternative maturation (Azuma-Mukai et al., 2008) or (ii) post-maturation processing (Azuma-Mukai et al., 2008), as independently measured in vivo (Azuma-Mukai et al., 2008) and as estimated in silico [Equation (6)] and expressed in natural logarithms are statistically significantly correlated (Table 4) processing (Azuma-Mukai et al., 2008). The estimated value was statistically significant (r = 0.49, α < 0.001) for the difference between the affinity of the miRNAs for Ago2 and that for Ago3:

FIGURE 7 | Control test for the difference between the affinity of miRNAs for Ago2 and that for Ago3 (Equation
[miRNA/Ago2] − [miRNA/Ago3] = 1.04[RHHK] F3 ξ j − 1.14. (6) This is consistent with the commonly accepted view that an individual miRNA variant forms complexes with Ago2 and Ago3 depending on its affinity for each of them, because specific interactions that normally occur due to evolutionary selection for affinity for these proteins are not there.

AN ILL-POSED INVERSE PROBLEM SOLUTION
The values of the abundances of 96 mature miRNAs in an extract from human embryonic kidney cells, HEK293T, under normal conditions (A) and following preincubation for 8 h with the transcription inhibitor actinomycin D (Bail et al., 2010) (B) are on the y-axis in Figure 8. Let us see whether these values can be predicted using Equation (5) with miRNA nucleotide sequences known from the miRBase database (Kozomara and Griffiths-Jones, 2010).
On the one hand, under the normal experimental conditions in (Bail et al., 2010), a total amount of a certain miRNA was measured so that the experimental value [miRNA] should be described by the linear-additive approximation as follows: [Ago4] [miRNA/Ago4] (#) ξ j + ε; (7) FIGURE 8 | Verification of ill-posed inverse problem solutions (Equation (10) and (11), the x -axis) using the final Equation (5) and independent experimental data (the y-axis) taken from another data source (Bail et al., 2010). The abundance of 96 mature miRNAs in an extract from the human embryonic kidney cell line HEK293T: (A) norm; (B) preincubation for 8 h with the transcription inhibitor actinomycin D (Bail et al., 2010). Independent experimental control data (y-axes) and in silico estimates and expressed on the same measurement scale [Equation (10) and Equation (11), respectively; x-axes] are statistically significantly correlated (Table 4). Dashed curves and solid lines as in the legend to 4 represent occupancies of the corresponding Ago1, Ago2, Ago3, and Ago4 proteins given an equilibrium of the miRNA and Ago molecule turnover in normal (#) HEK293T cells; ε is the prediction error, which inevitably creeps in due insufficient experimental data.
On the other hand, only two variables, [miRNA/Ago2] (#) and [miRNA/Ago3] (#) , out of 8 can be estimated by Equation (5)  In addition, Ago1 and Ago2, but not Ago3, are the major Ago proteins in human, and the expression of Ago3 and Ago4 is low (Valdmanis et al., 2012). Therefore, it seems quite difficult, or probably impossible, to estimate the total miRNA amount from [RHHK] F3 {ξ j } and [YRHB] F4 {ξ j } since too many ambiguities exist and too small contribution of [miRNA/Ago3] to the total amount of the miRNA is logically expected for Equation (7). In this sense, Equation (7) is an "ill-posed inverse problem." We have recently proposed a solution to an ill-posed inverse problem (Mironova et al., 2013) using STATISTICA (Afifi et al., 2003) and considering the existing additional information given in the frames. In our case, this additional information is represented by two results of the experiments performed by Winter and Diederichs (2011b) and by Martinez and Gregory (2013) suggesting that miRNAs and Ago2 are likely to stabilize each other and that Ago2 is one of two major Ago proteins in human (Valdmanis et al., 2012). This representation substantiates the use of STATISTICA (Afifi et al., 2003) as a means of assessing the statistical significance of the linear-additive contribution of [miRNA/Ago2] (#) estimates using Equation (5) in the linearadditive approximation by Equation (7) for experimental values [miRNA] (#) as follows: where: 1/3 is the heuristic coefficient that takes into account the normalization of experimental measurements (Azuma-Mukai et al., 2008) for Ago2 only in miRNA/Ago2 complexes within RISC without reference to Ago2 involvement in the regulation of transcription initiation or miRNA biogenesis;-[miRNA/Ago3] (&) {ξ j } is a heuristic correction, which takes into account a negative effect of the competition between Ago2 and Ago3 for miRNA binding and reduces [miRNA/Ago2] (&)→(#) {ξ j } in the measurements taken without Ago3 (&).
Nevertheless, there is an absolutely required additional stage in addressing an ill-posed inverse problem (Mironova et al., 2013), namely, verification using independent experimental data. To include this stage, we additionally reproduced all the calculations for experimental data under conditions that included ($) preincubation of HEK293T cells for 8 h with the transcription inhibitor actinomycin D (Bail et al., 2010). Because actinomycin D inhibits transcription elongation, the main difference between these conditions ($) and the normal conditions (#) is that no primary pri-miRNA transcripts are present and, consequently, Ago2-mediated miRNA biogenesis does not go. That is why we used 1/2 instead of 1/3 in Equation (9). After all intermediate calculations, the final Equation (10) derived from Equation (5) assumed the following form: The estimates obtained using Equation (11) were statistically significantly correlated with the measured [miRNA] values at the HEK293T cells preincubated for 8 h with the transcription inhibitor actinomycin D (Bail et al., 2010), as shown in Figure 8B and Table 4: r = 0.46, α < 0.000005; R = 0.43, α < 0.000025; and τ = 0.30 at α < 0.000025). Nevertheless, despite the statistical significance in the correlations between the value defined by Equation (10) and (11) and miRNA abundance, a large part of data points in Figure 8 exist outside of the dashed lines. This implies that Equations (10) and (11) are adequate sources of rough estimates of miRNA abundances in human embryonic kidney cells, HEK293T, under proper experimental conditions consistent with Ago2 protein affinity for miRNAs; however there must be the Ago1 protein, which is another major Ago protein in man (Valdmanis et al., 2012) and which was ignored by Equations (10) and (11) due to lack of experimental data [miRNA/Ago1]. Collectively, all these results imply that Equation (5)  Thus, Equation (5) found in silico in one experiment (Azuma-Mukai et al., 2008) readily applies to the next (Bail et al., 2010), at least within the limits of applicability of the theory that underlies these experiments. We had previously demonstrated this possibility (Ponomarenko et al., 2001), and its value is that it allows previously found patterns to be used for planning conditions of future experiments (for example, see Savinkova et al., 2013).

OUR HYPOTHESIS ON miRNA ABUNDANCE IN THE HUMAN BRAIN
Thus, all the different types of correlation shown in Figures 3-8 fit each other like pieces of a puzzle, which allowed us to heuristically generalize all of them and state that the miRNA abundance in the human brain regions or neocortical areas may be roughly described by the function of YRHB and RHHK abundances in these miRNAs for their practical consideration by cancer and neurodegeneration researchers.
Let us check this hypothesis. Figure 9 presents the results of a cluster analysis performed using STATISTICA (Afifi et al., 2003) on the data on miRNA abundance in the human brain taken from the Sestan Brain Atlases (Kang et al., 2011), the y-axis, vs. in silico estimates obtained using [Equation (5)] within the framework of the roughest  (12), the x -axis] using the final Equation (5) and independent experimental data (the y-axis) taken from the Sestan Brain Atlases (Kang et al., 2011). Results of a cluster analysis performed using STATISTICA (Afifi et al., 2003) on the miRNA abundance in the human brain taken from the Sestan Brain Atlases (Kang et al., 2011) vs. in silico estimates [Equation (12)] within the framework of the "limiting stage" approximation. The major cluster ( ) includes 294 miRNAs (92%), which have the lowest mean miRNA abundance in vivo, [miRNA] = 1.2 ± 1.0, and a high mean [miRNA/Ago] affinity estimated [Equation (12)], 9.1 ± 12.7, and the lowest statistically significant correlation coefficient r = 0.14 (α < 0.025). The minor cluster (+) includes 24 miRNAs (8%) with the highest mean miRNA abundance in vivo, 2.7 ± 0.7, and a lowest mean estimate of [miRNA/Ago] affinity in silico [Equation (12)], 3.5 ± 2.8, and the highest correlation coefficient r = 0.66 (α < 0.001). Dashed curves and solid lines as in the legend to Figure 3. approximation possible, the so-called "limiting stage" approximation (the x-axis):

AN INDEPENDENT CONTROL TEST
It was applied due to lack of data on the preference of any of these miRNAs for any of the Ago1, Ago2, Ago3, or Ago4 protein in the Argonaute family (Gagnon and Corey, 2012) and also due to lack of data on the affinity of any miRNA for two of these four proteins (50%), Ago1 and Ago4. Moreover, not only cells specific for the central nervous system can influence the mean abundance of miRNAs in the human brain, but many more tissue-specific cells such as neurons, glia (microglia, oligodendrocytes, astrocytes, etc.), meninges (connective tissues covering the brain and containing a large number of blood vessels), cells of choroid plexus (capillaries, simple cuboidal epithelium, ependymal cells) and other can do as much. This diversity should increase the variance of [miRNA] values even more than in the case of the organ-specific abundance of miRNAs in Arabidopsis (Figure 4). Indeed, while the C V -values for miRNA abundance in Arabidopsis inflorescences, stems, siliques, seedlings, roots, cauline and rosette leaves ranged from 7 to 72% (the mean being 31 ± 19%), the C V -values for miRNA abundance in 95 human brain regions or neocortical areas ranged from 9 to 281% (the mean being 73 ± 39%), possibly due to very high levels of expression of unique miRNAs in a limited number of these regions or areas.
First of all, the major cluster (•) includes 294 miRNAs (92%), which have a low mean miRNA abundance in vivo, [miRNA] = 1.2 ± 1.0, and a high mean [miRNA/Ago] affinity estimated in silico [Equation (12)], 9.1 ± 12.7. This cluster comprises miRNAs that have no preference for binding to any of the four proteins in the Argonaute family. This result is consistent with the statement used in the derivation of Equation (5) and made by Azuma-Mukai et al. (2008): most human miRNAs have no preference for binding to any particular Ago protein. We were surprised to see that even the roughest estimates were nevertheless statistically significantly linearly correlated (r = 0.14, α < 0.025) with in vivo measurements.
Finally, the minor cluster (+) includes 24 miRNAs (8%) with a high mean miRNA abundance in vivo, 2.7 ± 0.7, and a low mean estimate of [miRNA/Ago] affinity in silico [Equation (12)], 3.5 ± 2.8. This cluster contains miRNAs, each of which has a preference for binding to one particular Ago protein. This result is consistent with conclusions made by Azuma-Mukai et al. (2008): a few miRNAs in man have preference for binding to any particular Ago protein. As can be seen, these roughest estimates are, again, statistically significantly (r = 0.66, α < 0.001) correlated with in vivo values. Importantly, a higher r-value for specific than non-specific miRNA/Ago affinity (0.66 > 0.14) is in agreement with the most common view of the interactions between molecules. Nevertheless, despite the statistical significance in the correlations between the value defined by Equation (12) and miRNA abundance in the human brain, a large part of data points in Figure 9 exist outside of the dashed lines. Therefore, Equation (12) is an adequate source of only roughest estimates of miRNA abundances in the human brain; however, there is a wealth of relevant information on the Ago1 and Ago4 proteins (Figure 8), on the tissue-specific patterns of miRNA and Ago gene expression (Figure 4), which was ignored by Equation (12) due to lack of experimental data.

CONCLUDING REMARKS
We have now established that miRNA abundances depend on taxon-specific tetranucleotides in miRNAs.
First of all, specific tetranucleotides in a given miRNA seem to be responsible for the selectivity of miRNA binding to the proper Ago protein, which determines the biological function of the RISC containing this miRNA/Ago complex: (i) the RISC interacts with promoter DNAs or messenger RNAs (mRNAs) as it searches them for a complementary target of these miRNAs; and (ii) the RISC binds to or cleave this target within the mRNAs (Gagnon and Corey, 2012).
Based on these facts, we have for the first time obtained quantitative in silico estimates for miRNA abundances in the human embryonic kidney cells HEK293T by roughly solving an ill-posed inverse problem, and, also, in the human brain regions or neocortical areas, which are statistically significantly correlated with data from independent experiments on measuring these values in vivo taken from a work by Bail et al. (2010) and from the Sestan Brain Atlases (Kang et al., 2011), respectively. These two correlations are consistent with the results of two experiments, one performed by Winter and Diederichs (2011b) and another, by Martinez and Gregory (2013), and demonstrated that the affinity of miR-NAs for Ago proteins is an influence on the abundance of both miRNAs and Ago proteins due to their mutual co-stabilization in cells.
In summary, we have found evidence that in silico estimates like these can reach an acceptable accuracy level for their practical consideration by cancer and neurodegeneration researches once the preference of these miRNAs for the proteins in the Argonaute family has become known, and so have yet unknown values of the affinity of any miRNA for two of the four proteins (50%), Ago1 and Ago4, which is absolutely required for a more accurate approximation. In any case, because the abundance estimates [Equations (5)-(12)] for most miRNAs were more statistically significant in a particular human cell line (Figure 8) than in the human brain as a whole (Figure 9), the more specifically a target for estimation is defined (the entire human organism, an organ, a part, a tissue, a cell type, or a cell line), the more suitable these estimates are for practical use.