<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange DTD v2.3 20070202//EN" "archivearticle.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="methods-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Appl. Math. Stat.</journal-id>
<journal-title>Frontiers in Applied Mathematics and Statistics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Appl. Math. Stat.</abbrev-journal-title>
<issn pub-type="epub">2297-4687</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fams.2018.00002</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Applied Mathematics and Statistics</subject>
<subj-group>
<subject>Methods</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Finite Sample Corrections for Parameters Estimation and Significance Testing</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Teh</surname> <given-names>Boon Kin</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/475594/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Tay</surname> <given-names>Darrell JiaJie</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/515621/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Li</surname> <given-names>Sai Ping</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/162375/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Cheong</surname> <given-names>Siew Ann</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/518114/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Division of Physics and Applied Physics, School of Physical and Mathematical Sciences, Nanyang Technological University</institution>, <addr-line>Singapore</addr-line>, <country>Singapore</country></aff>
<aff id="aff2"><sup>2</sup><institution>Complexity Institute, Nanyang Technological University</institution>, <addr-line>Singapore</addr-line>, <country>Singapore</country></aff>
<aff id="aff3"><sup>3</sup><institution>Institute of Physics, Academia Sinica</institution>, <addr-line>Taipei</addr-line>, <country>Taiwan</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Dabao Zhang, Purdue University, United States</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Yanzhu Lin, National Institutes of Health (NIH), United States; Jie Yang, University of Illinois at Chicago, United States; Qin Shao, University of Toledo, United States</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Boon Kin Teh <email>boonkinteh&#x00040;gmail.com</email></p></fn>
<fn fn-type="corresp" id="fn002"><p>Siew Ann Cheong <email>cheongsa&#x00040;ntu.edu.sg</email></p></fn>
<fn fn-type="other" id="fn003"><p>This article was submitted to Mathematics of Computation and Data Science, a section of the journal Frontiers in Applied Mathematics and Statistics</p></fn></author-notes>
<pub-date pub-type="epub">
<day>30</day>
<month>01</month>
<year>2018</year>
</pub-date>
<pub-date pub-type="collection">
<year>2018</year>
</pub-date>
<volume>4</volume>
<elocation-id>2</elocation-id>
<history>
<date date-type="received">
<day>09</day>
<month>09</month>
<year>2017</year>
</date>
<date date-type="accepted">
<day>11</day>
<month>01</month>
<year>2018</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2018 Teh, Tay, Li and Cheong.</copyright-statement>
<copyright-year>2018</copyright-year>
<copyright-holder>Teh, Tay, Li and Cheong</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract><p>An increasingly important problem in the era of Big Data is fitting data to distributions. However, many stop at visually inspecting the fits or use the coefficient of determination as a measure of the goodness of fit. In general, goodness-of-fit measures do not allow us to tell which of several distributions fit the data best. Also, the likelihood of drawing the data from a distribution can be low even when the fit is good. To overcome these limitations, Clauset et al. advocated a three-step procedure for fitting any distribution: (i) estimate parameter(s) accurately, (ii) choosing and calculating an appropriate goodness of fit, (iii) test its significance to determine how likely this goodness of fit will appear in samples of the distribution. When we perform this significance testing on exponential distributions, we often obtain low significance values despite the fits being visually good. This led to our realization that most fitting methods do not account for effects due to the finite number of elements and the finite largest element. The former produces sample size dependence in the goodness of fits and the latter introduces a bias in the estimated parameter and the goodness of fit. We propose modifications to account for both and show that these corrections improve the significance of the fits of both real and simulated data. In addition, we used simulations and analytical approximations to verify that convergence rate of the estimated parameters toward its true value depends on how fast the largest element converge to infinity, and provide fast inversion formulas to obtain <italic>p</italic>-values directly from the adjusted test statistics, in place of doing more Monte Carlo simulations.</p></abstract>
<kwd-group>
<kwd>significance testing</kwd>
<kwd>finite sample effects</kwd>
<kwd>curve fitting</kwd>
<kwd>maximum likelihood</kwd>
<kwd><italic>p</italic>-test</kwd>
</kwd-group>
<counts>
<fig-count count="7"/>
<table-count count="0"/>
<equation-count count="24"/>
<ref-count count="42"/>
<page-count count="10"/>
<word-count count="6065"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>The current era of Big Data has ushered in a new way to look at Science&#x02014;and that is letting the data speak for itself. Because of this, we are now much more concerned about empirical distributions than we have in the past, and to check what the empirical distributions could be in statistically rigorous ways. In the past, many tests on empirical data were performed against the univariate normal distribution [<xref ref-type="bibr" rid="B1">1</xref>]. Some of these tests focus on the goodness-of-fit of higher order moments [<xref ref-type="bibr" rid="B2">2</xref>&#x02013;<xref ref-type="bibr" rid="B4">4</xref>], while others compare the test statistics against an Empirical Distribution Function (EDF) [<xref ref-type="bibr" rid="B5">5</xref>&#x02013;<xref ref-type="bibr" rid="B8">8</xref>]. In 2011, Nornadiah and Yap performed a systematic comparison of Anderson-Darling (AD), Lilliefors, Kolmogorov-Smirnov (KS), and Shapiro-Wilk (SW), using numerical simulations and concluded that the SW test is the best, followed closely by the AD test for a given significance [<xref ref-type="bibr" rid="B9">9</xref>].</p>
<p>Among these tests, the KS and Lilliefors tests can also be applied to non-normal distributions. In fact, many real-world data do not follow normal distributions. For instance, many social systems are known to have power-law distributions [<xref ref-type="bibr" rid="B10">10</xref>]. These include the financial returns [<xref ref-type="bibr" rid="B11">11</xref>&#x02013;<xref ref-type="bibr" rid="B14">14</xref>], word count [<xref ref-type="bibr" rid="B15">15</xref>, <xref ref-type="bibr" rid="B16">16</xref>], city size [<xref ref-type="bibr" rid="B17">17</xref>, <xref ref-type="bibr" rid="B18">18</xref>], home price [<xref ref-type="bibr" rid="B19">19</xref>&#x02013;<xref ref-type="bibr" rid="B21">21</xref>], wealth and income [<xref ref-type="bibr" rid="B22">22</xref>, <xref ref-type="bibr" rid="B23">23</xref>] distributions. One simple but naive way to detect a power law is to plot the data in log-log scale, fit it to a straight line and determine the goodness of fit. However, this simple method has three major flaws: (i) many distributions (e.g., exponential, gamma, log-normal) can also look straight in log-log plot, especially if the range of data is small; (ii) the goodness of fit only quantifies how well the fit is visually but does not tell us how plausible the fit is; and (iii) if our data looks straight in both log-log and semi-log plots, the goodness of fit values obtained from the two cannot be directly compared since they were obtained from plots of different scales. Clauset, Shalizi, and Newman (CSN) address precisely these three points in their 2009 paper [<xref ref-type="bibr" rid="B24">24</xref>], and the test they proposed is now considered by many the gold standard in curve fitting. We shall describe the main idea of the CSN technique in greater mathematical detail in section 2.</p>
<p>Since the CSN test can be applied across distributions, we also use it to fit data that appear exponentially distributed. On many occasions, we discovered that the exponential fits look good visually, but have significance values (<italic>p</italic>-value) much lower than fits of other data to power laws, even though the latter look visibly poorer. In fact, in the CSN paper where empirical data is tested against a power law (PL), log-normal, exponential (EXP), stretched exponential, and a power law with cut-off, the exponential distribution consistently performs poorer than the other distributions. This was also the case when Brzezinski tested the upper-tail wealth data for China, Russia, US, and the World using the CSN method [<xref ref-type="bibr" rid="B25">25</xref>]. In these papers, the data might truly be non-exponentially distributed, so it is not surprising the exponential fits fail. However, the low <italic>p</italic>-values for the visually convincing exponential fits to our data suggest that something fundamental was missed.</p>
<p>We realized there are two issues associated with fitting data to distributions defined over (0, &#x0221E;). First, there is the <italic>finite largest element effect</italic> (FLE), due to the largest element in the data being finite. Second, we also encounter the <italic>finite number of elements effect</italic> (FNE), due to the sample size dependence of the goodness-of-fit measures. These two <italic>finite sample effects</italic> are well studied for Generalized Moment Methods (GMMs) [<xref ref-type="bibr" rid="B26">26</xref>, <xref ref-type="bibr" rid="B27">27</xref>] but often neglected in tests of statistical significance. After describing the CSN test, we illustrate in section 2 the FLE and FNE effects by applying the test to three real data sets. With the insights gained, we designed both the estimators and test statistic to account for the FLE and FNE effects in section 3.1. Since real data is frequently polluted by noise, we also discuss the impact of noise on the <italic>p</italic>-value, and propose a test statistic that accounts for noise in section 4. Finally, in section 5, we apply the adjusted test statistics on our real data sets and compare the <italic>p</italic>-values obtained against those from the CSN test.</p>
</sec>
<sec id="s2">
<title>2. Reexamining significance testing for empirical distributions</title>
<p>Sometimes we have reasons to believe that our large data sets may be described by well known distributions, such as the normal distribution, power law distribution, exponential distribution, and so on, but with best-fit parameter values that we need to determine. Commonly used methods to perform <italic>parameter estimation</italic> include Maximum Likelihood Estimation (MLE) [<xref ref-type="bibr" rid="B28">28</xref>], Maximum Entropy Method (MEM) [<xref ref-type="bibr" rid="B29">29</xref>&#x02013;<xref ref-type="bibr" rid="B31">31</xref>], least square regressions [<xref ref-type="bibr" rid="B32">32</xref>], and direct or indirect computation of moments [<xref ref-type="bibr" rid="B33">33</xref>]. Since it is possible to fit any distribution to any data set, we need to compute its <italic>goodness of fit</italic>, which can be the KS distance [<xref ref-type="bibr" rid="B7">7</xref>], the coefficient of determination (<italic>R</italic><sup>2</sup>) and other forms of distance measure [<xref ref-type="bibr" rid="B34">34</xref>, <xref ref-type="bibr" rid="B35">35</xref>].</p>
<p>In a recent statement, the American Statistical Association warned the scientific community that the <italic>p</italic>-value &#x0201C;was never intended to be a substitute for scientific reasoning&#x0201D; [<xref ref-type="bibr" rid="B36">36</xref>, para. 2], and outline six principles that can prevent its misuse [<xref ref-type="bibr" rid="B37">37</xref>]. A <italic>Nature</italic> commentary on this statement also added that &#x0201C;[r]esearchers should describe not only the data analyses that produced statistically significant results, &#x02026;, but all statistical tests and choices made in calculations&#x0201D; [<xref ref-type="bibr" rid="B38">38</xref>, para. 3]. We heed the warning in this paper, but argue that when properly computed and interpreted, the <italic>p</italic>-value is useful in that it provides a quantitative and objective alternative to visual inspection of the fits. The latter is frequently subjective and biased. This utility becomes important when we are comparing fits of two or more data sets to two or more distributions, and have the ambiguity of being able to choose from two or more definitions of goodness of fit. This is why we need to go beyond the goodness of fits, to establish how plausible different distributions are for different data sets.</p>
<p>In 2009, Clauset, Shalizi, and Newman (CSN) did precisely this by coming up with a <italic>p</italic>-test model that use the well-known PL distribution as an illustration. They started by writing down the probability density function for the PL distribution</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>P</mml:mi><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>&#x003B1;</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>for <italic>x</italic> &#x02208; [<italic>x</italic><sub><italic>min</italic></sub>, &#x0221E;), with exponent &#x003B1;. The CSN <italic>p</italic>-test involves four major steps:</p>
<p><bold>CSN(i) MLE Estimation of</bold> &#x003B1;: Given an empirical data with <italic>S</italic> observations, with the ordered statistic <bold>Y</bold> &#x0003D; {<italic>y</italic><sub>1</sub>, <italic>y</italic><sub>2</sub>, &#x02026;, <italic>y</italic><sub><italic>S</italic></sub>}, sorted such that <italic>y</italic><sub><italic>i</italic></sub> &#x02264; <italic>y</italic><sub><italic>i</italic>&#x0002B;1</sub>, the CSN algorithm (<bold>CSN(ia)</bold>) first constructs the <italic>S</italic> subsets <inline-formula><mml:math id="M2"><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mi>S</mml:mi><mml:mo>-</mml:mo><mml:mi>j</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula>. (<bold>CSN(ib)</bold>) For each <bold>X</bold><sup>(<italic>j</italic>)</sup>, we estimate &#x003B1;<sup>(<italic>j</italic>)</sup> using the MLE method that maximizes the log-likelihood function,</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M3"><mml:mrow><mml:mi>ln</mml:mi><mml:msub><mml:mi mathvariant='double-struck'>L</mml:mi><mml:mrow><mml:mi>P</mml:mi><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>ln</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x0220F;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>P</mml:mi><mml:mi>L</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x0007C;</mml:mo><mml:mover accent='true'><mml:mi>&#x003B1;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>N</mml:mi><mml:mi>ln</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mover accent='true'><mml:mi>&#x003B1;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mover accent='true'><mml:mi>&#x003B1;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mrow><mml:mi>ln</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mstyle><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>Applying the maximizing condition <inline-formula><mml:math id="M4"><mml:mfrac><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo class="qopname">ln</mml:mo><mml:mi mathvariant='double-struck'>L</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:math></inline-formula> yields</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M5"><mml:mrow><mml:mover accent='true'><mml:mi>&#x003B1;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x0002B;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mrow><mml:mi>ln</mml:mi><mml:mfrac><mml:mi>x</mml:mi><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where the hat indicates an estimated parameter and <inline-formula><mml:math id="M6"><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> indicates the expectation value of the random variable <italic>x</italic>.</p>
<p><bold>CSN(ii) KS Distance</bold>: If <bold>X</bold> follows probability distribution function <italic>f</italic><sub><italic>X</italic></sub> with cumulative distribution function <italic>F</italic><sub><italic>X</italic></sub>, then its probability integral transform <italic>u</italic> &#x0003D; <italic>F</italic><sub><italic>X</italic></sub>(<italic>x</italic>) is a standard uniform distribution function (<italic>U</italic>(0, 1)). For any PL distributed sample <bold>X</bold> &#x0003D; {<italic>x</italic><sub>1</sub> &#x0003D; <italic>x</italic><sub><italic>min</italic></sub>, <italic>x</italic><sub>2</sub>, &#x02026;, <italic>x</italic><sub><italic>N</italic></sub>} with estimated <inline-formula><mml:math id="M7"><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula>, we (<bold>CSN(iia)</bold>) first transform the sample to <inline-formula><mml:math id="M8"><mml:msup><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>. (<bold>CSN(iib)</bold>) Then we calculate the KS distance</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M9"><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>K</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mo>&#x02200;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup><mml:mi>sup</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mi>i</mml:mi><mml:mi>N</mml:mi></mml:mfrac></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>between <italic>U</italic><sup>(<italic>s</italic>)</sup> and <italic>U</italic>(0, 1). Here we make use of the fact that the CDF of <italic>U</italic>(0, 1) is a linear function, <italic>F</italic><sub><italic>U</italic></sub>(<italic>u</italic>) &#x0003D; <italic>u</italic>.</p>
<p><bold>CSN(iii) Determining <italic>x</italic><sub><italic>min</italic></sub></bold>: To determine <italic>x</italic><sub><italic>min</italic></sub>, (<bold>CSN(iiia)</bold>) we calculate the KS distance for each <bold>X</bold><sup>(<italic>j</italic>)</sup> with its corresponding <inline-formula><mml:math id="M10"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula>. (<bold>CSN(iiib)</bold>) The set <bold>X</bold><sup>(<italic>j</italic>)</sup> that yields the lowest KS distance (<inline-formula><mml:math id="M11"><mml:msubsup><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>K</mml:mi><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>e</mml:mi><mml:mi>m</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula>) gives us <inline-formula><mml:math id="M12"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>e</mml:mi><mml:mi>m</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math id="M13"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>e</mml:mi><mml:mi>m</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula>. The superscript &#x0201C;(em)&#x0201D; indicates a parameter obtained from empirical data.</p>
<p><bold>CSN(iv) Significance Testing</bold>: After <inline-formula><mml:math id="M14"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>e</mml:mi><mml:mi>m</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula><mml:math id="M15"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>e</mml:mi><mml:mi>m</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> have been estimated from <bold>Y</bold> &#x0003D; {<italic>y</italic><sub>1</sub>, <italic>y</italic><sub>2</sub>, &#x02026;, <italic>y</italic><sub><italic>S</italic></sub>}, we test how plausible it is for <inline-formula><mml:math id="M16"><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>e</mml:mi><mml:mi>m</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>&#x02282;</mml:mo><mml:mstyle mathvariant="bold"><mml:mtext>Y</mml:mtext></mml:mstyle></mml:math></inline-formula> to be a sample taken from a PL distribution. This is done by (<bold>CSN(iva)</bold>) sampling the PL <italic>M</italic> times using <inline-formula><mml:math id="M17"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>e</mml:mi><mml:mi>m</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula><mml:math id="M18"><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>e</mml:mi><mml:mi>m</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula>. (<bold>CSN(ivb)</bold>) For the <italic>m</italic>th simulated sample we go through <bold>CSN(i)</bold> to <bold>CSN(iii)</bold> to obtain <inline-formula><mml:math id="M19"><mml:msubsup><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>K</mml:mi><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula>. (<bold>CSN(ivc)</bold>) The significance measure</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M20"><mml:mrow><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>M</mml:mi></mml:mfrac><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>M</mml:mi></mml:munderover><mml:mrow><mml:msub><mml:mi mathvariant='double-struck'>I</mml:mi><mml:mrow><mml:mo>&#x0007B;</mml:mo><mml:msubsup><mml:mi>d</mml:mi><mml:mrow><mml:mi>K</mml:mi><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>e</mml:mi><mml:mi>m</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x0003C;</mml:mo><mml:msubsup><mml:mi>d</mml:mi><mml:mrow><mml:mi>K</mml:mi><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>m</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>&#x0007D;</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:mstyle><mml:mo>,</mml:mo><mml:mtext>&#x000A0;&#x000A0;</mml:mtext><mml:msub><mml:mi mathvariant='double-struck'>I</mml:mi><mml:mrow><mml:mo>&#x0007B;</mml:mo><mml:mi>x</mml:mi><mml:mo>&#x0007D;</mml:mo></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mn>1</mml:mn><mml:mtext>&#x000A0;if&#x000A0;</mml:mtext><mml:mi>x</mml:mi><mml:mo>=</mml:mo><mml:mtext>True</mml:mtext><mml:mo>;</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mn>0</mml:mn><mml:mtext>&#x000A0;if&#x000A0;</mml:mtext><mml:mi>x</mml:mi><mml:mo>=</mml:mo><mml:mtext>False</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>is the fraction of simulated samples whose fits are poorer than that of the data.</p>
<p>Extending the CSN method to other distributions, we performed <italic>p</italic>-testing on the Taiwan home price per square foot (fitted to EXP), Taiwan income (fitted to EXP), and the Straits Times Index normalized return (fitted to PL) (see Supplementary Information section <xref ref-type="supplementary-material" rid="SM1">3</xref> for more descriptions on the data sets). The fits and <italic>p</italic>-values are shown in Figure <xref ref-type="fig" rid="F1">1</xref>. All fits are visually good yet only the <italic>p</italic>-value for the Taiwan housing is appreciable. We realized the reason for this is simple: while the EXP and PL distributions are defined over (0, &#x0221E;), when we collect data from the real world we can only obtain a finite number of elements. Moreover, the largest element in the data is finite. However, existing tests for statistical significance generally do not account for the effects produced by having a finite number of elements (FNE) and a finite largest element (FLE). In the next section we will explain how the parameters and test statistics can be adjusted for FNE and FLE.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p><italic>p</italic>-testing on <bold>(A)</bold> 2012&#x02013;2014 Taiwan home price per square foot, <bold>(B)</bold> 2012 Taiwan lower-tail income (fitted to EXP), and <bold>(C)</bold> 2009&#x02013;2016 Straits Times Index normalized return (fitted to PL). For each plot, <italic>N</italic> represents the number of data points (larger than <italic>x</italic><sub><italic>min</italic></sub>) fitted. the black dots represent empirical data while the blue dashed line represents the fit. All fits are visually good, yet only the <italic>p</italic>-value (<italic>P</italic><sub><italic>KS</italic></sub> in percentage) for Taiwan home price is appreciable.</p></caption>
<graphic xlink:href="fams-04-00002-g0001.tif"/>
</fig>
<p>At this stage, we might wonder whether the Taiwan income data would have been better fitted to a truncated EXP (TEXP) distribution</p>
<disp-formula id="E6"><label>(6)</label><mml:math id="M21"><mml:mrow><mml:msubsup><mml:mi>f</mml:mi><mml:mrow><mml:mi>E</mml:mi><mml:mi>X</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msubsup><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>&#x003B2;</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mi>exp</mml:mi><mml:mo stretchy='false'>[</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>]</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mi>exp</mml:mi><mml:mo stretchy='false'>[</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>]</mml:mo></mml:mrow></mml:mfrac><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>since it is obtained by removing the power-law tail. The Taiwan home price per square foot data was also truncated, but for a different reason: the small number of largest elements are clearly outliers that would not fit the EXP distribution. Ideally, we should be using untruncated data, like the Straits Times Index data, to illustrate the method that we will describe in the following sections. In the rest of the paper, we will use all three data sets as if they were untruncated, to illustrate how well our method works on different data types. To do so, we will compare the adjusted parameter and test statistic against the unadjusted parameter and test statistic meant for the untruncated EXP distribution.</p>
</sec>
<sec id="s3">
<title>3. Finite-sample adjustments</title>
<sec>
<title>3.1. Parameter adjustment for finite largest element</title>
<p>Here, we will illustrate the effects of FLE using an asymptotic EXP distribution. The same discussion can be generalized to other distributions (see Supplementary Information section <xref ref-type="supplementary-material" rid="SM1">1</xref>).</p>
<p>The EXP distribution is defined as</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="M22"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>E</mml:mi><mml:mi>X</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mi>exp</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>with &#x003B2; as a sole parameter for <italic>x</italic> &#x02208; [<italic>x</italic><sub><italic>min</italic></sub>, &#x0221E;). Maximizing the likelihood function <inline-formula><mml:math id="M23"><mml:mi>&#x1D543;</mml:mi><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mo>&#x0220F;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:msubsup><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>, we find the estimated parameter</p>
<disp-formula id="E8"><label>(8)</label><mml:math id="M24"><mml:mrow><mml:mover accent='true'><mml:mi>&#x003B2;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mi>x</mml:mi><mml:mo>&#x0232A;</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>If we use the mean obtained from data &#x02329;<italic>x</italic>&#x0232A;<sub><italic>data</italic></sub> as &#x02329;<italic>x</italic>&#x0232A; in Equation (8) we will obtain the unadjusted estimator &#x003B2;<sub><italic>unadj</italic></sub>. However, due to the FLE, we can only average up till <italic>x</italic><sub><italic>max</italic></sub>. As such &#x02329;<italic>x</italic>&#x0232A;<sub><italic>data</italic></sub> will be biased downwards and Equation (8) over-estimates <inline-formula><mml:math id="M25"><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula>.</p>
<p>To adjust for the FLE, we add the truncated part back into &#x02329;<italic>x</italic>&#x0232A;<sub><italic>data</italic></sub>, to define the adjusted &#x02329;<italic>x</italic>&#x0232A;<sub><italic>adj</italic></sub> as</p>
<disp-formula id="E9"><label>(9)</label><mml:math id="M26"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mi>x</mml:mi><mml:mo>&#x0232A;</mml:mo></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mi>x</mml:mi><mml:mo>&#x0232A;</mml:mo></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mstyle displaystyle='true'><mml:mrow><mml:msubsup><mml:mo>&#x0222B;</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msubsup><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>E</mml:mi><mml:mi>X</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>d</mml:mi><mml:mi>x</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mstyle displaystyle='true'><mml:mrow><mml:msubsup><mml:mo>&#x0222B;</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mi>&#x0221E;</mml:mi></mml:msubsup><mml:mi>x</mml:mi></mml:mrow></mml:mstyle><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>E</mml:mi><mml:mi>X</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>d</mml:mi><mml:mi>x</mml:mi><mml:mo>.</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mi>x</mml:mi><mml:mo>&#x0232A;</mml:mo></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mi>exp</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mfrac><mml:mrow><mml:mi>exp</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mfrac><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>&#x003B2;</mml:mi><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Inserting &#x02329;<italic>x</italic>&#x0232A;<sub><italic>adj</italic></sub> into Equation (8), we obtain a nonlinear equation</p>
<disp-formula id="E10"><label>(10)</label><mml:math id="M27"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>&#x003B2;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mrow><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mi>x</mml:mi><mml:mo>&#x0232A;</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mi>exp</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>&#x003B2;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mrow><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mover accent='true'><mml:mi>&#x003B2;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mrow><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mi>x</mml:mi><mml:mo>&#x0232A;</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>that we solve using MATLAB&#x00027;s builtin nonlinear solver function <italic>nlinfit()</italic> to obtain <inline-formula><mml:math id="M28"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>.</p>
<p>To test the performance of this adjustment formula, we simulated 1, 000 sets of EXP distributed data for <inline-formula><mml:math id="M29"><mml:mn>1</mml:mn><mml:msup><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mn>4</mml:mn></mml:mrow></mml:msup><mml:mo>&#x02264;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02264;</mml:mo><mml:mn>1</mml:mn><mml:msup><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>, by using the inverse cumulative function for EXP distribution</p>
<disp-formula id="E11"><label>(11)</label><mml:math id="M30"><mml:mrow><mml:msubsup><mml:mi>F</mml:mi><mml:mrow><mml:mi>E</mml:mi><mml:mi>X</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo stretchy='false'>(</mml:mo><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>&#x003B2;</mml:mi><mml:mi>T</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msub><mml:mi>&#x003B2;</mml:mi><mml:mi>T</mml:mi></mml:msub></mml:mrow></mml:mfrac><mml:mi>ln</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mi>u</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>This transforms <italic>U</italic>(0, 1) distributed elements {<italic>u</italic><sub><italic>i</italic></sub>} to EXP distributed elements {<italic>x</italic><sub><italic>i</italic></sub>}. Using this transformation <inline-formula><mml:math id="M31"><mml:msubsup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>E</mml:mi><mml:mi>X</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula>, 0 and 1 map to <italic>x</italic><sub><italic>min</italic></sub> and &#x0221E; respectively. It is also useful to note that Equation (11) is the inverse of the CDF of the EXP distribution,</p>
<disp-formula id="E12"><label>(12)</label><mml:math id="M32"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>E</mml:mi><mml:mi>X</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>&#x003B2;</mml:mi><mml:mi>T</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mi>exp</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>&#x003B2;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mi>x</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>To simulate the effect of a FLE with <inline-formula><mml:math id="M33"><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>E</mml:mi><mml:mi>X</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>9</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>, we sampled 1,000 sets of EXP distributed data using <italic>U</italic>(0, 0.9) instead of <italic>U</italic>(0, 1) with <italic>x</italic><sub><italic>min</italic></sub> &#x0003D; 0. Thereafter, we estimated <inline-formula><mml:math id="M34"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math id="M35"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> using Equations (8) and (10). Figure <xref ref-type="fig" rid="F2">2</xref> shows the relative estimation errors</p>
<disp-formula id="E13"><label>(13)</label><mml:math id="M36"><mml:mrow><mml:mo>&#x00394;</mml:mo><mml:mover accent='true'><mml:mi>&#x003B2;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msqrt><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mover accent='true'><mml:mi>&#x003B2;</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x003B2;</mml:mi><mml:mi>T</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo>&#x0232A;</mml:mo></mml:mrow></mml:msqrt></mml:mrow><mml:mrow><mml:msub><mml:mi>&#x003B2;</mml:mi><mml:mi>T</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<p>of <inline-formula><mml:math id="M37"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math id="M38"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> with respect to the true beta &#x003B2;<sub><italic>T</italic></sub>. As we can see from the Figure <xref ref-type="fig" rid="F2">2</xref>, <inline-formula><mml:math id="M39"><mml:mo>&#x00394;</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is about 38% for small samples <italic>N</italic> &#x0007E; 10<sup>2</sup> and decreases to 34% for large samples <italic>N</italic> &#x0007E; 10<sup>4</sup>. On the other hand, <inline-formula><mml:math id="M40"><mml:mo>&#x00394;</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> starts at 20%, but decreases to 2% as the number of data points is increased. Although it can be shown that the bias of <inline-formula><mml:math id="M41"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> vanishes with increasing sample sizes [<xref ref-type="bibr" rid="B24">24</xref>, <xref ref-type="bibr" rid="B39">39</xref>], we find it converging very slowly with increasing sample size in the unfortunate situation of a small <italic>x</italic><sub><italic>max</italic></sub>. In contrast, <inline-formula><mml:math id="M42"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> converges very quickly even for small <italic>x</italic><sub><italic>max</italic></sub> as we have accounted for the FLE.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Relative estimation errors of <bold>(A)</bold> <inline-formula><mml:math id="M43"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <bold>(B)</bold> <inline-formula><mml:math id="M44"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> measured from 1,000 simulated samples using different &#x003B2;<sub><italic>T</italic></sub> and <italic>N</italic> with <italic>x</italic><sub><italic>min</italic></sub> &#x0003D; 0 and <inline-formula><mml:math id="M45"><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>E</mml:mi><mml:mi>X</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>9</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>. Due to the FLE, &#x00394;&#x003B2;<sub><italic>unadj</italic></sub> remains high (close to the theoretical relative error of &#x003F5;(&#x003B4; &#x0003D; 0.1, <italic>x</italic><sub><italic>min</italic></sub> &#x0003D; 0) &#x0003D; 0.1[1 &#x02212; ln(0.1)] &#x02248; 33%) even for large <italic>N</italic>. In contrast, &#x00394;&#x003B2;<sub><italic>adj</italic></sub> decreases rapidly with increasing <italic>N</italic>.</p></caption>
<graphic xlink:href="fams-04-00002-g0002.tif"/>
</fig>
<p>In the Supplementary Information section <xref ref-type="supplementary-material" rid="SM1">1</xref>, we show details for our derivation of the theoretical estimation</p>
<disp-formula id="E14"><label>(14)</label><mml:math id="M46"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:msub><mml:mi>&#x003B2;</mml:mi><mml:mrow><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02248;</mml:mo><mml:msub><mml:mi>&#x003B2;</mml:mi><mml:mi>T</mml:mi></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mi>&#x003B2;</mml:mi><mml:mi>T</mml:mi></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003B2;</mml:mi><mml:mi>T</mml:mi></mml:msub><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x003B2;</mml:mi><mml:mi>T</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi mathvariant='tex-caligraphic'>O</mml:mi><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>&#x003B2;</mml:mi><mml:mi>T</mml:mi></mml:msub><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x0200B;</mml:mtext><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x0200B;</mml:mtext><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>&#x0200B;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:mtext>&#x0200B;</mml:mtext><mml:mn>2</mml:mn><mml:msub><mml:mi>&#x003B2;</mml:mi><mml:mi>T</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>By defining <inline-formula><mml:math id="M47"><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo class="qopname">ln</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003B4;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>, and substitute <italic>x</italic><sub><italic>max</italic></sub> in Equation (14) with &#x003B4;, the theoretical relative estimation error is expressed as</p>
<disp-formula id="E15"><label>(15)</label><mml:math id="M48"><mml:mrow><mml:mo>&#x00394;</mml:mo><mml:msub><mml:mi>&#x003B2;</mml:mi><mml:mrow><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>&#x003B4;</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mi>ln</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003B4;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mi>&#x003B2;</mml:mi><mml:mi>T</mml:mi></mml:msub><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x000A0;&#x000A0;</mml:mtext><mml:mi>&#x003B4;</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mo stretchy='false'>[</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>]</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>Equation (15) shows that the estimation error has no explicit dependence on sample size. This tells us that the <inline-formula><mml:math id="M49"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is always larger than the &#x003B2;<sub><italic>T</italic></sub> because of the FLE effect. The convergence rate then depends on how rapidly <italic>x</italic><sub><italic>max</italic></sub> approaches infinity (&#x003B4; approaches zero) with increasing sample size.</p>
</sec>
<sec>
<title>3.2. Test statistic adjustment for FLE</title>
<p>For a finite sample, <italic>F</italic><sub><italic>EXP</italic></sub>(<italic>x</italic>) &#x0003C; 1 for all <italic>x</italic> &#x0003C; &#x0221E;. Mathematically, this means that <italic>F</italic><sub><italic>EXP</italic></sub>(<italic>x</italic>) &#x0007E; <italic>U</italic>(0, 1 &#x02212; &#x003B4;), where <inline-formula><mml:math id="M50"><mml:msubsup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>E</mml:mi><mml:mi>X</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi>&#x003B4;</mml:mi></mml:math></inline-formula>. This observation is important, because <italic>d</italic><sub><italic>KS</italic></sub> is obtained by comparing <inline-formula><mml:math id="M51"><mml:msup><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>E</mml:mi><mml:mi>X</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> against <italic>U</italic>(0, 1) (see Equation 4). This tell us that for a fair comparison, we need to rescale all elements in <italic>U</italic><sup>(<italic>s</italic>)</sup> by a factor of 1/(1&#x02212;&#x003B4;). Figure <xref ref-type="fig" rid="F3">3</xref> shows the <italic>d</italic><sub><italic>KS</italic></sub> measured for the 1000 sets of EXP distributed data with finite largest element <inline-formula><mml:math id="M52"><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>9</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> for various &#x003B2;<sub><italic>T</italic></sub> and sample sizes <italic>N</italic>. For each sample, we use Equation (10) to estimate the <inline-formula><mml:math id="M53"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and transformed this data to <italic>U</italic><sup>(<italic>s</italic>)</sup> using Equation (12). After that, we measure <italic>d</italic><sub><italic>KS</italic></sub> with Equation (4) to obtain unadjusted KS distance, <italic>KS</italic><sub><italic>unadj</italic></sub> and adjusted KS distance, <italic>KS</italic><sub><italic>adj</italic></sub> using the non-rescaled and rescaled <italic>U</italic><sup>(<italic>s</italic>)</sup>, respectively. <italic>KS</italic><sub><italic>unadj</italic></sub> goes from 0.14 for small samples <italic>N</italic> &#x0007E; 10<sup>2</sup>, to 0.10 for large samples <italic>N</italic> &#x0007E; 10<sup>5</sup>. In contrast, <italic>KS</italic><sub><italic>adj</italic></sub> decrease from 0.06 for small samples to 0.006 for large samples.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>The median KS distances for <bold>(A)</bold> <italic>KS</italic><sub><italic>unadj</italic></sub> and <bold>(B)</bold> <italic>KS</italic><sub><italic>adj</italic></sub> measured from 1,000 simulated samples using different &#x003B2;<sub><italic>T</italic></sub> and <italic>N</italic>. The <italic>x</italic><sub><italic>min</italic></sub> is set to 0 and <inline-formula><mml:math id="M54"><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>E</mml:mi><mml:mi>X</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>9</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>. Because of the FLE, <italic>KS</italic><sub><italic>unadj</italic></sub> remains above &#x003B4; &#x0003D; 0.10 while <italic>KS</italic><sub><italic>adj</italic></sub> converges to zero for large <italic>N</italic>.</p></caption>
<graphic xlink:href="fams-04-00002-g0003.tif"/>
</fig>
</sec>
<sec>
<title>3.3. Adjustment for finite number of elements</title>
<p>Until now, we have only discussed adjustments to the estimated parameter and the KS distance to eliminate the bias caused by the FLE. Besides the FLE effect, we also need to consider the bias caused by having a finite number of elements in the sample. As we can see from Figure <xref ref-type="fig" rid="F3">3</xref>, the KS distance decreases as the sample size increases. Therefore, in order to have a fair comparison of the goodness of fit for various sample sizes, we need to determine how <italic>d</italic><sub><italic>KS</italic></sub> changes as a function of <italic>N</italic>. To do this, we simulated 10<sup>6</sup> samples of various sizes <italic>N</italic> from <italic>U</italic>(0, 1). For each sample we determined <italic>d</italic><sub><italic>KS</italic></sub> using Equation (4), so that for each <italic>N</italic> we end up with 10<sup>6</sup> KS distances. In Figure <xref ref-type="fig" rid="F4">4</xref> we show the KS distances at different deciles, which exhibits the asymptotic behavior</p>
<disp-formula id="E16"><label>(16)</label><mml:math id="M55"><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>K</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>&#x02118;</mml:mi><mml:mrow><mml:mi>K</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>N</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mn>100</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>&#x02118;</mml:mi><mml:mrow><mml:mi>K</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mn>0.176</mml:mn></mml:mrow></mml:msup><mml:mi>exp</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mn>0.274</mml:mn><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mn>0.492</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mfrac><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>N</mml:mi><mml:mo>&#x0003E;</mml:mo><mml:mn>50</mml:mn></mml:mrow></mml:math></disp-formula>
<p>that we settled for, after experimenting with several functional forms (see Supplementary Information section <xref ref-type="supplementary-material" rid="SM1">2</xref>). This result agrees with our expectation that <italic>d</italic><sub><italic>KS</italic></sub> &#x02192; 0 as <italic>N</italic> &#x02192; &#x0221E;. It also suggests that if we have two samples with sizes <italic>N</italic><sub>1</sub> and <italic>N</italic><sub>2</sub> from the same distribution, we should compare <inline-formula><mml:math id="M56"><mml:msubsup><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>492</mml:mn></mml:mrow></mml:msubsup><mml:msubsup><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>K</mml:mi><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> against <inline-formula><mml:math id="M57"><mml:msubsup><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>492</mml:mn></mml:mrow></mml:msubsup><mml:msubsup><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>K</mml:mi><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula>. Otherwise, if <italic>N</italic><sub>2</sub> &#x0003E; <italic>N</italic><sub>1</sub> then naturally <inline-formula><mml:math id="M58"><mml:msubsup><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>K</mml:mi><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x0003C;</mml:mo><mml:msubsup><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>K</mml:mi><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> and we will be lead to the wrong conclusion that the <italic>N</italic><sub>2</sub> sample fits the distribution better.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Log-log plot of <italic>d</italic><sub><italic>KS</italic></sub> against <italic>N</italic> for different deciles going from the 10th percentile (blue) to the 90th (red), obtained from 10<sup>6</sup> simulations.</p></caption>
<graphic xlink:href="fams-04-00002-g0004.tif"/>
</fig>
<p>In this section, we presented explicitly the procedures to obtain the adjusted parameter, as well as the steps to perform significance testing on this estimated parameter. Although we demonstrated this explicitly using the EXP distribution as an example, one should note that this method can also be applied to other distributions. The inclusion of <italic>x</italic><sub><italic>max</italic></sub> when fitting empirical data have been previously considered by [<xref ref-type="bibr" rid="B40">40</xref>&#x02013;<xref ref-type="bibr" rid="B42">42</xref>] for the truncated PL distribution. Like these, the method presented in this paper can be easily extended to fit different distributions, but unlike these, we can easily conduct significance testing across them. This is because by extending <italic>x</italic><sub><italic>max</italic></sub> to infinity, we can compute the probability integral transform to map arbitrary distributions to the standard uniform distribution, and ensure that during statistical significance testing our goodness-of-fit measure can be distribution independent [see <bold>CSN(ii)</bold>].</p>
<p>More importantly, fitting data to untruncated distributions defined over [<italic>x</italic><sub><italic>min</italic></sub>, &#x0221E;) is commonly encountered in practice, where no <italic>x</italic><sub><italic>max</italic></sub> is expected from theoretical considerations, but the largest element in our data is finite. If we fit to the truncated versions of the distributions, we might get better estimates of the distribution parameters, but we will not be able to justify inserting these estimates into the untruncated distributions, in the absence of a limiting procedure involving larger and larger <italic>x</italic><sub><italic>max</italic></sub>. Moreover, when researchers expect to be dealing with the untruncated distribution, they will not use the truncated distribution for estimation. In contrast, our self-consistent adjustment procedure would be ontologically easier to justify.</p>
</sec>
</sec>
<sec id="s4">
<title>4. The effects of random noise</title>
<p>Besides having to work with finite samples and finite largest elements, we will also in practice encounter imperfections while collecting samples for various reasons, such as undetected samples, contamination by background noise, and recording errors. We call such noises that occur at the element level <italic>elementary noise</italic>. When we convert these samples to a distribution, noise will also be present at the distribution level that we refer to as <italic>distribution noise</italic>. In principle the information at the distribution level is more robust compared to the elementary level, as we expect random and thus uncorrelated noise to cancel each other. This means that the distribution is less sensitive to elementary noise, but we still worry whether the distribution noise may play an important role in our test of statistical significance. In order to account for the effects of distribution noise, we need to first be able to quantify distribution noise, and thereafter understand how it affects significance testing.</p>
<p>Suppose we now randomly generate a set of EXP data. After adjusting for FLE, we obtained the distribution parameters and use it to transform this set to <inline-formula><mml:math id="M59"><mml:msup><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>E</mml:mi><mml:mi>X</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> following the procedure outlined in section 3.1. Then as illustrated in Figures <xref ref-type="fig" rid="F5">5A&#x02013;C</xref>, a natural way to measure the distribution noise is to plot the histogram, count the frequency for each bin, and compare it to the expected frequency from <italic>U</italic>(0, 1). Since this can be more accurately done for smaller bin sizes, we use the intervals between sorted elements as a collection of non-uniform bins, as shown in Figures <xref ref-type="fig" rid="F5">5D&#x02013;F</xref>. For a data set consisting of <italic>N</italic> elements, each bin carry a weight of 1/<italic>N</italic>, evenly distributed within the interval (<italic>u</italic><sub><italic>i</italic>&#x02212;1</sub>, <italic>u</italic><sub><italic>i</italic></sub>], such that the probability density is</p>
<disp-formula id="E17"><label>(17)</label><mml:math id="M60"><mml:mrow><mml:mi>f</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>u</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mrow><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>As the theoretical probability density for <italic>U</italic>(0, 1) is 1, we define the distribution noise <italic>d</italic><sub><italic>DN</italic></sub> mathematically to be</p>
<disp-formula id="E18"><label>(18)</label><mml:math id="M61"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:mfrac><mml:mrow><mml:mstyle displaystyle='true'><mml:msubsup><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>f</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>u</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:mstyle displaystyle='true'><mml:msubsup><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mstyle></mml:mrow></mml:mfrac></mml:mrow></mml:msqrt></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msqrt><mml:mrow><mml:mfrac><mml:mrow><mml:mstyle displaystyle='true'><mml:msubsup><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mi>N</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>u</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mfrac><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:mstyle displaystyle='true'><mml:msubsup><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mstyle></mml:mrow></mml:mfrac></mml:mrow></mml:msqrt><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>u</italic><sub>0</sub> &#x0003D; 0 and <italic>u</italic><sub><italic>N</italic></sub> &#x0003D; 1. We need to weigh the deviation of each bin by <inline-formula><mml:math id="M62"><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> because the bins are non-uniform, and also to keep <italic>d</italic><sub><italic>DN</italic></sub> finite.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Illustration of the distribution noise we would measure if we sample 10 elements from <italic>U</italic>(0, 1), rescaled such that the largest element becomes 1. In <bold>(A,C)</bold> we use 5 uniform bins whereas in <bold>(D,F)</bold> we use the intervals between sorted elements as the bins. Counts are shown as <bold>(A,D)</bold>, and frequencies are shown as <bold>(B,E)</bold>. Whereas the probability densities calculated using Equation (17) are shown on the as <bold>(C,F)</bold>.</p></caption>
<graphic xlink:href="fams-04-00002-g0005.tif"/>
</fig>
<sec>
<title>4.1. Relation between distribution noise and sample size</title>
<p>As with section 3.3, we simulated 10<sup>6</sup> samples from <italic>U</italic>(0, 1) with different <italic>N</italic>. For each sample, we calculate the distribution noise <italic>d</italic><sub><italic>DN</italic></sub> using Equation (18) and plot its deciles against <italic>N</italic> as shown in Figure <xref ref-type="fig" rid="F6">6</xref>. After experimenting with several functional forms, we write down the relationship between <italic>d</italic><sub><italic>DN</italic></sub> and <italic>N</italic> at percentile &#x02118;<sub><italic>DN</italic></sub> as</p>
<disp-formula id="E19"><label>(19)</label><mml:math id="M63"><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>&#x02118;</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>N</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mo>&#x02329;</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0232A;</mml:mo><mml:mo>&#x0002B;</mml:mo><mml:mo>&#x003A6;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>&#x02118;</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mn>50</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:mfrac><mml:mrow><mml:mi>exp</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mn>50</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mo>&#x0007C;</mml:mo><mml:msub><mml:mi>&#x02118;</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mn>50</mml:mn><mml:mo>&#x0007C;</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>0.430</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mo>&#x0007C;</mml:mo><mml:msub><mml:mi>&#x02118;</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mn>50</mml:mn><mml:msup><mml:mo>&#x0007C;</mml:mo><mml:mrow><mml:mn>0.302</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mn>0.495</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mfrac><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where &#x003A6;(<italic>x</italic>) represents the sign of <italic>x</italic>, and</p>
<disp-formula id="E20"><label>(20)</label><mml:math id="M64"><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0232A;</mml:mo><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:mo>&#x0002B;</mml:mo><mml:mfrac><mml:mrow><mml:mn>2</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:msup><mml:mi>N</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mrow></mml:msqrt><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mi>N</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>is the analytically derived distribution noise, that converges to <inline-formula><mml:math id="M65"><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:msqrt><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msqrt></mml:math></inline-formula> as <italic>N</italic> &#x02192; &#x0221E; (refer to Supplementary Information section <xref ref-type="supplementary-material" rid="SM1">2</xref> for more details). This result suggests that if we have two samples with sizes <italic>N</italic><sub>1</sub> and <italic>N</italic><sub>2</sub> with <italic>N</italic><sub>2</sub> &#x0003E; <italic>N</italic><sub>1</sub> from the same distribution, we should compare <inline-formula><mml:math id="M66"><mml:msubsup><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>495</mml:mn></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>-</mml:mo><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:msqrt><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msqrt></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> against <inline-formula><mml:math id="M67"><mml:msubsup><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>495</mml:mn></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>-</mml:mo><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:msqrt><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msqrt></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>. Otherwise, we risk making the wrong conclusion that the <italic>N</italic><sub>2</sub> sample fits the distribution better if <inline-formula><mml:math id="M68"><mml:msubsup><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x0003E;</mml:mo><mml:msubsup><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula>.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Relationship between distribution noise <italic>d</italic><sub><italic>DN</italic></sub> and sample size <italic>N</italic> at deciles going from the 10th percentile (blue) to the 90th (red), obtained from 10<sup>6</sup> simulations. The <italic>d</italic><sub><italic>DN</italic></sub> value converges to <inline-formula><mml:math id="M69"><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:msqrt><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msqrt></mml:math></inline-formula> as <italic>N</italic> increases.</p></caption>
<graphic xlink:href="fams-04-00002-g0006.tif"/>
</fig>
</sec>
<sec>
<title>4.2. Relationship between distribution noise and KS distance</title>
<p>As measures for statistical deviations, <italic>d</italic><sub><italic>DN</italic></sub> and <italic>d</italic><sub><italic>KS</italic></sub> are different in that <italic>d</italic><sub><italic>DN</italic></sub> measures deviation at the probability density level, whereas the <italic>d</italic><sub><italic>KS</italic></sub> measure it at the cumulative density level. As a result, <italic>d</italic><sub><italic>KS</italic></sub> assigns more weight to the tail of the distribution, while <italic>d</italic><sub><italic>DN</italic></sub> is more sensitive to deviations in the body of the distribution. Therefore, if we wish to combine these two measures to estimate the significance level, we need to first investigate the relationship between <italic>d</italic><sub><italic>KS</italic></sub> and <italic>d</italic><sub><italic>DN</italic></sub>. We do this by simulating 10<sup>6</sup> samples from <italic>U</italic>(0, 1) for various sample sizes, and for each sample, we calculate <italic>d</italic><sub><italic>KS</italic></sub> and <italic>d</italic><sub><italic>DN</italic></sub> using Equations (4) and (18) respectively, to obtain 10<sup>6</sup> pairs of <italic>d</italic><sub><italic>KS</italic></sub> and <italic>d</italic><sub><italic>DN</italic></sub>. We then compute the Pearson correlation between <italic>d</italic><sub><italic>KS</italic></sub> and <italic>d</italic><sub><italic>DN</italic></sub> and learned that (see Supplementary Information section <xref ref-type="supplementary-material" rid="SM1">2</xref> for the comparison of fits)</p>
<disp-formula id="E21"><label>(21)</label><mml:math id="M70"><mml:mrow><mml:msub><mml:mi>&#x003C1;</mml:mi><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>K</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>N</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mi>e</mml:mi><mml:mrow><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mn>0.481</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mfrac><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>As expected, <italic>d</italic><sub><italic>KS</italic></sub> is positively correlated with <italic>d</italic><sub><italic>DN</italic></sub>. Since <italic>d</italic><sub><italic>KS</italic></sub> is a measure at the cumulative level, the random distribution noises cancel each other, thus the correlation between <italic>d</italic><sub><italic>KS</italic></sub> and <italic>d</italic><sub><italic>DN</italic></sub> vanishes as <italic>N</italic> &#x02192; &#x0221E;.</p>
</sec>
</sec>
<sec id="s5">
<title>5. Application to significance testing</title>
<sec>
<title>5.1. Significance level for a given distribution</title>
<p>To perform significance testing given <italic>d</italic><sub><italic>KS</italic></sub> and <italic>d</italic><sub><italic>DN</italic></sub>, we need the percentile values &#x02118;<sub><italic>KS</italic></sub> and &#x02118;<sub><italic>DN</italic></sub>. &#x02118;<sub><italic>KS</italic></sub> can be obtained by inverting Equation (16), as</p>
<disp-formula id="E22"><label>(22)</label><mml:math id="M71"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:msubsup><mml:mi>&#x02118;</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mn>0.430</mml:mn></mml:mrow></mml:msubsup><mml:mo>&#x0002B;</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>50</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x02118;</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>0.302</mml:mn></mml:mrow></mml:msup><mml:mi>ln</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mo>&#x0007C;</mml:mo><mml:mi>&#x003B7;</mml:mi><mml:mo>&#x0007C;</mml:mo><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mn>0.495</mml:mn></mml:mrow></mml:msup><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>&#x003B7;</mml:mi><mml:mo>&#x0003C;</mml:mo><mml:mn>0</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:msub><mml:mi>&#x02118;</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mn>50</mml:mn><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>&#x003B7;</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>100</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x02118;</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>0.430</mml:mn></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>&#x02118;</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mn>50</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>0.302</mml:mn></mml:mrow></mml:msup><mml:mi>ln</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mo>&#x0007C;</mml:mo><mml:mi>&#x003B7;</mml:mi><mml:mo>&#x0007C;</mml:mo><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mn>0.495</mml:mn></mml:mrow></mml:msup><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>&#x003B7;</mml:mi><mml:mo>&#x0003E;</mml:mo><mml:mn>0</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Similarly, we invert Equation (19), and solve</p>
<disp-formula id="E23"><label>(23)</label><mml:math id="M72"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:msubsup><mml:mi>&#x02118;</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mn>0.430</mml:mn></mml:mrow></mml:msubsup><mml:mo>&#x0002B;</mml:mo><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mrow><mml:mn>50</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x02118;</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mn>0.302</mml:mn></mml:mrow></mml:msup><mml:mi>ln</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mo>&#x0007C;</mml:mo><mml:mi>&#x003B7;</mml:mi><mml:mo>&#x0007C;</mml:mo><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mn>0.495</mml:mn></mml:mrow></mml:msup><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>&#x003B7;</mml:mi><mml:mo>&#x0003C;</mml:mo><mml:mn>0</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:msub><mml:mi>&#x02118;</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mn>50</mml:mn><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>&#x003B7;</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mrow><mml:mn>100</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x02118;</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mn>0.430</mml:mn></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo><mml:msup><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mrow><mml:msub><mml:mi>&#x02118;</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mn>50</mml:mn></mml:mrow><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mn>0.302</mml:mn></mml:mrow></mml:msup><mml:mi>ln</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mo>&#x0007C;</mml:mo><mml:mi>&#x003B7;</mml:mi><mml:mo>&#x0007C;</mml:mo><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mn>0.495</mml:mn></mml:mrow></mml:msup><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>&#x003B7;</mml:mi><mml:mo>&#x0003E;</mml:mo><mml:mn>0</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>to get &#x02118;<sub><italic>DN</italic></sub>, where &#x003B7; &#x0003D; <italic>d<sub>DN</sub></italic> &#x02212; &#x02329;<italic>d<sub>DN</sub></italic>&#x0232A;.</p>
<p>Substituting the empirical KS distance <inline-formula><mml:math id="M73"><mml:mrow><mml:msubsup><mml:mi>d</mml:mi><mml:mrow><mml:mi>K</mml:mi><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>e</mml:mi><mml:mi>m</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> and empirical distribution noise <inline-formula><mml:math id="M74"><mml:mrow><mml:msubsup><mml:mi>d</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>e</mml:mi><mml:mi>m</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> into Equations (22) and (23), we obtain <inline-formula><mml:math id="M75"><mml:mrow><mml:msubsup><mml:mi>&#x02118;</mml:mi><mml:mrow><mml:mi>K</mml:mi><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>e</mml:mi><mml:mi>m</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M76"><mml:mrow><mml:msubsup><mml:mi>&#x02118;</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>e</mml:mi><mml:mi>m</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula>. This is an alternative way of obtaining the <italic>p</italic>-value without the need to perform Monte-Carlo (re)sampling again (CSN method), since we have already done so in sections 3 and 4. The percentage of simulated <italic>U</italic>(0, 1) samples with <inline-formula><mml:math id="M77"><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>K</mml:mi><mml:mi>S</mml:mi><mml:mo>/</mml:mo><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0003E;</mml:mo><mml:msubsup><mml:mi>d</mml:mi><mml:mrow><mml:mi>K</mml:mi><mml:mi>S</mml:mi><mml:mo>/</mml:mo><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>e</mml:mi><mml:mi>m</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> is <inline-formula><mml:math id="M78"><mml:mrow><mml:mn>100</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msubsup><mml:mi>&#x02118;</mml:mi><mml:mrow><mml:mi>K</mml:mi><mml:mi>S</mml:mi><mml:mo>/</mml:mo><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>e</mml:mi><mml:mi>m</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula>. Since <italic>d<sub>KS</sub></italic> and <italic>d<sub>DN</sub></italic> are not independent (Equation 21), we discount the correlation between <italic>d<sub>KS</sub></italic> and <italic>d<sub>DN</sub></italic>, and define the significance level (<italic>p</italic>-value) as</p>
<disp-formula id="E24"><label>(24)</label><mml:math id="M79"><mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>&#x02118;</mml:mi><mml:mrow><mml:mi>K</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>&#x02118;</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>N</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>&#x02118;</mml:mi><mml:mrow><mml:mi>K</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mn>100</mml:mn></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>&#x02118;</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mn>100</mml:mn></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mi>e</mml:mi><mml:mrow><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mn>0.481</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msqrt><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>to avoid overestimating the significance level.</p>
</sec>
<sec>
<title>5.2. Fitting to empirical data</title>
<p>We follow the steps outlined in the CSN algorithm (section 2) to fit the empirical data, but with two important modifications: (Ii) the parameters (<bold>CSN(ib)</bold>) and goodness of fit (<bold>CSN(iib)</bold>) are adjusted for the finite largest element; and (Iii) the <italic>p</italic>-value (<bold>CSN(ivc)</bold>) is adjusted for the finite number of elements effect. Meanwhile, optional modifications are (Oi) to incorporate distribution noise as another dimension for goodness of fit, so that the <italic>p</italic>-value can be determined via <inline-formula><mml:math id="M80"><mml:mrow><mml:msubsup><mml:mi>d</mml:mi><mml:mrow><mml:mi>K</mml:mi><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>e</mml:mi><mml:mi>m</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M81"><mml:mrow><mml:msubsup><mml:mi>d</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>e</mml:mi><mml:mi>m</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula>, or both; (Oii) instead of using bootstrapping to determine the <italic>p</italic>-value in the CSN method, which is very slow for large samples, one can use the fast inversion formulae Equations (22), (23), or (24).</p>
<p>Figure <xref ref-type="fig" rid="F7">7</xref> shows the fits and <italic>p</italic>-testing results for Taiwan housing price, Taiwan wealth, and Straits Times Index normalized returns. It is reassuring that after modifications the <italic>p</italic>-values of all distributions increased. In particular, the two distributions (Figures <xref ref-type="fig" rid="F7">7B,C</xref>) that did not meet the <italic>p</italic> &#x0003E; 0.1 criterion (as suggested by Clauset et al. [<xref ref-type="bibr" rid="B24">24</xref>]) before modification, now have <italic>p</italic> &#x0003E; 0.5. This is in agreement with our visual assessment of the three fits. We also understand now that a large &#x003B4; (small <italic>x</italic><sub><italic>max</italic></sub>) is the main reason for Taiwan wealth to fail <italic>p</italic>-testing before adjustment (although the fit is visually good). In general, our correction formulas perform the best when &#x003B4; is large due to small sample sizes or truncations. Readers can refer to Supplementary Information section <xref ref-type="supplementary-material" rid="SM1">4</xref> for more plots and instances where small &#x003B4; values affects the significance testing.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p><italic>p</italic>-testing results for <bold>(A)</bold> 2012&#x02013;2014 Taiwan home price per square foot (fitted to EXP), <bold>(B)</bold> 2012 Taiwan lower-tail income (fitted to EXP), and <bold>(C)</bold> 2009&#x02013;2016 Straits Times Index normalized return (fitted to PL) before and after finite-sample adjustments. In this figure, the <italic>N</italic> represent the number of fitted data, and the empirical CDF that is adjusted for FLE is shown as black dots, while the unadjusted and adjusted fits are shown as blue and red dashed line respectively. <italic>P</italic><sub><italic>KS</italic>/<italic>DN</italic></sub>-values (in percentage) are for unadjusted (blue) and adjusted (red) fits. We separate the <italic>p</italic>-values obtained using the CSN method (left) from those using Equations (21) or (23) (right) by a &#x0201C;/&#x0201D;.</p></caption>
<graphic xlink:href="fams-04-00002-g0007.tif"/>
</fig>
<p>There are several limitations one should note while obtaining <italic>P</italic><sub><italic>KS</italic>/<italic>DN</italic></sub> using Equations (22) or (23). First, it is only applicable to large samples (see Figures <xref ref-type="fig" rid="F4">4</xref>, <xref ref-type="fig" rid="F6">6</xref>). Second, these equations are obtained after experimenting with several functional forms and are only approximate. Lastly, <italic>p<sub>KS</sub></italic> measured using the CSN method are consistently smaller than that based on Equation (22). This is due to the CSN algorithm having an extra step to select <italic>x<sub>min</sub></italic> that minimizes <italic>d<sub>KS</sub></italic> of each simulated sample, and thus the algorithm is stricter than our fast inversion formulae. However, the inversion formulae Equations (22) and (23) are convenient and provide an upper bound for <italic>P</italic><sub><italic>KS</italic>/<italic>DN</italic></sub>. We make the codes for the procedures used in parameter estimation and significance testing available at <ext-link ext-link-type="uri" xlink:href="https://github.com/BoonKinTeh/StatisticalSignificanceTesting">https://github.com/BoonKinTeh/StatisticalSignificanceTesting</ext-link> for both these two methods, but leave it to the reader to decide which method to use.</p>
<p>All in all, when we test for statistical significance, we need to be aware of finite sample effects, namely the finite largest element effect and the finite number of elements effect. Beyond the KS distance measured at the cumulative distribution level, we also introduce an alternative measure of the goodness of fit based on the distribution noise at the probability density level.</p>
</sec>
</sec>
<sec id="s6">
<title>Author contributions</title>
<p>BT, DT, and SC: designed research. BT: performed research. BT, DT, and SL: collected data. BT and DT: analyzed data. All authors wrote and reviewed the paper.</p>
<sec>
<title>Conflict of interest statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</sec>
</body>
<back>
<ack>
<p>The authors would like to thank Chou Chung-I for directing us to the Taiwanese data sets.</p>
</ack>
<sec sec-type="supplementary-material" id="s8">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fams.2018.00002/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fams.2018.00002/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Presentation1.PDF" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kac</surname> <given-names>M</given-names></name> <name><surname>Kiefer</surname> <given-names>J</given-names></name> <name><surname>Wolfowitz</surname> <given-names>J</given-names></name></person-group>. <article-title>On tests of normality and other tests of goodness of fit based on distance methods</article-title>. <source>Ann Math Stat.</source> (<year>1955</year>) <volume>26</volume>:<fpage>189</fpage>&#x02013;<lpage>211</lpage>. <pub-id pub-id-type="doi">10.1214/aoms/1177728538</pub-id></citation>
</ref>
<ref id="B2">
<label>2.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>D&#x00027;Agostino</surname> <given-names>RB</given-names></name></person-group>. <article-title>Transformation to normality of the null distribution of g1</article-title>. <source>Biometrika</source> (<year>1970</year>) <volume>57</volume>:<fpage>679</fpage>&#x02013;<lpage>81</lpage>.</citation>
</ref>
<ref id="B3">
<label>3.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jarque</surname> <given-names>CM</given-names></name> <name><surname>Bera</surname> <given-names>AK</given-names></name></person-group>. <article-title>A test for normality of observations and regression residuals</article-title>. <source>Int Stat Rev</source>. (<year>1987</year>) <volume>55</volume>:<fpage>163</fpage>&#x02013;<lpage>72</lpage>.</citation>
</ref>
<ref id="B4">
<label>4.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shaphiro</surname> <given-names>S</given-names></name> <name><surname>Wilk</surname> <given-names>M</given-names></name></person-group>. <article-title>An analysis of variance test for normality</article-title>. <source>Biometrika</source> (<year>1965</year>) <volume>52</volume>:<fpage>591</fpage>&#x02013;<lpage>611</lpage>.</citation>
</ref>
<ref id="B5">
<label>5.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Anderson</surname> <given-names>TW</given-names></name> <name><surname>Darling</surname> <given-names>DA</given-names></name></person-group>. <article-title>Asymptotic theory of certain &#x0201C;goodness of fit&#x0201D; criteria based on stochastic processes</article-title>. <source>Ann Math Stat.</source> (<year>1952</year>) <volume>23</volume>:<fpage>193</fpage>&#x02013;<lpage>212</lpage>. <pub-id pub-id-type="doi">10.1214/aoms/1177729437</pub-id></citation>
</ref>
<ref id="B6">
<label>6.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Anderson</surname> <given-names>TW</given-names></name> <name><surname>Darling</surname> <given-names>DA</given-names></name></person-group>. <article-title>A test of goodness of fit</article-title>. <source>J Am Stat Assoc</source>. (<year>1954</year>) <volume>49</volume>:<fpage>765</fpage>&#x02013;<lpage>9</lpage>.</citation>
</ref>
<ref id="B7">
<label>7.</label>
<citation citation-type="journal"><person-group person-group-type="author">Massey <name><surname>FJ</surname> <given-names>Jr</given-names></name></person-group>. <article-title>The Kolmogorov-Smirnov test for goodness of fit</article-title>. <source>J Am Stat Assoc.</source> (<year>1951</year>) <volume>46</volume>:<fpage>68</fpage>&#x02013;<lpage>78</lpage>.</citation>
</ref>
<ref id="B8">
<label>8.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lilliefors</surname> <given-names>HW</given-names></name></person-group>. <article-title>On the Kolmogorov-Smirnov test for normality with mean and variance unknown</article-title>. <source>J Am Stat Assoc.</source> (<year>1967</year>) <volume>62</volume>:<fpage>399</fpage>&#x02013;<lpage>402</lpage>.</citation>
</ref>
<ref id="B9">
<label>9.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Razali</surname> <given-names>NM</given-names></name> <name><surname>Wah</surname> <given-names>YB</given-names></name></person-group>. <article-title>Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests</article-title>. <source>J Stat Model Anal</source>. (<year>2011</year>) <volume>2</volume>:<fpage>21</fpage>&#x02013;<lpage>33</lpage>.</citation>
</ref>
<ref id="B10">
<label>10.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Newman</surname> <given-names>ME</given-names></name></person-group>. <article-title>Power laws, Pareto distributions and Zipf&#x00027;s law</article-title>. <source>Contemp Phys.</source> (<year>2005</year>) <volume>46</volume>:<fpage>323</fpage>&#x02013;<lpage>51</lpage>. <pub-id pub-id-type="doi">10.1016/j.cities.2012.03.001</pub-id></citation>
</ref>
<ref id="B11">
<label>11.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mantegna</surname> <given-names>RN</given-names></name> <name><surname>Stanley</surname> <given-names>HE</given-names></name></person-group>. <article-title>Scaling behaviour in the dynamics of an economic index</article-title>. <source>Nature</source> (<year>1995</year>) <volume>376</volume>:<fpage>46</fpage>&#x02013;<lpage>9</lpage>.</citation>
</ref>
<ref id="B12">
<label>12.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Plerou</surname> <given-names>V</given-names></name> <name><surname>Gopikrishnan</surname> <given-names>P</given-names></name> <name><surname>Amaral</surname> <given-names>LAN</given-names></name> <name><surname>Meyer</surname> <given-names>M</given-names></name> <name><surname>Stanley</surname> <given-names>HE</given-names></name></person-group>. <article-title>Scaling of the distribution of price fluctuations of individual companies</article-title>. <source>Phys Rev E</source> (<year>1999</year>) <volume>60</volume>:<fpage>6519</fpage>. <pub-id pub-id-type="pmid">11970569</pub-id></citation>
</ref>
<ref id="B13">
<label>13.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gopikrishnan</surname> <given-names>P</given-names></name> <name><surname>Plerou</surname> <given-names>V</given-names></name> <name><surname>Amaral</surname> <given-names>LAN</given-names></name> <name><surname>Meyer</surname> <given-names>M</given-names></name> <name><surname>Stanley</surname> <given-names>HE</given-names></name></person-group>. <article-title>Scaling of the distribution of fluctuations of financial market indices</article-title>. <source>Phys Rev E</source> (<year>1999</year>) <volume>60</volume>:<fpage>5305</fpage>. <pub-id pub-id-type="pmid">11970400</pub-id></citation>
</ref>
<ref id="B14">
<label>14.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Teh</surname> <given-names>BK</given-names></name> <name><surname>Cheong</surname> <given-names>SA</given-names></name></person-group>. <article-title>The Asian correction can be quantitatively forecasted using a statistical model of fusion-fission processes</article-title>. <source>PloS ONE</source> (<year>2016</year>) <volume>11</volume>:<fpage>e0163842</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0163842</pub-id><pub-id pub-id-type="pmid">27706198</pub-id></citation>
</ref>
<ref id="B15">
<label>15.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zipf</surname> <given-names>GK</given-names></name></person-group>. <source>Human Behavior and the Principle of Least Effort</source>. <publisher-loc>Reading, MA</publisher-loc>: <publisher-name>Addison-Weslay</publisher-name> (<year>1949</year>).</citation>
</ref>
<ref id="B16">
<label>16.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cancho</surname> <given-names>RFi</given-names></name> <name><surname>Sol&#x000E9;</surname> <given-names>RV</given-names></name></person-group>. <article-title>The small world of human language</article-title>. <source>Proc R Soc Lond B Biol Sci</source>. (<year>2001</year>) <volume>268</volume>:<fpage>2261</fpage>&#x02013;<lpage>5</lpage>. <pub-id pub-id-type="doi">10.1098/rspb.2001.1800</pub-id></citation>
</ref>
<ref id="B17">
<label>17.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Auerbach</surname> <given-names>F</given-names></name></person-group>. <article-title>Das gesetz der bev&#x000F6;lkerungskonzentration</article-title>. <source>Petermanns Geogr Mitt</source>. (<year>1913</year>) <volume>59</volume>:<fpage>74</fpage>&#x02013;<lpage>6</lpage>.</citation>
</ref>
<ref id="B18">
<label>18.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gabaix</surname> <given-names>X</given-names></name> <name><surname>Ioannides</surname> <given-names>YM</given-names></name></person-group>. <article-title>The evolution of city size distributions</article-title>. <source>Handb Region Urban Econ</source>. (<year>2004</year>) <volume>4</volume>:<fpage>2341</fpage>&#x02013;<lpage>78</lpage>. <pub-id pub-id-type="doi">10.1016/S1574-0080(04)80010-5</pub-id></citation>
</ref>
<ref id="B19">
<label>19.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>MacKay</surname> <given-names>N</given-names></name></person-group>. <article-title>London house prices are power-law distributed</article-title>. <source>arXiv preprint arXiv:10123039</source> (<year>2010</year>).</citation>
</ref>
<ref id="B20">
<label>20.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ohnishi</surname> <given-names>T</given-names></name> <name><surname>Mizuno</surname> <given-names>T</given-names></name> <name><surname>Shimizu</surname> <given-names>C</given-names></name> <name><surname>Watanabe</surname> <given-names>T</given-names></name></person-group>. <article-title>Power laws in real estate prices during bubble periods</article-title>. <source>Int J Mod Phys Conf Ser.</source> (<year>2012</year>) <volume>16</volume>:<fpage>61</fpage>&#x02013;<lpage>81</lpage>. <pub-id pub-id-type="doi">10.1142/S2010194512007787</pub-id></citation>
</ref>
<ref id="B21">
<label>21.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tay</surname> <given-names>DJ</given-names></name> <name><surname>Chou</surname> <given-names>CI</given-names></name> <name><surname>Li</surname> <given-names>SP</given-names></name> <name><surname>Tee</surname> <given-names>SY</given-names></name> <name><surname>Cheong</surname> <given-names>SA</given-names></name></person-group>. <article-title>Bubbles are departures from equilibrium housing markets: evidence from Singapore and Taiwan</article-title>. <source>PLoS ONE</source> (<year>2016</year>) <volume>11</volume>:<fpage>e0166004</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0166004</pub-id><pub-id pub-id-type="pmid">27812187</pub-id></citation>
</ref>
<ref id="B22">
<label>22.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mandelbrot</surname> <given-names>B</given-names></name></person-group>. <article-title>The Pareto-Levy law and the distribution of income</article-title>. <source>Int Econ Rev</source>. (<year>1960</year>) <volume>1</volume>:<fpage>79</fpage>&#x02013;<lpage>106</lpage>.</citation>
</ref>
<ref id="B23">
<label>23.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yakovenko</surname> <given-names>VM</given-names></name> <name><surname>Rosser</surname> <given-names>JB</given-names> <suffix>Jr</suffix></name></person-group>. <article-title>Colloquium: statistical mechanics of money, wealth, and income</article-title>. <source>Rev Mod Phys</source>. (<year>2009</year>) <volume>81</volume>:<fpage>1703</fpage>. <pub-id pub-id-type="doi">10.1103/RevModPhys.81.1703</pub-id></citation>
</ref>
<ref id="B24">
<label>24.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Clauset</surname> <given-names>A</given-names></name> <name><surname>Shalizi</surname> <given-names>CR</given-names></name> <name><surname>Newman</surname> <given-names>ME</given-names></name></person-group>. <article-title>Power-law distributions in empirical data</article-title>. <source>SIAM Rev</source>. (<year>2009</year>) <volume>51</volume>:<fpage>661</fpage>&#x02013;<lpage>703</lpage>. <pub-id pub-id-type="doi">10.1137/070710111</pub-id></citation>
</ref>
<ref id="B25">
<label>25.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brzezinski</surname> <given-names>M</given-names></name></person-group>. <article-title>Do wealth distributions follow power laws? Evidence from &#x0201C;rich lists&#x0201D;</article-title>. <source>Phys A</source> (<year>2014</year>) <volume>406</volume>:<fpage>155</fpage>&#x02013;<lpage>62</lpage>. <pub-id pub-id-type="doi">10.1016/j.physa.2014.03.052</pub-id></citation>
</ref>
<ref id="B26">
<label>26.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hansen</surname> <given-names>LP</given-names></name> <name><surname>Heaton</surname> <given-names>J</given-names></name> <name><surname>Yaron</surname> <given-names>A</given-names></name></person-group>. <article-title>Finite-sample properties of some alternative GMM estimators</article-title>. <source>J Bus Econ Stat</source>. (<year>1996</year>) <volume>14</volume>:<fpage>262</fpage>&#x02013;<lpage>80</lpage>.</citation>
</ref>
<ref id="B27">
<label>27.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Windmeijer</surname> <given-names>F</given-names></name></person-group>. <article-title>A finite sample correction for the variance of linear efficient two-step GMM estimators</article-title>. <source>J Econom</source>. (<year>2005</year>) <volume>126</volume>:<fpage>25</fpage>&#x02013;<lpage>51</lpage>. <pub-id pub-id-type="doi">10.1016/j.jeconom.2004.02.005</pub-id></citation>
</ref>
<ref id="B28">
<label>28.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fisher</surname> <given-names>RA</given-names></name></person-group>. <article-title>On an absolute criterion for fitting frequency curves</article-title>. <source>Messenger Math</source>. (<year>1912</year>) <volume>41</volume>:<fpage>155</fpage>&#x02013;<lpage>60</lpage>.</citation>
</ref>
<ref id="B29">
<label>29.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kumphon</surname> <given-names>B</given-names></name></person-group>. <article-title>Maximum entropy and maximum likelihood estimation for the three-parameter Kappa distribution</article-title>. <source>Open J Stat</source>. (<year>2012</year>) <volume>2</volume>:<fpage>415</fpage>&#x02013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.4236/ojs.2012.24050</pub-id></citation>
</ref>
<ref id="B30">
<label>30.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hradil</surname> <given-names>Z</given-names></name> <name><surname>Reh&#x000E1;cek</surname> <given-names>J</given-names></name></person-group>. <article-title>Likelihood and entropy for statistical inversion</article-title>. <source>J Phys Conf Ser.</source> (<year>2006</year>) <volume>36</volume>:<fpage>55</fpage>. <pub-id pub-id-type="doi">10.1088/1742-6596/36/1/011</pub-id></citation>
</ref>
<ref id="B31">
<label>31.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Akaike</surname> <given-names>H</given-names></name></person-group>. <article-title>Information theory and an extension of the maximum likelihood principle</article-title>. Chapter 4: AIC and Parametrization. In: Parzen E, Tanabe K, Kitagawa G, editors. <source>Information Theory and an Extension of the Maximum Likelihood Principle.</source> <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer New York</publisher-name> (<year>1998</year>). <fpage>p. 199</fpage>&#x02013;<lpage>213</lpage>.</citation>
</ref>
<ref id="B32">
<label>32.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bates</surname> <given-names>DM</given-names></name> <name><surname>Watts</surname> <given-names>DG</given-names></name></person-group>. <source>Nonlinear Regression Analysis and Its Applications</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Wiley</publisher-name> (<year>1988</year>).</citation>
</ref>
<ref id="B33">
<label>33.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wooldridge</surname> <given-names>JM</given-names></name></person-group>. <article-title>Applications of generalized method of moments estimation</article-title>. <source>J Econ Perspect</source>. (<year>2001</year>) <volume>15</volume>:<fpage>87</fpage>&#x02013;<lpage>100</lpage>. <pub-id pub-id-type="doi">10.1257/jep.15.4.87</pub-id></citation>
</ref>
<ref id="B34">
<label>34.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cameron</surname> <given-names>AC</given-names></name> <name><surname>Windmeijer</surname> <given-names>FAG</given-names></name></person-group>. <article-title>An R-squared measure of goodness of fit for some common nonlinear regression models</article-title>. <source>J Econom.</source> (<year>1997</year>) <volume>77</volume>:<fpage>329</fpage>&#x02013;<lpage>42</lpage>.</citation>
</ref>
<ref id="B35">
<label>35.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Janczura</surname> <given-names>J</given-names></name> <name><surname>Weron</surname> <given-names>R</given-names></name></person-group>. <article-title>Black swans or dragon-kings? A simple test for deviations from the power law</article-title>. <source>Eur Phys J Spec Top.</source> (<year>2012</year>) <volume>205</volume>:<fpage>79</fpage>&#x02013;<lpage>93</lpage>. <pub-id pub-id-type="doi">10.1140/epjst/e2012-01563-9</pub-id></citation>
</ref>
<ref id="B36">
<label>36.</label>
<citation citation-type="web"><person-group person-group-type="author"><collab>American Statistical Association</collab></person-group>. <source>ASA P-Value Statement Viewed &#x0003E; 150, 000 Times.</source> American Statistical Association News (<year>2016</year>). (Accessed March 07, 2017). Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.amstat.org/ASA/News/ASA-P-Value-Statement-Viewed-150000-Times.aspx">https://www.amstat.org/ASA/News/ASA-P-Value-Statement-Viewed-150000-Times.aspx</ext-link></citation>
</ref>
<ref id="B37">
<label>37.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wasserstein</surname> <given-names>RL</given-names></name> <name><surname>Lazar</surname> <given-names>NA</given-names></name></person-group>. <article-title>The ASA&#x00027;s statement on p-values: context, process, and purpose</article-title>. <source>Am Stat.</source> (<year>2016</year>) <volume>70</volume>:<fpage>129</fpage>&#x02013;<lpage>33</lpage>. <pub-id pub-id-type="doi">10.1080/00031305.2016.1154108</pub-id></citation>
</ref>
<ref id="B38">
<label>38.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baker</surname> <given-names>M</given-names></name></person-group>. <article-title>Statisticians issue warning over misuse of P values</article-title>. <source>Nature</source> (<year>2016</year>) <volume>531</volume>:<fpage>151</fpage>. <pub-id pub-id-type="doi">10.1038/nature.2016.19503</pub-id><pub-id pub-id-type="pmid">26961635</pub-id></citation>
</ref>
<ref id="B39">
<label>39.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Pitman</surname> <given-names>EJ</given-names></name> <name><surname>Pitman</surname> <given-names>EJG</given-names></name></person-group>. <source>Some Basic Theory for Statistical Inference</source>, <volume>Vol. 7</volume>. <publisher-loc>London</publisher-loc>: <publisher-name>Chapman and Hall London</publisher-name> (<year>1979</year>).</citation>
</ref>
<ref id="B40">
<label>40.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alstott</surname> <given-names>J</given-names></name> <name><surname>Bullmore</surname> <given-names>E</given-names></name> <name><surname>Plenz</surname> <given-names>D</given-names></name></person-group>. <article-title>powerlaw: a Python package for analysis of heavy-tailed distributions</article-title>. <source>PLoS ONE</source> (<year>2014</year>) <volume>9</volume>:<fpage>e85777</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0085777</pub-id><pub-id pub-id-type="pmid">24489671</pub-id></citation>
</ref>
<ref id="B41">
<label>41.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>S</given-names></name> <name><surname>Klaus</surname> <given-names>A</given-names></name> <name><surname>Yang</surname> <given-names>H</given-names></name> <name><surname>Plenz</surname> <given-names>D</given-names></name></person-group>. <article-title>Scale-invariant neuronal avalanche dynamics and the cut-off in size distributions</article-title>. <source>PLoS ONE</source> (<year>2014</year>) <volume>9</volume>:<fpage>e99761</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0099761</pub-id><pub-id pub-id-type="pmid">24927158</pub-id></citation>
</ref>
<ref id="B42">
<label>42.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marshall</surname> <given-names>N</given-names></name> <name><surname>Timme</surname> <given-names>NM</given-names></name> <name><surname>Bennett</surname> <given-names>N</given-names></name> <name><surname>Ripp</surname> <given-names>M</given-names></name> <name><surname>Lautzenhiser</surname> <given-names>E</given-names></name> <name><surname>Beggs</surname> <given-names>JM</given-names></name></person-group>. <article-title>Analysis of power laws, shape collapses, and neural complexity: new techniques and Matlab support via the ncc toolbox</article-title>. <source>Front Physiol</source>. (<year>2016</year>) <volume>7</volume>:<fpage>250</fpage>. <pub-id pub-id-type="doi">10.3389/fphys.2016.00250</pub-id><pub-id pub-id-type="pmid">27445842</pub-id></citation>
</ref>
</ref-list>
<fn-group>
<fn fn-type="financial-disclosure"><p><bold>Funding.</bold> This research is supported by the Singapore Ministry of Education Academic Research Fund Tier 2 under Grant Number MOE2015-T2-2-012.</p></fn>
</fn-group>
</back>
</article>
