<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Appl. Math. Stat.</journal-id>
<journal-title>Frontiers in Applied Mathematics and Statistics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Appl. Math. Stat.</abbrev-journal-title>
<issn pub-type="epub">2297-4687</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fams.2017.00014</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Applied Mathematics and Statistics</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>A Deep Learning Approach to Diabetic Blood Glucose Prediction</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Mhaskar</surname> <given-names>Hrushikesh N.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/281015/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Pereverzyev</surname> <given-names>Sergei V.</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/288454/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>van der Walt</surname> <given-names>Maria D.</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/306889/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Institute of Mathematical Sciences, Claremont Graduate University</institution> <country>Claremont, CA, United States</country></aff>
<aff id="aff2"><sup>2</sup><institution>Johann Radon Institute</institution> <country>Linz, Austria</country></aff>
<aff id="aff3"><sup>3</sup><institution>Department of Mathematics, Vanderbilt University</institution> <country>Nashville, TN, United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Dabao Zhang, Purdue University, United States</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Polina Mamoshina, Insilico Medicine, Inc., United States; Ping Ma, University of Georgia, United States</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Maria D. van der Walt <email>maryke.thom&#x00040;gmail.com</email></p></fn>
<fn fn-type="other" id="fn002"><p>This article was submitted to Mathematics of Computation and Data Science, a section of the journal Frontiers in Applied Mathematics and Statistics</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>14</day>
<month>07</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="collection">
<year>2017</year>
</pub-date>
<volume>3</volume>
<elocation-id>14</elocation-id>
<history>
<date date-type="received">
<day>12</day>
<month>04</month>
<year>2017</year>
</date>
<date date-type="accepted">
<day>28</day>
<month>06</month>
<year>2017</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2017 Mhaskar, Pereverzyev and van der Walt.</copyright-statement>
<copyright-year>2017</copyright-year>
<copyright-holder>Mhaskar, Pereverzyev and van der Walt</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>We consider the question of 30-min prediction of blood glucose levels measured by continuous glucose monitoring devices, using clinical data. While most studies of this nature deal with one patient at a time, we take a certain percentage of patients in the data set as training data, and test on the remainder of the patients; i.e., the machine need not re-calibrate on the new patients in the data set. We demonstrate how deep learning can outperform shallow networks in this example. One novelty is to demonstrate how a parsimonious deep representation can be constructed using domain knowledge.</p>
</abstract>
<kwd-group>
<kwd>deep learning</kwd>
<kwd>deep neural network</kwd>
<kwd>diffusion geometry</kwd>
<kwd>continuous glucose monitoring</kwd>
<kwd>blood glucose prediction</kwd>
</kwd-group>
<contract-num rid="cn001">W911NF-15-1-0385</contract-num>
<contract-num rid="cn002">I 1669-N26</contract-num>
<contract-sponsor id="cn001">Army Research Office<named-content content-type="fundref-id">10.13039/100000183</named-content></contract-sponsor>
<contract-sponsor id="cn002">Austrian Science Fund<named-content content-type="fundref-id">10.13039/501100002428</named-content></contract-sponsor>
<counts>
<fig-count count="5"/>
<table-count count="3"/>
<equation-count count="27"/>
<ref-count count="28"/>
<page-count count="11"/>
<word-count count="6389"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Deep Neural Networks, especially of the convolutional type (DCNNs), have started a revolution in the field of artificial intelligence and machine learning, triggering a large number of commercial ventures and practical applications. There is therefore a great deal of theoretical investigations about when and why deep (hierarchical) networks perform so well compared to shallow ones. For example, Montufar et al. [<xref ref-type="bibr" rid="B1">1</xref>] showed that the number of linear regions that can be synthesized by a deep network with ReLU nonlinearities is much larger than by a shallow network. Examples of specific functions that cannot be represented efficiently by shallow networks have been given very recently by Telgarsky [<xref ref-type="bibr" rid="B2">2</xref>] and Safran and Shamir [<xref ref-type="bibr" rid="B3">3</xref>].</p>
<p>It is argued in Mhaskar and Poggio [<xref ref-type="bibr" rid="B4">4</xref>] that from a function approximation point of view, deep networks are able to overcome the so-called curse of dimensionality if the target function is hierarchical in nature; e.g., a target function of the form</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mn>3</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mn>21</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mn>11</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mn>12</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>3</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>4</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mn>22</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mn>13</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>5</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>6</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mn>14</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>7</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>8</mml:mn></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where each function has a bounded gradient, can be approximated by a deep network comprising <italic>n</italic> units, organized as a binary tree, up to an accuracy <inline-formula><mml:math id="M2"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">O</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>. In contrast, a shallow network that cannot take into account this hierarchical structure can yield an accuracy of at most <inline-formula><mml:math id="M3"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">O</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>8</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>. In theory, if samples of the functions <italic>h, h</italic><sub>1,2</sub>, &#x02026; are known, one can construct the networks explicitly, without using any traditional learning algorithms such as back propagation.</p>
<p>One way to think of the function in Equation (1) is to think of the inner functions as the features of the data, obtained in a hierarchical manner. While classical machine learning with shallow neural networks requires that the relevant features of the raw data should be selected by using domain knowledge before the learning can start, deep learning algorithms appear to select the right features automatically. However, it is typically not clear how to interpret these features. Indeed, from a mathematical point of view, it is easy to show that a structure such as Equation (1) is not unique, so that the hierarchical features cannot be defined uniquely, except perhaps in some very special examples.</p>
<p>In this paper, we examine how a deep network can be constructed in a parsimonious manner if we do allow domain knowledge to suggest the compositional structure of the target function as well as the values of the constituent functions. We study the problem of predicting, based on the past few readings of a continuous glucose monitoring (CGM) device, both the blood glucose (BG) level and the rate at which it would be changing 30 min after the last reading. From the point of view of diabetes management, a reliable solution to this problem is of great importance. If a patient has some warning that that his/her BG will rise or drop in the next half hour, the patient can take certain precautionary measures to prevent this (e.g., administer an insulin injection or take an extra snack) [<xref ref-type="bibr" rid="B5">5</xref>, <xref ref-type="bibr" rid="B6">6</xref>].</p>
<p>Our approach is to first construct three networks based on whether a 5-min prediction, using ideas in Mhaskar et al. [<xref ref-type="bibr" rid="B7">7</xref>], indicates the trend to be in the hypoglycemic (0&#x02013;70 mg/dL), euglycemic (70&#x02013;180 mg/dL), or hyperglycemic (180&#x02013;450 mg/dL) range. We then train a &#x0201C;judge&#x0201D; network to get a final prediction based on the outputs of these three networks. Unlike the tacit assumption in the theory in Mhaskar and Poggio [<xref ref-type="bibr" rid="B4">4</xref>], the readings and the outputs of the three constituent networks are not dense in a Euclidean cube. Therefore, we use diffusion geometry ideas in Ehler et al. [<xref ref-type="bibr" rid="B8">8</xref>] to train the networks in a manner analogous to manifold learning.</p>
<p>From the point of view of BG prediction, a novelty of our paper is the following. Most of the literature on this subject which we are familiar with does the prediction patient-by-patient; for example, by taking 30% of the data for each patient to make the prediction for that patient. In contrast, we consider the entire data for 30% of the patients as training data, and predict the BG level for the remaining 70% of patients in the data set. Thus, our algorithm transfers the knowledge learned from one data set to another, although it does require the knowledge of both the data sets to work with. From this perspective, the algorithm has similarity with the meta-learning approach by Naumova et al. [<xref ref-type="bibr" rid="B5">5</xref>], but in contrast to the latter, it does not require a meta-features selection.</p>
<p>We will explain the problem and the evaluation criterion in Section 2. Some prior work on this problem is reviewed briefly in Section 3. The methodology and algorithm used in this paper are described in Section 4. The results are discussed in Section 5. The mathematical background behind the methods described in Section 4 is summarized in Appendices A.1 and A.2.</p>
</sec>
<sec id="s2">
<title>2. Problem statement and evaluation</title>
<p>We use a clinical data set provided by the DirectNet Central Laboratory [<xref ref-type="bibr" rid="B9">9</xref>], which lists BG levels of different patients taken at 5-min intervals with the CGM device; i.e., for each patient <italic>p</italic> in the patient set <italic>P</italic>, we are given a time series {<italic>s</italic><sub><italic>p</italic></sub>(<italic>t</italic><sub><italic>j</italic></sub>)}, where <italic>s</italic><sub><italic>p</italic></sub>(<italic>t</italic><sub><italic>j</italic></sub>) denotes the BG level at time <italic>t</italic><sub><italic>j</italic></sub>. Our goal is to predict for each <italic>j</italic>, the level <italic>s</italic><sub><italic>p</italic></sub>(<italic>t</italic><sub><italic>j</italic>&#x0002B;<italic>m</italic></sub>), given readings <italic>s</italic><sub><italic>p</italic></sub>(<italic>t</italic><sub><italic>j</italic></sub>), &#x02026;, <italic>s</italic><sub><italic>p</italic></sub>(<italic>t</italic><sub><italic>j</italic>&#x02212;<italic>d</italic>&#x0002B;1</sub>) for appropriate values of <italic>m</italic> and <italic>d</italic>. For a 30-min prediction, <italic>m</italic> &#x0003D; 6, and we took <italic>d</italic> &#x0003D; 7 (a sampling horizon <italic>t</italic><sub><italic>j</italic></sub> &#x02212; <italic>t</italic><sub><italic>j</italic>&#x02212;<italic>d</italic>&#x0002B;1</sub> of 30 min has been suggested as the optimal one for BG prediction in Mhaskar et al. [<xref ref-type="bibr" rid="B7">7</xref>] and Hayes et al. [<xref ref-type="bibr" rid="B10">10</xref>].</p>
<p>In this problem, numerical accuracy is not the central objective. To quantify the clinical accuracy of the considered predictors, we use the Prediction Error-Grid Analysis (PRED-EGA) [<xref ref-type="bibr" rid="B11">11</xref>], which has been designed especially for glucose predictors and which, together with its predecessors and variations, is by now a standard metric in the blood glucose prediction problem (see for example, [<xref ref-type="bibr" rid="B5">5</xref>, <xref ref-type="bibr" rid="B12">12</xref>&#x02013;<xref ref-type="bibr" rid="B14">14</xref>]). This assessment methodology records reference BG estimates paired with the BG estimates predicted for the same moments, as well as reference BG directions and rates of change paired with the corresponding estimates predicted for the same moments. As a result, the PRED-EGA reports the numbers (in percent) of Accurate (Acc.), Benign (Benign) and Erroneous (Error) predictions in the hypoglycemic, euglycemic and hyperglycemic ranges separately. This stratification is of great importance because consequences caused by a prediction error in the hypoglycemic range are very different from ones in the euglycemic or the hyperglycemic range.</p>
</sec>
<sec id="s3">
<title>3. Prior work</title>
<p>Given the importance of the problem, many researchers have worked on it in several directions. Relevant to our work is the work using a linear predictor and work using supervised learning.</p>
<p>The linear predictor method estimates <inline-formula><mml:math id="M4"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> based on the previous <italic>d</italic> readings, and then predicts</p>
<disp-formula id="E2"><mml:math id="M5"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02248;</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:msubsup><mml:mi>s</mml:mi><mml:mi>p</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msubsup><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>Perhaps the most classical of these is the work [<xref ref-type="bibr" rid="B15">15</xref>] by Savitzky and Golay, that proposes an approximation of <inline-formula><mml:math id="M6"><mml:msubsup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> by the derivative of a polynomial of least square fit to the data (<italic>t</italic><sub><italic>k</italic></sub>, <italic>s</italic><sub><italic>p</italic></sub>(<italic>t</italic><sub><italic>k</italic></sub>)), <italic>k</italic> &#x0003D; <italic>j</italic>, &#x02026;, <italic>j</italic> &#x02212; <italic>d</italic> &#x0002B; 1. The degree <italic>n</italic> of the polynomial acts as a regularization parameter. However, in addition to the intrinsic ill&#x02013;conditioning of numerical differentiation, the solution of the least square problem as posed above involves a system of linear equations with the Hilbert matrix of order <italic>n</italic>, which is notoriously ill&#x02013;conditioned. Therefore, it is proposed in Lu et al. [<xref ref-type="bibr" rid="B16">16</xref>] to use Legendre polynomials rather than the monomials as the basis for the space of polynomials of degree <italic>n</italic>. A procedure to choose <italic>n</italic> is given in Lu et al. [<xref ref-type="bibr" rid="B16">16</xref>], together with error bounds in terms of <italic>n</italic> and the estimates on the noise level in the data, which are optimal up to a constant factor for the method in the sense of the oracle inequality. A substantial improvement on this method was proposed in Mhaskar et al. [<xref ref-type="bibr" rid="B7">7</xref>], which we summarize in Appendix A.1. As far as we are aware, this is the state of the art in this approach in short term blood glucose prediction using linear prediction technology.</p>
<p>There exist several BG prediction algorithms in the literature that use a supervised learning approach. These can be divided into three main groups.</p>
<p>The first group of methods employ kernel-based regularization techniques to achieve prediction (for example, [<xref ref-type="bibr" rid="B5">5</xref>, <xref ref-type="bibr" rid="B17">17</xref>] and references therein), where Tikhonov regularization is used to find the best least square fit to the data (<italic>t</italic><sub><italic>k</italic></sub>, <italic>s</italic><sub><italic>p</italic></sub>(<italic>t</italic><sub><italic>k</italic></sub>)), <italic>k</italic> &#x0003D; <italic>j</italic>, &#x02026;, <italic>j</italic> &#x02212; <italic>d</italic> &#x0002B; 1, assuming the minimizer belongs to a reproducing kernel Hilbert space (RKHS). Of course, these methods are quite sensitive to the choice of kernel and regularization parameters. Therefore, the authors in Naumova et al. [<xref ref-type="bibr" rid="B5">5</xref>, <xref ref-type="bibr" rid="B17">17</xref>] develop methods to choose both the kernel and regularization parameter adaptively, or through meta-learning (&#x0201C;learning to learn&#x0201D;) approaches.</p>
<p>The second group consists of artificial neural network models (such as [<xref ref-type="bibr" rid="B13">13</xref>, <xref ref-type="bibr" rid="B18">18</xref>]). In Pappada et al. [<xref ref-type="bibr" rid="B13">13</xref>], for example, a feed-forward neural network is designed with eleven neurons in the input layer (corresponding to variables such as CGM data, the rate of change of glucose levels, meal intake and insulin dosage), and nine neurons with hyperbolic tangent transfer function in the hidden layer. The network was trained with the use of data from 17 patients and tested on data from 10 other patients for a 75-min prediction, and evaluated using the classical Clarke Error-Grid Analysis (EGA) [<xref ref-type="bibr" rid="B19">19</xref>], which is a predecessor of the PRED-EGA assessment metric. Classical EGA differs from PRED-EGA in the sense that it only compares absolute BG concentrations with reference BG values (and not rates of change in BG concentrations as well). Although good results are achieved in the EGA grid in this paper, a limitation of the method is the large amount of additional input information necessary to design the model, as described above.</p>
<p>The third group consists of methods that utilize time-series techniques such as autoregressive (AR) models (for example, [<xref ref-type="bibr" rid="B14">14</xref>, <xref ref-type="bibr" rid="B20">20</xref>]). In Reifman et al. [<xref ref-type="bibr" rid="B14">14</xref>], a tenth-order AR model is developed, where the AR coefficients are determined through a regularized least square method. The model is trained patient-by-patient, typically using the first 30% of the patient&#x00027;s BG measurements, for a 30 or 60-min prediction. The method is tested on a time series containing glucose values measured every minute, and evaluation is again done through the classical EGA grid. The authors in Sparacino et al. [<xref ref-type="bibr" rid="B20">20</xref>] develop a first-order AR model, patient-by-patient, with time-varying AR coefficients determined through weighted least squares. Their method is tested on a time series containing glucose values measured every 3 min, and quantified using statistical metrics such as measuring the mean square of the errors. As noted in Naumova et al. [<xref ref-type="bibr" rid="B5">5</xref>], these methods seem to be sensitive to gaps in the input data.</p>
</sec>
<sec id="s4">
<title>4. Methodology in the current paper</title>
<p>Our proposed method represents semi-supervised learning that follows an entirely different approach from those described above. It is not a classical statistics/optimization based approach; instead, it is based on function approximation on data defined manifolds, using diffusion polynomials. In this section, we describe our deep learning method, which consists of two layers, in details.</p>
<p>Given the time series {<italic>s</italic><sub><italic>p</italic></sub>(<italic>t</italic><sub><italic>j</italic></sub>)} of BG levels at time <italic>t</italic><sub><italic>j</italic></sub> for each patient <italic>p</italic> in the patient set <italic>P</italic>, where <italic>t</italic><sub><italic>j</italic></sub> &#x02212; <italic>t</italic><sub><italic>j</italic>&#x02212;1</sub> &#x0003D; 5 min, we start by formatting the data into the form {(<bold>x</bold><sub><italic>j</italic></sub>, <italic>y</italic><sub><italic>j</italic></sub>)}, where</p>
<disp-formula id="E3"><mml:math id="M7"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mi>j</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mi>d</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:mo>&#x022EF;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mi>&#x0211D;</mml:mi><mml:mi>d</mml:mi></mml:msup><mml:mtext>&#x02009;&#x02009;&#x02009;and</mml:mtext></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mi>y</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02208;</mml:mo><mml:mi>&#x0211D;</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x02009;for&#x02009;all&#x02009;patients&#x02009;</mml:mtext><mml:mi>p</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>P</mml:mi><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>We will use the notation</p>
<disp-formula id="E4"><mml:math id="M8"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">P</mml:mi><mml:mtext>&#x02009;</mml:mtext><mml:mo>:</mml:mo><mml:mtext>&#x0200B;</mml:mtext><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle><mml:mi>j</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mi>d</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:mo>&#x022EF;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>:</mml:mo><mml:mi>p</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>P</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>We also construct the diffusion matrix from <inline-formula><mml:math id="M9"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">P</mml:mi></mml:mrow></mml:math></inline-formula>. This is done by normalizing the rows of the weight matrix <inline-formula><mml:math id="M10"><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">W</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B5;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> in Equation (A6), following the approach in Lafon [<xref ref-type="bibr" rid="B21">21</xref>, pp. 33&#x02013;34].</p>
<p>Having defined the input data <inline-formula><mml:math id="M11"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">P</mml:mi></mml:mrow></mml:math></inline-formula> and the corresponding diffusion matrix, our method proceeds as follows.</p>
<sec>
<title>4.1. First layer: three networks in different clusters</title>
<p>To start our first layer training, we form the training patient set <italic>TP</italic> by randomly selecting (according to a uniform probability distribution) <italic>M</italic>% of the patients in <italic>P</italic>. The training data are now defined to be all the data (<bold>x</bold><sub><italic>j</italic></sub>, <italic>y</italic><sub><italic>j</italic></sub>) corresponding to the patients in <italic>TP</italic>. We will use the notations</p>
<disp-formula id="E5"><mml:math id="M12"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">C</mml:mi><mml:mtext>&#x02009;</mml:mtext><mml:mo>:</mml:mo><mml:mtext>&#x0200B;&#x0200B;</mml:mtext><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle><mml:mi>j</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mi>d</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:mo>&#x022EF;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>:</mml:mo><mml:mi>p</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>and</p>
<disp-formula id="E6"><mml:math id="M13"><mml:mrow><mml:msup><mml:mi mathvariant="-tex-caligraphic">C</mml:mi><mml:mo>&#x022C6;</mml:mo></mml:msup><mml:mtext>&#x02009;</mml:mtext><mml:mo>:</mml:mo><mml:mtext>&#x0200B;&#x0200B;</mml:mtext><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle><mml:mi>j</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mi>d</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:mo>&#x022EF;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>:</mml:mo><mml:mi>p</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>Next, we make a short term prediction <italic>L</italic><sub><bold>x</bold><sub><italic>j</italic></sub></sub>(<italic>t</italic><sub><italic>j</italic>&#x0002B;1</sub>) of the BG level <italic>s</italic><sub><italic>p</italic></sub>(<italic>t</italic><sub><italic>j</italic>&#x0002B;1</sub>) after 5 min, for all the given measurements <inline-formula><mml:math id="M14"><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">C</mml:mi></mml:mrow></mml:math></inline-formula>, by applying the linear predictor method (summarized in Section 3 and Appendix A.1). Based on these 5-min predictions, we divide the measurements in <inline-formula><mml:math id="M15"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">C</mml:mi></mml:mrow></mml:math></inline-formula> into three clusters <inline-formula><mml:math id="M16"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">C</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>o</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">C</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math id="M17"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">C</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>,</p>
<disp-formula id="E7"><mml:math id="M18"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:msub><mml:mi mathvariant="-tex-caligraphic">C</mml:mi><mml:mi>o</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle><mml:mi>j</mml:mi></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mi mathvariant="-tex-caligraphic">C</mml:mi><mml:mo>:</mml:mo><mml:mn>0</mml:mn><mml:mo>&#x02264;</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:msub><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02264;</mml:mo><mml:mn>70</mml:mn></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mtext>hypoglycemia</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mo>;</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mi mathvariant="-tex-caligraphic">C</mml:mi><mml:mi>e</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle><mml:mi>j</mml:mi></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mi mathvariant="-tex-caligraphic">C</mml:mi><mml:mo>:</mml:mo><mml:mn>70</mml:mn><mml:mo>&#x0003C;</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:msub><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02264;</mml:mo><mml:mn>180</mml:mn></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mtext>euglycemia</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mo>;</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mi mathvariant="-tex-caligraphic">C</mml:mi><mml:mi>r</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle><mml:mi>j</mml:mi></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mi mathvariant="-tex-caligraphic">C</mml:mi><mml:mo>:</mml:mo><mml:mn>180</mml:mn><mml:mo>&#x0003C;</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:msub><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02264;</mml:mo><mml:mn>450</mml:mn></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mtext>&#x02009;&#x02009;&#x02009;</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mtext>hyperglycemia</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>with</p>
<disp-formula id="E8"><mml:math id="M19"><mml:mrow><mml:msubsup><mml:mi mathvariant="-tex-caligraphic">C</mml:mi><mml:mi>&#x02113;</mml:mi><mml:mo>&#x022C6;</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mi>j</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>:</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mi>j</mml:mi></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mi mathvariant="-tex-caligraphic">C</mml:mi><mml:mi>&#x02113;</mml:mi></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x02003;</mml:mtext><mml:mi>&#x02113;</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mi>o</mml:mi><mml:mo>,</mml:mo><mml:mi>e</mml:mi><mml:mo>,</mml:mo><mml:mi>r</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>The motivation of this step is to gather more information concerning the training set to ensure more accurate predictions in each BG range&#x02014;as noted previously, consequences of prediction error in the separate BG ranges are very different.</p>
<p>In the sequel, let <italic>S</italic>(&#x00393;, <bold>x</bold><sub><italic>j</italic></sub>) denote the result of the method of Appendix A.2 [that is, <italic>S</italic>(&#x00393;, <bold>x</bold><sub><italic>j</italic></sub>) is defined by Equation (A5)], used with training data &#x00393; and evaluated at a point <inline-formula><mml:math id="M20"><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">P</mml:mi></mml:mrow></mml:math></inline-formula>. After obtaining the three clusters <inline-formula><mml:math id="M21"><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">C</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>o</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x022C6;</mml:mo></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">C</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x022C6;</mml:mo></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M22"><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">C</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x022C6;</mml:mo></mml:mrow></mml:msubsup></mml:math></inline-formula>, we compute the three predictors</p>
<disp-formula id="E9"><mml:math id="M23"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:msub><mml:mi>f</mml:mi><mml:mi>o</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mo stretchy='false'>)</mml:mo><mml:mo>:</mml:mo><mml:mtext>&#x0200B;&#x0200B;</mml:mtext><mml:mo>=</mml:mo><mml:mi>S</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msubsup><mml:mi mathvariant="-tex-caligraphic">C</mml:mi><mml:mi>o</mml:mi><mml:mo>&#x022C6;</mml:mo></mml:msubsup><mml:mo>,</mml:mo><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:mtext>&#x02003;</mml:mtext><mml:msub><mml:mi>f</mml:mi><mml:mi>e</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mo stretchy='false'>)</mml:mo><mml:mtext>&#x02009;</mml:mtext><mml:mo>:</mml:mo><mml:mtext>&#x0200B;&#x0200B;</mml:mtext><mml:mo>=</mml:mo><mml:mi>S</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msubsup><mml:mi mathvariant="-tex-caligraphic">C</mml:mi><mml:mi>e</mml:mi><mml:mo>&#x022C6;</mml:mo></mml:msubsup><mml:mo>,</mml:mo><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mo stretchy='false'>)</mml:mo><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;and</mml:mtext></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mi>f</mml:mi><mml:mi>r</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mo stretchy='false'>)</mml:mo><mml:mo>:</mml:mo><mml:mtext>&#x0200B;&#x0200B;</mml:mtext><mml:mo>=</mml:mo><mml:mi>S</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msubsup><mml:mi mathvariant="-tex-caligraphic">C</mml:mi><mml:mi>r</mml:mi><mml:mo>&#x022C6;</mml:mo></mml:msubsup><mml:mo>,</mml:mo><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;for&#x02009;all&#x02009;</mml:mtext><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mo>&#x02208;</mml:mo><mml:mi mathvariant="-tex-caligraphic">P</mml:mi><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>as well as the &#x0201C;judge&#x0201D; predictor, based on the entire training set <inline-formula><mml:math id="M24"><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">C</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mo>&#x022C6;</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula>,</p>
<disp-formula id="E10"><mml:math id="M25"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>J</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mo stretchy='false'>)</mml:mo><mml:mtext>&#x02009;</mml:mtext><mml:mo>:</mml:mo><mml:mtext>&#x0200B;</mml:mtext><mml:mo>=</mml:mo><mml:mi>S</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mi mathvariant="-tex-caligraphic">C</mml:mi><mml:mo>&#x022C6;</mml:mo></mml:msup><mml:mo>,</mml:mo><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:mtext>&#x02003;</mml:mtext><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mo>&#x02208;</mml:mo><mml:mi mathvariant="-tex-caligraphic">P</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>using the summability method of Appendix A.2 [specifically, Equation (A4)]. We remark that, as discussed in Mhaskar [<xref ref-type="bibr" rid="B22">22</xref>], our approximations in Equation (A5) can be computed as classical radial basis function networks, with exactly as many neurons as the number of eigenfunctions used in Equation (A3). (As mentioned in Appendix A.2, this number is determined to ensure that the system Equation (A4) is still well conditioned.)</p>
<p>The motivation of this step is to decide which one of the three predictions (<italic>f</italic><sub><italic>o</italic></sub>, <italic>f</italic><sub><italic>e</italic></sub> or <italic>f</italic><sub><italic>r</italic></sub>) is the best prediction for each datum <inline-formula><mml:math id="M26"><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">P</mml:mi></mml:mrow></mml:math></inline-formula>. Since we do not know in advance in which blood glucose range a particular datum will result, we need to use all of the training data for the judge predictor <italic>f</italic><sub><italic>J</italic></sub> to choose the best prediction.</p>
</sec>
<sec>
<title>4.2. Second layer (judge): final output</title>
<p>In the last training layer, a final output is produced based on which <italic>f</italic><sub>&#x02113;</sub>, &#x02113; &#x02208; {<italic>o, e, r</italic>} gives the best placement in the PRED-EGA grid, using <italic>f</italic><sub><italic>J</italic></sub> as the reference value. The PRED-EGA grid is constructed by using comparisons of <italic>f</italic><sub><italic>o</italic></sub> (resp., <italic>f</italic><sub><italic>e</italic></sub> and <italic>f</italic><sub><italic>r</italic></sub>) with the reference value <italic>f</italic><sub><italic>J</italic></sub>&#x02014;specifically, it involves comparing</p>
<disp-formula id="E11"><mml:math id="M27"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>o</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mtext>&#x02009;</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mtext>resp</mml:mtext><mml:mo>.</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mi>e</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mtext>&#x02009;and&#x02009;</mml:mtext><mml:msub><mml:mi>f</mml:mi><mml:mi>r</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;with&#x02003;</mml:mtext><mml:msub><mml:mi>f</mml:mi><mml:mi>J</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></disp-formula>
<p>as well as the rates of change</p>
<disp-formula id="E12"><label>(2)</label><mml:math id="M28"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mfrac><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>o</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mi>o</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mfrac><mml:mtext>&#x02009;</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mtext>resp</mml:mtext><mml:mo>.</mml:mo><mml:mo>,</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>e</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mi>e</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mfrac><mml:mtext>&#x02009;and</mml:mtext></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>r</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mi>r</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;with&#x02003;</mml:mtext><mml:mfrac><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>J</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mi>J</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mstyle mathvariant='bold' mathsize='normal'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mfrac><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>for all <inline-formula><mml:math id="M29"><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">P</mml:mi></mml:mrow></mml:math></inline-formula>. Based on these comparisons, PRED-EGA classifies <italic>f</italic><sub><italic>o</italic></sub>(<bold>x</bold><sub><italic>j</italic></sub>) (resp., <italic>f</italic><sub><italic>e</italic></sub>(<bold>x</bold><sub><italic>j</italic></sub>) and <italic>f</italic><sub><italic>r</italic></sub>(<bold>x</bold><sub><italic>j</italic></sub>)) as being Accurate, Benign or Erroneous. As our final output <italic>f</italic>(<bold>x</bold><sub><italic>j</italic></sub>) of the target function at <inline-formula><mml:math id="M30"><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">P</mml:mi></mml:mrow></mml:math></inline-formula>, we choose the one among <italic>f</italic><sub>&#x02113;</sub>(<bold>x</bold><sub><italic>j</italic></sub>), &#x02113; &#x02208; {<italic>o, e, r</italic>} with the best classification, and if there is more than one achieving the best grid placement, we output the one among <italic>f</italic><sub>&#x02113;</sub>(<bold>x</bold><sub><italic>j</italic></sub>), &#x02113; &#x02208; {<italic>o, e, r</italic>} that has value closest to <italic>f</italic><sub><italic>J</italic></sub>(<bold>x</bold><sub><italic>j</italic></sub>). For the first and last <bold>x</bold><sub><italic>j</italic></sub> for each patient <italic>p</italic> &#x02208; <italic>P</italic> (for which the rate of change Equation (2) cannot be computed), we use the one among <italic>f</italic><sub>&#x02113;</sub>(<bold>x</bold><sub><italic>j</italic></sub>), &#x02113; &#x02208; {<italic>o, e, r</italic>} that has value closest to <italic>f</italic><sub><italic>J</italic></sub>(<bold>x</bold><sub><italic>j</italic></sub>) as the final output.</p>
</sec>
<sec>
<title>4.3. Evaluation</title>
<p>Lastly, to evaluate the performance of the final output <inline-formula><mml:math id="M31"><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">P</mml:mi></mml:mrow><mml:mo>,</mml:mo></mml:math></inline-formula> we use the actual reference values <italic>y</italic><sub><italic>j</italic></sub> to place <italic>f</italic>(<bold>x</bold><sub><italic>j</italic></sub>) in the PRED-EGA grid.</p>
<p>We repeat the process described in Sections 4.1&#x02013;4.3 for a fixed number of trials, after which we report the average of the PRED-EGA grid placements, over all <inline-formula><mml:math id="M32"><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">P</mml:mi></mml:mrow></mml:math></inline-formula> and over all trials, as the final evaluation.</p>
<p>A summary of the method is given in Algorithm 1. A flowchart diagram of the algorithm is shown in Figure <xref ref-type="fig" rid="F1">1</xref>.</p>
<table-wrap position="float" id="T3">
<label>Algorithm 1</label>
<caption><p>Deep Network for BG prediction</p></caption>
<graphic xlink:href="fams-03-00014-i0001.tif"/>
</table-wrap>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Flowchart diagram: Deep Network for BG prediction.</p></caption>
<graphic xlink:href="fams-03-00014-g0001.tif"/>
</fig>
</sec>
<sec>
<title>4.4. Remarks</title>
<list list-type="order">
<list-item><p>An optional smoothing step may be applied before the first training layer step in Section 4.1 to remove any large spikes in the given time series {<italic>s</italic><sub><italic>p</italic></sub>(<italic>t</italic><sub><italic>j</italic></sub>)}, <italic>p</italic> &#x02208; <italic>P</italic>, that may be caused by patient movement, for example. Following ideas in Sparacino et al. [<xref ref-type="bibr" rid="B20">20</xref>], we may apply flat low-pass filtering (for example, a first-order Butterworth filter). In this case, the evaluation of the final output in Section 4.3 is done by using the original, unsmoothed measurements as reference values.</p></list-item>
<list-item><p>We remark that, as explained in Section 1, the training set may also be implemented in a different way: instead of drawing <italic>M</italic>% of the patients in <italic>P</italic> and using their entire data sets for training, we could construct the training set <inline-formula><mml:math id="M33"><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">C</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mo>&#x022C6;</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula> for each patient <italic>p</italic> &#x02208; <italic>P</italic> separately by drawing <italic>M</italic>% of a given patient&#x00027;s data for training, and then construct the networks <italic>f</italic><sub>&#x02113;</sub>, &#x02113; &#x02208; {<italic>o, e, r</italic>} and <italic>f</italic><sub><italic>J</italic></sub> for each patient separately. This is a different problem, studied often in the literature (see for example, [<xref ref-type="bibr" rid="B14">14</xref>, <xref ref-type="bibr" rid="B20">20</xref>, <xref ref-type="bibr" rid="B23">23</xref>]). We do not pursue this approach in the current paper.</p></list-item>
</list>
</sec>
</sec>
<sec id="s5">
<title>5. Results and discussion</title>
<p>As mentioned in Section 2, we apply our method to data provided by the DirecNet Central Laboratory. Time series for 25 patients are considered. This specific data set was designed to study the performance of CGM devices in children with Type I diabetes; as such, all of the patients are less than 18 years old. Each time series {<italic>s</italic><sub><italic>p</italic></sub>(<italic>t</italic><sub><italic>j</italic></sub>)} contains more than 160 BG measurements, taken at 5-min intervals. Our method is a general purpose algorithm, where these details do not play any significant role, except in affecting the outcome of the experiments.</p>
<p>We provide results obtained by implementing our method in Matlab, as described in Algorithm 1. For our implementation, we employ a sampling horizon <italic>t</italic><sub><italic>j</italic></sub> &#x02212; <italic>t</italic><sub><italic>j</italic>&#x02212;<italic>d</italic>&#x0002B;1</sub> of 30 min (<italic>d</italic> &#x0003D; 7), a prediction horizon <italic>t</italic><sub><italic>j</italic>&#x0002B;<italic>m</italic></sub> &#x02212; <italic>t</italic><sub><italic>j</italic></sub> of 30 min (<italic>m</italic> &#x0003D; 6), and a total of 100 trials (<italic>T</italic> &#x0003D; 100). We provide results for both 30% training data (<italic>M</italic> &#x0003D; 30) and 50% training data (<italic>M</italic> &#x0003D; 50) (which is comparable to approaches followed in for example [<xref ref-type="bibr" rid="B13">13</xref>, <xref ref-type="bibr" rid="B14">14</xref>]). After testing our method on all 25 patients, the average PRED-EGA scores (in percent) are displayed in Table <xref ref-type="table" rid="T1">1</xref>. For 30% training data, the percentage of accurate predictions and predictions with benign consequences is 84.32% in the hypoglycemic range, 97.63% in the euglycemic range, and 82.89% in the hyperglycemic range, while for 50% training data, we have 93.21% in the hypoglycemic range, 97.68% in the euglycemic range, and 86.78% in the hyperglycemic range.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Average PRED-EGA scores (in percent): <italic>M</italic>% of patients used for training.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>Hypoglycemia:</bold><break/> <bold>BG &#x02264; 70 (mg/dL)</bold></th>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>Euglycemia:</bold><break/> <bold>BG 70 &#x02212; 180 (mg/dL)</bold></th>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>Hyperglycemia:</bold><break/> <bold>BG &#x0003E; 180 (mg/dL)</bold></th>
</tr>
<tr>
<th/>
<th valign="top" align="center"><bold>Acc</bold>.</th>
<th valign="top" align="center"><bold>Benign</bold></th>
<th valign="top" align="center"><bold>Error</bold></th>
<th valign="top" align="center"><bold>Acc</bold>.</th>
<th valign="top" align="center"><bold>Benign</bold></th>
<th valign="top" align="center"><bold>Error</bold></th>
<th valign="top" align="center"><bold>Acc</bold>.</th>
<th valign="top" align="center"><bold>Benign</bold></th>
<th valign="top" align="center"><bold>Error</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="10"><bold>30% training data (<italic>M</italic> &#x0003D; 30):</bold></td>
</tr>
<tr>
<td valign="top" align="left">Deep network</td>
<td valign="top" align="center">79.97</td>
<td valign="top" align="center">4.35</td>
<td valign="top" align="center">15.68</td>
<td valign="top" align="center">81.88</td>
<td valign="top" align="center">15.75</td>
<td valign="top" align="center">2.37</td>
<td valign="top" align="center">62.72</td>
<td valign="top" align="center">20.17</td>
<td valign="top" align="center">17.11</td>
</tr>
<tr>
<td valign="top" align="left">Shallow network</td>
<td valign="top" align="center">52.79</td>
<td valign="top" align="center">2.64</td>
<td valign="top" align="center">44.57</td>
<td valign="top" align="center">80.55</td>
<td valign="top" align="center">14.04</td>
<td valign="top" align="center">5.41</td>
<td valign="top" align="center">59.37</td>
<td valign="top" align="center">22.09</td>
<td valign="top" align="center">18.54</td>
</tr>
<tr>
<td valign="top" align="left">Tikhonov reg.</td>
<td valign="top" align="center">52.34</td>
<td valign="top" align="center">2.10</td>
<td valign="top" align="center">45.56</td>
<td valign="top" align="center">81.25</td>
<td valign="top" align="center">13.68</td>
<td valign="top" align="center">5.07</td>
<td valign="top" align="center">61.33</td>
<td valign="top" align="center">19.69</td>
<td valign="top" align="center">18.98</td>
</tr>
<tr style="border-top: thin solid #000000;">
<td valign="top" align="left" colspan="10"><bold>50% training data (<italic>M</italic> &#x0003D; 50):</bold></td>
</tr>
<tr>
<td valign="top" align="left">Deep network</td>
<td valign="top" align="center">88.72</td>
<td valign="top" align="center">4.49</td>
<td valign="top" align="center">6.79</td>
<td valign="top" align="center">80.32</td>
<td valign="top" align="center">17.36</td>
<td valign="top" align="center">2.32</td>
<td valign="top" align="center">64.88</td>
<td valign="top" align="center">21.90</td>
<td valign="top" align="center">13.22</td>
</tr>
<tr>
<td valign="top" align="left">Shallow network</td>
<td valign="top" align="center">51.84</td>
<td valign="top" align="center">2.47</td>
<td valign="top" align="center">45.69</td>
<td valign="top" align="center">80.94</td>
<td valign="top" align="center">13.77</td>
<td valign="top" align="center">5.29</td>
<td valign="top" align="center">60.41</td>
<td valign="top" align="center">21.58</td>
<td valign="top" align="center">18.01</td>
</tr>
<tr>
<td valign="top" align="left">Tikhonov reg.</td>
<td valign="top" align="center">52.92</td>
<td valign="top" align="center">1.70</td>
<td valign="top" align="center">45.38</td>
<td valign="top" align="center">81.28</td>
<td valign="top" align="center">13.66</td>
<td valign="top" align="center">5.06</td>
<td valign="top" align="center">62.22</td>
<td valign="top" align="center">20.26</td>
<td valign="top" align="center">17.52</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>For comparison, Table <xref ref-type="table" rid="T1">1</xref> also displays the PRED-EGA scores obtained when implementing a shallow (ie, one layer) feed-forward network with 20 neurons using Matlab&#x00027;s Neural Network Toolbox, using the same parameters <italic>d, m, M</italic> and <italic>T</italic> as in our implementation, with Matlab&#x00027;s default hyperbolic tangent sigmoid activation function. The motivation for using 20 neurons is the following. As mentioned in Section 4.1, each layer in our deep network can be viewed as a classical neural network with exactly as many neurons as the number of eigenfunctions used in Equations (A9) and (A11). This number is determined to ensure that the system in Equation (A10) is well conditioned; in our experiments, it turned out to be at most 5. Therefore, to compare our two-layer deep network (where the first layer consists of three separate networks) with a classical shallow neural network, we use 20 neurons. It is clear that our deep network performs substantially better than the shallow network; in all three BG ranges, our method produces a lower percentage of erroneous predictions.</p>
<p>For a further comparison, we also display in Table <xref ref-type="table" rid="T1">1</xref> the PRED-EGA scores obtained when training through supervised learning using a standard Tikhonov regularization to find the best least square fit to the data; specifically, we implemented the method described in Poggio and Smale [<xref ref-type="bibr" rid="B24">24</xref>, pp. 3&#x02013;4] using a Gaussian kernel with &#x003C3; &#x0003D; 100 and regularization constant &#x003B3; &#x0003D; 0.0001. Again, our method produces superior results, especially in the hypoglycemic range.</p>
<p>Figures <xref ref-type="fig" rid="F2">2</xref>, <xref ref-type="fig" rid="F3">3</xref> display boxplots for the percentage accurate predictions in each BG range for each method over the 100 trials.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Boxplot for the 100 experiments conducted with no smoothing and 30% training data for each prediction method (D, deep network; S, shallow network; T, Tikhonov regularization). The three graphs show the percentage accurate predictions in the hypoglycemic range (left), euglycemic range (middle), and hyperglycemic range (right).</p></caption>
<graphic xlink:href="fams-03-00014-g0002.tif"/>
</fig>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Boxplot for the 100 experiments conducted with no smoothing and 50% training data for each prediction method (D, deep network; S, shallow network; T, Tikhonov regularization). The three graphs show the percentage accurate predictions in the hypoglycemic range (left), euglycemic range (middle) and hyperglycemic range (right).</p></caption>
<graphic xlink:href="fams-03-00014-g0003.tif"/>
</fig>
<p>We also provide results obtained when applying smoothing through flat low-pass filtering to the given time series, as explained in Section 4.4. For our implementations, we used a first-order Butterworth filter with cutoff frequency 0.8, with the same input parameters as before. The results are given in Table <xref ref-type="table" rid="T2">2</xref>. For 30% training data, the percentage of accurate predictions and predictions with benign consequences is 88.92% in the hypoglycemic range, 97.64% in the euglycemic range, and 77.58% in the hyperglycemic range, while for 50% training data, we have 96.43% in the hypoglycemic range, 97.96% in the euglycemic range, and 85.29% in the hyperglycemic range. Figures <xref ref-type="fig" rid="F4">4</xref>, <xref ref-type="fig" rid="F5">5</xref> display boxplots for the percentage accurate predictions in each BG range for each method over the 100 trials.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Average PRED-EGA scores (in percent): <italic>M</italic>% of patients used for training with flat low-pass filtering.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>Hypoglycemia:</bold><break/> <bold>BG &#x02264; 70 (mg/dL)</bold></th>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>Euglycemia:</bold><break/> <bold>BG 70 &#x02212; 180 (mg/dL)</bold></th>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>Hyperglycemia:</bold><break/> <bold>BG &#x0003E; 180 (mg/dL)</bold></th>
</tr>
<tr>
<th/>
<th valign="top" align="center"><bold>Acc</bold>.</th>
<th valign="top" align="center"><bold>Benign</bold></th>
<th valign="top" align="center"><bold>Error</bold></th>
<th valign="top" align="center"><bold>Acc</bold>.</th>
<th valign="top" align="center"><bold>Benign</bold></th>
<th valign="top" align="center"><bold>Error</bold></th>
<th valign="top" align="center"><bold>Acc</bold>.</th>
<th valign="top" align="center"><bold>Benign</bold></th>
<th valign="top" align="center"><bold>Error</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="10"><bold>30% training data (<italic>M</italic> &#x0003D; 30):</bold></td>
</tr>
<tr>
<td valign="top" align="left">Deep network</td>
<td valign="top" align="center">86.41</td>
<td valign="top" align="center">2.51</td>
<td valign="top" align="center">11.08</td>
<td valign="top" align="center">85.05</td>
<td valign="top" align="center">12.59</td>
<td valign="top" align="center">2.36</td>
<td valign="top" align="center">62.24</td>
<td valign="top" align="center">15.34</td>
<td valign="top" align="center">22.42</td>
</tr>
<tr>
<td valign="top" align="left">Shallow network</td>
<td valign="top" align="center">61.10</td>
<td valign="top" align="center">5.21</td>
<td valign="top" align="center">33.69</td>
<td valign="top" align="center">81.96</td>
<td valign="top" align="center">12.77</td>
<td valign="top" align="center">5.27</td>
<td valign="top" align="center">60.01</td>
<td valign="top" align="center">19.62</td>
<td valign="top" align="center">20.37</td>
</tr>
<tr>
<td valign="top" align="left">Tikhonov reg.</td>
<td valign="top" align="center">57.47</td>
<td valign="top" align="center">2.01</td>
<td valign="top" align="center">40.52</td>
<td valign="top" align="center">83.49</td>
<td valign="top" align="center">12.00</td>
<td valign="top" align="center">4.51</td>
<td valign="top" align="center">62.13</td>
<td valign="top" align="center">19.13</td>
<td valign="top" align="center">18.74</td>
</tr>
<tr style="border-top: thin solid #000000;">
<td valign="top" align="left" colspan="10"><bold>50% training data (<italic>M</italic> &#x0003D; 50):</bold></td>
</tr>
<tr>
<td valign="top" align="left">Deep network</td>
<td valign="top" align="center">94.39</td>
<td valign="top" align="center">2.04</td>
<td valign="top" align="center">3.57</td>
<td valign="top" align="center">83.44</td>
<td valign="top" align="center">14.52</td>
<td valign="top" align="center">2.04</td>
<td valign="top" align="center">67.21</td>
<td valign="top" align="center">18.08</td>
<td valign="top" align="center">14.71</td>
</tr>
<tr>
<td valign="top" align="left">Shallow network</td>
<td valign="top" align="center">61.49</td>
<td valign="top" align="center">5.50</td>
<td valign="top" align="center">33.01</td>
<td valign="top" align="center">82.16</td>
<td valign="top" align="center">12.59</td>
<td valign="top" align="center">5.25</td>
<td valign="top" align="center">60.50</td>
<td valign="top" align="center">19.21</td>
<td valign="top" align="center">20.29</td>
</tr>
<tr>
<td valign="top" align="left">Tikhonov reg.</td>
<td valign="top" align="center">59.02</td>
<td valign="top" align="center">1.94</td>
<td valign="top" align="center">39.04</td>
<td valign="top" align="center">83.56</td>
<td valign="top" align="center">11.95</td>
<td valign="top" align="center">4.49</td>
<td valign="top" align="center">62.34</td>
<td valign="top" align="center">19.55</td>
<td valign="top" align="center">18.11</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Boxplot for the 100 experiments conducted with flat low-pass filtering and 30% training data for each prediction method (D, deep network; S, shallow network; T, Tikhonov regularization). The three graphs show the percentage accurate predictions in the hypoglycemic range (left), euglycemic range (middle) and hyperglycemic range (right).</p></caption>
<graphic xlink:href="fams-03-00014-g0004.tif"/>
</fig>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Boxplot for the 100 experiments conducted with flat low-pass filtering and 50% training data for each prediction method (D, deep network; S, shallow network; T, Tikhonov regularization). The three graphs show the percentage accurate predictions in the hypoglycemic range (left), euglycemic range (middle) and hyperglycemic range (right).</p></caption>
<graphic xlink:href="fams-03-00014-g0005.tif"/>
</fig>
<p>In all our experiments the percentage of erroneous predictions is substantially smaller with deep networks than the other two methods we have tried, with the only exception of hyperglycemic range with 30% training data and flat low pass filtering. Our method&#x00027;s performance also improves when the amount of the training data increases from 30% to 50%, because the percentage of erroneous predictions decreases in all of the BG ranges in all experiments.</p>
<p>Moreover, from this viewpoint, our deep learning method outperforms the considered competitors, except in the hyperglycemic BG range in Table <xref ref-type="table" rid="T2">2</xref>. A possible explanation for this is that the DirecNet study was done on children (less than 18 years of age) with type 1 diabetes, for a period of roughly 26 h. It appears that children usually have prolonged hypoglycemic periods as well as profound postprandial hyperglycemia (high blood sugar &#x0201C;spikes&#x0201D; after meals)&#x02014;according to Boland et al. [<xref ref-type="bibr" rid="B25">25</xref>], more than 70% of children display prolonged hypoglycemia while more than 90% of children display significant postprandial hyperglycemia. In particular, many of the patients in the data set only exhibit very limited hyperglycemic BG spikes&#x02014;for the patient labeled 43, for example, there exists a total of 194 BG measurements, of which only 5 measurements are greater than 180 mg/dL. This anomaly might have affected the performance of our algorithm in the hyperglycemic range. Obviously, it performs remarkably better than the other techniques we have tried, including fully supervised training, while ours is only semi-supervised.</p>
</sec>
<sec sec-type="conclusions" id="s6">
<title>6. Conclusion</title>
<p>The prediction of blood glucose levels based on continuous glucose monitoring system data 30 min ahead is a very important problem with many consequences for the health care industry. In this paper, we suggest a deep learning paradigm based on a solid mathematical theory as well as domain knowledge to solve this problem accurately as assessed by the PRED-EGA grid, developed specifically for this purpose. It is demonstrated in Mhaskar and Poggio [<xref ref-type="bibr" rid="B4">4</xref>] that deep networks perform substantially better than shallow networks in terms of expressiveness for function approximation when the target function has a compositional structure. Thus, the blessing of compositionality cures the curse of dimensionality. However, the compositional structure is not unique, and it is open problem to decide whether a given target function has a certain compositional structure. In this paper, we have demonstrated an example where domain knowledge can be used to build an appropriate compositional structure, leading to a parsimonious deep learning design.</p>
</sec>
<sec id="s7">
<title>Ethics statement</title>
<p>The clinical data set used in this publication is publicly available online [<xref ref-type="bibr" rid="B9">9</xref>]. The data were collected during the study that was carried out in accordance with the recommendations of DirecNet, Jaeb Center for Health Research, with written informed consent from all 878 subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by DirecNet, Jaeb Center for Health Research.</p>
</sec>
<sec id="s8">
<title>Author contributions</title>
<p>All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.</p>
<sec>
<title>Conflict of interest statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</sec>
</body>
<back>
<ack><p>The research of HM is supported in part by ARO Grant W911NF-15-1-0385. The research of SP is partially supported by Austrian Science Fund (FWF) Grant I 1669-N26.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Montufar</surname> <given-names>GF</given-names></name> <name><surname>Pascanu</surname> <given-names>R</given-names></name> <name><surname>Cho</surname> <given-names>K</given-names></name> <name><surname>Bengio</surname> <given-names>Y</given-names></name></person-group>. <article-title>On the number of linear regions of deep neural networks</article-title>. In: <source>Advances in Neural Information Processing Systems</source>. <publisher-loc>Red Hook, NY</publisher-loc>: <publisher-name>Curran Associates, Inc</publisher-name>. (<year>2014</year>) p. <fpage>2924</fpage>&#x02013;<lpage>32</lpage>.</citation></ref>
<ref id="B2">
<label>2</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Telgarsky</surname> <given-names>M</given-names></name></person-group>. <article-title>Representation benefits of deep feedforward networks</article-title>. <source>arXiv preprint</source> (<year>2015</year>) arXiv:150908101</citation></ref>
<ref id="B3">
<label>3</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Safran</surname> <given-names>I</given-names></name> <name><surname>Shamir</surname> <given-names>O</given-names></name></person-group>. <article-title>Depth separation in relu networks for approximating smooth non-linear functions</article-title>. <source>arXiv preprint</source> (<year>2016</year>) arXiv:161009887</citation></ref>
<ref id="B4">
<label>4</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mhaskar</surname> <given-names>HN</given-names></name> <name><surname>Poggio</surname> <given-names>T</given-names></name></person-group>. <article-title>Deep vs. shallow networks: an approximation theory perspective</article-title>. <source>Anal Appl.</source> (<year>2016</year>) <volume>14</volume>:<fpage>829</fpage>&#x02013;<lpage>48</lpage>. <pub-id pub-id-type="doi">10.1142/S0219530516400042</pub-id></citation></ref>
<ref id="B5">
<label>5</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Naumova</surname> <given-names>V</given-names></name> <name><surname>Pereverzyev</surname> <given-names>SV</given-names></name> <name><surname>Sivananthan</surname> <given-names>S</given-names></name></person-group>. <article-title>A meta-learning approach to the regularized learning - Case study: blood glucose prediction</article-title>. <source>Neural Netw.</source> (<year>2012</year>) <volume>33</volume>:<fpage>181</fpage>&#x02013;<lpage>93</lpage>. <pub-id pub-id-type="doi">10.1016/j.neunet.2012.05.004</pub-id><pub-id pub-id-type="pmid">22706092</pub-id></citation></ref>
<ref id="B6">
<label>6</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Snetselaar</surname> <given-names>L</given-names></name></person-group>. <source>Nutrition Counseling Skills for the Nutrition Care Process</source>. <publisher-loc>Sudbury, MA</publisher-loc>: <publisher-name>Jones &#x00026; Bartlett Learning</publisher-name> (<year>2009</year>).</citation></ref>
<ref id="B7">
<label>7</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mhaskar</surname> <given-names>HN</given-names></name> <name><surname>Naumova</surname> <given-names>V</given-names></name> <name><surname>Pereverzyev</surname> <given-names>SV</given-names></name></person-group>. <article-title>Filtered Legendre expansion method for numerical differentiation at the boundary point with application to blood glucose predictions</article-title>. <source>Appl Math Comput.</source> (<year>2013</year>) <volume>224</volume>:<fpage>835</fpage>&#x02013;<lpage>47</lpage>. <pub-id pub-id-type="doi">10.1016/j.amc.2013.09.015</pub-id></citation></ref>
<ref id="B8">
<label>8</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ehler</surname> <given-names>M</given-names></name> <name><surname>Filbir</surname> <given-names>F</given-names></name> <name><surname>Mhaskar</surname> <given-names>HN</given-names></name></person-group>. <article-title>Locally learning biomedical data using diffusion frames</article-title>. <source>J Comput Biol</source>. (<year>2012</year>) <volume>19</volume>:<fpage>1251</fpage>&#x02013;<lpage>64</lpage>. <pub-id pub-id-type="doi">10.1089/cmb.2012.0187</pub-id><pub-id pub-id-type="pmid">23101786</pub-id></citation></ref>
<ref id="B9">
<label>9</label>
<citation citation-type="web"><person-group person-group-type="author"><collab>DirecNet Central Laboratory</collab></person-group>. (<year>2005</year>). Available online at: <ext-link ext-link-type="uri" xlink:href="http://direcnet.jaeb.org/Studies.aspx">http://direcnet.jaeb.org/Studies.aspx</ext-link></citation></ref>
<ref id="B10">
<label>10</label>
<citation citation-type="patent"><person-group person-group-type="author"><name><surname>Hayes</surname> <given-names>AC</given-names></name> <name><surname>Mastrototaro</surname> <given-names>JJ</given-names></name> <name><surname>Moberg</surname> <given-names>SB</given-names></name> <name><surname>Mueller</surname> <given-names>JC</given-names> <suffix>Jr</suffix></name> <name><surname>Clark</surname> <given-names>HB</given-names></name> <name><surname>Tolle</surname> <given-names>MCV</given-names></name> <etal/></person-group>. <article-title>Algorithm sensor augmented bolus estimator for semi-closed loop infusion system</article-title>. Google Patents (<year>2009</year>). <patent>US Patent 7,547,281</patent>.</citation></ref>
<ref id="B11">
<label>11</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sivananthan</surname> <given-names>S</given-names></name> <name><surname>Naumova</surname> <given-names>V</given-names></name> <name><surname>Man</surname> <given-names>CD</given-names></name> <name><surname>Facchinetti</surname> <given-names>A</given-names></name> <name><surname>Renard</surname> <given-names>E</given-names></name> <name><surname>Cobelli</surname> <given-names>C</given-names></name> <etal/></person-group>. <article-title>Assessment of blood glucose predictors: the prediction-error grid analysis</article-title>. <source>Diabet Technol Ther.</source> (<year>2011</year>) <volume>13</volume>:<fpage>787</fpage>&#x02013;<lpage>96</lpage>. <pub-id pub-id-type="doi">10.1089/dia.2011.0033</pub-id><pub-id pub-id-type="pmid">21612393</pub-id></citation></ref>
<ref id="B12">
<label>12</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Naumova</surname> <given-names>V</given-names></name> <name><surname>Pereverzyev</surname> <given-names>SV</given-names></name> <name><surname>Sivananthan</surname> <given-names>S</given-names></name></person-group>. <article-title>Adaptive parameter choice for one-sided finite difference schemes and its application in diabetes technology</article-title>. <source>J Complex</source>. (<year>2012</year>) <volume>28</volume>:<fpage>524</fpage>&#x02013;<lpage>38</lpage>. <pub-id pub-id-type="doi">10.1016/j.jco.2012.06.001</pub-id></citation></ref>
<ref id="B13">
<label>13</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pappada</surname> <given-names>SM</given-names></name> <name><surname>Cameron</surname> <given-names>BD</given-names></name> <name><surname>Rosman</surname> <given-names>PM</given-names></name> <name><surname>Bourey</surname> <given-names>RE</given-names></name> <name><surname>Papadimos</surname> <given-names>TJ</given-names></name> <name><surname>Olorunto</surname> <given-names>W</given-names></name> <etal/></person-group>. <article-title>Neural network-based real-time prediction of glucose in patients with insulin-dependent diabetes</article-title>. <source>Diabet Technol Ther.</source> (<year>2011</year>) <volume>13</volume>:<fpage>135</fpage>&#x02013;<lpage>41</lpage>. <pub-id pub-id-type="doi">10.1089/dia.2010.0104</pub-id><pub-id pub-id-type="pmid">21284480</pub-id></citation></ref>
<ref id="B14">
<label>14</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Reifman</surname> <given-names>J</given-names></name> <name><surname>Rajaraman</surname> <given-names>S</given-names></name> <name><surname>Gribok</surname> <given-names>A</given-names></name> <name><surname>Ward</surname> <given-names>WK</given-names></name></person-group>. <article-title>Predictive monitoring for improved management of glucose levels</article-title>. <source>J Diabet Sci Technol</source>. (<year>2007</year>) <volume>1</volume>:<fpage>478</fpage>&#x02013;<lpage>86</lpage>. <pub-id pub-id-type="doi">10.1177/193229680700100405</pub-id><pub-id pub-id-type="pmid">19885110</pub-id></citation></ref>
<ref id="B15">
<label>15</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Savitzky</surname> <given-names>A</given-names></name> <name><surname>Golay</surname> <given-names>MJE</given-names></name></person-group>. <article-title>Smoothing and differentiation of data by simplified least squares procedures</article-title>. <source>Anal Chem</source>. (<year>1964</year>) <volume>36</volume>:<fpage>1627</fpage>&#x02013;<lpage>39</lpage>. <pub-id pub-id-type="doi">10.1021/ac60214a047</pub-id></citation></ref>
<ref id="B16">
<label>16</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lu</surname> <given-names>S</given-names></name> <name><surname>Naumova</surname> <given-names>V</given-names></name> <name><surname>Pereverzev</surname> <given-names>SV</given-names></name></person-group>. <article-title>Legendre polynomials as a recommended basis for numerical differentiation in the presence of stochastic white noise</article-title>. <source>J Inverse ILL Posed Prob</source>. (<year>2013</year>) <volume>21</volume>:<fpage>193</fpage>&#x02013;<lpage>216</lpage>. <pub-id pub-id-type="doi">10.1515/jip-2012-0050</pub-id></citation></ref>
<ref id="B17">
<label>17</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Naumova</surname> <given-names>V</given-names></name> <name><surname>Pereverzyev</surname> <given-names>SV</given-names></name> <name><surname>Sivananthan</surname> <given-names>S</given-names></name></person-group>. <article-title>Extrapolation in variable RKHSs with application to the blood glucose reading</article-title>. <source>Inverse Prob</source>. (<year>2011</year>) <volume>27</volume>:075010. <pub-id pub-id-type="doi">10.1088/0266-5611/27/7/075010</pub-id></citation></ref>
<ref id="B18">
<label>18</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pappada</surname> <given-names>SM</given-names></name> <name><surname>Cameron</surname> <given-names>BD</given-names></name> <name><surname>Rosman</surname> <given-names>PM</given-names></name></person-group>. <article-title>Development of a neural network for prediction of glucose concentration in type 1 diabetes patients</article-title>. <source>J Diabet Sci Technol</source>. (<year>2008</year>) <volume>2</volume>:<fpage>792</fpage>&#x02013;<lpage>801</lpage>. <pub-id pub-id-type="doi">10.1177/193229680800200507</pub-id><pub-id pub-id-type="pmid">19885262</pub-id></citation></ref>
<ref id="B19">
<label>19</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Clarke</surname> <given-names>WL</given-names></name> <name><surname>Cox</surname> <given-names>D</given-names></name> <name><surname>Gonder-Frederick</surname> <given-names>LA</given-names></name> <name><surname>Carter</surname> <given-names>W</given-names></name> <name><surname>Pohl</surname> <given-names>SL</given-names></name></person-group>. <article-title>Evaluating clinical accuracy of systems for self-monitoring of blood glucose</article-title>. <source>Diabetes Care</source> (<year>1987</year>) <volume>10</volume>:<fpage>622</fpage>&#x02013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.2337/diacare.10.5.622</pub-id><pub-id pub-id-type="pmid">3677983</pub-id></citation></ref>
<ref id="B20">
<label>20</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sparacino</surname> <given-names>G</given-names></name> <name><surname>Zanderigo</surname> <given-names>F</given-names></name> <name><surname>Corazza</surname> <given-names>S</given-names></name> <name><surname>Maran</surname> <given-names>A</given-names></name> <name><surname>Facchinetti</surname> <given-names>A</given-names></name> <name><surname>Cobelli</surname> <given-names>C</given-names></name></person-group>. <article-title>Glucose concentration can be predicted ahead in time from continuous glucose monitoring sensor time-series</article-title>. <source>IEEE Trans Biomed Eng</source>. (<year>2007</year>) <volume>54</volume>:<fpage>931</fpage>&#x02013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1109/TBME.2006.889774</pub-id><pub-id pub-id-type="pmid">17518291</pub-id></citation></ref>
<ref id="B21">
<label>21</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Lafon</surname> <given-names>S</given-names></name></person-group>. <source>Diffusion Maps and Geometric Harmonics</source>. <publisher-loc>New Haven, CT</publisher-loc>: <publisher-name>Yale University</publisher-name> (<year>2004</year>).</citation></ref>
<ref id="B22">
<label>22</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mhaskar</surname> <given-names>HN</given-names></name></person-group>. <article-title>Eignets for function approximation on manifolds</article-title>. <source>Appl Comput Harmon Anal</source>. (<year>2010</year>) <volume>29</volume>:<fpage>63</fpage>&#x02013;<lpage>87</lpage>. <pub-id pub-id-type="doi">10.1016/j.acha.2009.08.006</pub-id></citation></ref>
<ref id="B23">
<label>23</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Eren-Oruklu</surname> <given-names>M</given-names></name> <name><surname>Cinar</surname> <given-names>A</given-names></name> <name><surname>Quinn</surname> <given-names>L</given-names></name> <name><surname>Smith</surname> <given-names>D</given-names></name></person-group>. <article-title>Estimation of future glucose concentrations with subject-specific recursive linear models</article-title>. <source>Diabet Technol Ther</source>. (<year>2009</year>) <volume>11</volume>:<fpage>243</fpage>&#x02013;<lpage>53</lpage>. <pub-id pub-id-type="doi">10.1089/dia.2008.0065</pub-id><pub-id pub-id-type="pmid">19344199</pub-id></citation></ref>
<ref id="B24">
<label>24</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Poggio</surname> <given-names>T</given-names></name> <name><surname>Smale</surname> <given-names>S</given-names></name></person-group>. <article-title>The mathematics of learning: dealing with data</article-title>. <source>Notices AMS</source> (<year>2003</year>) <volume>50</volume>:<fpage>537</fpage>&#x02013;<lpage>44</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.ams.org/notices/200305/fea-smale.pdf">http://www.ams.org/notices/200305/fea-smale.pdf</ext-link></citation></ref>
<ref id="B25">
<label>25</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Boland</surname> <given-names>E</given-names></name> <name><surname>Monsod</surname> <given-names>T</given-names></name> <name><surname>Delucia</surname> <given-names>M</given-names></name> <name><surname>Brandt</surname> <given-names>CA</given-names></name> <name><surname>Fernando</surname> <given-names>S</given-names></name> <name><surname>Tamborlane</surname> <given-names>WV</given-names></name></person-group>. <article-title>Limitations of conventional methods of self-monitoring of blood glucose lessons learned from 3 days of continuous glucose sensing in pediatric patients with type 1 diabetes</article-title>. <source>Diabetes Care</source> (<year>2001</year>) <volume>24</volume>:<fpage>1858</fpage>&#x02013;<lpage>62</lpage>. <pub-id pub-id-type="doi">10.2337/diacare.24.11.1858</pub-id><pub-id pub-id-type="pmid">11679447</pub-id></citation></ref>
<ref id="B26">
<label>26</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Maggioni</surname> <given-names>M</given-names></name> <name><surname>Mhaskar</surname> <given-names>HN</given-names></name></person-group>. <article-title>Diffusion polynomial frames on metric measure spaces</article-title>. <source>Appl Comput Harmon Anal</source>. (<year>2008</year>) <volume>24</volume>:<fpage>329</fpage>&#x02013;<lpage>53</lpage>. <pub-id pub-id-type="doi">10.1016/j.acha.2007.07.001</pub-id></citation></ref>
<ref id="B27">
<label>27</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Belkin</surname> <given-names>M</given-names></name> <name><surname>Niyogi</surname> <given-names>P</given-names></name></person-group>. <article-title>Towards a theoretical foundation for Laplacian-based manifold methods</article-title>. In: <source>International Conference on Computational Learning Theory</source>. <publisher-loc>Berlin</publisher-loc>: <publisher-name>Springer</publisher-name> (<year>2005</year>). p. <fpage>486</fpage>&#x02013;<lpage>500</lpage>. <pub-id pub-id-type="doi">10.1007/11503415_33</pub-id></citation></ref>
<ref id="B28">
<label>28</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Singer</surname> <given-names>A</given-names></name></person-group>. <article-title>From graph to manifold Laplacian: the convergence rate</article-title>. <source>Appl Comput Harmon Anal.</source> (<year>2006</year>) <volume>21</volume>:<fpage>128</fpage>&#x02013;<lpage>34</lpage>. <pub-id pub-id-type="doi">10.1016/j.acha.2006.03.004</pub-id></citation></ref>
</ref-list>
<app-group>
<app id="A1">
<title>A. Appendix</title>
<sec>
<title>A.1. Filtered legendre expansion method</title>
<p>In this appendix, we review the mathematical background for the method developed in Mhaskar et al. [<xref ref-type="bibr" rid="B7">7</xref>] for short term blood glucose prediction. As explained in the text, the main mathematical problem can be summarized as that of estimating the derivative of a function at the end point of an interval, based on measurements of the function in the past. Mathematically, if <italic>f</italic> : [&#x02212;1, 1] &#x02192; &#x0211D; is continuously differentiable, we wish to estimate <italic>f</italic>&#x02032;(1) given the noisy values {<italic>y</italic><sub><italic>j</italic></sub> &#x0003D; <italic>f</italic>(<italic>t</italic><sub><italic>j</italic></sub>) &#x0002B; &#x003F5;<sub><italic>j</italic></sub>} at points <inline-formula><mml:math id="M34"><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x02282;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>. We summarize only the method here, and refer the reader to Mhaskar et al. [<xref ref-type="bibr" rid="B7">7</xref>] for the detailed proof of the mathematical facts.</p>
<p>For this purpose, we define the Legendre polynomials recursively by</p>
<disp-formula id="E13"><mml:math id="M35"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:msub><mml:mi>P</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>2</mml:mn><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mi>k</mml:mi></mml:mrow></mml:mfrac><mml:mi>x</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mfrac><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>k</mml:mi></mml:mfrac><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mtext>&#x0200A;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:mtext>&#x0200A;</mml:mtext><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:mtext>&#x02009;</mml:mtext><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>3</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>;</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;&#x02009;</mml:mtext><mml:msub><mml:mi>P</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x02009;</mml:mtext><mml:msub><mml:mi>P</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mi>x</mml:mi><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Let <italic>h</italic>:&#x0211D; &#x02192; &#x0211D; be an even, infinitely differentiable function, such that <italic>h</italic>(<italic>t</italic>) &#x0003D; 1 if <italic>t</italic> &#x02208; [0, 1/2] and <italic>h</italic>(<italic>t</italic>) &#x0003D; 0 if <italic>t</italic> &#x02265; 1. We define</p>
<disp-formula id="E14"><mml:math id="M36"><mml:mrow><mml:msub><mml:mi>K</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>h</mml:mi><mml:mo>;</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mtext>&#x0200A;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x0200A;</mml:mtext><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mtext>&#x0200A;</mml:mtext><mml:mo>&#x02212;</mml:mo><mml:mtext>&#x0200A;</mml:mtext><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:mi>h</mml:mi></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mi>k</mml:mi><mml:mi>n</mml:mi></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>k</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>Next, given the points <inline-formula><mml:math id="M37"><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x02282;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>, we use least squares to find <italic>n</italic> &#x0003D; <italic>n</italic><sub><italic>d</italic></sub> such that</p>
<disp-formula id="E15"><label>(A1)</label><mml:math id="M38"><mml:mrow><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mtext>&#x0200A;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x0200A;</mml:mtext><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:mstyle><mml:msub><mml:mi>P</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:mrow><mml:msubsup><mml:mo>&#x0222B;</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mn>1</mml:mn></mml:msubsup><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mstyle><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mi>d</mml:mi><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>if&#x02009;</mml:mtext><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>if&#x02009;</mml:mtext><mml:mn>1</mml:mn><mml:mo>&#x02264;</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x0003C;</mml:mo><mml:mn>2</mml:mn><mml:mi>n</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>and the following estimates are valid for all polynomials of degree &#x0003C; 2<italic>n</italic>:</p>
<disp-formula id="E16"><mml:math id="M39"><mml:mrow><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mo stretchy='false'>&#x0007C;</mml:mo></mml:mstyle><mml:msub><mml:mi>w</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>&#x0007C;</mml:mo><mml:mo>&#x02264;</mml:mo><mml:mi>A</mml:mi><mml:mstyle displaystyle='true'><mml:mrow><mml:msubsup><mml:mo>&#x0222B;</mml:mo><mml:mrow><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mn>1</mml:mn></mml:msubsup><mml:mo stretchy='false'>&#x0007C;</mml:mo></mml:mrow></mml:mstyle><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>&#x0007C;</mml:mo><mml:mi>d</mml:mi><mml:mi>t</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>for some positive constant <italic>A</italic>. We do not need to determine <italic>A</italic>, but it is guaranteed to be proportional to the condition number of the system in Equation (A1).</p>
<p>Finally, our estimate of <italic>f</italic>&#x02032;(1) based on the values <italic>y</italic><sub><italic>j</italic></sub> &#x0003D; <italic>f</italic>(<italic>t</italic><sub><italic>j</italic></sub>)&#x0002B;&#x003F5;<sub><italic>j</italic></sub>, <italic>j</italic> &#x0003D; 1, &#x022EF;&#x02009;, <italic>d</italic>, is given by</p>
<disp-formula id="E17"><mml:math id="M40"><mml:mrow><mml:msub><mml:mi mathvariant="-tex-caligraphic">S</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>h</mml:mi><mml:mo>;</mml:mo><mml:mo stretchy='false'>&#x0007B;</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>&#x0007D;</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mtext>&#x0200A;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x0200A;</mml:mtext><mml:mn>1</mml:mn></mml:mrow><mml:mi>d</mml:mi></mml:munderover><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:mstyle><mml:msub><mml:mi>y</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:msub><mml:mi>K</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>h</mml:mi><mml:mo>;</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>It is proved in Mhaskar et al. [<xref ref-type="bibr" rid="B7">7</xref>, Theorem 3.2] that if <italic>f</italic> is twice continuously differentiable,</p>
<disp-formula id="E18"><mml:math id="M41"><mml:mrow><mml:msub><mml:mi>&#x00394;</mml:mi><mml:mrow><mml:mo stretchy='false'>[</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>]</mml:mo></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>f</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mn>2</mml:mn><mml:mi>x</mml:mi><mml:msup><mml:mi>f</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo stretchy='false'>)</mml:mo><mml:msup><mml:mi>f</mml:mi><mml:mo>&#x02033;</mml:mo></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>and |&#x003F5;<sub><italic>j</italic></sub>| &#x02264; &#x003B4; then</p>
<disp-formula id="E19"><mml:math id="M42"><mml:mrow><mml:mo>&#x0007C;</mml:mo><mml:msub><mml:mi mathvariant="-tex-caligraphic">S</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>h</mml:mi><mml:mo>;</mml:mo><mml:mo>&#x0007B;</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>&#x0007D;</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msup><mml:mi>f</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0007C;</mml:mo><mml:mo>&#x02264;</mml:mo><mml:mi>c</mml:mi><mml:mi>A</mml:mi><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mi>E</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>/</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mo stretchy='false'>[</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>]</mml:mo></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>&#x00394;</mml:mi><mml:mrow><mml:mo stretchy='false'>[</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>]</mml:mo></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>f</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:msup><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mi>&#x003B4;</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where</p>
<disp-formula id="E20"><label>(A2)</label><mml:math id="M43"><mml:mrow><mml:msub><mml:mi>E</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>/</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mo stretchy='false'>[</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>]</mml:mo></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>&#x00394;</mml:mi><mml:mrow><mml:mo stretchy='false'>[</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x02009;</mml:mtext><mml:mn>1</mml:mn><mml:mo stretchy='false'>]</mml:mo></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>f</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mi>min</mml:mi><mml:munder><mml:mrow><mml:mi>max</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mo stretchy='false'>[</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy='false'>]</mml:mo></mml:mrow></mml:munder><mml:mo>&#x0007C;</mml:mo><mml:msub><mml:mi>&#x00394;</mml:mi><mml:mrow><mml:mo stretchy='false'>[</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x02009;</mml:mtext><mml:mn>1</mml:mn><mml:mo stretchy='false'>]</mml:mo></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>f</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0007C;</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>the minimum being over all polynomials <italic>P</italic> of degree &#x0003C; <italic>n</italic>/2.</p>
</sec>
<sec>
<title>A.2. Function approximation on data defined spaces</title>
<p>While classical approximation theory literature deals with function approximation based on data that is dense on a known domain, such as a cube, torus, sphere, etc., this condition is generally not satisfied in the context of machine learning. For example, the set of vectors (<italic>s</italic><sub><italic>p</italic></sub>(<italic>t</italic><sub><italic>j</italic></sub>), &#x02026;, <italic>s</italic><sub><italic>p</italic></sub>(<italic>t</italic><sub><italic>j</italic></sub> &#x02212; <italic>d</italic> &#x0002B; 1)) is unlikely to be dense in a cube in &#x0211D;<sup><italic>d</italic></sup>. A relatively recent idea is to think of the data as being sampled from a distribution on an unknown manifold, or more generally, a locally compact, quasi-metric measure space. In this appendix, we review the theoretical background that underlies our experiments reported in this paper. The discussion here is based mainly on Maggioni and Mhaskar [<xref ref-type="bibr" rid="B26">26</xref>], Mhaskar [<xref ref-type="bibr" rid="B22">22</xref>], and Ehler et al. [<xref ref-type="bibr" rid="B8">8</xref>].</p>
<p>Let &#x1D54F; be a locally compact quasi-metric measure space, with &#x003C1; being the quasi-metric and &#x003BC;<sup>&#x0002A;</sup> being the measure. In the context of machine learning, &#x003BC;<sup>&#x0002A;</sup> is a probability measure and &#x1D54F; is its support. The starting point of this theory is a non-decreasing sequence of numbers <inline-formula><mml:math id="M44"><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x0221E;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>, such that &#x003BB;<sub>0</sub> &#x0003D; 0 and &#x003BB;<sub><italic>k</italic></sub> &#x02192; &#x0221E; as <italic>k</italic> &#x02192; &#x0221E;, and a corresponding sequence of bounded functions <inline-formula><mml:math id="M45"><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003D5;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x0221E;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> that forms an orthonormal sequence in <italic>L</italic><sup>2</sup>(&#x003BC;<sup>&#x0002A;</sup>). Let &#x003A0;<sub>&#x003BB;</sub> &#x0003D; <italic>span</italic>{&#x003D5;<sub><italic>k</italic></sub> : &#x003BB;<sub><italic>k</italic></sub> &#x0003C; &#x003BB;}, and analogously to Equation (A2), we define for a uniformly continuous, bounded function <italic>f</italic> : &#x1D54F; &#x02192; &#x0211D;,</p>
<disp-formula id="E21"><mml:math id="M46"><mml:mrow><mml:msub><mml:mi>E</mml:mi><mml:mi>&#x003BB;</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>f</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:munder><mml:mrow><mml:mi>min</mml:mi></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mi>&#x003A0;</mml:mi><mml:mi>&#x003BB;</mml:mi></mml:msub></mml:mrow></mml:munder><mml:munder><mml:mrow><mml:mi>sup</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>&#x1D54F;</mml:mi></mml:mrow></mml:munder><mml:mo>&#x0007C;</mml:mo><mml:mi>f</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mi>P</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0007C;</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>With the function <italic>h</italic> as defined in Appendix A.1, we define</p>
<disp-formula id="E22"><label>(A3)</label><mml:math id="M47"><mml:mrow><mml:msub><mml:mi>&#x003A6;</mml:mi><mml:mi>&#x003BB;</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>h</mml:mi><mml:mo>;</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>:</mml:mo><mml:msub><mml:mi>&#x003BB;</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&#x0003C;</mml:mo><mml:mi>&#x003BB;</mml:mi></mml:mrow></mml:munder><mml:mi>h</mml:mi></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mi>&#x003BB;</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mi>&#x003BB;</mml:mi></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mi>&#x003D5;</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:msub><mml:mi>&#x003D5;</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:mtext>&#x02003;</mml:mtext><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>&#x1D54F;</mml:mi><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>Next, let <inline-formula><mml:math id="M48"><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x02282;</mml:mo><mml:mi>&#x1D54F;</mml:mi></mml:math></inline-formula> be the &#x0201C;training data&#x0201D;. Our goal is to learn a function <italic>f</italic> : &#x1D54F; &#x02192; &#x0211D; based on the values <inline-formula><mml:math id="M49"><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>. Toward this goal, we solve an under-determined system of equations</p>
<disp-formula id="E23"><label>(A4)</label><mml:math id="M50"><mml:mrow><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mtext>&#x0200A;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x0200A;</mml:mtext><mml:mn>1</mml:mn></mml:mrow><mml:mi>M</mml:mi></mml:munderover><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:mstyle><mml:msub><mml:mi>&#x003D5;</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>if&#x02009;</mml:mtext><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>if&#x02009;</mml:mtext><mml:mi>k</mml:mi><mml:mo>&#x0003E;</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow><mml:mtext>&#x02003;&#x02003;</mml:mtext><mml:mi>k</mml:mi><mml:mo>:</mml:mo><mml:msub><mml:mi>&#x003BB;</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&#x0003C;</mml:mo><mml:mi>&#x003BB;</mml:mi></mml:mrow></mml:math></disp-formula>
<p>for the largest possible &#x003BB; for which this system is still well conditioned. We then define an approximation, analogous to the classical radial basis function networks, by</p>
<disp-formula id="E24"><label>(A5)</label><mml:math id="M51"><mml:mrow><mml:msub><mml:mi>&#x003C3;</mml:mi><mml:mi>&#x003BB;</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>h</mml:mi><mml:mo>;</mml:mo><mml:mi>f</mml:mi><mml:mo>,</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mtext>&#x0200A;</mml:mtext><mml:mo>=</mml:mo><mml:mtext>&#x0200A;</mml:mtext><mml:mn>1</mml:mn></mml:mrow><mml:mi>M</mml:mi></mml:munderover><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:mstyle><mml:mi>f</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:msub><mml:mi>&#x003A6;</mml:mi><mml:mi>&#x003BB;</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:mtext>&#x02003;</mml:mtext><mml:mi>x</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>&#x1D54F;</mml:mi><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>It is proved in Maggioni and Mhaskar [<xref ref-type="bibr" rid="B26">26</xref>], Mhaskar [<xref ref-type="bibr" rid="B22">22</xref>], and Ehler et al. [<xref ref-type="bibr" rid="B8">8</xref>] that under certain technical conditions, for a uniformly continuous, bounded function <italic>f</italic> : &#x1D54F; &#x02192; &#x0211D;, we have</p>
<disp-formula id="E25"><mml:math id="M52"><mml:mrow><mml:msub><mml:mi>E</mml:mi><mml:mi>&#x003BB;</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>f</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02264;</mml:mo><mml:munder><mml:mrow><mml:mi>sup</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>&#x1D54F;</mml:mi></mml:mrow></mml:munder><mml:mo>&#x0007C;</mml:mo><mml:mi>f</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>&#x003C3;</mml:mi><mml:mi>&#x003BB;</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>h</mml:mi><mml:mo>;</mml:mo><mml:mi>f</mml:mi><mml:mo>,</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x0007C;</mml:mo><mml:mo>&#x02264;</mml:mo><mml:mi>c</mml:mi><mml:msub><mml:mi>E</mml:mi><mml:mrow><mml:mi>&#x003BB;</mml:mi><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>f</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where <italic>c</italic> &#x0003E; 0 is a generic positive constant. Thus, the approximation &#x003C3;<sub>&#x003BB;</sub>(<italic>f</italic>) is guaranteed to yield a &#x0201C;good approximation&#x0201D; that is asymptotically best possible given the training data.</p>
<p>In practice, one has a point cloud <inline-formula><mml:math id="M53"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">P</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x02283;</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> rather than the space &#x1D54F;. The set <inline-formula><mml:math id="M54"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">P</mml:mi></mml:mrow></mml:math></inline-formula> is a subset of some Euclidean space &#x0211D;<sup><italic>D</italic></sup> for a possibly high value of <italic>D</italic>, but is assumed to lie on a compact sub-manifold &#x1D54F; of &#x0211D;<sup><italic>D</italic></sup>, with this manifold having a low dimension. We build the graph Laplacian from <inline-formula><mml:math id="M55"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">P</mml:mi></mml:mrow></mml:math></inline-formula> as follows: With a parameter &#x003B5; &#x0003E; 0, induces the weight matrix <inline-formula><mml:math id="M56"><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">W</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B5;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> defined by</p>
<disp-formula id="E26"><label>(A6)</label><mml:math id="M57"><mml:mrow><mml:msubsup><mml:mi mathvariant="-tex-caligraphic">W</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mo>;</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>&#x003B5;</mml:mi></mml:msubsup><mml:mo>=</mml:mo><mml:mi>exp</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mo>&#x02212;</mml:mo><mml:mo>&#x0007C;</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:msup><mml:mo>&#x0007C;</mml:mo><mml:mn>2</mml:mn></mml:msup><mml:mo>/</mml:mo><mml:mi>&#x003B5;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>We build the diagonal matrix <inline-formula><mml:math id="M58"><mml:msubsup><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:mo>;</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B5;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover><mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">W</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:mo>;</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B5;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and define the (unnormalized) graph Laplacian as</p>
<disp-formula id="E27"><mml:math id="M59"><mml:mrow><mml:msubsup><mml:mi>L</mml:mi><mml:mi>N</mml:mi><mml:mi>&#x003B5;</mml:mi></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mi mathvariant="-tex-caligraphic">W</mml:mi><mml:mi>N</mml:mi><mml:mi>&#x003B5;</mml:mi></mml:msubsup><mml:mo>&#x02212;</mml:mo><mml:msubsup><mml:mi>D</mml:mi><mml:mi>N</mml:mi><mml:mi>&#x003B5;</mml:mi></mml:msubsup><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>Various other versions of this Laplacian are also used in practice. For <italic>M</italic> &#x02192; &#x0221E; and &#x003B5; &#x02192; 0, the eigenvalues and interpolations of the eigenvectors of <inline-formula><mml:math id="M60"><mml:msubsup><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B5;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> converge toward the &#x0201C;interesting&#x0201D; eigenvalues &#x003BB;<sub><italic>k</italic></sub> and eigenfunctions &#x003D5;<sub><italic>k</italic></sub> of a differential operator &#x00394;<sub>&#x1D54F;</sub>. This behavior has been studied in detail by many authors, e.g., Belkin and Niyogi [<xref ref-type="bibr" rid="B27">27</xref>], Lafon [<xref ref-type="bibr" rid="B21">21</xref>], and Singer [<xref ref-type="bibr" rid="B28">28</xref>]. When <inline-formula><mml:math id="M61"><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> are uniformly distributed on &#x1D54F;, this operator is the Laplace-Beltrami operator on &#x1D54F;. If <inline-formula><mml:math id="M62"><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> are distributed according to the density <italic>p</italic>, then the graph Laplacian approximates the elliptic Schr&#x000F6;dinger type operator <inline-formula><mml:math id="M63"><mml:mi>&#x00394;</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mfrac><mml:mrow><mml:mi>&#x00394;</mml:mi><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:mfrac></mml:math></inline-formula>, whose eigenfunctions &#x003D5;<sub><italic>k</italic></sub> also form an orthonormal basis for <italic>L</italic><sub>2</sub>(&#x1D54F;, &#x003BC;).</p>
</sec>
</app>
</app-group>
</back>
</article>