<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Appl. Math. Stat.</journal-id>
<journal-title>Frontiers in Applied Mathematics and Statistics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Appl. Math. Stat.</abbrev-journal-title>
<issn pub-type="epub">2297-4687</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fams.2021.784855</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Applied Mathematics and Statistics</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>An Empirical Study of Graph-Based Approaches for Semi-supervised Time Series Classification</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>B&#x000FC;nger</surname> <given-names>Dominik</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/631776/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Gondos</surname> <given-names>Miriam</given-names></name>
</contrib>
<contrib contrib-type="author">
<name><surname>Peroche</surname> <given-names>Lucile</given-names></name>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Stoll</surname> <given-names>Martin</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/611597/overview"/>
</contrib>
</contrib-group>
<aff><institution>Department of Mathematics, Chair of Scientific Computing, TU Chemnitz</institution>, <addr-line>Chemnitz</addr-line>, <country>Germany</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Stefan Kunis, Osnabr&#x000FC;ck University, Germany</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Tarek Emmrich, Osnabr&#x000FC;ck University, Germany; Alex Jung, Aalto University, Finland</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Martin Stoll <email>martin.stoll&#x00040;mathematik.tu-chemnitz.de</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Mathematics of Computation and Data Science, a section of the journal Frontiers in Applied Mathematics and Statistics</p></fn></author-notes>
<pub-date pub-type="epub">
<day>20</day>
<month>01</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>7</volume>
<elocation-id>784855</elocation-id>
<history>
<date date-type="received">
<day>28</day>
<month>09</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>28</day>
<month>12</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2022 B&#x000FC;nger, Gondos, Peroche and Stoll.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>B&#x000FC;nger, Gondos, Peroche and Stoll</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license></permissions>
<abstract>
<p>Time series data play an important role in many applications and their analysis reveals crucial information for understanding the underlying processes. Among the many time series learning tasks of great importance, we here focus on semi-supervised learning based on a graph representation of the data. Two main aspects are studied in this paper. Namely, suitable distance measures to evaluate the similarities between different time series, and the choice of learning method to make predictions based on a given number of pre-labeled data points. However, the relationship between the two aspects has never been studied systematically in the context of graph-based learning. We describe four different distance measures, including (Soft) DTW and MPDist, a distance measure based on the Matrix Profile, as well as four successful semi-supervised learning methods, including the recently introduced graph Allen&#x02013;Cahn method and Graph Convolutional Neural Network method. We provide results for the novel combination of these distance measures with both the Allen-Cahn method and the GCN algorithm for binary semi-supervised learning tasks for various time-series data sets. In our findings we compare the chosen graph-based methods using all distance measures and observe that the results vary strongly with respect to the accuracy. We then observe that no clear best combination to employ in all cases is found. Our study provides a reproducible framework for future work in the direction of semi-supervised learning for time series with a focus on graph representations.</p></abstract>
<kwd-group>
<kwd>semi-supervised learning</kwd>
<kwd>time series</kwd>
<kwd>graph Laplacian</kwd>
<kwd>Allen-Cahn equation</kwd>
<kwd>graph convolutional networks</kwd>
</kwd-group>
<contract-sponsor id="cn001">Technische Universit&#x000E4;t Chemnitz<named-content content-type="fundref-id">10.13039/100009117</named-content></contract-sponsor>
<counts>
<fig-count count="12"/>
<table-count count="4"/>
<equation-count count="32"/>
<ref-count count="64"/>
<page-count count="19"/>
<word-count count="8057"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Many processes for which data are collected are time-dependent and as a result the study of time series data is a subject of great importance [<xref ref-type="bibr" rid="B1">1</xref>&#x02013;<xref ref-type="bibr" rid="B3">3</xref>]. The case of time series is interesting for tasks such as anomaly detection [<xref ref-type="bibr" rid="B4">4</xref>], motif computation [<xref ref-type="bibr" rid="B5">5</xref>] or time series forecasting [<xref ref-type="bibr" rid="B6">6</xref>]. We refer to [<xref ref-type="bibr" rid="B7">7</xref>&#x02013;<xref ref-type="bibr" rid="B10">10</xref>] for more general introductions.</p>
<p>We here focus on the task of classification of time series [<xref ref-type="bibr" rid="B11">11</xref>&#x02013;<xref ref-type="bibr" rid="B16">16</xref>] in the context of semi-supervised learning [<xref ref-type="bibr" rid="B17">17</xref>, <xref ref-type="bibr" rid="B18">18</xref>] where we want to label all data points<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref> based on the fact that only a small portion of the data is already pre-labeled.</p>
<p>An example is given in <xref ref-type="fig" rid="F1">Figure 1</xref> where we see some time series reflecting ECG (electrocardiogram) data and the classification into normal heartbeats on the one hand and myocardial infarction on the other hand. In our applications, we assume that only for some of the time series the corresponding class is known a priori. Our main contribution is to introduce a novel combination of incorporating the data into a <italic>graph</italic> and then incorporate this representation into several recently introduced methods for <italic>semi-supervised learning</italic>. For this, each time series becomes a node within a weighted undirected graph and the edge-weight is proportional to the similarity between different time series. Graph-based approaches have become a standard tool in many learning tasks (cf. [<xref ref-type="bibr" rid="B19">19</xref>&#x02013;<xref ref-type="bibr" rid="B24">24</xref>] and the references mentioned therein). The matrix representation of the graph via its Laplacian [<xref ref-type="bibr" rid="B25">25</xref>] leads to studying the network using matrix properties. The Laplacian is <italic>the</italic> representation of the network that is utilized from machine learning to mathematical imaging. Recently, it has also been used network-Lasso-based learning approaches focusing on data with an inherent network structure, see e.g., [<xref ref-type="bibr" rid="B26">26</xref>, <xref ref-type="bibr" rid="B27">27</xref>]. A very important ingredient in the construction of the Laplacian is the choice of the appropriate weight function. In many applications, the computation of the distance between time series or sub-sequences becomes a crucial task and this will be reflected in our choice of weight function. We consider several distance measures such as dynamic time warping DTW [<xref ref-type="bibr" rid="B28">28</xref>], soft DTW [<xref ref-type="bibr" rid="B29">29</xref>], and matrix profile [<xref ref-type="bibr" rid="B30">30</xref>].</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>A typical example for time series classification. Given the dataset ECG200, the goal is to automatically separate all time series into the classes <italic>normal heartbeats</italic> and <italic>myocardial infarction</italic>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fams-07-784855-g0001.tif"/>
</fig>
<p>We will embed these measures via the graph Laplacian into two different recently proposed semi-supervised learning frameworks. Namely, a diffuse interface approach that originates from material science [<xref ref-type="bibr" rid="B31">31</xref>] via the graph Allen-Cahn equation as well as a method based on graph convolutional networks [<xref ref-type="bibr" rid="B21">21</xref>]. Since these methods have originally been introduced outside of the field of time series learning, their relationship with time series distance measures has never been studied. Our goal is furthermore to compare these approaches with the well-known 1NN approach [<xref ref-type="bibr" rid="B11">11</xref>] and a simple optimization formulation solved relying on a linear system of equations. Our motivation follows that of [<xref ref-type="bibr" rid="B32">32</xref>, <xref ref-type="bibr" rid="B33">33</xref>], where many methods for supervised learning in the context of time series were compared, namely that we aim to provide a wide-ranging overview of recent methods based on a graph representation of the data and combined with several distance measures.</p>
<p>We structure the paper as follows. In section 2, we introduce some basic notations and illustrate the basic notion of graph-based learning motivated with a clustering approach. In section 3, we discuss several distance measures with a focus on the well-known DTW measure as well as two recently emerged alternatives, i.e., Soft DTW and the MP distance. We use section 4 to introduce the two semi-supervised learning methods in more detail, followed by a shorter description of their well-known competitors. section 5 will allow us to compare the methods and study the hyperparameter selection.</p></sec>
<sec id="s2">
<title>2. Basics</title>
<p>We consider discrete time series <bold>x</bold><sub><italic>i</italic></sub> given as a vector of real numbers of length <italic>m</italic><sub><italic>i</italic></sub>. In general, we allow for the time series to be of different dimensionality; later we often consider all <italic>m</italic><sub><italic>i</italic></sub> &#x0003D; <italic>m</italic>. We assume that we are given <italic>n</italic> time series <inline-formula><mml:math id="M1"><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msup></mml:math></inline-formula>. The goal of a classification task is to group the <italic>n</italic> time series into a number <italic>k</italic> of different <italic>clusters</italic> <italic>C</italic><sub><italic>j</italic></sub> with <italic>j</italic> &#x0003D; 1, &#x02026;, <italic>k</italic>. In this paper we focus on the task of semi-supervised learning [<xref ref-type="bibr" rid="B17">17</xref>] where only some of the data are already labeled but we want to classify all available data simultaneously. Nevertheless, we review some techniques for unsupervised learning first as they deliver useful terminology. As such the <italic>k-means</italic> algorithm is a prototype-based<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref> clustering algorithm that divides the given data into a predefined number of <italic>k</italic> clusters [<xref ref-type="bibr" rid="B34">34</xref>]. The idea behind <italic>k</italic>-means is rather simple as the cluster centroids are repeatedly updated and the data points are assigned to the nearest centroid until the centroids and data points have converged. Often the termination condition is not handled that strictly. For example, the method can be terminated when only 1% of the points change clusters. The starting classes are often chosen at random but can also be assigned in a more systematic way by calculating the centers first and then assign the points to the nearest center. While <italic>k</italic>-means remains very popular it also has certain weaknesses coming from its minimization of the sum of squared errors loss function [<xref ref-type="bibr" rid="B35">35</xref>]. We discuss this method in some detail here to point out the main mechanism and this is based on assigning points to clusters and hence the cluster centroids based on the distance being the Euclidean norm, which would also be done when <italic>k</italic>-means is applied to time series. As a result the clusters might not capture the shape of the data manifold as illustrated in a simple two-dimensional example shown in <xref ref-type="fig" rid="F2">Figure 2</xref>. In comparison, the alternative method shown, i.e., a spectral clustering technique, performs much better. We briefly discuss this method next as it forms the basis of the main techniques introduced in this paper.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Clustering based on original data via k-means <bold>(left)</bold> vs. transformed data via spectral clustering <bold>(right)</bold>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fams-07-784855-g0002.tif"/>
</fig>
<sec>
<title>2.1. Graph Laplacian and Spectral Clustering</title>
<p>As we illustrated in <xref ref-type="fig" rid="F2">Figure 2</xref> the separation of the data into two-classes is rather difficult for <italic>k</italic>-means as the centroids are based on a 2-norm minimization. One alternative to <italic>k</italic>-means is based on interpreting the data points as nodes in a graph. For this, we assume that we are given data points <italic>x</italic><sub>1</sub>, &#x02026;, <italic>x</italic><sub><italic>n</italic></sub> and some measure of similarity [<xref ref-type="bibr" rid="B23">23</xref>]. We define the weighted undirected similarity graph <italic>G</italic> &#x0003D; (<italic>V, E</italic>) with the <italic>vertex</italic> or <italic>node</italic> set <italic>V</italic> and the edge set <italic>E</italic>. We view the data points <bold>x</bold><sub><italic>i</italic></sub> as vertices, <italic>V</italic> &#x0003D; {<bold>x</bold><sub>1</sub>, &#x02026;, <bold>x</bold><sub><italic>n</italic></sub>}, and if two nodes (<bold>x</bold><sub><italic>i</italic></sub>, <bold>x</bold><sub><italic>j</italic></sub>) have a positive similarity function value, they are connected by an edge with weight <italic>w</italic><sub><italic>ij</italic></sub> equal to that similarity. With this reformulation of the data we turn the clustering problem into a graph partitioning problem where we want to cut the graph into two or possibly more classes. This is usually done in such a way that the weight of the edges across the partition is minimal.</p>
<p>We collect all edge weights in the <italic>adjacency matrix</italic> <italic>W</italic> &#x0003D; (<sub><italic>w</italic><sub><italic>ij</italic></sub>)<italic>i, j</italic> &#x0003D; 1, &#x02026;, <italic>n</italic></sub>. The degree of a vertex <bold>x</bold><sub><italic>i</italic></sub> is defined as <inline-formula><mml:math id="M2"><mml:msub><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and the degree matrix <italic>D</italic> is the diagonal matrix holding all <italic>n</italic> node degrees. In our case we use a fully connected graph with the <italic>Gaussian similarity function</italic></p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>w</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo class="qopname">exp</mml:mo><mml:mstyle displaystyle="true"><mml:mo>(</mml:mo><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mtext class="textrm" mathvariant="normal">dist</mml:mtext><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mfrac><mml:mo>)</mml:mo></mml:mstyle><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where &#x003C3; is a scaling parameter and dist(<bold>x</bold><sub><italic>i</italic></sub>, <bold>x</bold><sub><italic>j</italic></sub>) is a particular distance function such as the Euclidean distance <inline-formula><mml:math id="M4"><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">dist</mml:mtext></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>:</mml:mo><mml:mo>=</mml:mo><mml:mo>&#x02225;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mo>&#x02225;</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>. Note that for similar nodes, the value of the <italic>distance</italic> function is smaller than it would be for dissimilar nodes while the <italic>similarity</italic> function is relatively large.</p>
<p>We now use both the degree and weight matrix to define the <italic>graph Laplacian</italic> as <italic>L</italic> &#x0003D; <italic>D</italic> &#x02212; <italic>W</italic>. Often the <italic>symmetrically normalized Laplacian</italic> defined via</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M5"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">sym</mml:mtext></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac></mml:mrow></mml:msup><mml:mi>L</mml:mi><mml:msup><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>I</mml:mi><mml:mo>-</mml:mo><mml:msup><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac></mml:mrow></mml:msup><mml:mi>W</mml:mi><mml:msup><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac></mml:mrow></mml:msup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>provides better clustering information [<xref ref-type="bibr" rid="B23">23</xref>]. It has some very useful properties that we will exploit here. For example, given a non-zero vector <italic>u</italic> &#x02208; &#x0211D;<sup><italic>n</italic></sup> we obtain the energy term</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M6"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtable class="aligned"><mml:mtr><mml:mtd columnalign="left"><mml:msup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mstyle mathvariant="sans-serif"><mml:mi>T</mml:mi></mml:mstyle></mml:mrow></mml:msup><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">sym</mml:mtext></mml:mrow></mml:msub><mml:mi>u</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msqrt><mml:mrow><mml:msub><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msqrt><mml:mrow><mml:msub><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Using this it is easy to see that <italic>L</italic><sub>sym</sub> is positive semi-definite with non-negative eigenvalues 0 &#x0003D; &#x003BB;<sub>1</sub> &#x02264; &#x003BB;<sub>2</sub> &#x02264;&#x02026; &#x02264; &#x003BB;<sub><italic>n</italic></sub>. The main advantage of the graph Laplacian is that based on its spectral information one can usually rely on transforming the data into a space where they are easier to separate [<xref ref-type="bibr" rid="B23">23</xref>, <xref ref-type="bibr" rid="B25">25</xref>, <xref ref-type="bibr" rid="B36">36</xref>]. As a result one typically requires the spectral information corresponding to the smallest eigenvalues of <italic>L</italic><sub>sym</sub>. The most famed eigenvector is the <italic>Fiedler vector</italic>, i.e., the eigenvector corresponding to the first non-zero eigenvalue, which is bound to have a sign change and as a result can be used for binary classification. The weight function (1) is also found in kernel methods [<xref ref-type="bibr" rid="B37">37</xref>, <xref ref-type="bibr" rid="B38">38</xref>] when the radial basis kernel is applied.</p></sec>
<sec>
<title>2.2. Self-Tuning</title>
<p>In order to improve the performance of the methods based on the graph Laplacian, tuning the parameter &#x003C3; is crucial. While hyperparameter tuning based on a grid search or cross validation is certainly possible we also consider a &#x003C3; that adapts to the given data. For spectral clustering, such a procedure was introduced in [<xref ref-type="bibr" rid="B39">39</xref>]. Here we use this technique to learning with time series data. For each time series <bold>x</bold><sub><italic>i</italic></sub> we assume a local scaling parameter &#x003C3;<sub><italic>i</italic></sub>. As a result, we have the generalized square distance as</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M7"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtable class="aligned"><mml:mtr><mml:mtd columnalign="left"><mml:mfrac><mml:mrow><mml:mtext class="textrm" mathvariant="normal">dist</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mfrac><mml:mrow><mml:mtext class="textrm" mathvariant="normal">dist</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mtext class="textrm" mathvariant="normal">dist</mml:mtext><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>and this gives the entries of the adjacency matrix <italic>W</italic> via</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M8"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtable class="aligned"><mml:mtr><mml:mtd columnalign="left"><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo class="qopname">exp</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mtext class="textrm" mathvariant="normal">dist</mml:mtext><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The authors in [<xref ref-type="bibr" rid="B39">39</xref>] choose &#x003C3;<sub><italic>i</italic></sub> as the distance to the <italic>K</italic>-th nearest neighbor of <bold>x</bold><sub><italic>i</italic></sub> where <italic>K</italic> is a fixed parameter, e.g., <italic>K</italic> &#x0003D; 9 is used in [<xref ref-type="bibr" rid="B31">31</xref>].</p>
<p>In section 5, we will explore several different values for <italic>K</italic> and their influence on the classification behavior.</p></sec></sec>
<sec id="s3">
<title>3. Distance Measures</title>
<p>We have seen from the definition of the weight matrix that the Laplacian depends on the choice of distance measure dist(<bold>x</bold><sub><italic>i</italic></sub>, <bold>x</bold><sub><italic>j</italic></sub>). If all time series are of the same length then the easiest distance measure would be a Euclidean distance, which especially for large <italic>n</italic> is fast to compute. This makes the Euclidean distance incredibly popular but it suffers from being sensitive to small shifts in the time series. As a result we discuss several popular and efficient methods for different distance measures. Our focus is to illustrate in an empirical study how the choice of distance measure impacts the performance of graph-based learning and to provide further insights for future research (cf. [<xref ref-type="bibr" rid="B40">40</xref>]).</p>
<sec>
<title>3.1. Dynamic Time Warping</title>
<p>We first discuss the distance measure of Dynamic Time Warping (DTW, [<xref ref-type="bibr" rid="B28">28</xref>]). By construction, DTW is an algorithm to find an optimal alignment between time series.</p>
<p>In the following, we adapt the notation of [<xref ref-type="bibr" rid="B28">28</xref>] to our case. Consider two time series <bold>x</bold> and <inline-formula><mml:math id="M201"><mml:mstyle mathvariant="bold"><mml:mover accent='true'><mml:mi>x</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover></mml:mstyle></mml:math></inline-formula> of lengths <italic>m</italic> and <inline-formula><mml:math id="M9"><mml:mover accent="true"><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:math></inline-formula>, respectively, with entries <inline-formula><mml:math id="M10"><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mi>&#x0211D;</mml:mi></mml:math></inline-formula> for <italic>i</italic> &#x0003D; 1, &#x02026;, <italic>m</italic> and <inline-formula><mml:math id="M11"><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:math></inline-formula>. We obtain the local cost matrix <inline-formula><mml:math id="M12"><mml:mi>C</mml:mi><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow></mml:msup></mml:math></inline-formula> by assembling the local differences for each pair of elements, i.e., <inline-formula><mml:math id="M13"><mml:msub><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo></mml:math></inline-formula>.</p>
<p>The DTW distance is defined via <inline-formula><mml:math id="M14"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula><italic>-warping paths</italic>, which are sequences of index tuples <italic>p</italic> &#x0003D; ((<italic>i</italic><sub>1</sub>, <italic>j</italic><sub>1</sub>), &#x02026;, (<italic>i</italic><sub><italic>L</italic></sub>, <italic>j</italic><sub><italic>L</italic></sub>)) with boundary, monotonicity, and step size conditions</p>
<disp-formula id="E6"><mml:math id="M15"><mml:mtable columnalign="left"><mml:mtr><mml:mtd columnalign="center"><mml:mn>1</mml:mn><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02264;</mml:mo><mml:msub><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02264;</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>&#x02264;</mml:mo><mml:msub><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mn>1</mml:mn><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02264;</mml:mo><mml:msub><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02264;</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>&#x02264;</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="center"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02113;</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02113;</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mtext>&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x02113;</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:mi>L</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The total cost of such a path with respect to <bold>x</bold>, <inline-formula><mml:math id="M203"><mml:mstyle mathvariant="bold"><mml:mover accent='true'><mml:mi>x</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover></mml:mstyle></mml:math></inline-formula> is defined as</p>
<disp-formula id="E7"><mml:math id="M16"><mml:mrow><mml:msub><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant='bold'><mml:mover><mml:mtext>x</mml:mtext><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x02113;</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mo stretchy="false">|</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo stretchy="false">|</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
<p>The DTW distance is then defined as the minimum cost of any warping path:</p>
<disp-formula id="E8"><label>(6)</label><mml:math id="M17"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">DTW</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mtext>y</mml:mtext></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>:</mml:mo><mml:mo>=</mml:mo><mml:mo class="qopname">min</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mtext>y</mml:mtext></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02223;</mml:mo><mml:mi>p</mml:mi><mml:mtext class="textrm" mathvariant="normal">is&#x000A0;a&#x000A0;(m,</mml:mtext><mml:mover accent="true"><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mo class="qopname">&#x0007E;</mml:mo></mml:mover><mml:mtext class="textrm" mathvariant="normal">)-warping&#x000A0;path</mml:mtext></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Both the warping and the warping path are illustrated in <xref ref-type="fig" rid="F3">Figure 3</xref>.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>DTW warping <bold>(left)</bold> and warpings paths <bold>(right)</bold>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fams-07-784855-g0003.tif"/>
</fig>
<p>Computing the optimal warping path directly quickly becomes infeasible. However, we can use dynamic programming to evaluate the accumulated cost matrix <italic>D</italic> recursively via</p>
<disp-formula id="E9"><label>(7)</label><mml:math id="M18"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>D</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>:</mml:mo><mml:mo>=</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:mo>&#x0002B;</mml:mo><mml:mo class="qopname">min</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mi>D</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mi>D</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mi>D</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The actual DTW distance is finally obtained as</p>
<disp-formula id="E10"><label>(8)</label><mml:math id="M19"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">DTW</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mtext>y</mml:mtext></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>D</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The DTW method is a heavily used distance measure for capturing the sometimes subtle similarities between time series. In the literature it is typically stated that the computational cost of DTW being prohibitively large. As a result one is interested in accelerating the DTW algorithm itself. One possibility arises from imposing additional constraints (cf. [<xref ref-type="bibr" rid="B28">28</xref>, <xref ref-type="bibr" rid="B41">41</xref>]) such as the Sakoe-Chiba Band and the Itakura parallelogram as these simplify the identification of the optimal warping path. While these are appealing concepts the authors in [<xref ref-type="bibr" rid="B42">42</xref>] observe that the well-known FastDTW algorithm [<xref ref-type="bibr" rid="B41">41</xref>] is in fact slower than DTW. For our purpose we will hence rely on DTW and in particular on the implementation of DTW provided via <ext-link ext-link-type="uri" xlink:href="https://github.com/wannesm/dtaidistance">https://github.com/wannesm/dtaidistance</ext-link>. We observe that for this implementation of DTW indeed FastDTW is outperformed frequently.</p></sec>
<sec>
<title>3.2. Soft Dynamic Time Warping</title>
<p>Based on a slight reformulation of the above DTW scheme, we want to look at another time series distance measure, the <italic>Soft Dynamic Time Warping</italic> (Soft DTW). It is an extension of DTW designed allowing a differentiable loss function and it was introduced in [<xref ref-type="bibr" rid="B29">29</xref>, <xref ref-type="bibr" rid="B43">43</xref>]. We again start from the cost matrix <italic>C</italic> with <inline-formula><mml:math id="M20"><mml:mi>C</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo></mml:math></inline-formula> for time series <bold>x</bold> and <inline-formula><mml:math id="M204"><mml:mstyle mathvariant="bold"><mml:mover accent='true'><mml:mi>x</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover></mml:mstyle></mml:math></inline-formula>. Each warping path can equivalently be described by a matrix <inline-formula><mml:math id="M21"><mml:mi>A</mml:mi><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow></mml:msup></mml:math></inline-formula> with the following condition: The ones in <italic>A</italic> form a path starting in (1, 1) going to <inline-formula><mml:math id="M22"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>, only using steps downwards, to the right and diagonal downwards. <italic>A</italic> is called monotonic alignment matrix and we denote the set containing all these alignment matrices with <inline-formula><mml:math id="M23"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">A</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>. The Frobenius inner product &#x02329;<italic>A, C</italic>&#x0232A; is then the sum of costs along the alignment <italic>A</italic>. Solving the following minimization problem leads us to a reformulation of the dynamic time warping introduced above as</p>
<disp-formula id="E11"><label>(9)</label><mml:math id="M24"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">DTW</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">min</mml:mo></mml:mrow><mml:mrow><mml:mi>A</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">A</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>N</mml:mi><mml:mo>,</mml:mo><mml:mi>M</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:munder></mml:mstyle><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mrow><mml:mi>A</mml:mi><mml:mo>,</mml:mo><mml:mi>C</mml:mi></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>With Soft DTW we involve all alignments possible in <inline-formula><mml:math id="M25"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">A</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>N</mml:mi><mml:mo>,</mml:mo><mml:mi>M</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> by replacing the minimization with a <italic>soft minimum</italic>:</p>
<disp-formula id="E12"><label>(10)</label><mml:math id="M26"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">min</mml:mo></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>S</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x02248;</mml:mo><mml:msub><mml:mrow><mml:msub><mml:mrow><mml:mtext class="textrm" mathvariant="normal">min</mml:mtext></mml:mrow><mml:mrow><mml:mi>&#x003B3;</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>:</mml:mo><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mi>&#x003B3;</mml:mi><mml:mo class="qopname">log</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>S</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mo class="qopname">exp</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mo>-</mml:mo><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>&#x003B3;</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>S</italic> is a discrete subset of the real numbers. This function approximates the minimum of <italic>f</italic>(<italic>x</italic>) and is differentiable. The parameter &#x003B3; controls the tuning between smoothness and approximation of the minimum. Using the DTW-function (9) within (10) yields the expression for Soft Dynamic Time Warping written as</p>
<disp-formula id="E13"><label>(11)</label><mml:math id="M27"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mtext class="textrm" mathvariant="normal">DTW</mml:mtext></mml:mrow><mml:mrow><mml:mi>&#x003B3;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle><mml:mo>,</mml:mo><mml:mover><mml:mstyle mathvariant='bold'><mml:mtext>x</mml:mtext></mml:mstyle><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:msub><mml:mrow><mml:mtext class="textrm" mathvariant="normal">min</mml:mtext></mml:mrow><mml:mrow><mml:mi>&#x003B3;</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>A</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">A</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mrow><mml:mi>A</mml:mi><mml:mo>,</mml:mo><mml:mi>C</mml:mi></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mi>&#x003B3;</mml:mi><mml:mo class="qopname">log</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>A</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">A</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:munder></mml:mstyle><mml:mo class="qopname">exp</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mo>-</mml:mo><mml:mrow><mml:mo>&#x02329;</mml:mo><mml:mrow><mml:mi>A</mml:mi><mml:mo>,</mml:mo><mml:mi>C</mml:mi></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>&#x003B3;</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>This is now a differentiable alternative to DTW, which involves all alignments in our cost matrix.</p>
<p>Due to entropic bias<xref ref-type="fn" rid="fn0003"><sup>3</sup></xref>, Soft DTW can generate negative values, which would cause issues for our use in time series classification. We apply the following remedy to overcome this drawback:</p>
<disp-formula id="E15"><label>(12)</label><mml:math id="M29"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">Div</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mtext>y</mml:mtext></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext class="textrm" mathvariant="normal">DTW</mml:mtext></mml:mrow><mml:mrow><mml:mi>&#x003B3;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mtext>y</mml:mtext></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:mo>&#x000B7;</mml:mo><mml:mstyle displaystyle="true"><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mtext class="textrm" mathvariant="normal">DTW</mml:mtext></mml:mrow><mml:mrow><mml:mi>&#x003B3;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mtext class="textrm" mathvariant="normal">DTW</mml:mtext></mml:mrow><mml:mrow><mml:mi>&#x003B3;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>y</mml:mtext></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle mathvariant="bold"><mml:mtext>y</mml:mtext></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mstyle><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>This measure is called Soft DTW divergence [<xref ref-type="bibr" rid="B43">43</xref>] and will be employed in our experiments.</p></sec>
<sec>
<title>3.3. Matrix Profile Distance</title>
<p>Another alternative time series measure that has recently been introduced is the <italic>Matrix Profile Distance</italic> (MP distance, [<xref ref-type="bibr" rid="B30">30</xref>]). This measure is designed for fast computation and finding similarities between time series.</p>
<p>We will again introduce the concept of the matrix profile of two time series <bold>x</bold> and <inline-formula><mml:math id="M205"><mml:mstyle mathvariant="bold"><mml:mover accent='true'><mml:mi>x</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover></mml:mstyle></mml:math></inline-formula>. The matrix profile is based on the subsequences of these two time series. For a fixed window length <italic>L</italic>, the subsequence <bold>x</bold><sub><italic>i,L</italic></sub> of a time series <bold>x</bold> is defined as a contiguous <italic>L</italic>-element subset of <bold>x</bold> via <bold>x</bold><sub><italic>i,L</italic></sub> &#x0003D; (<italic>x</italic><sub><italic>i</italic></sub>, <italic>x</italic><sub><italic>i</italic> &#x0002B; 1</sub>, &#x02026;, <italic>x</italic><sub><italic>i</italic>&#x0002B;<italic>L</italic>&#x02212;1</sub>). The <italic>all-subsequences set</italic> <italic>A</italic> of <bold>x</bold> contains all possible subsequences of <bold>x</bold> with length <italic>L</italic>, <italic>A</italic> &#x0003D; {<bold>x</bold><sub>1,<italic>L</italic></sub>, <bold>x</bold><sub>2,<italic>L</italic></sub>, &#x02026;, <bold>x</bold><sub><italic>m</italic>&#x02212;<italic>L</italic> &#x0002B; 1, <italic>L</italic></sub>}, where <italic>m</italic> is again the length of <bold>x</bold>.</p>
<p>For the matrix profile, we need the all-subsequences sets <italic>A</italic> and <italic>B</italic> of both time series <bold>x</bold> and <inline-formula><mml:math id="M200"><mml:mstyle mathvariant="bold"><mml:mover accent='true'><mml:mi>x</mml:mi><mml:mo>&#x002DC;</mml:mo></mml:mover></mml:mstyle></mml:math></inline-formula>. The matrix profile <bold>P</bold><sub>ABBA</sub> is the set consisting of the closest Euclidean distances from each subsequence in <italic>A</italic> to any subsequence in <italic>B</italic> and vice versa:</p>
<disp-formula id="E16"><mml:math id="M30"><mml:mrow><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>P</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">ABBA</mml:mtext></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">min</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mstyle mathvariant="bold"><mml:mover><mml:mtext>x</mml:mtext><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mstyle><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mo stretchy="false">|</mml:mo><mml:mo stretchy="false">|</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mover><mml:mtext>x</mml:mtext><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">|</mml:mo><mml:mo stretchy="false">|</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mi>A</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>&#x0222A;</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">min</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mi>A</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mo stretchy="false">|</mml:mo><mml:mo stretchy="false">|</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mover><mml:mtext>x</mml:mtext><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">|</mml:mo><mml:mo stretchy="false">|</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mover><mml:mtext>x</mml:mtext><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:mi>B</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math></disp-formula>
<p>With the matrix profile, we can finally define the MP distance based on the idea that two time series are similar if they have many similar subsequences. We do not consider the smallest or the largest value of <bold>P</bold><sub><italic>ABBA</italic></sub> because then the MP distance could be too rough or too detailed. For example, if we would have two rather similar time series, but either one has a noisy spike or some missing values, then the largest value of the matrix profile could give a wrong impression about the similarity of these two time series. Instead, the distance is defined as</p>
<disp-formula id="E17"><mml:math id="M31"><mml:mrow><mml:mtext class="textrm" mathvariant="normal">MPdist</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle class="math"><mml:mi>k</mml:mi><mml:mtext class="textrm" mathvariant="normal">-th&#x000A0;smallest&#x000A0;value &#x000A0;in&#x000A0;sorted&#x000A0;</mml:mtext></mml:mstyle><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>P</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>A</mml:mi><mml:mi>B</mml:mi><mml:mi>B</mml:mi><mml:mi>A</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where the parameter <italic>k</italic> is typically set to 5% of 2<italic>N</italic> [<xref ref-type="bibr" rid="B30">30</xref>].</p>
<p>We now illustrate the MP distance using an example as illustrated in section 3.3, where we display three time series of length <italic>N</italic> &#x0003D; 100. Our goal is to compare these time series using the MP distance. We observe that <italic>X</italic><sub>1</sub> and <italic>X</italic><sub>2</sub> have quite similar oscillations. The third time series <italic>X</italic><sub>3</sub> does not share any obvious features with the first two sequences.</p>
<p>The MP distance compares the subsequences of the time series, depending on the window length <italic>L</italic>. Choosing the window length to be <italic>L</italic> &#x0003D; 40, we get the following distances:</p>
<disp-formula id="E18"><mml:math id="M32"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">MPdist</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>433</mml:mn><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">MPdist</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mn>5</mml:mn><mml:mo>.</mml:mo><mml:mn>425</mml:mn><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">MPdist</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mn>5</mml:mn><mml:mo>.</mml:mo><mml:mn>404</mml:mn><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>As we can see, the MP distance identified the similarity between <italic>X</italic><sub>1</sub> and <italic>X</italic><sub>2</sub> shows that <italic>X</italic><sub>1</sub>, <italic>X</italic><sub>2</sub> differ from <italic>X</italic><sub>3</sub>. We also want to show that the MP Distance depends on the window length <italic>L</italic>. Let us look at the MP distance between the lower oscillation time series <italic>X</italic><sub>2</sub> and <italic>X</italic><sub>3</sub>, which is varying a lot for different values of <italic>L</italic> as indicated in <xref ref-type="table" rid="T1">Table 1</xref>. Choosing <italic>L</italic> &#x0003D; 10 there is not a large portion of both time series to compare with and as a result we observe a small value for the MP distance, which does not describe the dissimilarity of <italic>X</italic><sub>2</sub> and <italic>X</italic><sub>3</sub> in a proper way. If we look at <italic>L</italic> &#x0003D; 40, there is a larger part of the time series structure to compare the two series. If there is a special recurring pattern in the time series, the length <italic>L</italic> should be large enough to cover one recurrence. We illustrate the comparison based on different window lengths in <xref ref-type="fig" rid="F4">Figure 4</xref>.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>MP distance depending on the window length.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>L</bold></th>
<th valign="top" align="center"><bold>10</bold></th>
<th valign="top" align="center"><bold>20</bold></th>
<th valign="top" align="center"><bold>30</bold></th>
<th valign="top" align="center"><bold>40</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">MPdist(<italic>X</italic><sub>2</sub>, <italic>X</italic><sub>3</sub>)</td>
<td valign="top" align="center">0.270</td>
<td valign="top" align="center">2,034</td>
<td valign="top" align="center">3,955</td>
<td valign="top" align="center">5,404</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Illustration of Matrix Profile distance <bold>(left)</bold>, subsequences indicated in red with window length <italic>L</italic> &#x0003D; 10 <bold>(middle)</bold> and <italic>L</italic> &#x0003D; 30 <bold>(right)</bold>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fams-07-784855-g0004.tif"/>
</fig>
<p>For the tests all data sets consist of time series with a certain length, varying for each data set. Thus we have to decide which window length <italic>L</italic> should be chosen automatically in the classifier. An empirical study showed that choosing <italic>L</italic> &#x02248; <italic>N</italic>/2 gives good classification results.</p>
<p>We briefly illustrate the computing times of the different distance measures when applied to time series of increasing length shown in <xref ref-type="fig" rid="F5">Figure 5</xref>. It can be seen that DTW is faster than fastDTW. Obviously, the Euclidean distance shows the best scalability. We also observe that the computation of the SDTW is scaling worse than the competing approaches when applied to longer time series.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Runtimes of distance computation between a single pair of time series with increasing length.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fams-07-784855-g0005.tif"/>
</fig></sec></sec>
<sec id="s4">
<title>4. Semi-Supervised Learning Based on Graph Laplacians</title>
<p>In this section, we focus mainly on two methods that have recently gained wide attention. This first method is inspired by a partial differential equation model originating from material science and the second approach is based on neural networks that incorporate the graph structure of the labeled and unlabeled data.</p>
<sec>
<title>4.1. Semi-supervised Learning With Phase Field Methods: Allen&#x02013;Cahn Model</title>
<p>Within the material science community phase field methods have been developed to model the phase separation of a multicomponent alloy system (cf. [<xref ref-type="bibr" rid="B45">45</xref>, <xref ref-type="bibr" rid="B46">46</xref>]). The evolution of the phases over time is described by a partial differential equation (PDE) model, such as the Allen-Cahn [<xref ref-type="bibr" rid="B46">46</xref>] or Cahn-Hilliard equation [<xref ref-type="bibr" rid="B47">47</xref>] both non-linear reaction-diffusion equations of second and fourth order, respectively. These equations can be obtained as gradient flows of the Ginzburg&#x02013;Landau energy functional</p>
<disp-formula id="E19"><mml:math id="M33"><mml:mstyle displaystyle="true"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">E</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo>&#x0222B;</mml:mo><mml:mfrac><mml:mrow><mml:mi>&#x003B5;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:mo stretchy="false">|</mml:mo><mml:mo>&#x02207;</mml:mo><mml:mi>u</mml:mi><mml:msup><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003B5;</mml:mi></mml:mrow></mml:mfrac><mml:mi>&#x003D5;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mstyle></mml:math></disp-formula>
<p>where <italic>u</italic> is the order parameter and &#x003B5; a parameter reflecting the width of the interface between the pure phases. The polynomial &#x003D5; is chosen to have minima at the pure phases, namely <italic>u</italic> &#x0003D; &#x02212;1 and <italic>u</italic> &#x0003D; 1, to enforce that a minimization of the Ginzburg&#x02013;Landau energy will lead to phase separation. A common choice is the well-known double-well potential <inline-formula><mml:math id="M34"><mml:mi>&#x003D5;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:mfrac><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:msup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>.</mml:mo></mml:math></inline-formula> The Dirichlet energy term |&#x02207;<italic>u</italic>|<sup>2</sup> corresponds to minimization of the interfacial length. The minimization is then performed using a gradient flow, which leads to the Allen-Cahn equation</p>
<disp-formula id="E20"><label>(13)</label><mml:math id="M35"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo>&#x00394;</mml:mo><mml:mi>u</mml:mi><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003B5;</mml:mi></mml:mrow></mml:mfrac><mml:msup><mml:mrow><mml:mi>&#x003D5;</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>equipped with appropriate boundary and initial conditions. A modified Allen&#x02013;Cahn equation was used for image inpainting, i.e., restoring damage parts in an image, where a misfit &#x003C9;(<italic>f</italic>&#x02212;<italic>u</italic>) term is added to Equation (13) (cf. [<xref ref-type="bibr" rid="B48">48</xref>, <xref ref-type="bibr" rid="B49">49</xref>]). Here, &#x003C9; is a penalty parameter and <italic>f</italic> is a function equal to the undamaged image parts or later training data. In [<xref ref-type="bibr" rid="B31">31</xref>], Bertozzi and Flenner extended this idea to the case of semi-supervised learning where the training data correspond to the undamaged image parts, i.e, the function <italic>f</italic>. Their idea is to consider the modified energy of the following form</p>
<disp-formula id="E21"><label>(14)</label><mml:math id="M36"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>E</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>&#x003B5;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:msup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">sym</mml:mtext></mml:mrow></mml:msub><mml:mi>u</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>4</mml:mn><mml:mi>&#x003B5;</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C9;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>f</italic><sub><italic>i</italic></sub> holds the already assigned labels. Here, the first term in (14) reflects the RatioCut based on the graph Laplacian, the second term enforces the pure phases, and the third term corresponds to incorporating the training data. Numerically, this system is solved using a convexity splitting approach [<xref ref-type="bibr" rid="B31">31</xref>] where we write</p>
<disp-formula id="E22"><mml:math id="M40"><mml:mi>E</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></disp-formula>
<p>with</p>
<disp-formula id="E23"><mml:math id="M41"><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>:</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>&#x003B5;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:msup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">sym</mml:mtext></mml:mrow></mml:msub><mml:mi>u</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mfrac><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:msup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:mi>u</mml:mi></mml:math></disp-formula>
<p>and</p>
<disp-formula id="E24"><mml:math id="M42"><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>:</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:msup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:mi>u</mml:mi><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>4</mml:mn><mml:mi>&#x003B5;</mml:mi></mml:mrow></mml:mfrac><mml:mo>&#x0002B;</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C9;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where the positive parameter <italic>c</italic> &#x02208; &#x0211D; ensures convexity of both energies. In order to compute the minimizer of the above energy we use a gradient scheme where</p>
<disp-formula id="E25"><mml:math id="M43"><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>-</mml:mo><mml:msup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mi>&#x003C4;</mml:mi></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mo>&#x02207;</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mo>&#x02207;</mml:mo><mml:msub><mml:mrow><mml:mi>E</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where the indices <italic>k, k</italic> &#x0002B; 1 indicate the current and next time step, respectively. The variable &#x003C4; is a hyperparameter but can be interpreted as a pseudo time-step. In more detail following the notation of [<xref ref-type="bibr" rid="B20">20</xref>], this leads to</p>
<disp-formula id="E26"><mml:math id="M44"><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>-</mml:mo><mml:msup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mi>&#x003C4;</mml:mi></mml:mrow></mml:mfrac><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003B5;</mml:mi><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">sym</mml:mtext></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo><mml:mi>c</mml:mi><mml:msup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>c</mml:mi><mml:msup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003B5;</mml:mi></mml:mrow></mml:mfrac><mml:mo>&#x02207;</mml:mo><mml:mi>&#x003C8;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mo>&#x02207;</mml:mo><mml:mi>&#x003D5;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></disp-formula>
<p>with</p>
<disp-formula id="E27"><mml:math id="M45"><mml:mi>&#x003C8;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mtext>&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mi>&#x003D5;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C9;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msubsup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:math></disp-formula>
<p>Expanding the order parameter in a number of the small eigenvectors &#x003D5;<sub><italic>i</italic></sub> of <italic>L</italic><sub>sym</sub> via <inline-formula><mml:math id="M46"><mml:mi>u</mml:mi><mml:mo>=</mml:mo><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munderover><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>&#x003D5;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mi>a</mml:mi></mml:math></inline-formula> where <italic>a</italic> is a coefficient vector and &#x003A6;<sub><italic>m</italic><sub><italic>e</italic></sub></sub> &#x0003D; [&#x003D5;<sub>1</sub>, &#x02026;, &#x003D5;<sub><italic>m</italic><sub><italic>e</italic></sub></sub>]. This lets us arrive at</p>
<disp-formula id="E28"><mml:math id="M47"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003B5;</mml:mi><mml:mi>&#x003C4;</mml:mi><mml:msub><mml:mrow><mml:mo>&#x003BB;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:msubsup><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo>&#x0002B;</mml:mo><mml:mi>c</mml:mi><mml:mi>&#x003C4;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msubsup><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003C4;</mml:mi><mml:mi>c</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msubsup><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003B5;</mml:mi></mml:mrow></mml:mfrac><mml:msubsup><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x0002B;</mml:mo><mml:msubsup><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mtext>&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:mo>&#x02200;</mml:mo><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:math></disp-formula>
<p>using</p>
<disp-formula id="E29"><mml:math id="M48"><mml:msup><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x02207;</mml:mo><mml:mi>&#x003C8;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:msup><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x02207;</mml:mo><mml:mi>&#x003D5;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mo>&#x003A6;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:math></disp-formula>
<p>In [<xref ref-type="bibr" rid="B50">50</xref>], the authors extend this to the case of multiple classes where again the spectral information of the graph Laplacian are crucial as the energy term includes <inline-formula><mml:math id="M49"><mml:mfrac><mml:mrow><mml:mi>&#x003B5;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">tr</mml:mtext></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">sym</mml:mtext></mml:mstyle></mml:mrow></mml:msub><mml:mi>U</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> with <italic>U</italic> &#x02208; &#x0211D;<sup><italic>n, s</italic></sup>, <italic>s</italic> being the number of classes for segmentation, and tr being the trace of the matrix. Details of the definition of the potential and the fidelity term incorporating the training data are found in [<xref ref-type="bibr" rid="B50">50</xref>]. Further extensions of this approach have been suggested in [<xref ref-type="bibr" rid="B20">20</xref>, <xref ref-type="bibr" rid="B22">22</xref>, <xref ref-type="bibr" rid="B51">51</xref>&#x02013;<xref ref-type="bibr" rid="B55">55</xref>].</p></sec>
<sec>
<title>4.2. Semi-supervised Learning Based on Graph Convolutional Networks</title>
<p>Artificial neural networks and in particular deep neural networks have shown outstanding performance in many learning tasks [<xref ref-type="bibr" rid="B56">56</xref>, <xref ref-type="bibr" rid="B57">57</xref>]. The incorporation of additional structural information via a graph structure has received wide attention [<xref ref-type="bibr" rid="B24">24</xref>] with particular success within the semi-supervised learning formulation [<xref ref-type="bibr" rid="B21">21</xref>].</p>
<p>Let <inline-formula><mml:math id="M50"><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> denote the hidden feature vector of the <italic>i</italic>-th node in the <italic>l</italic>-th layer. The feature mapping of a simple multilayer perceptron (MLP) computes the new features by multiplying with a weight matrix &#x00398;<sup>(<italic>l</italic>)<italic>T</italic></sup> and adding a bias vector <bold>b</bold><sup>(<italic>l</italic>)</sup>, then applying a (potentially layer-dependent) ReLU activation function &#x003C3;<sub><italic>l</italic></sub> in all layers except the last. This layer operation can be written as <inline-formula><mml:math id="M51"><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mstyle displaystyle="true"><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>l</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x0002B;</mml:mo><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>b</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mstyle></mml:math></inline-formula>.</p>
<p>In Graph Neural Networks, the features are additionally propagated along the edges of the graph. This is achieved by forming weighted sums over the local neighborhood of each node, leading to</p>
<disp-formula id="E30"><label>(15)</label><mml:math id="M52"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>(</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0222A;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:munder></mml:mstyle><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x00175;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msqrt><mml:mrow><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac><mml:msup><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>l</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x0002B;</mml:mo><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>b</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>)</mml:mo><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Here, <inline-formula><mml:math id="M53"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> denotes the set of neighbors of node <italic>i</italic>, &#x00398;<sup>(<italic>l</italic>)</sup> and <bold>b</bold><sup>(<italic>l</italic>)</sup> the trainable parameters of layer <italic>l</italic>, the &#x00175;<sub><italic>ij</italic></sub> denote the entries of the adjacency matrix <italic>W</italic> with added self loops, &#x00174; &#x0003D; <italic>W</italic> &#x0002B; <italic>I</italic>, and the <inline-formula><mml:math id="M54"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> denote the row sums of that matrix. By adding the self loops, it is ensured that the original features of that node are maintained in the weighted sum.</p>
<p>To obtain a matrix formulation, we can accumulate state matrices <italic>X</italic><sup>(<italic>l</italic>)</sup> whose <italic>n</italic> rows are the feature vectors <inline-formula><mml:math id="M55"><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> for <italic>i</italic> &#x0003D; 1, &#x02026;, <italic>n</italic>. The propagation scheme of a simple two-layer graph convolutional network can then be written as</p>
<disp-formula id="E31"><label>(16)</label><mml:math id="M56"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none" equalcolumns="false" class="array"><mml:mtr><mml:mtd columnalign="left"><mml:msup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>&#x003C3;</mml:mi><mml:mstyle displaystyle="true"><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mi>&#x00174;</mml:mi><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>b</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mstyle></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="left"><mml:msup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mi>&#x00174;</mml:mi><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mo>&#x00398;</mml:mo></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>b</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M57"><mml:mover accent="true"><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> is the diagonal matrix holding the <inline-formula><mml:math id="M58"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>.</p>
<p>Multiplication with <inline-formula><mml:math id="M59"><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mi>&#x00174;</mml:mi><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> can also be understood in a spectral sense as performing <italic>graph convolution</italic> with the spectral filter function &#x003C6;(&#x003BB;) &#x0003D; 1 &#x02212; &#x003BB;. This filter originates from truncating a Chebyshev polynomial to first order as discussed in [<xref ref-type="bibr" rid="B58">58</xref>]. As a result of this filter the eigenvalues &#x003BB; of the graph Laplacian operator <inline-formula><mml:math id="M60"><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow></mml:math></inline-formula> (formed in this case <italic>after</italic> adding the self loops) are transformed via &#x003C6; to obtain damping coefficients for the corresponding eigenvectors. This filter has been shown to lead to convolutional layers equivalent to aggregating node representations from their direct neighborhood (cf. [<xref ref-type="bibr" rid="B58">58</xref>] for more information).</p>
<p>It has been noted, e.g., in [<xref ref-type="bibr" rid="B59">59</xref>] that traditional graph neural networks including GCN are mostly targeted at the case of <italic>sparse</italic> graphs, where each node is only connected to a small number of neighbors. The fully connected graphs that we utilize in this work present challenges for GCN through their spectral properties. Most notably, these <italic>dense</italic> graphs typically have large eigengaps, i.e., the gap between the smallest eigenvalue &#x003BB;<sub>1</sub> &#x0003D; 0 and the second eigenvalue &#x003BB;<sub>2</sub> &#x0003E; 0 may be close to 1. Hence the GCN filter acts almost like a projection onto the undesirable eigenvector &#x003D5;<sub>1</sub>. However, it has been observed in the same work that in some applications, GCNs applied to <italic>sparsified</italic> graphs yield comparable results to dedicated dense methods. Our experiments justified only using Standard GCN on a <italic>k</italic>-nearest neighbor subgraph.</p></sec>
<sec>
<title>4.3. Other Semi-supervised Learning Methods</title>
<p>In the context of graph-based semi-supervised learning a rather straightforward approach follows from minimizing the following objective</p>
<disp-formula id="E32"><label>(17)</label><mml:math id="M61"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">min</mml:mo></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:mi>u</mml:mi><mml:mo>-</mml:mo><mml:mi>f</mml:mi><mml:msubsup><mml:mrow><mml:mo>|</mml:mo><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>&#x0002B;</mml:mo><mml:mfrac><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:msup><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">sym</mml:mtext></mml:mrow></mml:msub><mml:mi>u</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>f</italic> holds the values 1, &#x02212;1, and 0 according to the labeled and unlabeled data. Calculating the derivative shows that in order to obtain <italic>u</italic>, we need to solve the following <italic>linear system</italic> of equations</p>
<disp-formula id="E33"><mml:math id="M62"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>I</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mtext class="textrm" mathvariant="normal">sym</mml:mtext></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mi>u</mml:mi><mml:mo>=</mml:mo><mml:mi>f</mml:mi></mml:math></disp-formula>
<p>where <italic>I</italic> is the identity matrix of the appropriate dimensionality.</p>
<p>Furthermore, we compare our previously introduced approaches to the well-known one-nearest neighbor (1NN) method. In the context of time series classification this method was proposed in [<xref ref-type="bibr" rid="B11">11</xref>]. In each iteration, we identify the indices <italic>i, j</italic> with the shortest distance between the labeled sample <bold>x</bold><sub><italic>i</italic></sub> and the unlabeled sample <bold>x</bold><sub><italic>j</italic></sub>. The label of <bold>x</bold><sub><italic>i</italic></sub> is then copied to <bold>x</bold><sub><italic>j</italic></sub>. This process is repeated until no unlabeled data remain.</p>
<p>In [<xref ref-type="bibr" rid="B60">60</xref>], the authors construct several graph Laplacians and then perform the semi-supervised learning based on a weighted sum of the Laplacian matrices.</p></sec></sec>
<sec id="s5">
<title>5. Numerical Experiments</title>
<p>In this section, we illustrate how the algorithms discussed in this paper perform when applied to multiple time series data sets. We here focus on binary classification and use time series taken from the UCR time series classification archive<xref ref-type="fn" rid="fn0004"><sup>4</sup></xref> [<xref ref-type="bibr" rid="B61">61</xref>]. All our codes are to be found at <ext-link ext-link-type="uri" xlink:href="https://github.com/dominikalfke/TimeSeriesSSL">https://github.com/dominikalfke/TimeSeriesSSL</ext-link>. The distance measure we use here are the previously introduced DTW, Soft DTW divergence, MP, and Euclidean distances. For completeness, we list the default parameters for all methods in <xref ref-type="table" rid="T2">Table 2</xref>.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Default parameters used in the experiments.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Method</bold></th>
<th valign="top" align="left"><bold>Parameters and default values</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Allen&#x02013;Cahn</td>
<td valign="top" align="left"><italic>m</italic><sub><italic>e</italic></sub> &#x0003D; 20, <inline-formula><mml:math id="M37"><mml:mi>&#x003B5;</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msqrt><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac></mml:math></inline-formula>, <inline-formula><mml:math id="M38"><mml:mi>c</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>3</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003B5;</mml:mi></mml:mrow></mml:mfrac><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003C9;</mml:mi><mml:mo>,</mml:mo></mml:math></inline-formula> &#x003C9; &#x0003D; 1<italic>e</italic>10, &#x003C4; &#x0003D; 0.01, <italic>tol</italic> &#x0003D; 1<italic>e</italic> &#x02212; 8</td>
</tr>
<tr>
<td valign="top" align="left">GCN</td>
<td valign="top" align="left">10-NN sparsification, <italic>h</italic> &#x0003D; 32, dropout <italic>p</italic> &#x0003D; 0.5, <sc>Adam</sc> optimization [<xref ref-type="bibr" rid="B62">62</xref>], learning rate 0.01, weight decay 0.0005, 500 epochs</td>
</tr>
<tr>
<td valign="top" align="left">Linear System</td>
<td valign="top" align="left">&#x003B2; &#x0003D; 1, <italic>tol</italic> &#x0003D; 1<italic>e</italic> &#x02212; 5</td>
</tr>
<tr>
<td valign="top" align="left">1NN</td>
<td valign="top" align="left">&#x02014;</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>We split the presentation of the numerical results in the following way. We start by exploring the dependence of our schemes on some of the hyperparameters inherent in their derivation. We start by investigating the self-tuning parameters, namely the value of the chosen neighbor to compute the local scaling. We then study the performance of the Allen&#x02013;Cahn model depending on the number of eigenpairs used for the approximation of the graph Laplacian. For our main study, we pair up all distance measures with all learning methods and report the results on all datasets. Furthermore, we investigate how the method&#x00027;s performance depends on the number of available training data using random training splits.</p>
<sec>
<title>5.1. Self-Tuning Values</title>
<p>In section 2, we proposed the use of the self-tuning approach for the Gaussian function within the weight matrix. The crucial hyperparameter we want to explore now is the choice of neighbor <italic>k</italic> for the construction of &#x003C3;<sub><italic>i</italic></sub> &#x0003D; dist(<bold>x</bold><sub><italic>i</italic></sub>, <bold>x</bold><sub><italic>k,i</italic></sub>) with <bold>x</bold><sub><italic>k,i</italic></sub> the <italic>k</italic>-th nearest neighbor of the data point <bold>x</bold><sub><italic>i</italic></sub>. We can see from <xref ref-type="table" rid="T3">Table 3</xref> that the small values <italic>k</italic> &#x0003D; 7, 20 perform quite well in comparison to the larger self-tuning parameters. As a result we will use these smaller values in all further computations.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Study of self-tuning parameters.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th/>
<th/>
<th valign="top" align="center"><bold><italic>k</italic> &#x0003D; 7 (%)</bold></th>
<th valign="top" align="center"><bold><italic>k</italic> &#x0003D; 20 (%)</bold></th>
<th valign="top" align="center"><bold><inline-formula><mml:math id="M39"><mml:mstyle mathvariant="bold"><mml:mtext>k</mml:mtext></mml:mstyle><mml:mstyle mathvariant="bold"><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msqrt></mml:mstyle></mml:math></inline-formula> (%)</bold></th>
<th valign="top" align="center"><bold><italic>k</italic> &#x0003D; 0.1<italic>n</italic> (%)</bold></th>
<th valign="top" align="center"><bold><italic>k</italic> &#x0003D; 0.05<italic>n</italic> (%)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="7"><bold>ECG200</bold> <bold>(<italic>n</italic> &#x0003D; 200)</bold></td>
</tr>
<tr>
<td valign="top" align="left">MPDist</td>
<td valign="top" align="left">GCN</td>
<td valign="top" align="center"><bold>83.58</bold></td>
<td valign="top" align="center">81.74</td>
<td valign="top" align="center">81.90</td>
<td valign="top" align="center">81.74</td>
<td valign="top" align="center">82.54</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Allen-Cahn</td>
<td valign="top" align="center"><bold>81.00</bold></td>
<td valign="top" align="center">79.00</td>
<td valign="top" align="center">80.00</td>
<td valign="top" align="center">79.00</td>
<td valign="top" align="center">80.00</td>
</tr>
<tr>
<td valign="top" align="left">SDTW</td>
<td valign="top" align="left">GCN</td>
<td valign="top" align="center"><bold>91.95</bold></td>
<td valign="top" align="center">91.34</td>
<td valign="top" align="center">90.70</td>
<td valign="top" align="center">91.43</td>
<td valign="top" align="center">90.55</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Allen-Cahn</td>
<td valign="top" align="center"><bold>92.00</bold></td>
<td valign="top" align="center">90.00</td>
<td valign="top" align="center">91.00</td>
<td valign="top" align="center">90.00</td>
<td valign="top" align="center">91.00</td>
</tr>
<tr>
<td valign="top" align="left">DTW</td>
<td valign="top" align="left">GCN</td>
<td valign="top" align="center">88.92</td>
<td valign="top" align="center">86.76</td>
<td valign="top" align="center">87.43</td>
<td valign="top" align="center">86.76</td>
<td valign="top" align="center"><bold>88.97</bold></td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Allen-Cahn</td>
<td valign="top" align="center">82.00</td>
<td valign="top" align="center">82.00</td>
<td valign="top" align="center"><bold>83.00</bold></td>
<td valign="top" align="center">82.00</td>
<td valign="top" align="center">82.00</td>
</tr>
<tr>
<td valign="top" align="left" colspan="7"><bold>SonyAIBORobotSurface1</bold> <bold>(<italic>n</italic> &#x0003D; 621)</bold></td>
</tr>
<tr>
<td valign="top" align="left">MPDist</td>
<td valign="top" align="left">GCN</td>
<td valign="top" align="center"><bold>95.45</bold></td>
<td valign="top" align="center">88.74</td>
<td valign="top" align="center">93.08</td>
<td valign="top" align="center">78.10</td>
<td valign="top" align="center">89.62</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Allen-Cahn</td>
<td valign="top" align="center"><bold>75.54</bold></td>
<td valign="top" align="center">72.88</td>
<td valign="top" align="center">73.04</td>
<td valign="top" align="center">75.37</td>
<td valign="top" align="center">73.71</td>
</tr>
<tr>
<td valign="top" align="left">SDTW</td>
<td valign="top" align="left">GCN</td>
<td valign="top" align="center">90.32</td>
<td valign="top" align="center">91.46</td>
<td valign="top" align="center">92.48</td>
<td valign="top" align="center">87.34</td>
<td valign="top" align="center"><bold>92.85</bold></td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Allen-Cahn</td>
<td valign="top" align="center"><bold>93.68</bold></td>
<td valign="top" align="center">85.19</td>
<td valign="top" align="center">82.36</td>
<td valign="top" align="center">81.36</td>
<td valign="top" align="center">82.36</td>
</tr>
<tr>
<td valign="top" align="left">DTW</td>
<td valign="top" align="left">GCN</td>
<td valign="top" align="center"><bold>97.59</bold></td>
<td valign="top" align="center">97.58</td>
<td valign="top" align="center">97.48</td>
<td valign="top" align="center">96.49</td>
<td valign="top" align="center">97.35</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Allen-Cahn</td>
<td valign="top" align="center">84.03</td>
<td valign="top" align="center">86.85</td>
<td valign="top" align="center">87.69</td>
<td valign="top" align="center">87.19</td>
<td valign="top" align="center"><bold>88.19</bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="7"><bold>ECGFiveDays</bold> <bold>(<italic>n</italic> &#x0003D; 884)</bold></td>
</tr>
<tr>
<td valign="top" align="left">MPDist</td>
<td valign="top" align="left">GCN</td>
<td valign="top" align="center">99.70</td>
<td valign="top" align="center"><bold>99.77</bold></td>
<td valign="top" align="center">99.51</td>
<td valign="top" align="center">99.66</td>
<td valign="top" align="center">99.15</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Allen-Cahn</td>
<td valign="top" align="center">89.89</td>
<td valign="top" align="center">90.71</td>
<td valign="top" align="center">95.35</td>
<td valign="top" align="center">95.82</td>
<td valign="top" align="center"><bold>96.40</bold></td>
</tr>
<tr>
<td valign="top" align="left">SDTW</td>
<td valign="top" align="left">GCN</td>
<td valign="top" align="center">97.30</td>
<td valign="top" align="center">97.11</td>
<td valign="top" align="center"><bold>97.31</bold></td>
<td valign="top" align="center">96.49</td>
<td valign="top" align="center">97.06</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Allen-Cahn</td>
<td valign="top" align="center">82.00</td>
<td valign="top" align="center">86.99</td>
<td valign="top" align="center">85.48</td>
<td valign="top" align="center">86.76</td>
<td valign="top" align="center"><bold>87.57</bold></td>
</tr>
<tr>
<td valign="top" align="left">DTW</td>
<td valign="top" align="left">GCN</td>
<td valign="top" align="center">97.22</td>
<td valign="top" align="center">97.19</td>
<td valign="top" align="center"><bold>97.39</bold></td>
<td valign="top" align="center">97.20</td>
<td valign="top" align="center">97.35</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Allen-Cahn</td>
<td valign="top" align="center"><bold>77.35</bold></td>
<td valign="top" align="center">76.31</td>
<td valign="top" align="center">75.72</td>
<td valign="top" align="center">73.17</td>
<td valign="top" align="center">74.68</td>
</tr>
<tr>
<td valign="top" align="left" colspan="7"><bold>TwoLeadECG</bold> <italic><bold>(n = 1,162)</bold></italic></td>
</tr>
<tr>
<td valign="top" align="left">MPDist</td>
<td valign="top" align="left">GCN</td>
<td valign="top" align="center"><bold>99.81</bold></td>
<td valign="top" align="center">99.78</td>
<td valign="top" align="center"><bold>99.81</bold></td>
<td valign="top" align="center">99.62</td>
<td valign="top" align="center">99.74</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Allen-Cahn</td>
<td valign="top" align="center"><bold>99.12</bold></td>
<td valign="top" align="center">97.10</td>
<td valign="top" align="center">96.49</td>
<td valign="top" align="center">97.72</td>
<td valign="top" align="center">96.57</td>
</tr>
<tr>
<td valign="top" align="left">SDTW</td>
<td valign="top" align="left">GCN</td>
<td valign="top" align="center"><bold>92.10</bold></td>
<td valign="top" align="center">90.74</td>
<td valign="top" align="center">90.53</td>
<td valign="top" align="center">89.98</td>
<td valign="top" align="center">90.72</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Allen-Cahn</td>
<td valign="top" align="center"><bold>97.19</bold></td>
<td valign="top" align="center">93.24</td>
<td valign="top" align="center">91.04</td>
<td valign="top" align="center">87.27</td>
<td valign="top" align="center">87.71</td>
</tr>
<tr>
<td valign="top" align="left">DTW</td>
<td valign="top" align="left">GCN</td>
<td valign="top" align="center">92.94</td>
<td valign="top" align="center">94.04</td>
<td valign="top" align="center">94.98</td>
<td valign="top" align="center">93.97</td>
<td valign="top" align="center"><bold>96.49</bold></td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Allen-Cahn</td>
<td valign="top" align="center">93.85</td>
<td valign="top" align="center">92.36</td>
<td valign="top" align="center">92.10</td>
<td valign="top" align="center"><bold>94.12</bold></td>
<td valign="top" align="center">93.50</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Bold values indicate most accurate classification</italic>.</p>
</table-wrap-foot>
</table-wrap></sec>
<sec>
<title>5.2. Spectral Approximation</title>
<p>As described in section 4 the Allen&#x02013;Cahn equation is projected to a lower-dimensional space using the insightful information provided by the eigenvectors to the smallest eigenvalues of the graph Laplacian. We now investigate how the number of used eigenvectors impacts the accuracy. In the following we vary the number of eigenvalues from 10 to 190 and compare the performance of the Allen&#x02013;Cahn method on three different datasets. The results are shown in <xref ref-type="table" rid="T4">Table 4</xref> and it becomes clear that a vast number of eigenvectors does not lead to better classification accuracy. As a result we require a smaller number of eigenpair computations and also fewer computations within the Allen&#x02013;Cahn scheme itself. The comparison was done for the self-tuning parameter <italic>k</italic> &#x0003D; 7.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>Varying the number of eigenpairs for the reduced Allen&#x02013;Cahn equation.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Number of eigenvalues</bold></th>
<th valign="top" align="center"><bold>10 (%)</bold></th>
<th valign="top" align="center"><bold>20 (%)</bold></th>
<th valign="top" align="center"><bold>30 (%)</bold></th>
<th valign="top" align="center"><bold>150 (%)</bold></th>
<th valign="top" align="center"><bold>190 (%)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="6"><bold>Dataset ECG200</bold></td>
</tr>
<tr>
<td valign="top" align="left">MPDist</td>
<td valign="top" align="center">82.00</td>
<td valign="top" align="center">81.00</td>
<td valign="top" align="center"><bold>86.00</bold></td>
<td valign="top" align="center">62.00</td>
<td valign="top" align="center">56.00</td>
</tr>
<tr>
<td valign="top" align="left">SDTW</td>
<td valign="top" align="center">78.00</td>
<td valign="top" align="center"><bold>92.00</bold></td>
<td valign="top" align="center"><bold>92.00</bold></td>
<td valign="top" align="center">68.00</td>
<td valign="top" align="center">66.00</td>
</tr>
<tr>
<td valign="top" align="left">DTW</td>
<td valign="top" align="center">78.00</td>
<td valign="top" align="center">82.00</td>
<td valign="top" align="center"><bold>87.00</bold></td>
<td valign="top" align="center">69.00</td>
<td valign="top" align="center">54.00</td>
</tr>
<tr style="border-top: thin solid #000000;">
<td valign="top" align="left"><bold>Number of eigenvalues</bold></td>
<td valign="top" align="center"><bold>10 (%)</bold></td>
<td valign="top" align="center"><bold>20 (%)</bold></td>
<td valign="top" align="center"><bold>30 (%)</bold></td>
<td valign="top" align="center"><bold>500 (%)</bold></td>
<td valign="top" align="center"><bold>600 (%)</bold></td>
</tr>
<tr style="border-top: thin solid #000000;">
<td valign="top" align="left" colspan="6"><bold>SonyAIBORobotSurface1</bold></td>
</tr>
<tr>
<td valign="top" align="left">MPDist</td>
<td valign="top" align="center"><bold>85.36</bold></td>
<td valign="top" align="center">75.54</td>
<td valign="top" align="center">73.04</td>
<td valign="top" align="center">51.58</td>
<td valign="top" align="center">51.08</td>
</tr>
<tr>
<td valign="top" align="left">SDTW</td>
<td valign="top" align="center"><bold>96.17</bold></td>
<td valign="top" align="center">93.68</td>
<td valign="top" align="center">83.19</td>
<td valign="top" align="center">52.08</td>
<td valign="top" align="center">49.92</td>
</tr>
<tr>
<td valign="top" align="left">DTW</td>
<td valign="top" align="center"><bold>90.01</bold></td>
<td valign="top" align="center">84.03</td>
<td valign="top" align="center">72.71</td>
<td valign="top" align="center">52.41</td>
<td valign="top" align="center">48.58</td>
</tr>
<tr style="border-top: thin solid #000000;">
<td valign="top" align="left"><bold>Number of eigenvalues</bold></td>
<td valign="top" align="center"><bold>10 (%)</bold></td>
<td valign="top" align="center"><bold>20 (%)</bold></td>
<td valign="top" align="center"><bold>30 (%)</bold></td>
<td valign="top" align="center"><bold>700 (%)</bold></td>
<td valign="top" align="center"><bold>800 (%)</bold></td>
</tr>
<tr style="border-top: thin solid #000000;">
<td valign="top" align="left" colspan="6"><bold>ECGFiveDays</bold></td>
</tr>
<tr>
<td valign="top" align="left">MPDist</td>
<td valign="top" align="center">87.19</td>
<td valign="top" align="center"><bold>89.89</bold></td>
<td valign="top" align="center">85.95</td>
<td valign="top" align="center">50.29</td>
<td valign="top" align="center">51.22</td>
</tr>
<tr>
<td valign="top" align="left">SDTW</td>
<td valign="top" align="center"><bold>91.52</bold></td>
<td valign="top" align="center">82.00</td>
<td valign="top" align="center">84.20</td>
<td valign="top" align="center">54.00</td>
<td valign="top" align="center">52.38</td>
</tr>
<tr>
<td valign="top" align="left">DTW</td>
<td valign="top" align="center">68.87</td>
<td valign="top" align="center"><bold>77.35</bold></td>
<td valign="top" align="center">77.00</td>
<td valign="top" align="center">49.82</td>
<td valign="top" align="center">50.29</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Bold values indicate most accurate classification</italic>.</p>
</table-wrap-foot>
</table-wrap></sec>
<sec>
<title>5.3. Full Method Comparison</title>
<p>We now compare the Allen-Cahn approach, the GCN scheme, the linear systems based method, and the 1NN algorithm, each paired up with each of the distance measures introduced in section 3. Full results are listed in <xref ref-type="fig" rid="F6">Figures 6</xref>, <xref ref-type="fig" rid="F7">7</xref>. We show the comparison for all 42 datasets.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Comparison of the proposed methods using various distance measures for a variety of time series data. The size of the training set is specified in <monospace>TwoClassProblems.csv</monospace> within <ext-link ext-link-type="uri" xlink:href="http://www.timeseriesclassification.com/Downloads/Archives/Univariate2018_arff.zip">http://www.timeseriesclassification.com/Downloads/Archives/Univariate2018_arff.zip</ext-link>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fams-07-784855-g0006.tif"/>
</fig>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>Comparison of the proposed methods using various distance measures for a variety of time series data.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fams-07-784855-g0007.tif"/>
</fig>
<p>As can be seen there are several datasets where the performance of all methods is fairly similar even when the distance measure is varied. Here, we name <monospace>Chinatown</monospace>, <monospace>Earthquakes</monospace>, <monospace>GunPoint</monospace>, <monospace>ItalyPowerDemand</monospace>, <monospace>MoteStrain</monospace>, <monospace>Wafer</monospace>. There are several examples where the methods do not seem to perform well, with GCN and 1NN relatively similar outperforming the Linear System and Allen&#x02013;Cahn approach. Such examples are <monospace>DodgerLoopGame</monospace>, <monospace>DodgerLoopWeekend</monospace>. The GCN method clearly does not perform well with the GunPoint datasets where the other methods clearly perform well. It is surprising to note that the Euclidean distance, given its computational speed and simplicity, does not come out as underperforming with respect to the accuracy across the different methods. There are very few datasets where one distance clearly outperforms the other choice. We name <monospace>ShapeletSim</monospace>, <monospace>ToeSegementation1</monospace> here. One might conjecture that the varying sizes of the training data might be a reason for the difference in performance of the models. To investigate this further we will next vary the training splits for all datasets and methods.</p></sec>
<sec>
<title>5.4. Varying Training Splits</title>
<p>In <xref ref-type="fig" rid="F8">Figures 8</xref>&#x02013;<xref ref-type="fig" rid="F12">12</xref>, we vary the size of the training set from 1 to 20% of the available data. All reported numbers are averages over 100 random splits. The numbers we observe mirror the performance of the full training size. We see that the methods show reduced performance when only 1% of the training data are used but often reach an accuracy plateau when 5 to 10% of the training data are used. We observe that the size of the training set alone does not explain the different performance in the various datasets and methods applied here.</p>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p>Method accuracy comparison for random training splits of different sizes (part 1/5).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fams-07-784855-g0008.tif"/>
</fig>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption><p>Method accuracy comparison for random training splits of different sizes (part 2/5).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fams-07-784855-g0009.tif"/>
</fig>
<fig id="F10" position="float">
<label>Figure 10</label>
<caption><p>Method accuracy comparison for random training splits of different sizes (part 3/5).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fams-07-784855-g0010.tif"/>
</fig>
<fig id="F11" position="float">
<label>Figure 11</label>
<caption><p>Method accuracy comparison for random training splits of different sizes (part 4/5).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fams-07-784855-g0011.tif"/>
</fig>
<fig id="F12" position="float">
<label>Figure 12</label>
<caption><p>Method accuracy comparison for random training splits of different sizes (part 5/5).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fams-07-784855-g0012.tif"/>
</fig></sec></sec>
<sec sec-type="conclusions" id="s6">
<title>6. Conclusion</title>
<p>In this paper we took to the task of classifying time series data in a semi-supervised learning setting. For this we proposed to represent the data as a fully-connected graph where the edge weights are created based on a Gaussian similarity measure (1). The heart of this function is the difference measure between the time series, for which we used the (Soft) Dynamic Time Warping and Matrix Profile based distance measures as well as the Euclidean distance. We then investigated several learning algorithms, namely, the Allen&#x02013;Cahn-based method, the Graph Convolutional Network scheme, and a linear system approach, all reliant on the graph Laplacian, as well as the Nearest Neighbor method. We then illustrated the performance of all pairs of distance measure and learning methods. In this empirical study we observed that the methods tend to show an increased performance adding more training data. Studying all binary time-series with the <ext-link ext-link-type="uri" xlink:href="https://timeseriesclassification.com">timeseriesclassification.com</ext-link> repository gives results that in accordance with the no free lunch theorem show no clear winner. On the positive side the methods often perform quite well and there are only a few datasets with decreased performance. The comparison of the distance measures indicates there are certain cases where they outperform their competitors but also there is no clear winner with regards to accuracy. We believe that this empirical, reproducible study will encourage further research in this direction. Additionally, it might be interesting to consider model-based representations of time-series such as ARMA [<xref ref-type="bibr" rid="B63">63</xref>, <xref ref-type="bibr" rid="B64">64</xref>] to use within the graph representations used here.</p></sec>
<sec sec-type="data-availability" id="s7">
<title>Data Availability Statement</title>
<p>The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.</p></sec>
<sec id="s8">
<title>Author Contributions</title>
<p>MG provide the initial implementation of some of the methods. MS supervised the other members, wrote parts of the manuscript, and implemented the Allen Cahn scheme. DB implemented the GCN approach, wrote parts of the manuscript, and oversaw the design of the tests. LP implemented several algorithms and wrote parts of the manuscript. All authors contributed to the article and approved the submitted version.</p></sec>
<sec sec-type="funding-information" id="s9">
<title>Funding</title>
<p>MS and LP acknowledge the funding of the BMBF grant 01|S20053A. DB was partially supported by KINTUC project [S&#x000E4;chsische Aufbaubank&#x02013;F&#x000F6;rderbank&#x02013;(SAB) 100378180].</p></sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p></sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x00027;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p></sec>
</body>
<back>
<ack><p>All authors would like to acknowledge the hard work and dedication by the team maintaining <ext-link ext-link-type="uri" xlink:href="http://www.timeseriesclassification.com/">www.timeseriesclassification.com/</ext-link>. The publication of this article was funded by Chemnitz University of Technology.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fu</surname> <given-names>TC</given-names></name></person-group>. <article-title>A review on time series data mining</article-title>. <source>Eng Appl Artif Intell</source>. (<year>2011</year>) <volume>24</volume>:<fpage>164</fpage>&#x02013;<lpage>81</lpage>. <pub-id pub-id-type="doi">10.1016/j.engappai.2010.09.007</pub-id></citation></ref>
<ref id="B2">
<label>2.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bello-Orgaz</surname> <given-names>G</given-names></name> <name><surname>Jung</surname> <given-names>JJ</given-names></name> <name><surname>Camacho</surname> <given-names>D</given-names></name></person-group>. <article-title>Social big data: recent achievements and new challenges</article-title>. <source>Inform Fusion</source>. (<year>2016</year>) <volume>28</volume>:<fpage>45</fpage>&#x02013;<lpage>59</lpage>. <pub-id pub-id-type="doi">10.1016/j.inffus.2015.08.005</pub-id><pub-id pub-id-type="pmid">32288689</pub-id></citation></ref>
<ref id="B3">
<label>3.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>F</given-names></name> <name><surname>Deng</surname> <given-names>P</given-names></name> <name><surname>Wan</surname> <given-names>J</given-names></name> <name><surname>Zhang</surname> <given-names>D</given-names></name> <name><surname>Vasilakos</surname> <given-names>AV</given-names></name> <name><surname>Rong</surname> <given-names>X</given-names></name></person-group>. <article-title>Data mining for the internet of things: literature review and challenges</article-title>. <source>Int J Distribut Sensor Netw</source>. (<year>2015</year>) <volume>11</volume>:<fpage>431047</fpage>. <pub-id pub-id-type="doi">10.1155/2015/431047</pub-id><pub-id pub-id-type="pmid">33805471</pub-id></citation></ref>
<ref id="B4">
<label>4.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Laptev</surname> <given-names>N</given-names></name> <name><surname>Amizadeh</surname> <given-names>S</given-names></name> <name><surname>Flint</surname> <given-names>I</given-names></name></person-group>. <article-title>Generic and scalable framework for automated time-series anomaly detection</article-title>. In: <source>Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>. (<year>2015</year>). p. <fpage>1939</fpage>&#x02013;<lpage>47</lpage>. <pub-id pub-id-type="doi">10.1145/2783258.2788611</pub-id></citation></ref>
<ref id="B5">
<label>5.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chiu</surname> <given-names>B</given-names></name> <name><surname>Keogh</surname> <given-names>E</given-names></name> <name><surname>Lonardi</surname> <given-names>S</given-names></name></person-group>. <article-title>Probabilistic discovery of time series motifs</article-title>. In: <source>Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>. (<year>2003</year>). p. <fpage>493</fpage>&#x02013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1145/956750.956808</pub-id></citation></ref>
<ref id="B6">
<label>6.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>De Gooijer</surname> <given-names>JG</given-names></name> <name><surname>Hyndman</surname> <given-names>RJ</given-names></name></person-group>. <article-title>25 years of time series forecasting</article-title>. <source>Int J Forecast</source> (<year>2006</year>). <volume>22</volume>:<fpage>443</fpage>&#x02013;<lpage>73</lpage>. <pub-id pub-id-type="doi">10.1016/j.ijforecast.2006.01.001</pub-id></citation></ref>
<ref id="B7">
<label>7.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wei</surname> <given-names>WW</given-names></name></person-group>. <article-title>Time series analysis</article-title>. In: <person-group person-group-type="editor"><name><surname>Todd little</surname></name></person-group> editor. <source>The Oxford Handbook of Quantitative Methods in Psychology</source>. Vol. 2. (<year>2006</year>).</citation></ref>
<ref id="B8">
<label>8.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chatfield</surname> <given-names>C</given-names></name> <name><surname>Xing</surname> <given-names>H</given-names></name></person-group>. <source>The Analysis of Time Series: An Introduction with R</source>. <publisher-name>CRC Press</publisher-name>. (<year>2019</year>) <pub-id pub-id-type="doi">10.1201/9781351259446</pub-id></citation></ref>
<ref id="B9">
<label>9.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fawaz</surname> <given-names>HI</given-names></name> <name><surname>Forestier</surname> <given-names>G</given-names></name> <name><surname>Weber</surname> <given-names>J</given-names></name> <name><surname>Idoumghar</surname> <given-names>L</given-names></name> <name><surname>Muller</surname> <given-names>PA</given-names></name></person-group>. <article-title>Deep learning for time series classification: a review</article-title>. <source>Data Mining Knowledge Discov</source>. (<year>2019</year>) <volume>33</volume>:<fpage>917</fpage>&#x02013;<lpage>63</lpage>. <pub-id pub-id-type="doi">10.1007/s10618-019-00619-1</pub-id></citation></ref>
<ref id="B10">
<label>10.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Abanda</surname> <given-names>A</given-names></name> <name><surname>Mori</surname> <given-names>U</given-names></name> <name><surname>Lozano</surname> <given-names>JA</given-names></name></person-group>. <article-title>A review on distance based time series classification</article-title>. <source>Data Mining Knowledge Discov</source>. (<year>2019</year>) <volume>33</volume>:<fpage>378</fpage>&#x02013;<lpage>412</lpage>. <pub-id pub-id-type="doi">10.1007/s10618-018-0596-4</pub-id></citation></ref>
<ref id="B11">
<label>11.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wei</surname> <given-names>L</given-names></name> <name><surname>Keogh</surname> <given-names>E</given-names></name></person-group>. <article-title>Semi-supervised time series classification</article-title>. In: <source>Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>. (<year>2006</year>). p. <fpage>748</fpage>&#x02013;<lpage>53</lpage>. <pub-id pub-id-type="doi">10.1145/1150402.1150498</pub-id></citation></ref>
<ref id="B12">
<label>12.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liao</surname> <given-names>TW</given-names></name></person-group>. <article-title>Clustering of time series data-a survey</article-title>. <source>Pattern Recogn</source>. (<year>2005</year>) <volume>38</volume>:<fpage>1857</fpage>&#x02013;<lpage>74</lpage>. <pub-id pub-id-type="doi">10.1016/j.patcog.2005.01.025</pub-id><pub-id pub-id-type="pmid">21666407</pub-id></citation></ref>
<ref id="B13">
<label>13.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aghabozorgi</surname> <given-names>S</given-names></name> <name><surname>Shirkhorshidi</surname> <given-names>AS</given-names></name> <name><surname>Wah</surname> <given-names>TY</given-names></name></person-group>. <article-title>Time-series clustering-a decade review</article-title>. <source>Inform Syst</source>. (<year>2015</year>) <volume>53</volume>:<fpage>16</fpage>&#x02013;<lpage>38</lpage>. <pub-id pub-id-type="doi">10.1016/j.is.2015.04.007</pub-id></citation></ref>
<ref id="B14">
<label>14.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shifaz</surname> <given-names>A</given-names></name> <name><surname>Pelletier</surname> <given-names>C</given-names></name> <name><surname>Petitjean</surname> <given-names>F</given-names></name> <name><surname>Webb</surname> <given-names>GI</given-names></name></person-group>. <article-title>TS-CHIEF: a scalable and accurate forest algorithm for time series classification</article-title>. <source>Data Mining Knowledge Discov</source>. (<year>2020</year>) <volume>34</volume>:<fpage>742</fpage>&#x02013;<lpage>75</lpage>. <pub-id pub-id-type="doi">10.1007/s10618-020-00679-8</pub-id></citation></ref>
<ref id="B15">
<label>15.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dempster</surname> <given-names>A</given-names></name> <name><surname>Petitjean</surname> <given-names>F</given-names></name> <name><surname>Webb</surname> <given-names>GI</given-names></name></person-group>. <article-title>ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels</article-title>. <source>Data Mining Knowledge Discov</source>. (<year>2020</year>) <volume>34</volume>:<fpage>1454</fpage>&#x02013;<lpage>95</lpage>. <pub-id pub-id-type="doi">10.1007/s10618-020-00701-z</pub-id></citation></ref>
<ref id="B16">
<label>16.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fawaz</surname> <given-names>HI</given-names></name> <name><surname>Lucas</surname> <given-names>B</given-names></name> <name><surname>Forestier</surname> <given-names>G</given-names></name> <name><surname>Pelletier</surname> <given-names>C</given-names></name> <name><surname>Schmidt</surname> <given-names>DF</given-names></name> <name><surname>Weber</surname> <given-names>J</given-names></name> <etal/></person-group>. <article-title>Inceptiontime: Finding alexnet for time series classification</article-title>. <source>Data Mining Knowledge Discov</source>. (<year>2020</year>) <volume>34</volume>:<fpage>1936</fpage>&#x02013;<lpage>62</lpage>. <pub-id pub-id-type="doi">10.1007/s10618-020-00710-y</pub-id></citation></ref>
<ref id="B17">
<label>17.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>X</given-names></name> <name><surname>Goldberg</surname> <given-names>A</given-names></name></person-group>. <source>Introduction to Semi-supervised Learning</source>. <publisher-name>Morgan &#x00026; Claypool Publishers</publisher-name> (<year>2009</year>). <pub-id pub-id-type="doi">10.2200/S00196ED1V01Y200906AIM006</pub-id></citation></ref>
<ref id="B18">
<label>18.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chapelle</surname> <given-names>O</given-names></name> <name><surname>Sch&#x000F6;lkopf</surname> <given-names>B</given-names></name> <name><surname>Zien</surname> <given-names>A</given-names></name></person-group>. <article-title>Semi-supervised learning</article-title>. <source>IEEE Trans Neural Netw</source>. (<year>2009</year>) <volume>20</volume>:<fpage>542</fpage>. <pub-id pub-id-type="doi">10.1109/TNN.2009.2015974</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B19">
<label>19.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stoll</surname> <given-names>M</given-names></name></person-group>. <article-title>A literature survey of matrix methods for data science</article-title>. <source>GAMM-Mitt</source>. (<year>2020</year>) <volume>43</volume>:<fpage>e202000013</fpage>:<lpage>4</lpage>. <pub-id pub-id-type="doi">10.1002/gamm.202000013</pub-id><pub-id pub-id-type="pmid">25855820</pub-id></citation></ref>
<ref id="B20">
<label>20.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mercado</surname> <given-names>P</given-names></name> <name><surname>Bosch</surname> <given-names>J</given-names></name> <name><surname>Stoll</surname> <given-names>M</given-names></name></person-group>. <article-title>Node classification for signed social networks using diffuse interface methods</article-title>. In: <source>ECMLPKDD</source>. (<year>2019</year>). <pub-id pub-id-type="doi">10.1007/978-3-030-46150-8_31</pub-id></citation></ref>
<ref id="B21">
<label>21.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kipf</surname> <given-names>TN</given-names></name> <name><surname>Welling</surname> <given-names>M</given-names></name></person-group>. <article-title>Semi-supervised classification with graph convolutional networks</article-title>. <source>arXiv [Preprint]. arXiv:160902907</source> (<year>2016</year>).<pub-id pub-id-type="pmid">29890408</pub-id></citation></ref>
<ref id="B22">
<label>22.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bertozzi</surname> <given-names>AL</given-names></name> <name><surname>Luo</surname> <given-names>X</given-names></name> <name><surname>Stuart</surname> <given-names>AM</given-names></name> <name><surname>Zygalakis</surname> <given-names>KC</given-names></name></person-group>. <article-title>Uncertainty quantification in graph-based classification of high dimensional data</article-title>. <source>SIAM/ASA J Uncertainty Quant</source>. (<year>2018</year>) <volume>6</volume>:<fpage>568</fpage>&#x02013;<lpage>95</lpage>. <pub-id pub-id-type="doi">10.1137/17M1134214</pub-id></citation></ref>
<ref id="B23">
<label>23.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>von Luxburg</surname> <given-names>U</given-names></name></person-group>. <article-title>A tutorial on spectral clustering</article-title>. <source>Stat Comput</source>. (<year>2007</year>) <volume>17</volume>:<fpage>395</fpage>&#x02013;<lpage>416</lpage>. <pub-id pub-id-type="doi">10.1007/s11222-007-9033-z</pub-id><pub-id pub-id-type="pmid">32650053</pub-id></citation></ref>
<ref id="B24">
<label>24.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bruna</surname> <given-names>J</given-names></name> <name><surname>Zaremba</surname> <given-names>W</given-names></name> <name><surname>Szlam</surname> <given-names>A</given-names></name> <name><surname>LeCun</surname> <given-names>Y</given-names></name></person-group>. <article-title>Spectral networks and locally connected networks on graphs</article-title>. <source>arXiv [Preprint]. arXiv:13126203</source> (<year>2013</year>).</citation></ref>
<ref id="B25">
<label>25.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chung</surname> <given-names>FR</given-names></name> <name><surname>Graham</surname> <given-names>FC</given-names></name></person-group>. <source>Spectral graph Theory</source>. American Mathematical Soc. (<year>1997</year>).</citation></ref>
<ref id="B26">
<label>26.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jung</surname> <given-names>A</given-names></name></person-group>. <article-title>Networked exponential families for big data over networks</article-title>. <source>IEEE Access</source>. (<year>2020</year>) <volume>8</volume>:<fpage>202897</fpage>&#x02013;<lpage>909</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2020.3033817</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B27">
<label>27.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jung</surname> <given-names>A</given-names></name> <name><surname>Tran</surname> <given-names>N</given-names></name></person-group>. <article-title>Localized linear regression in networked data</article-title>. <source>IEEE Signal Process Lett</source>. (<year>2019</year>) <volume>26</volume>:<fpage>1090</fpage>&#x02013;<lpage>4</lpage>. <pub-id pub-id-type="doi">10.1109/LSP.2019.2918933</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B28">
<label>28.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>M&#x000FC;ller</surname> <given-names>M</given-names></name></person-group>. <source>Information Retrieval for Music and Motion</source>. vol. 2. <publisher-loc>Springer</publisher-loc> (<year>2007</year>). <pub-id pub-id-type="doi">10.1007/978-3-540-74048-3</pub-id></citation></ref>
<ref id="B29">
<label>29.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Cuturi</surname> <given-names>M</given-names></name> <name><surname>Blondel</surname> <given-names>M</given-names></name></person-group>. <article-title>Soft-DTW: a differentiable loss function for time-series</article-title>. In: <source>International Conference on Machine Learning</source>. <publisher-loc>PMLR</publisher-loc> (<year>2017</year>). p. <fpage>894</fpage>&#x02013;<lpage>903</lpage>.</citation></ref>
<ref id="B30">
<label>30.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gharghabi</surname> <given-names>S</given-names></name> <name><surname>Imani</surname> <given-names>S</given-names></name> <name><surname>Bagnall</surname> <given-names>A</given-names></name> <name><surname>Darvishzadeh</surname> <given-names>A</given-names></name> <name><surname>Keogh</surname> <given-names>E</given-names></name></person-group>. <article-title>An ultra-fast time series distance measure to allow data mining in more complex real-world deployments</article-title>. <source>Data Mining Knowledge Discov</source>. (<year>2020</year>) <volume>34</volume>:<fpage>1104</fpage>&#x02013;<lpage>35</lpage>. <pub-id pub-id-type="doi">10.1007/s10618-020-00695-8</pub-id></citation></ref>
<ref id="B31">
<label>31.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bertozzi</surname> <given-names>AL</given-names></name> <name><surname>Flenner</surname> <given-names>A</given-names></name></person-group>. <article-title>Diffuse interface models on graphs for classification of high dimensional data</article-title>. <source>Multiscale Model Simul</source>. (<year>2012</year>) <volume>10</volume>:<fpage>1090</fpage>&#x02013;<lpage>118</lpage>. <pub-id pub-id-type="doi">10.1137/11083109X</pub-id></citation></ref>
<ref id="B32">
<label>32.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bagnall</surname> <given-names>A</given-names></name> <name><surname>Lines</surname> <given-names>J</given-names></name> <name><surname>Bostrom</surname> <given-names>A</given-names></name> <name><surname>Large</surname> <given-names>J</given-names></name> <name><surname>Keogh</surname> <given-names>E</given-names></name></person-group>. <article-title>The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances</article-title>. <source>Data mining Knowledge Discov</source>. (<year>2017</year>) <volume>31</volume>:<fpage>606</fpage>&#x02013;<lpage>60</lpage>. <pub-id pub-id-type="doi">10.1007/s10618-016-0483-9</pub-id><pub-id pub-id-type="pmid">33679210</pub-id></citation></ref>
<ref id="B33">
<label>33.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ruiz</surname> <given-names>AP</given-names></name> <name><surname>Flynn</surname> <given-names>M</given-names></name> <name><surname>Large</surname> <given-names>J</given-names></name> <name><surname>Middlehurst</surname> <given-names>M</given-names></name> <name><surname>Bagnall</surname> <given-names>A</given-names></name></person-group>. <article-title>The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances</article-title>. <source>Data Mining Knowledge Discov</source>. (<year>2021</year>) <volume>35</volume>:<fpage>401</fpage>&#x02013;<lpage>49</lpage>. <pub-id pub-id-type="doi">10.1007/s10618-020-00727-3</pub-id><pub-id pub-id-type="pmid">33679210</pub-id></citation></ref>
<ref id="B34">
<label>34.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>MacQueen</surname> <given-names>J</given-names></name></person-group>. <article-title>Some methods for classification and analysis of multivariate observations</article-title>. In: <source>Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability</source>. <publisher-loc>Oakland, CA</publisher-loc> (<year>1967</year>). p. <fpage>281</fpage>&#x02013;<lpage>97</lpage>.<pub-id pub-id-type="pmid">26336666</pub-id></citation></ref>
<ref id="B35">
<label>35.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>MacKay</surname> <given-names>DJ</given-names></name> <name><surname>Mac Kay</surname> <given-names>DJ</given-names></name></person-group>. <source>Information Theory, Inference and Learning Algorithms</source>. <publisher-name>Cambridge University Press</publisher-name> (<year>2003</year>).</citation></ref>
<ref id="B36">
<label>36.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Belkin</surname> <given-names>M</given-names></name> <name><surname>Niyogi</surname> <given-names>P</given-names></name></person-group>. <article-title>Laplacian eigenmaps and spectral techniques for embedding and clustering</article-title>. In: <source>Advances in Neural Information Processing Systems</source>. (<year>2001</year>). p. <fpage>585</fpage>&#x02013;<lpage>91</lpage>.<pub-id pub-id-type="pmid">15333211</pub-id></citation></ref>
<ref id="B37">
<label>37.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Shawe-Taylor</surname> <given-names>J</given-names></name> <name><surname>Cristianini</surname> <given-names>N</given-names></name></person-group>. <source>Kernel Methods for Pattern Analysis</source>. <publisher-name>Cambridge University Press</publisher-name> (<year>2004</year>). <pub-id pub-id-type="doi">10.1017/CBO9780511809682</pub-id><pub-id pub-id-type="pmid">30886898</pub-id></citation></ref>
<ref id="B38">
<label>38.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hofmann</surname> <given-names>T</given-names></name> <name><surname>Sch&#x000F6;lkopf</surname> <given-names>B</given-names></name> <name><surname>Smola</surname> <given-names>AJ</given-names></name></person-group>. <article-title>Kernel methods in machine learning</article-title>. <source>Ann Stat</source>. (<year>2008</year>) <volume>36</volume>:<fpage>1171</fpage>&#x02013;<lpage>220</lpage>. <pub-id pub-id-type="doi">10.1214/009053607000000677</pub-id><pub-id pub-id-type="pmid">18320210</pub-id></citation></ref>
<ref id="B39">
<label>39.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zelnik-Manor</surname> <given-names>L</given-names></name> <name><surname>Perona</surname> <given-names>P</given-names></name></person-group>. <article-title>Self-tuning spectral clustering</article-title>. In: <source>Advances in Neural Information Processing Systems</source>. (<year>2005</year>). p. <fpage>1601</fpage>&#x02013;<lpage>8</lpage>.</citation></ref>
<ref id="B40">
<label>40.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Keogh</surname> <given-names>E</given-names></name> <name><surname>Kasetty</surname> <given-names>S</given-names></name></person-group>. <article-title>On the need for time series data mining benchmarks: a survey and empirical demonstration</article-title>. <source>Data Mining Knowledge Discov</source>. (<year>2003</year>) <volume>7</volume>:<fpage>349</fpage>&#x02013;<lpage>71</lpage>. <pub-id pub-id-type="doi">10.1023/A:1024988512476</pub-id></citation></ref>
<ref id="B41">
<label>41.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Salvador</surname> <given-names>S</given-names></name> <name><surname>Chan</surname> <given-names>PK</given-names></name></person-group>. <article-title>Toward accurate dynamic time warping in linear time and space</article-title>. <source>Intell Data Anal</source>. (<year>2004</year>) <volume>11</volume>:<fpage>70</fpage>&#x02013;<lpage>80</lpage>. <pub-id pub-id-type="doi">10.3233/IDA-2007-11508</pub-id></citation></ref>
<ref id="B42">
<label>42.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>R</given-names></name> <name><surname>Keogh</surname> <given-names>EJ</given-names></name></person-group>. <article-title>FastDTW is approximate and generally slower than the algorithm it approximates</article-title>. <source>IEEE Trans Knowledge Data Eng</source>. (<year>2020</year>). <pub-id pub-id-type="doi">10.1109/TKDE.2020.3033752</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B43">
<label>43.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Blondel</surname> <given-names>M</given-names></name> <name><surname>Mensch</surname> <given-names>A</given-names></name> <name><surname>Vert</surname> <given-names>JP</given-names></name></person-group>. <article-title>Differentiable divergences between time series</article-title>. <source>arXiv [Preprint]. arXiv:201008354</source> (<year>2020</year>).</citation></ref>
<ref id="B44">
<label>44.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lin</surname> <given-names>H</given-names></name> <name><surname>Hong</surname> <given-names>X</given-names></name> <name><surname>Ma</surname> <given-names>Z</given-names></name> <name><surname>Wei</surname> <given-names>X</given-names></name> <name><surname>Qiu</surname> <given-names>Y</given-names></name> <name><surname>Wang</surname> <given-names>Y</given-names></name> <etal/></person-group>. <article-title>Direct measure matching for crowd counting</article-title>. <source>arXiv [Preprint]. arXiv:210701558</source> (<year>2021</year>). <pub-id pub-id-type="doi">10.24963/ijcai.2021/116</pub-id></citation></ref>
<ref id="B45">
<label>45.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Taylor</surname> <given-names>JE</given-names></name> <name><surname>Cahn</surname> <given-names>JW</given-names></name></person-group>. <article-title>Linking anisotropic sharp and diffuse surface motion laws via gradient flows</article-title>. <source>J Statist Phys</source>. (<year>1994</year>) <volume>77</volume>:<fpage>183</fpage>&#x02013;<lpage>97</lpage>. <pub-id pub-id-type="doi">10.1007/BF02186838</pub-id></citation></ref>
<ref id="B46">
<label>46.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Allen</surname> <given-names>SM</given-names></name> <name><surname>Cahn</surname> <given-names>JW</given-names></name></person-group>. <article-title>A microscopic theory for antiphase boundary motion and its application to antiphase domain coarsening</article-title>. <source>Acta Metall</source>. (<year>1979</year>) <volume>27</volume>:<fpage>1085</fpage>&#x02013;<lpage>95</lpage>. <pub-id pub-id-type="doi">10.1016/0001-6160(79)90196-2</pub-id></citation></ref>
<ref id="B47">
<label>47.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cahn</surname> <given-names>JW</given-names></name> <name><surname>Hilliard</surname> <given-names>JE</given-names></name></person-group>. <article-title>Free energy of a nonuniform system. I. Interfacial free energy</article-title>. <source>J Chem Phys</source>. (<year>1958</year>) <volume>28</volume>:<fpage>258</fpage>&#x02013;<lpage>67</lpage>. <pub-id pub-id-type="doi">10.1063/1.1744102</pub-id><pub-id pub-id-type="pmid">24313031</pub-id></citation></ref>
<ref id="B48">
<label>48.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bosch</surname> <given-names>J</given-names></name> <name><surname>Kay</surname> <given-names>D</given-names></name> <name><surname>Stoll</surname> <given-names>M</given-names></name> <name><surname>Wathen</surname> <given-names>A</given-names></name></person-group>. <article-title>Fast solvers for Cahn-Hilliard inpainting</article-title>. <source>SIAM J Imaging Sci</source>. (<year>2014</year>) <volume>7</volume>:<fpage>67</fpage>&#x02013;<lpage>97</lpage>. <pub-id pub-id-type="doi">10.1137/130921842</pub-id></citation></ref>
<ref id="B49">
<label>49.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bertozzi</surname> <given-names>AL</given-names></name> <name><surname>Esedoglu</surname> <given-names>S</given-names></name> <name><surname>Gillette</surname> <given-names>A</given-names></name></person-group>. <article-title>Inpainting of binary images using the Cahn-Hilliard equation</article-title>. <source>IEEE Trans Image Process</source>. (<year>2007</year>) <volume>16</volume>:<fpage>285</fpage>&#x02013;<lpage>91</lpage>. <pub-id pub-id-type="doi">10.1109/TIP.2006.887728</pub-id><pub-id pub-id-type="pmid">17283787</pub-id></citation></ref>
<ref id="B50">
<label>50.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Garcia-Cardona</surname> <given-names>C</given-names></name> <name><surname>Merkurjev</surname> <given-names>E</given-names></name> <name><surname>Bertozzi</surname> <given-names>AL</given-names></name> <name><surname>Flenner</surname> <given-names>A</given-names></name> <name><surname>Percus</surname> <given-names>AG</given-names></name></person-group>. <article-title>Multiclass data segmentation using diffuse interface methods on graphs</article-title>. <source>IEEE Trans Pattern Anal Mach Intell</source>. (<year>2014</year>) <volume>36</volume>:<fpage>1600</fpage>&#x02013;<lpage>13</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2014.2300478</pub-id><pub-id pub-id-type="pmid">26353341</pub-id></citation></ref>
<ref id="B51">
<label>51.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bosch</surname> <given-names>J</given-names></name> <name><surname>Klamt</surname> <given-names>S</given-names></name> <name><surname>Stoll</surname> <given-names>M</given-names></name></person-group>. <article-title>Generalizing diffuse interface methods on graphs: nonsmooth potentials and hypergraphs</article-title>. <source>SIAM J Appl Math</source>. (<year>2018</year>) <volume>78</volume>:<fpage>1350</fpage>&#x02013;<lpage>77</lpage>. <pub-id pub-id-type="doi">10.1137/17M1117835</pub-id></citation></ref>
<ref id="B52">
<label>52.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bergermann</surname> <given-names>K</given-names></name> <name><surname>Stoll</surname> <given-names>M</given-names></name> <name><surname>Volkmer</surname> <given-names>T</given-names></name></person-group>. <article-title>Semi-supervised learning for multilayer graphs using diffuse interface methods and fast matrix vector products</article-title>. <source>SIAM J Math Data Sci</source>. (<year>2021</year>). <pub-id pub-id-type="doi">10.1137/20M1352028</pub-id></citation></ref>
<ref id="B53">
<label>53.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Budd</surname> <given-names>J</given-names></name> <name><surname>van Gennip</surname> <given-names>Y</given-names></name></person-group>. <article-title>Graph MBO as a semi-discrete implicit Euler scheme for graph Allen-Cahn</article-title>. <source>arXiv [Preprint]. arXiv:190710774</source> (<year>2019</year>). <pub-id pub-id-type="doi">10.1137/19M1277394</pub-id></citation></ref>
<ref id="B54">
<label>54.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Budd</surname> <given-names>J</given-names></name> <name><surname>van Gennip</surname> <given-names>Y</given-names></name> <name><surname>Latz</surname> <given-names>J</given-names></name></person-group>. <article-title>Classification and image processing with a semi-discrete scheme for fidelity forced Allen-Cahn on graphs</article-title>. <source>arXiv [Preprint]. arXiv:201014556</source> (<year>2020</year>). <pub-id pub-id-type="doi">10.1002/gamm.202100004</pub-id><pub-id pub-id-type="pmid">25855820</pub-id></citation></ref>
<ref id="B55">
<label>55.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Calatroni</surname> <given-names>L</given-names></name> <name><surname>van Gennip</surname> <given-names>Y</given-names></name> <name><surname>Sch&#x000F6;nlieb</surname> <given-names>CB</given-names></name> <name><surname>Rowland</surname> <given-names>HM</given-names></name> <name><surname>Flenner</surname> <given-names>A</given-names></name></person-group>. <article-title>Graph clustering, variational image segmentation methods and Hough transform scale detection for object measurement in images</article-title>. <source>J Math Imaging Vision</source>. (<year>2017</year>) <volume>57</volume>:<fpage>269</fpage>&#x02013;<lpage>91</lpage>. <pub-id pub-id-type="doi">10.1007/s10851-016-0678-0</pub-id></citation></ref>
<ref id="B56">
<label>56.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Goodfellow</surname> <given-names>I</given-names></name> <name><surname>Bengio</surname> <given-names>Y</given-names></name> <name><surname>Courville</surname> <given-names>A</given-names></name> <name><surname>Bengio</surname> <given-names>Y</given-names></name></person-group>. <source>Deep Learning</source>. <volume>vol. 1</volume>. <publisher-loc>Cambridge</publisher-loc>: <publisher-name>MIT Press</publisher-name> (<year>2016</year>).</citation></ref>
<ref id="B57">
<label>57.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>LeCun</surname> <given-names>Y</given-names></name> <name><surname>Bengio</surname> <given-names>Y</given-names></name> <name><surname>Hinton</surname> <given-names>G</given-names></name></person-group>. <article-title>Deep learning</article-title>. <source>Nature</source>. (<year>2015</year>) <volume>521</volume>:<fpage>436</fpage>&#x02013;<lpage>44</lpage>. <pub-id pub-id-type="doi">10.1038/nature14539</pub-id><pub-id pub-id-type="pmid">26017442</pub-id></citation></ref>
<ref id="B58">
<label>58.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>S</given-names></name> <name><surname>Tong</surname> <given-names>H</given-names></name> <name><surname>Xu</surname> <given-names>J</given-names></name> <name><surname>Maciejewski</surname> <given-names>R</given-names></name></person-group>. <article-title>Graph convolutional networks: a comprehensive review</article-title>. <source>Comput Soc Netw</source>. (<year>2019</year>) <volume>6</volume>:<fpage>1</fpage>&#x02013;<lpage>23</lpage>. <pub-id pub-id-type="doi">10.1186/s40649-019-0069-y</pub-id></citation></ref>
<ref id="B59">
<label>59.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alfke</surname> <given-names>D</given-names></name> <name><surname>Stoll</surname> <given-names>M</given-names></name></person-group>. <article-title>Pseudoinverse graph convolutional networks: fast filters tailored for large eigengaps of dense graphs and hypergraphs</article-title>. <source>Data Mining Knowledge Discov</source>. (<year>2021</year>). <pub-id pub-id-type="doi">10.1007/s10618-021-00752-w</pub-id></citation></ref>
<ref id="B60">
<label>60.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>Z</given-names></name> <name><surname>Funaya</surname> <given-names>K</given-names></name></person-group>. <article-title>Time series analysis with graph-based semi-supervised learning</article-title>. In: <source>2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA)</source>. <publisher-name>IEEE</publisher-name> (<year>2015</year>). p. <fpage>1</fpage>&#x02013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1109/DSAA.2015.7344902</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B61">
<label>61.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dau</surname> <given-names>HA</given-names></name> <name><surname>Bagnall</surname> <given-names>A</given-names></name> <name><surname>Kamgar</surname> <given-names>K</given-names></name> <name><surname>Yeh</surname> <given-names>CCM</given-names></name> <name><surname>Zhu</surname> <given-names>Y</given-names></name> <name><surname>Gharghabi</surname> <given-names>S</given-names></name> <etal/></person-group>. <article-title>The UCR time series archive</article-title>. <source>IEEE/CAA J Automat Sin</source>. (<year>2019</year>) <volume>6</volume>:<fpage>1293</fpage>&#x02013;<lpage>305</lpage>. <pub-id pub-id-type="doi">10.1109/JAS.2019.1911747</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B62">
<label>62.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kingma</surname> <given-names>D</given-names></name> <name><surname>Ba</surname> <given-names>JL</given-names></name></person-group>. <article-title>Adam: a method for stochastic optimization</article-title>. In: <source>Proc Int Conf Learn Represent. ICLR&#x00027;15</source>. (<year>2015</year>).</citation></ref>
<ref id="B63">
<label>63.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Brockwell</surname> <given-names>PJ</given-names></name> <name><surname>Davis</surname> <given-names>RA</given-names></name></person-group>. <source>Time Series: Theory and Methods</source>. <publisher-name>Springer Science &#x00026; Business Media</publisher-name> (<year>2009</year>).</citation></ref>
<ref id="B64">
<label>64.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Spiegel</surname> <given-names>S</given-names></name></person-group>. <source>Time series distance measures</source>. Ph.D. thesis. Berlin, Germany.</citation></ref>
</ref-list>
<fn-group>
<fn id="fn0001"><p><sup>1</sup>We here view one time-series as a data point and the feature vector for this data point is the vector with the associated data collected in a vector.</p></fn>
<fn id="fn0002"><p><sup>2</sup>Here the prototype of the cluster is the centroid.</p></fn>
<fn id="fn0003"><p><sup>3</sup>This term is commonly used when the regression results shrink toward a mass at the barycenter of a target [<xref ref-type="bibr" rid="B44">44</xref>].</p></fn>
<fn id="fn0004"><p><sup>4</sup>We focussed on all binary classification series listed in <monospace>TwoClassProblems.csv</monospace> within <ext-link ext-link-type="uri" xlink:href="http://www.timeseriesclassification.com/Downloads/Archives/Univariate2018_arff.zip">http://www.timeseriesclassification.com/Downloads/Archives/Univariate2018_arff.zip</ext-link>.</p></fn>
</fn-group>
</back>
</article>