<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Comput. Sci.</journal-id>
<journal-title>Frontiers in Computer Science</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Comput. Sci.</abbrev-journal-title>
<issn pub-type="epub">2624-9898</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fcomp.2020.546917</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Computer Science</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Extraction of Hierarchical Behavior Patterns Using a Non-parametric Bayesian Approach</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Briones</surname> <given-names>Jeric</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/854640/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Kubo</surname> <given-names>Takatomi</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/720064/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Ikeda</surname> <given-names>Kazushi</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/125430/overview"/>
</contrib>
</contrib-group>
<aff><institution>Division of Information Science, Nara Institute of Science and Technology</institution>, <addr-line>Ikoma</addr-line>, <country>Japan</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Priit Kruus, Tallinn University of Technology, Estonia</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Dimitrije Markovi&#x00107;, Technische Universit&#x000E4;t Dresden, Germany; Chee-Ming Ting, University of Technology Malaysia, Malaysia</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Takatomi Kubo <email>takatomi-k&#x00040;is.naist.jp</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Digital Public Health, a section of the journal Frontiers in Computer Science</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>19</day>
<month>10</month>
<year>2020</year>
</pub-date>
<pub-date pub-type="collection">
<year>2020</year>
</pub-date>
<volume>2</volume>
<elocation-id>546917</elocation-id>
<history>
<date date-type="received">
<day>30</day>
<month>03</month>
<year>2020</year>
</date>
<date date-type="accepted">
<day>14</day>
<month>09</month>
<year>2020</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2020 Briones, Kubo and Ikeda.</copyright-statement>
<copyright-year>2020</copyright-year>
<copyright-holder>Briones, Kubo and Ikeda</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract><p>Extraction of complex temporal patterns, such as human behaviors, from time series data is a challenging yet important problem. The double articulation analyzer has been previously proposed by Taniguchi et al. to discover a hierarchical structure that leads to complex temporal patterns. It segments time series into hierarchical state subsequences, with the higher level and the lower level analogous to words and phonemes, respectively. The double articulation analyzer approximates the sequences in the lower level by linear functions. However, it is not suitable to model real behaviors since such a linear function is too simple to represent their non-linearity even after the segmentation. Thus, we propose a new method that models the lower segments by fitting autoregressive functions that allows for more complex dynamics, and discovers a hierarchical structure based on these dynamics. To achieve this goal, we propose a method that integrates the beta process&#x02014;autoregressive hidden Markov model and the double articulation by nested Pitman-Yor language model. Our results showed that the proposed method extracted temporal patterns in both low and high levels from synthesized datasets and a motion capture dataset with smaller errors than those of the double articulation analyzer.</p></abstract>
<kwd-group>
<kwd>behavioral pattern</kwd>
<kwd>non-parametric Bayesian approach</kwd>
<kwd>segmentation</kwd>
<kwd>hierarchical structure</kwd>
<kwd>dynamics</kwd>
</kwd-group>
<contract-num rid="cn001">KAKENHI 17H05863</contract-num>
<contract-num rid="cn001">KAKENHI 25118009</contract-num>
<contract-num rid="cn001">KAKENHI 15K16395</contract-num>
<contract-num rid="cn001">KAKENHI 17H05979</contract-num>
<contract-num rid="cn001">KAKENHI 18K18108</contract-num>
<contract-sponsor id="cn001">Japan Society for the Promotion of Science<named-content content-type="fundref-id">10.13039/501100001691</named-content></contract-sponsor>
<counts>
<fig-count count="8"/>
<table-count count="0"/>
<equation-count count="13"/>
<ref-count count="28"/>
<page-count count="12"/>
<word-count count="6191"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>In the big data era, we can easily collect information-rich time series thanks to the advancements in sensing technologies. However, such time series data are not segmented and hence difficult to apply recent machine learning techniques. To segment such data, extraction of temporal patterns in an unsupervised manner is necessary. This has become an active topic in several research fields, such as health care (Zeger et al., <xref ref-type="bibr" rid="B27">2006</xref>), biology (Saeedi et al., <xref ref-type="bibr" rid="B22">2016</xref>), speech recognition (Taniguchi et al., <xref ref-type="bibr" rid="B24">2016</xref>), natural language processing (Heller et al., <xref ref-type="bibr" rid="B14">2009</xref>), and motion analysis (Barbi&#x0010D; et al., <xref ref-type="bibr" rid="B2">2004</xref>). Although many methods have been proposed to extract temporal patterns (Keogh et al., <xref ref-type="bibr" rid="B17">2004</xref>), there exists a problem that the number of existing patterns (and consequently, the number of segments) is generally unknown beforehand. To solve this issue, non-parametric Bayesian methods are used to determine the number of patterns (Fox et al., <xref ref-type="bibr" rid="B9">2008b</xref>). Specifically, non-parametric Bayesian methods based on switching AR models, such as the beta process&#x02014;autoregressive hidden Markov model (BP-AR-HMM) (Fox et al., <xref ref-type="bibr" rid="B7">2009</xref>, <xref ref-type="bibr" rid="B10">2014</xref>), can be used to identify the temporal patterns without specifying the number of patterns beforehand.</p>
<p>Although conventional methods can discover the temporal patterns to segment a time-series sequence, some sequences have a hierarchical structure that makes the segmentation more complex. Motion data, for example, can be seen as a sequence of semantic actions, where each action can be decomposed as a series of <italic>motion primitives</italic> (Viviani and Cenzato, <xref ref-type="bibr" rid="B25">1985</xref>; Zhou et al., <xref ref-type="bibr" rid="B28">2013</xref>; Grigore and Scassellati, <xref ref-type="bibr" rid="B12">2017</xref>). Similarly, speech data consist of words, where each word consists of phonemes. With such a hierarchical structure, usual methods involving switching dynamical systems may not be sufficient since they do not assume the existence of the hierarchical structure. Time series sequences like the examples above should then be analyzed using hierarchical models. Non-parametric Bayesian methods for hierarchical models include the hierarchical hidden Markov model (HHMM) (Fine et al., <xref ref-type="bibr" rid="B6">1998</xref>), the nested Pitman-Yor language model (NPYLM) for sentences (Mochihashi et al., <xref ref-type="bibr" rid="B18">2009</xref>), and the double articulation analyzer (DAA) (Taniguchi and Nagasaka, <xref ref-type="bibr" rid="B23">2011</xref>). However, they are not suitable for analyzing the dynamic patterns. For example, DAA only modeled the time series sequences by fitting segment-wise linear functions to the lower level of the structure. Complex dynamics in the lower level has not been considered in the previous method, despite motion primitives being usually modeled as non-linear functions (Williams et al., <xref ref-type="bibr" rid="B26">2007</xref>; Bruno et al., <xref ref-type="bibr" rid="B4">2012</xref>).</p>
<p>From these backgrounds, it is necessary to develop a method that considers both dynamics and hierarchical structure to extract temporal patterns. To realize such a method, we naively applied BP-AR-HMM and NPYLM in order to model hierarchically-structured sequences with dynamical systems in our previous study (Briones et al., <xref ref-type="bibr" rid="B3">2018</xref>).</p>
<p>In this work, we propose a model that integrates BP-AR-HMM and NPYLM as a unified model. Our method can capture the hierarchical structure of the time series by NPYLM and use dynamical systems (specifically, switching AR models in BP-AR-HMM) to represent the dynamic pattern in the lower level sequences. Also, BP-AR-HMM allows for asynchronous switching of segments across the multiple time series data considered thanks to the beta process in BP-AR-HMM. Compared to our previous two-step approach, the proposed integrated approach is expected to improve segmentation and estimation accuracy. In this study, we tested our method with toy dataset and sequences generated from real motion capture (mocap) sequences with two interacting agents. Such motion sequences are suitable to test the segmentation performance of our method, since interaction switches from time to time (Ryoo and Aggarwal, <xref ref-type="bibr" rid="B21">2009</xref>; Alazrai et al., <xref ref-type="bibr" rid="B1">2015</xref>).</p>
<p>The rest of this paper is organized as follows: section 2 shows our proposed method, with a brief introduction of basic algorithms. Sections 3 and 4 outline the details of the synthetic experiments carried out using two datasets and their corresponding results. Finally, section 5 gives some discussion of the results, including the conclusions.</p>
</sec>
<sec id="s2">
<title>2. Proposed Method</title>
<p>We propose to use a hierarchical non-parametric Bayesian approach to extract hierarchical temporal patterns from time series data. Specifically, we use an unsupervised segmentation method, where the extracted segments are used to define the temporal patterns. Our method consists of two non-parametric Bayesian models: BP-AR-HMM (Fox et al., <xref ref-type="bibr" rid="B7">2009</xref>) and NPYLM (Mochihashi et al., <xref ref-type="bibr" rid="B18">2009</xref>) (<xref ref-type="fig" rid="F1">Figure 1</xref>).</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Illustration of processing steps in our proposed method. Each time step is assigned both an EB label (line color) and UB label (background color). The summarized sequence of EB labels (shown as numbers above the lines) obtained from BP-AR-HMM are then grouped together (indicated by the square brackets) using NPYLM.</p></caption>
<graphic xlink:href="fcomp-02-546917-g0001.tif"/>
</fig>
<p>In the first step, BP-AR-HMM is applied to time series data, to discover low-level temporal patterns or elemental behaviors (EB), which correspond to the motion primitives in the motion analysis. Segmentation is indicated by assigning EB labels at each time step. The obtained EB label sequences for each time series are then summarized, before being used as an input for the second step. In the second step, NPYLM is applied to the (summarized) sequence of EB labels, to detect unit behaviors (UB). Subsequences of EB labels with recurring patterns are grouped together and assigned UB labels. As a consequence, the method outputs a sequence of UBs, each of which is a sequence of EBs represented by AR models. Then, these two steps are iterated a fixed number of times, with the resulting UB labels from the NPYLM step used as initial EB labels for the BP-AR-HMM step of the next iteration.</p>
<p>In the following, we introduce the components of our method: BP-AR-HMM and NPYLM.</p>
<sec>
<title>2.1. BP-AR-HMM</title>
<p>BP-AR-HMM is an extension of hidden Markov model where each discrete latent variable <italic>z</italic><sub><italic>t</italic></sub> has an AR model of order <italic>r</italic> with parameter &#x003B8;<sub><italic>z</italic><sub><italic>t</italic></sub></sub> &#x0003D; {<bold>A</bold><sub><italic>z</italic><sub><italic>t</italic></sub></sub>, &#x003A3;<sub><italic>z</italic><sub><italic>t</italic></sub></sub>} (<xref ref-type="fig" rid="F2">Figure 2</xref>), and the observed variable <bold>y</bold><sub><italic>t</italic></sub> is represented as an AR model with lag order <italic>r</italic>. This model is a non-parametric Bayesian model with a beta process prior, where an indicator vector over the set of EBs, <bold>f</bold><sub><italic>i</italic></sub>, is drawn. The EB <italic>z</italic><sub><italic>t</italic></sub>, the state transition matrix <inline-formula><mml:math id="M1"><mml:msubsup><mml:mrow><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula>, and the AR coefficient matrix <bold>A</bold><sub><italic>k</italic></sub> are drawn according to <bold>f</bold><sub><italic>i</italic></sub>, a gamma prior, and a matrix normal prior, respectively (<xref ref-type="fig" rid="F2">Figure 2</xref>).</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Graphical representations of the non-parametric Bayesian models. (Top) BP-AR-HMM. (Bottom) NPYLM.</p></caption>
<graphic xlink:href="fcomp-02-546917-g0002.tif"/>
</fig>
<sec>
<title>2.1.1. Beta Process (BP)</title>
<p>A beta process prior is placed on the EB indicator vector. This makes it possible to not specify the number of EBs beforehand, and thus allow us to use an infinite-dimensional EB indicator vector <bold>f</bold>. A beta process is a completely random measure, denoted by</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M2"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>B</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>|</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>c</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>&#x0007E;</mml:mo><mml:mi>B</mml:mi><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>c</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E2"><label>(2)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>B</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x0221E;</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mi>&#x003C9;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>&#x003B4;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;with&#x000A0;</mml:mtext><mml:mi>&#x003B1;</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x00398;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>B</italic><sub>0</sub> is a base measure, <italic>c</italic> the concentration parameter, and &#x003B1; the mass parameter. The number of active EBs, including which EBs are active, for time series <italic>i</italic> is determined by a realization of the indicator vector <bold>f</bold><sub><italic>i</italic></sub> | <italic>B</italic>&#x0007E;BeP(<italic>B</italic>), given by</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M4"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>f</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>&#x003B4;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;with&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0007E;</mml:mo><mml:mtext>Be</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C9;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Here, <italic>f</italic><sub><italic>ik</italic></sub> &#x0003D; 1 if the <italic>k</italic>th EB is active for time series <italic>i</italic>, <italic>i</italic> &#x0003D; 1, &#x02026;<italic>N</italic>.</p>
</sec>
<sec>
<title>2.1.2. AR-HMM</title>
<p>The <italic>D</italic>-dimensional observation vector <bold>y</bold><sub><italic>t</italic></sub> is described by an autoregressive hidden Markov model (AR-HMM), with order <italic>r</italic>, latent variable (<italic>state sequences)</italic> <italic>z</italic><sub><italic>t</italic></sub>, and transition probability matrix &#x003C0;<sub><italic>k</italic></sub>. That is,</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M5"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x000A0;</mml:mtext><mml:mo>|</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x0007E;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E5"><label>(5)</label><mml:math id="M6"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>y</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x000A0;</mml:mtext><mml:mo>|</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0007E;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>A</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>y</mml:mtext></mml:mstyle></mml:mrow><mml:mo>&#x0007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003A3;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E6"><label>(6)</label><mml:math id="M7"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>y</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mi>A</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>y</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>e</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x000A0;&#x000A0;&#x000A0;with&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>e</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0007E;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003A3;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>For the <italic>k</italic>th EB, the corresponding AR-HMM parameters are denoted as &#x003B8;<sub><italic>k</italic></sub> &#x0003D; {<bold>A</bold><sub><italic>k</italic></sub>, &#x003A3;<sub><italic>k</italic></sub>}, while the transition probabilities are denoted by &#x003C0;<sub><italic>k</italic></sub>. Since active EBs vary for each sequence, <italic>feature-constrained</italic> transition distributions (Fox et al., <xref ref-type="bibr" rid="B7">2009</xref>) are used. That is, given <bold>f</bold><sub><italic>i</italic></sub>,</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="M8"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mtable style="text-align:axis;" columnalign="left" equalrows="false" columnlines="none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:mn>0</mml:mn></mml:mtd><mml:mtd><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mi>j</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>|</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msubsup><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mtd></mml:mtr><mml:mtr></mml:mtr></mml:mtable></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;with&#x000A0;</mml:mtext><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:msubsup><mml:mrow><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>A gamma prior would be placed on the transition matrix, with</p>
<disp-formula id="E9"><label>(8)</label><mml:math id="M10"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x000A0;</mml:mtext><mml:mo>|</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>&#x003B3;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x003BA;</mml:mi><mml:mo>&#x0007E;</mml:mo><mml:mtext>Gamma</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003B3;</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003BA;</mml:mi><mml:msub><mml:mrow><mml:mi>&#x003B4;</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E10"><label>(9)</label><mml:math id="M11"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mi>&#x003B7;</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x02297;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>f</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mstyle displaystyle="true"><mml:msub><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mstyle><mml:msubsup><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where &#x003B3;, &#x003BA; are the transition and transition sticky parameter, respectively. Moreover, &#x003B4;<sub><italic>j, k</italic></sub> is the Kronecker delta function, and &#x02297; is the Hadamard (or element-wise) vector product.</p>
<p>Moreover, matrix normal priors would be placed on the dynamic parameters. That is,</p>
<disp-formula id="E11"><label>(10)</label><mml:math id="M12"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x000A0;</mml:mtext><mml:mo>|</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>&#x0007E;</mml:mo><mml:mi>I</mml:mi><mml:mi>W</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E12"><label>(11)</label><mml:math id="M13"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>A</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x000A0;</mml:mtext><mml:mo>|</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>M</mml:mi><mml:mo>,</mml:mo><mml:mi>L</mml:mi><mml:mo>&#x0007E;</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">MN</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>A</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>;</mml:mo><mml:mi>M</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>L</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>n</italic><sub>0</sub> is the degrees of freedom, <italic>S</italic><sub>0</sub> a scale matrix, <italic>M</italic> the mean dynamic matrix, and <italic>L</italic>, &#x003A3;<sub><italic>k</italic></sub> defines covariance of <bold>A</bold><sub><italic>k</italic></sub>.</p>
</sec>
<sec>
<title>2.1.3. Posterior Inference</title>
<p>Samples are generated from the posterior distribution using Markov chain Monte Carlo (MCMC) algorithm. To be specific, the samples to be produced are the EB indicator vector <bold>f</bold> given <bold>&#x003B8;</bold>, <bold>&#x003B7;</bold>, state sequences <bold>z</bold> given <bold>f</bold>, <bold>&#x003B8;</bold>, <bold>&#x003B7;</bold>, and variables <bold>&#x003B8;</bold>, <bold>&#x003B7;</bold> given <bold>f</bold> and <bold>z</bold>. The hyperparameters &#x003B1;, <italic>c</italic>, &#x003BA;, &#x003B3; would also be sampled. Basically, MCMC alternates between sampling <bold>f</bold>|<bold>y</bold>, <bold>&#x003B8;</bold> and <bold>&#x003B8;</bold>|<bold>y</bold>, <bold>f</bold>, with the hyperparameters sampled in between the cycles. To generate unique EB vectors, birth-death reversible jump MCMC sampling (Fox et al., <xref ref-type="bibr" rid="B7">2009</xref>) and split-merge techniques (Hughes et al., <xref ref-type="bibr" rid="B16">2012</xref>) would be utilized. These samples would then be used to carry out posterior inference.</p>
</sec>
<sec>
<title>2.1.4. Advantages</title>
<p>Using this model provides several advantages over the sticky hierarchical Dirichlet process&#x02014;HMM (sticky HDP-HMM) (Fox et al., <xref ref-type="bibr" rid="B8">2008a</xref>) used in DAA. First, we can segment multiple time series, and discover common and unique behaviors from these sequences, thanks to the BP prior. This would not be possible if we use sticky HDP-HMM since it would require all the time series sequences to share exactly the same behaviors (and not just a subset of it). The difference between BP and HDP is most evident on the transition probability matrices used for each sequence. HDP in HDP-HMM assigns a state to each time step according to a transition matrix shared by all time series, while BP in BP-AR-HMM assigns a state according to transition matrix specific to each sequence.</p>
<p>Furthermore, using an AR model also allows us to discover the dynamic properties of the data. This is again not present in DAA. Specifically, BP-AR-HMM fits AR models for the given time series {<bold>y</bold><sub><italic>t</italic></sub>}. Hence, the interactions among the variables are expressed in its AR coefficient matrix <bold>A</bold><sub><italic>k</italic></sub> (Harrison et al., <xref ref-type="bibr" rid="B13">2003</xref>; Gilson et al., <xref ref-type="bibr" rid="B11">2017</xref>), making our method suitable for subsequent interaction analysis.</p>
</sec>
</sec>
<sec>
<title>2.2. NPYLM</title>
<p>NPYLM is originally proposed as a hierarchical language model where both letters and words are modeled by hierarchical Pitman-Yor processes (Mochihashi et al., <xref ref-type="bibr" rid="B18">2009</xref>; Neubig et al., <xref ref-type="bibr" rid="B20">2010</xref>). In each layer of the hierarchical model, words and letters are modeled as <italic>n</italic>-grams, which are produced by Pitman-Yor processes. In general, words can be considered as high-level unit segments (UB in this study), while letters as low-level sub-unit segments (EB in this study). Similar to how words are made up of letters, these high-level unit segments are also composed of low-level sub-unit segments.</p>
<sec>
<title>2.2.1. Pitman-Yor Process</title>
<p>Pitman-Yor (PY) process is a stochastic process that generates probability distribution <italic>G</italic> that is similar to a base distribution <italic>G</italic><sub>0</sub>. This is denoted by</p>
<disp-formula id="E13"><label>(12)</label><mml:math id="M14"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>G</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>|</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>&#x003B8;</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo>&#x0007E;</mml:mo><mml:mtext>PY</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>&#x003B8;</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>G</italic><sub>0</sub> is a base measure, &#x003B8; the concentration parameter, and <italic>d</italic> the discount parameter.</p>
</sec>
<sec>
<title>2.2.2. Hierarchical Pitman-Yor Language Model</title>
<p>Given a unigram distribution <inline-formula><mml:math id="M15"><mml:msubsup><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">W</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup></mml:math></inline-formula>, we can generate a bigram distribution <inline-formula><mml:math id="M16"><mml:msubsup><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">W</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup></mml:math></inline-formula> such that this distribution will be similar <inline-formula><mml:math id="M17"><mml:msubsup><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">W</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup></mml:math></inline-formula>, especially for the high-frequency units. That is, <inline-formula><mml:math id="M18"><mml:msubsup><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">W</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup><mml:mo>&#x0007E;</mml:mo><mml:mtext>PY</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">W</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mi>&#x003B8;</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>. Similarly, a trigram distribution can also be generated similar to the bigram distribution, such that <inline-formula><mml:math id="M19"><mml:msubsup><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">W</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup><mml:mo>&#x0007E;</mml:mo><mml:mtext>PY</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">W</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mi>&#x003B8;</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>. In general, then, the <italic>n</italic>-gram model is Pitman-Yor distributed with base measure from the (<italic>n</italic>&#x02212;1)-gram model, and the base measure of the unigram model being <inline-formula><mml:math id="M20"><mml:msubsup><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">W</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup></mml:math></inline-formula>. This hierarchical structure of <italic>n</italic>-gram models is referred to as hierarchical Pitman-Yor language model (HPYLM).</p>
<p>Specifically, for the <italic>unit</italic> <italic>n</italic>-gram model, the probability of a unit <italic>w</italic> &#x0003D; <italic>w</italic><sub><italic>t</italic></sub> given a context <italic>h</italic> &#x0003D; <italic>w</italic><sub><italic>t</italic>&#x02212;<italic>n</italic></sub>&#x02026;<italic>w</italic><sub><italic>t</italic>&#x02212;1</sub> is calculated recursively as</p>
<disp-formula id="E14"><label>(13)</label><mml:math id="M21"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>w</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>|</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>h</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>c</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>w</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>|</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>h</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:mi>d</mml:mi><mml:mo>&#x000B7;</mml:mo><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>w</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>&#x003B8;</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>c</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:mo>&#x0002B;</mml:mo><mml:mfrac><mml:mrow><mml:mi>&#x003B8;</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>d</mml:mi><mml:mo>&#x000B7;</mml:mo><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>&#x003B8;</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>c</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>w</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>|</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M22"><mml:msup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mi>n</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02026;</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> is the shorter (<italic>n</italic>&#x02212;1)-gram context, <italic>c</italic>(<italic>w</italic>&#x02223;<italic>h</italic>) is a count of <italic>w</italic> under context <italic>h</italic>, and <inline-formula><mml:math id="M23"><mml:mi>c</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:munder><mml:mi>c</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>w</mml:mi><mml:mo>&#x02223;</mml:mo><mml:mi>h</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>. Here, <italic>p</italic>(<italic>w</italic> | <italic>h</italic>&#x02032;) can be considered as a prior probability of <italic>w</italic>. On the other hand, <italic>t</italic><sub><italic>hw</italic></sub> is a count under the context <italic>h</italic>&#x02032;, while <inline-formula><mml:math id="M24"><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:munder><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>w</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is a count under the context <italic>h</italic>. Finally, <italic>d</italic>, &#x003B8; are the discount and concentration parameters, respectively.</p>
<p>To define the base measure <inline-formula><mml:math id="M25"><mml:msubsup><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">W</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup></mml:math></inline-formula> for the unit unigram model (and consequently define <italic>p</italic>(<italic>w</italic> | <italic>h</italic>&#x02032;) for <inline-formula><mml:math id="M26"><mml:msubsup><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">W</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup></mml:math></inline-formula>), NPYLM uses a <italic>sub-unit</italic> <italic>n</italic>-gram model <inline-formula><mml:math id="M27"><mml:msubsup><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">C</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup></mml:math></inline-formula> as the aforementioned base measure. This sub-unit <italic>n</italic>-gram model <inline-formula><mml:math id="M28"><mml:msubsup><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">C</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup></mml:math></inline-formula> also uses hierarchical Pitman-Yor processes, and is structured similarly to the unit <italic>n</italic>-gram model <inline-formula><mml:math id="M29"><mml:msubsup><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">W</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup></mml:math></inline-formula>. Moreover, the probability for the sub-unit <italic>n</italic>-gram is also calculated recursively using Equation (13), where <italic>G</italic><sub>0</sub>, <italic>d</italic><sup>&#x0002A;</sup>, &#x003B8;<sup>&#x0002A;</sup> are the base measure, discount parameter, and concentration parameter for sub-unit unigram model, respectively. As a result, an HPYLM (in this case, the sub-unit <italic>n</italic>-gram) is actually embedded inside another HPYLM (the unit <italic>n</italic>-gram), resulting to the &#x0201C;nested&#x0201D; part of NPYLM.</p>
</sec>
<sec>
<title>2.2.3. Posterior Inference</title>
<p>Samples are generated from the posterior distribution using Gibbs sampling and forward filtering-backward sampling (Mochihashi et al., <xref ref-type="bibr" rid="B18">2009</xref>; Neubig et al., <xref ref-type="bibr" rid="B20">2010</xref>; Taniguchi and Nagasaka, <xref ref-type="bibr" rid="B23">2011</xref>). To be specific, a unit is removed from the current <italic>unit</italic> <italic>n</italic>-gram model, then a &#x0201C;new&#x0201D; unit is sampled by generating a new segmentation of the sequence of sub-units. The &#x0201C;new&#x0201D; unit is then added to the <italic>unit</italic> <italic>n</italic>-gram model, thereby updating the said model. This process of blocked Gibbs sampling is repeated several times, with forward filtering-backward sampling used to generate new segmentation.</p>
</sec>
<sec>
<title>2.2.4. Advantages</title>
<p>This model assumes that the input sequence has a hierarchical structure. Thanks to this hierarchical structure, NPYLM is suitable to model motion data composed of a sequence of UBs, each of which is composed of a sequence of EBs. This second step allows us to have high-level semantic, more meaningful behaviors, rather than the low-level short, simple behaviors (akin to motion primitives).</p>
<p>Moreover, since NPYLM is an unsupervised language model, using this model in the second step enables us to do segmentation without having an existing dictionary. In addition, using blocked Gibbs sampler significantly reduces computational time for the sampling procedure (Mochihashi et al., <xref ref-type="bibr" rid="B18">2009</xref>; Taniguchi and Nagasaka, <xref ref-type="bibr" rid="B23">2011</xref>).</p>
</sec>
</sec>
</sec>
<sec id="s3">
<title>3. Synthetic Experiments</title>
<p>We carried out experiments with two datasets to check the performance of our method and compare it with that of DAA. One was a toy dataset synthesized from known AR models to evaluate the estimation accuracy for segments using the ground truth. Using this dataset, we also investigated the effects of complexity (AR order) of the time series.</p>
<sec>
<title>3.1. Toy Data</title>
<p>To evaluate the estimation accuracy, three subdatasets, <bold>Lm</bold> (<italic>m</italic> &#x0003D; 1, 2, 3), were generated from switching <italic>m</italic>-th lag order AR models with hierarchical structure. UBs were randomly chosen from a library of four UBs (based on predefined transition probability matrices), to form sequences of concatenated UBs. Each UB consists of several EBs, where each EB has sparse AR(<italic>m</italic>) coefficient matrices, generated independently for each subdataset. Elements of the AR coefficient matrices were set within the range (&#x02212;1, 1). EBs under the same UB share the same sparsity structure for their respective AR coefficient matrices. Finally, each subdataset <bold>Lm</bold> (<italic>m</italic> &#x0003D; 1, 2, 3) has four time series sequences of four dimensions each.</p>
<p>Our method with <italic>r</italic>-th AR (<italic>r</italic> &#x0003D; 1, 2, 3) was then applied to each subdataset <bold>Lm</bold> (<italic>m</italic> &#x0003D; 1, 2, 3). We carried out our proposed segmentation method for thirty runs, where each run had ten different chains of sampling. In each chain, the UB labels obtained from the prior iteration were used as the initial EB labels of the next one, with the first chain having all time steps (across all sequences) assigned the same initial EB label.</p>
<p>The hyperparameter setting of BP-AR-HMM and NPYLM were based on the values used in Fox et al. (<xref ref-type="bibr" rid="B10">2014</xref>) (for BP-AR-HMM) and Neubig et al. (<xref ref-type="bibr" rid="B20">2010</xref>) (for NPYLM), respectively. The parameters of BP-AR-HMM were set as follows: the concentration parameter <italic>c</italic> &#x0003D; 3, the mass parameter &#x003B1; &#x0003D; 2, both with Gamma(1, 1) prior, for the beta process; the transition parameter &#x003B3; &#x0003D; 1, the transition sticky parameter &#x003BA; &#x0003D; 25, with Gamma(1, 1) and Gamma(100, 1) prior, respectively, for the transition matrix. The first 5,000 samples of the MCMC algorithm were discarded as burn-in, and the following 5,000 samples were used. The state sequences were summarized in each of the thirty runs, where states with associated time shorter than 1% of the total time were discarded. These were then forwarded to NPYLM. Settings for NPYLM were as follows: the discount parameter <italic>d</italic> &#x0003D; 0.5 with Beta(1.5, 1) prior; the concentration parameter &#x003B8; &#x0003D; 0.1 with Gamma(10, 0.1) prior. The first 5,000 samples of the blocked Gibbs sampling were discarded as burn-in, and the following 5,000 samples were used. Posterior inference for BP-AR-HMM was carried out using the codes developed by Hughes (<xref ref-type="bibr" rid="B15">2016</xref>), while the codes developed by Neubig (<xref ref-type="bibr" rid="B19">2016</xref>) were used to carry out posterior inference for NPYLM.</p>
<p>Finally, our method was also compared with DAA. For DAA, state sequences were also summarized in each of the thirty runs of each subdataset <bold>Lm</bold> (<italic>m</italic> &#x0003D; 1, 2, 3). The parameters of DAA were set so that they were comparable to those of our method. As sticky HDP-HMM is usually for single time series sequences, the time series were concatenated into one long time series before being applied to the first step of DAA. The EB labels were then split back and summarized afterward, where states with associated time shorter than 1% of the total time discarded. DAA was carried out using the codes recommended in <ext-link ext-link-type="uri" xlink:href="http://daa.tanichu.com/code">http://daa.tanichu.com/code</ext-link>.</p>
<sec>
<title>3.1.1. Large-Scale Toy Data</title>
<p>We generated three additional subdatasets, <bold>Ts</bold> (<italic>s</italic> &#x0003D; 10, 20, 100), to explore how our proposed method would fair on a large-scale simulation. The subdatasets were generated from switching AR(1) models, using the same parameter settings described earlier, but with <italic>s</italic> (<italic>s</italic> &#x0003D; 10, 20, 100) time series sequences instead. Our method with AR(1) was applied to subdataset <bold>Ts</bold> (<italic>s</italic> &#x0003D; 10, 20, 100), using the same settings described above, but with ten, ten, and three runs for <bold>T10</bold>, <bold>T20</bold>, and <bold>T100</bold>, respectively, where each run still had ten different chains of sampling.</p>
</sec>
</sec>
<sec>
<title>3.2. Evaluation With Toy Data</title>
<p>The result of our method with AR(<italic>r</italic>) applied to the toy subdataset <bold>L</bold><italic>m</italic> was denoted by <bold>Lm-ARr</bold>, while the result of DAA was denoted by <bold>Lm-DAA</bold>. Our method segmented time series with high accuracy (with respect to both EBs and UBs), and had better accuracy than DAA. The top panel of <xref ref-type="fig" rid="F3">Figure 3</xref> shows a visualization example of segmentation results. In the panel, the background color of the top row indicates its own EB. UBs are represented by sequential patterns consisting of sets of EBs. The second and third rows show the estimated EBs and UBs, respectively. Their boundaries show high consistency to the ground truth.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Plot of the first four dimensions of the input data and an example of segmentation results for both EB and UB steps. (Top) Toy data. (Bottom) CMU motion data.</p></caption>
<graphic xlink:href="fcomp-02-546917-g0003.tif"/>
</fig>
<p>Confusion matrices for the EB labels (left) and UB labels (right) of an example segmentation results are also illustrated in <xref ref-type="fig" rid="F4">Figure 4</xref>. Here, the correspondence between true labels and estimated labels is represented with the values normalized per column, to allow one-to-many correspondence from a true label to estimated labels. Columns with entries close to 1 indicate that corresponding estimated labels are assigned with high specificity, while rows with multiple entries close to 1 indicate multiple estimated labels correspond to one true label.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Confusion matrices for an example of segmentation results for both EB and UB steps. Numbers normalized per column. (Top) Toy data. (Bottom) CMU motion data.</p></caption>
<graphic xlink:href="fcomp-02-546917-g0004.tif"/>
</fig>
<p>Next, we evaluated the effects of model mismatch to segmentation, using the resulting average normalized Hamming distances between the true EBs and the estimated EBs (EB HDist). Hamming distances were computed as the total number of time steps where the estimated label was different from the ground truth in a given sequence, then summed for all time series sequences in each run. Then, normalization was done by dividing the Hamming distance by the total number of time steps of the given sequence The EB HDist were smallest when the true AR order was used. For example, EB HDist of <bold>L1-AR1</bold> were smaller than those of <bold>L1-AR2</bold> and <bold>L1-AR3</bold>, as seen in <xref ref-type="fig" rid="F5">Figure 5</xref>. Similar results were observed in the cases of Sets <bold>L2</bold> and <bold>L3</bold>. Note that this tendency was also observed in their respective adjusted Rand index (ARI) (right figures in <xref ref-type="fig" rid="F5">Figure 5</xref>) and joint log probabilities of data and sample variables (<xref ref-type="fig" rid="F6">Figure 6</xref>). The joint log probability <italic>P</italic>(<bold>y</bold>, <bold>F</bold>, <bold>z</bold>) is available even without the ground truth. Thus, the joint log probability can be a potential criterion for selecting the model with cross-validation.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Evaluation of model accuracy. (Left) Average normalized Hamming distances. Bars: 1 SE. Lower value is better. (Right) Average adjusted Rand index. Bars: 1 SE. Higher value is better.</p></caption>
<graphic xlink:href="fcomp-02-546917-g0005.tif"/>
</fig>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Boxplots of the joint log probabilities <italic>P</italic>(<bold>y</bold>, <bold>F</bold>, <bold>z</bold>) of the EB step. Red line: median. Edges of blue box: first and third quartiles. Whiskers: most extreme data points except outliers.</p></caption>
<graphic xlink:href="fcomp-02-546917-g0006.tif"/>
</fig>
<p>Finally, when we compared the normalized HDist of our method (selected using the joint log probabilities) with that of DAA, our method generally had better results than DAA. <bold>L2-AR2</bold> and <bold>L3-AR3</bold>, which have the highest joint log probability among used models, have smaller EB HDist and UB HDist than <bold>L2-DAA</bold> and <bold>L3-DAA</bold>, respectively. In the case of <bold>L1</bold>, <bold>L1-AR1</bold> shows smaller EB HDist than <bold>L1-DAA</bold>, but larger UB HDist. Even in this case, <bold>L1-AR2</bold> shows better performance than <bold>L1-DAA</bold>. In summary, these results indicate the superiority of our method to DAA (<xref ref-type="fig" rid="F5">Figure 5</xref>).</p>
<p>Aside from having better results compared to DAA, our method has other advantages. First, we note that the second step of our method reduces the error observed (left figures in <xref ref-type="fig" rid="F5">Figure 5</xref>). Except for the results of <bold>L1-AR1</bold>, UB HDist are generally smaller than EB HDist. Even when segmentation in the EB level is wrong, correct segmentation in UB level is still possible, provided that the wrong pattern extraction of EB is reproduced for segments of same UBs. Second, it was also seen that the number of discovered EBs, including the EBs and the resulting segmentation, for each run varied (<xref ref-type="fig" rid="F8">Figure 8</xref>). Despite this, the segmentation results at the UB step were more or less similar, as seen in the computed UB HDist. This observation indicates our method can identify the same UBs despite discovering different EBs. In other words, our method can absorb the difference of estimated EBs among different runs.</p>
<sec>
<title>3.2.1. Large-Scale Toy Data</title>
<p>Results from subdataset <bold>Ts</bold> (<italic>s</italic> &#x0003D; 10, 20, 100) suggest that using our proposed method on larger datasets would yield more discovered EB labels (13.90, 17.70, 21.33) and formed UB labels (21.00, 19.70, 39.33). This then causes multiple discovered labels to correspond to the same &#x02018;true&#x00027; label. Adjusting for this when computing for EB and UB HDist, the computed EB HDist are 0.6112, 0.6759, and 0.7086 for <bold>T10</bold>, <bold>T20</bold>, and <bold>T100</bold>, respectively, while the corresponding UB HDist are 0.1096, 0.1633, and 0.1864, respectively. These results are consistent with our earlier observation that the second step of our method reduces observed errors in the first step. That is, correct segmentation in UB level is possible despite having errors in EB level segmentation, and regardless of the number of time series sequences considered.</p>
</sec>
</sec>
</sec>
<sec id="s4">
<title>4. Motion Data Experiments</title>
<p>Aside from synthetic experiments, motion data was also used in segmentation experiments. This is to see the applicability of proposed method to segment actual motion sequences.</p>
<sec>
<title>4.1. Real Motion Data</title>
<p>To determine the effectiveness of our method with real motion data, one dataset was generated using the motion capture sequences of the actions of Subjects 18&#x02013;23 in CMU Graphics Lab&#x02014;Motion Capture Library (CMU, <xref ref-type="bibr" rid="B5">2009</xref>). The dataset has four time series sequences of 16 dimensions that correspond to 8 joint angles of 2 individuals. The time series <inline-formula><mml:math id="M30"><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>y</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula> (<italic>i</italic> &#x0003D; 1, 2, 3, 4) were generated by concatenating UBs randomly chosen from six fixed types. The six UBs were the following actions: (1) walk toward each other than shake hands, (2) linked arms while walking, (3) synchronized walking, (4) alternating squats, (5) alternating jumping jacks, and (6) synchronized jumping jacks.</p>
<p>To evaluate the applicability of our method to real motion data, our method with AR orders (<italic>r</italic> &#x0003D; 1, 2, 3) was applied to the dataset. The parameters in our method were set in the same way as the previous section 3.1, but with &#x003BA; &#x0003D; 200. Similar to toy data, our method was also compared with DAA using CMU dataset. State sequences were processed similar to the toy data, but with no states being discarded, as the states switched frequently in the first step of DAA.</p>
</sec>
<sec>
<title>4.2. Real Motion Data Applicability</title>
<p>Similar to the experiments in the previous section, the result of our method with AR(<italic>r</italic>) applied to the CMU dataset was denoted by <bold>CMU-ARr</bold>, while <bold>CMU-DAA</bold> was used to denote results from DAA. In terms of the average normalized UB HDist, <bold>CMU-AR1</bold> had the smallest error when compared with <bold>CMU-AR2</bold> and <bold>CMU-AR3</bold> (<xref ref-type="fig" rid="F5">Figure 5</xref>). However, <bold>CMU-AR2</bold> and <bold>CMU-AR3</bold> had higher log probability than <bold>CMU-AR1</bold> (<xref ref-type="fig" rid="F5">Figure 5</xref>). The optimal AR order could not be determined from the joint log probabilities in this case. Another criterion is needed to choose an optimal AR order. There is no existing available criterion, because our method is a highly complex singular model.</p>
<p>Comparing with the results of DAA, our method again had better performance than DAA. <bold>CMU-AR1</bold> had smaller UB HDist (0.1815) compared to <bold>CMU-DAA</bold> (0.2080) (<xref ref-type="fig" rid="F5">Figure 5</xref>). Similarly, <bold>CMU-AR1</bold> had higher UB ARI (0.6847) compared to <bold>CMU-DAA</bold> (0.6384) (<xref ref-type="fig" rid="F5">Figure 5</xref>). Unlike the obtained results from our proposed method, DAA had more UBs and switches, due to oversegmentation (<xref ref-type="fig" rid="F7">Figure 7</xref>).</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>Switching characteristics of estimated labels. (Left) Average number of discovered EBs and UBs for each experiment. Bars: 1 SD. (Right) Average number of switches in all time series sequences for each experiment. Bars: 1 SD.</p></caption>
<graphic xlink:href="fcomp-02-546917-g0007.tif"/>
</fig>
<p>Finally, similar to the toy dataset results, the number of discovered EBs and the EB labels still varied for each run. However, the segmentation results of the UB step were quite similar (for example, see <xref ref-type="fig" rid="F8">Figure 8</xref>). Here, the UB [1 8 1] refers to the alternating jumping jacks motion. In another segmentation, the same UB label corresponds to [D E], with the component EBs referring to completely different behaviors. Despite the difference in component EBs, both [1 8 1] and [D E] refer to the same true UB. Our method can thus identify the same semantic behaviors even from real motion data.</p>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p>Plot of the first four dimensions of the input data and example segmentation results using our proposed method, obtained from two different runs. (Top) Toy data, with result from <bold>L2-AR1</bold>. (Bottom) CMU motion data, with result from <bold>CMU-AR1</bold>.</p></caption>
<graphic xlink:href="fcomp-02-546917-g0008.tif"/>
</fig>
</sec>
</sec>
<sec sec-type="discussion" id="s5">
<title>5. Discussion</title>
<p>To discover complex temporal patterns from the time series data via segmentation, we proposed a hierarchical non-parametric Bayesian approach. We combined the BP-AR-HMM and the double articulation by NPYLM to segment time series sequences under the assumption that they are generated from hierarchically-structured dynamical systems. In our results, we found that our method has better accuracy to discover temporal patterns than DAA for both the toy and real motion datasets. It may mean the necessity of dynamics to model local temporal patterns in time series data. Also, double articulation structure of our method would be suitable to extract semantic unit behaviors from unsegmented human motion sequences similar to DAA. In addition, our proposed method has another advantageous property over DAA. Our proposed method allows for asynchronous switching of segments, unlike DAA. It should be beneficial to extract temporal patterns from natural observation without any intervention, since we cannot expect consistent switching of behaviors under the natural observation. Despite these benefits, it should be noted that our proposed method is limited by its computational complexity. Furthermore, should the assumption of the sequence having a hierarchical structure not be met, our proposed method could not necessarily be appropriate to use.</p>
<p>Future directions are as follows: (1) using the estimated AR coefficients for interaction analysis and causality analysis, (2) a semi-supervised extension of the proposed method, and (3) automatic determination of AR order. Some methods of the causality analysis, e.g., Granger causality, are based on the AR models in their mathematical formulations. Therefore, we can use the estimated AR coefficient matrix to connect our method to causality analysis. With this combined approach, it will be possible to analyze switching causality. Next, it is usually difficult to have categorical labels for the entire dataset, but partial labels are easier to have. In this case, using semi-supervised segmentation could help improve the interpretability of results since some of the discovered components or states would correspond to the known categories. These labeled instances may also improve the identification of distribution of corresponding categories. A semi-supervised extension of our approach would thus be more effective to discover behavioral patterns. Finally, although we tried multiple settings of the AR order to select a model, automatic determination of AR order will solve this model selection problem.</p>
<p>We then conclude that our method can extract temporal patterns from multiple time series sequences by segmenting these sequences into low-level and high-level segments. Our method showed superior performance to a method called double articulation analyzer. Moreover, even when it discovered different low-level segments, our method can absorb such variation, and properly and consistently identify high-level segments.</p>
</sec>
<sec sec-type="data-availability-statement" id="s6">
<title>Data Availability Statement</title>
<p>The datasets analyzed for this study can be found in the CMU Graphics Lab-Motion Capture Library (CMU, <xref ref-type="bibr" rid="B5">2009</xref>).</p>
</sec>
<sec id="s7">
<title>Author Contributions</title>
<p>JB, TK, and KI contributed to the design and implementation of the research, to the analysis of the results, and to the writing of the manuscript. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec id="s8">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alazrai</surname> <given-names>R.</given-names></name> <name><surname>Mowafi</surname> <given-names>Y.</given-names></name> <name><surname>Lee</surname> <given-names>C. S. G.</given-names></name></person-group> (<year>2015</year>). <article-title>Anatomical-plane-based representation for human-human interactions analysis</article-title>. <source>Pattern Recogn</source>. <volume>48</volume>, <fpage>2346</fpage>&#x02013;<lpage>2363</lpage>. <pub-id pub-id-type="doi">10.1016/j.patcog.2015.03.002</pub-id><pub-id pub-id-type="pmid">25571343</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Barbi&#x0010D;</surname> <given-names>J.</given-names></name> <name><surname>Safonova</surname> <given-names>A.</given-names></name> <name><surname>Pan</surname> <given-names>J.-Y.</given-names></name> <name><surname>Faloutsos</surname> <given-names>C.</given-names></name> <name><surname>Hodgins</surname> <given-names>J.</given-names></name> <name><surname>Pollard</surname> <given-names>N.</given-names></name></person-group> (<year>2004</year>). <article-title>&#x0201C;Segmenting motion capture data into distinct behaviors,&#x0201D;</article-title> in <source>Proceedings of Graphics Interface</source> (<publisher-loc>London, ON</publisher-loc>), <fpage>185</fpage>&#x02013;<lpage>194</lpage>.</citation></ref>
<ref id="B3">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Briones</surname> <given-names>J.</given-names></name> <name><surname>Kubo</surname> <given-names>T.</given-names></name> <name><surname>Ikeda</surname> <given-names>K.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;A segmentation approach for interaction analysis using non-parametric Bayesian methods,&#x0201D;</article-title> in <source>Proceedings of the 62nd Annual Conference of the Institute of Systems, Control and Information Engineers (ISCIE)</source> (<publisher-loc>Kyoto</publisher-loc>).</citation></ref>
<ref id="B4">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bruno</surname> <given-names>B.</given-names></name> <name><surname>Mastrogiovanni</surname> <given-names>F.</given-names></name> <name><surname>Sgorbissa</surname> <given-names>A.</given-names></name> <name><surname>Vernazza</surname> <given-names>T.</given-names></name> <name><surname>Zaccaria</surname> <given-names>R.</given-names></name></person-group> (<year>2012</year>). <article-title>&#x0201C;Human motion modelling and recognition: a computational approach,&#x0201D;</article-title> in <source>2012 IEEE International Conference on Automation Science and Engineering (CASE)</source> (<publisher-loc>Seoul</publisher-loc>), <fpage>156</fpage>&#x02013;<lpage>161</lpage>. <pub-id pub-id-type="doi">10.1109/CoASE.2012.6386410</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="web"><person-group person-group-type="author"><collab>CMU</collab></person-group> (<year>2009</year>). <source>Carnegie Mellon University Graphics Lab&#x02013;Motion Capture Library</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://mocap.cs.cmu.edu/">http://mocap.cs.cmu.edu/</ext-link></citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fine</surname> <given-names>S.</given-names></name> <name><surname>Singer</surname> <given-names>Y.</given-names></name> <name><surname>Tishby</surname> <given-names>N.</given-names></name></person-group> (<year>1998</year>). <article-title>The hierarchical hidden markov model: analysis and applications</article-title>. <source>Mach Learn</source>. <volume>32</volume>, <fpage>41</fpage>&#x02013;<lpage>62</lpage>. <pub-id pub-id-type="doi">10.1023/A:1007469218079</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Fox</surname> <given-names>E.</given-names></name> <name><surname>Sudderth</surname> <given-names>E.</given-names></name> <name><surname>Jordan</surname> <given-names>M.</given-names></name> <name><surname>andWillsky</surname> <given-names>A.</given-names></name></person-group> (<year>2009</year>). <article-title>&#x0201C;Sharing features among dynamical systems with beta processes,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems, Vol. 22</source>, eds Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta (<publisher-loc>Vancouver, BC</publisher-loc>: <publisher-name>Curran Associates, Inc</publisher-name>), <fpage>549</fpage>&#x02013;<lpage>557</lpage>.</citation></ref>
<ref id="B8">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Fox</surname> <given-names>E.</given-names></name> <name><surname>Sudderth</surname> <given-names>E.</given-names></name> <name><surname>Jordan</surname> <given-names>M.</given-names></name> <name><surname>Willsky</surname> <given-names>A.</given-names></name></person-group> (<year>2008a</year>). <article-title>&#x0201C;An HDP-HMM for systems with state persistence,&#x0201D;</article-title> in <source>Proceedings of the 25th International Conference on Machine Learning</source> (<publisher-loc>Helsinki</publisher-loc>), <fpage>312</fpage>&#x02013;<lpage>319</lpage>. <pub-id pub-id-type="doi">10.1145/1390156.1390196</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Fox</surname> <given-names>E.</given-names></name> <name><surname>Sudderth</surname> <given-names>E.</given-names></name> <name><surname>Jordan</surname> <given-names>M.</given-names></name> <name><surname>Willsky</surname> <given-names>A.</given-names></name></person-group> (<year>2008b</year>). <article-title>&#x0201C;Nonparametric bayesian learning of switching linear dynamical systems,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems, Vol. 21</source>, eds D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou (<publisher-loc>Vancouver, BC</publisher-loc>: <publisher-name>Curran Associates, Inc</publisher-name>), <fpage>457</fpage>&#x02013;<lpage>464</lpage>.</citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fox</surname> <given-names>E.</given-names></name> <name><surname>Sudderth</surname> <given-names>E.</given-names></name> <name><surname>Jordan</surname> <given-names>M.</given-names></name> <name><surname>Willsky</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <article-title>Joint modeling of multiple time series via the beta process with application to motion capture segmentation</article-title>. <source>Ann. Appl. Stat</source>. <volume>8</volume>, <fpage>1281</fpage>&#x02013;<lpage>1313</lpage>. <pub-id pub-id-type="doi">10.1214/14-AOAS742</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gilson</surname> <given-names>M.</given-names></name> <name><surname>Tauste Campo</surname> <given-names>A.</given-names></name> <name><surname>Thiele</surname> <given-names>A.</given-names></name> <name><surname>Deco</surname> <given-names>G.</given-names></name></person-group> (<year>2017</year>). <article-title>Nonparametric test for connectivity detection in multivariate autoregressive networks and application to multiunit activity data</article-title>. <source>Netw. Neurosci</source>. <volume>1</volume>, <fpage>357</fpage>&#x02013;<lpage>380</lpage>. <pub-id pub-id-type="doi">10.1162/NETN_a_00019</pub-id><pub-id pub-id-type="pmid">30090871</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Grigore</surname> <given-names>E. C.</given-names></name> <name><surname>Scassellati</surname> <given-names>B.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x0201C;Discovering action primitive granularity from human motion for human-robot collaboration,&#x0201D;</article-title> in <source>Proceedings of Robotics: Science and Systems</source> (<publisher-loc>Cambridge, MA</publisher-loc>).</citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Harrison</surname> <given-names>L.</given-names></name> <name><surname>Penny</surname> <given-names>W.</given-names></name> <name><surname>Friston</surname> <given-names>K.</given-names></name></person-group> (<year>2003</year>). <article-title>Multivariate autoregressive modeling of fMRI time series</article-title>. <source>Neuroimage</source> <volume>19</volume>, <fpage>1477</fpage>&#x02013;<lpage>1491</lpage>. <pub-id pub-id-type="doi">10.1016/S1053-8119(03)00160-5</pub-id><pub-id pub-id-type="pmid">12948704</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Heller</surname> <given-names>K.</given-names></name> <name><surname>Teh</surname> <given-names>Y.W.</given-names></name> <name><surname>Gorur</surname> <given-names>D.</given-names></name></person-group> (<year>2009</year>). <article-title>&#x0201C;Infinite hierarchical hidden Markov models,&#x0201D;</article-title> in <source>Artificial Intelligence and Statistics</source>, eds D. van Dyk and M. Welling (<publisher-loc>Clearwater, FL</publisher-loc>: <publisher-name>PMLR</publisher-name>), <fpage>224</fpage>&#x02013;<lpage>231</lpage>.</citation></ref>
<ref id="B15">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Hughes</surname> <given-names>M.</given-names></name></person-group> (<year>2016</year>). <source>NPBayesHMM: Nonparametric Bayesian HMM Toolbox, for Matlab</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://github.com/michaelchughes/NPBayesHMM">https://github.com/michaelchughes/NPBayesHMM</ext-link></citation></ref>
<ref id="B16">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hughes</surname> <given-names>M.</given-names></name> <name><surname>Fox</surname> <given-names>E.</given-names></name> <name><surname>Sudderth</surname> <given-names>E.</given-names></name></person-group> (<year>2012</year>). <article-title>&#x0201C;Effective split-merge Monte Carlo methods for nonparametric models of sequential data,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems, Vol. 25</source>, eds F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (<publisher-loc>Lake Tahoe, CA</publisher-loc>: <publisher-name>Curran Associates, Inc</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>9</lpage>.</citation></ref>
<ref id="B17">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Keogh</surname> <given-names>E.</given-names></name> <name><surname>Chu</surname> <given-names>S.</given-names></name> <name><surname>Hart</surname> <given-names>D.</given-names></name> <name><surname>Pazzani</surname> <given-names>M.</given-names></name></person-group> (<year>2004</year>). <article-title>&#x0201C;Segmenting time series: a survey and novel approach,&#x0201D;</article-title> in <source>Data Mining in Time Series Databases</source>, eds M. Last, A. Kandel, and H. Bunke (<publisher-name>World Scientific</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>21</lpage>. <pub-id pub-id-type="doi">10.1142/9789812565402_0001</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Mochihashi</surname> <given-names>D.</given-names></name> <name><surname>Yamada</surname> <given-names>T.</given-names></name> <name><surname>Ueda</surname> <given-names>N.</given-names></name></person-group> (<year>2009</year>). <article-title>&#x0201C;Bayesian unsupervised word segmentation with nested Pitman-Yor language modeling,&#x0201D;</article-title> in <source>Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1&#x02013;ACL-IJCNLP &#x00027;09</source> (<publisher-loc>Singapore</publisher-loc>), <volume>Vol. 1</volume>, <fpage>100</fpage>&#x02013;<lpage>108</lpage>. <pub-id pub-id-type="doi">10.3115/1687878.1687894</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Neubig</surname> <given-names>G.</given-names></name></person-group> (<year>2016</year>). <source>latticelm</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://github.com/neubig/latticelm">https://github.com/neubig/latticelm</ext-link></citation></ref>
<ref id="B20">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Neubig</surname> <given-names>G.</given-names></name> <name><surname>Mimura</surname> <given-names>M.</given-names></name> <name><surname>Mori</surname> <given-names>S.</given-names></name> <name><surname>Kawahara</surname> <given-names>T.</given-names></name></person-group> (<year>2010</year>). <article-title>&#x0201C;Learning a language model from continuous speech,&#x0201D;</article-title> in <source>11th Annual Conference of the International Speech Communication Association (InterSpeech 2010)</source> (<publisher-loc>Makuhari</publisher-loc>), <fpage>1053</fpage>&#x02013;<lpage>1056</lpage>.</citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ryoo</surname> <given-names>M.</given-names></name> <name><surname>Aggarwal</surname> <given-names>J.</given-names></name></person-group> (<year>2009</year>). <article-title>Semantic representation and recognition of continued and recursive human activities</article-title>. <source>Int. J. Comput. Vis</source>. <volume>82</volume>, <fpage>1</fpage>&#x02013;<lpage>24</lpage>. <pub-id pub-id-type="doi">10.1007/s11263-008-0181-1</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Saeedi</surname> <given-names>A.</given-names></name> <name><surname>Hoffman</surname> <given-names>M.</given-names></name> <name><surname>Johnson</surname> <given-names>M.</given-names></name> <name><surname>Adams</surname> <given-names>R.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;The segmented iHMM: a simple, efficient hierarchical infinite HMM,&#x0201D;</article-title> in <source>International Conference on Machine Learning</source> (<publisher-loc>New York, NY</publisher-loc>), <fpage>2682</fpage>&#x02013;<lpage>2691</lpage>.</citation></ref>
<ref id="B23">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Taniguchi</surname> <given-names>T.</given-names></name> <name><surname>Nagasaka</surname> <given-names>S.</given-names></name></person-group> (<year>2011</year>). <article-title>&#x0201C;Double articulation analyzer for unsegmented human motion using Pitman-Yor language model and infinite hidden Markov model,&#x0201D;</article-title> in <source>2011 IEEE/SICE International Symposium on System Integration, SII 2011</source> (<publisher-loc>Kyoto</publisher-loc>), <fpage>250</fpage>&#x02013;<lpage>255</lpage>. <pub-id pub-id-type="doi">10.1109/SII.2011.6147455</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Taniguchi</surname> <given-names>T.</given-names></name> <name><surname>Nagasaka</surname> <given-names>S.</given-names></name> <name><surname>Nakashima</surname> <given-names>R.</given-names></name></person-group> (<year>2016</year>). <article-title>Nonparametric Bayesian double articulation analyzer for direct language acquisition from continuous speech signals</article-title>. <source>IEEE Trans. Cogn. Dev. Syst</source>. <volume>8</volume>, <fpage>171</fpage>&#x02013;<lpage>185</lpage>. <pub-id pub-id-type="doi">10.1109/TCDS.2016.2550591</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Viviani</surname> <given-names>P.</given-names></name> <name><surname>Cenzato</surname> <given-names>M.</given-names></name></person-group> (<year>1985</year>). <article-title>Segmentation and coupling in complex movements</article-title>. <source>J. Exp. Psychol. Hum. Percept. Perform</source>. <volume>11</volume>:<fpage>828</fpage>. <pub-id pub-id-type="doi">10.1037/0096-1523.11.6.828</pub-id><pub-id pub-id-type="pmid">2934511</pub-id></citation></ref>
<ref id="B26">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Williams</surname> <given-names>B.</given-names></name> <name><surname>Toussaint</surname> <given-names>M.</given-names></name> <name><surname>Storkey</surname> <given-names>A.</given-names></name></person-group> (<year>2007</year>). <article-title>&#x0201C;Modelling motion primitives and their timing in biologically executed movements,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems, Vol. 20</source>, eds J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis (<publisher-loc>Vancouver, BC</publisher-loc>: <publisher-name>Curran Associates, Inc</publisher-name>), <fpage>1609</fpage>&#x02013;<lpage>1616</lpage>.</citation></ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zeger</surname> <given-names>S.</given-names></name> <name><surname>Irizarry</surname> <given-names>R.</given-names></name> <name><surname>Peng</surname> <given-names>R.</given-names></name></person-group> (<year>2006</year>). <article-title>On time series analysis of public health and biomedical data</article-title>. <source>Annu. Rev. Public Health</source> <volume>27</volume>, <fpage>57</fpage>&#x02013;<lpage>79</lpage>. <pub-id pub-id-type="doi">10.1146/annurev.publhealth.26.021304.144517</pub-id><pub-id pub-id-type="pmid">16533109</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>F.</given-names></name> <name><surname>Torre</surname> <given-names>F. D. L.</given-names></name> <name><surname>Hodgins</surname> <given-names>J. K.</given-names></name></person-group> (<year>2013</year>). <article-title>Hierarchical aligned cluster analysis for temporal clustering of human motion</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell</source>. <volume>35</volume>, <fpage>582</fpage>&#x02013;<lpage>596</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2012.137</pub-id><pub-id pub-id-type="pmid">22732658</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn fn-type="financial-disclosure"><p><bold>Funding.</bold> This work was partially supported by JSPS KAKENHI 17H05863, 25118009, 15K16395, 17H05979, 16H06569, and 18K18108.</p></fn>
</fn-group>
</back>
</article>