<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Artif. Intell.</journal-id>
<journal-title>Frontiers in Artificial Intelligence</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Artif. Intell.</abbrev-journal-title>
<issn pub-type="epub">2624-8212</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/frai.2022.868085</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Artificial Intelligence</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Dichotomic Pattern Mining Integrated With Constraint Reasoning for Digital Behavior Analysis</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Ghosh</surname> <given-names>Sohom</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1661691/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Yadav</surname> <given-names>Shefali</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1877898/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Wang</surname> <given-names>Xin</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1770477/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Chakrabarty</surname> <given-names>Bibhash</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1877956/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Kad&#x00131;o&#x0011F;lu</surname> <given-names>Serdar</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1661652/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>AI Center of Excellence, Fidelity Investments</institution>, <addr-line>Boston, MA</addr-line>, <country>United States</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Computer Science, Brown University</institution>, <addr-line>Providence, RI</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Xiaomo Liu, J.P. Morgan AI Research, United States</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Rasha Kashef, Ryerson University, Canada; Yanci Zhang, University of Pennsylvania, United States</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Serdar Kad&#x00131;o&#x0011F;lu <email>firstname.lastname&#x00040;fmr.com</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Artificial Intelligence in Finance, a section of the journal Frontiers in Artificial Intelligence</p></fn></author-notes>
<pub-date pub-type="epub">
<day>12</day>
<month>07</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>5</volume>
<elocation-id>868085</elocation-id>
<history>
<date date-type="received">
<day>02</day>
<month>02</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>14</day>
<month>06</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2022 Ghosh, Yadav, Wang, Chakrabarty and Kad&#x00131;o&#x0011F;lu.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Ghosh, Yadav, Wang, Chakrabarty and Kad&#x00131;o&#x0011F;lu</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license></permissions>
<abstract>
<p>Sequential pattern mining remains a challenging task due to the large number of redundant candidate patterns and the exponential search space. In addition, further analysis is still required to map extracted patterns to different outcomes. In this paper, we introduce a pattern mining framework that operates on semi-structured datasets and exploits the dichotomy between outcomes. Our approach takes advantage of constraint reasoning to find sequential patterns that occur frequently and exhibit desired properties. This allows the creation of novel pattern embeddings that are useful for knowledge extraction and predictive modeling. Based on dichotomic pattern mining, we present two real-world applications for customer intent prediction and intrusion detection. Overall, our approach plays an integrator role between semi-structured sequential data and machine learning models, improves the performance of the downstream task, and retains interpretability.</p></abstract>
<kwd-group>
<kwd>dichotomic pattern mining</kwd>
<kwd>sequential pattern mining</kwd>
<kwd>semi-structured clickstream datasets</kwd>
<kwd>digital behavior analysis</kwd>
<kwd>intent prediction</kwd>
<kwd>intrusion detection</kwd>
<kwd>constraint-based sequential pattern mining</kwd>
<kwd>knowledge extraction and representation</kwd>
</kwd-group>
<counts>
<fig-count count="5"/>
<table-count count="6"/>
<equation-count count="0"/>
<ref-count count="41"/>
<page-count count="13"/>
<word-count count="9188"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Sequential Pattern Mining (SPM) is highly relevant in various practical applications including the analysis of medical treatment history (Bou Rjeily et al., <xref ref-type="bibr" rid="B12">2019</xref>), customer purchases (Requena et al., <xref ref-type="bibr" rid="B35">2020</xref>), segmentation (Kuruba Manjunath and Kashef, <xref ref-type="bibr" rid="B27">2021</xref>), call patterns, and digital clickstream (Agrawal and Srikant, <xref ref-type="bibr" rid="B1">1995</xref>; Srikant and Agrawal, <xref ref-type="bibr" rid="B36">1996</xref>). A recent survey that covers SPM fundamentals and applications can be found in Gan et al. (<xref ref-type="bibr" rid="B18">2019</xref>). In SPM, we are given a set of sequences that is referred to as <italic>sequence database</italic>. As shown in the example in <xref ref-type="table" rid="T1">Table 1</xref>, each sequence is an ordered set of <italic>items</italic>. Each item might be associated with a set of <italic>attributes</italic> to capture item properties, e.g., price, timestamp. A <italic>pattern</italic> is a subsequence that occurs in at least one sequence in the database maintaining the original ordering of items. The number of sequences that contain a pattern defines the <italic>frequency</italic>. Given a sequence database, SPM is aimed at finding patterns that occur more than a certain frequency threshold.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Example sequence database with three sequences and two attributes, price and timestamp.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Sequence database &#x02329;(item, price, timestamp)&#x0232A;</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">&#x02329;(A, 5, 1), (A, 5, 1), (B, 3, 2), (A, 8, 3), (D, 2, 3)&#x0232A;</td>
</tr>
<tr>
<td valign="top" align="left">&#x02329;(C, 1, 3), (B, 3, 8), (A, 3, 9)&#x0232A;</td>
</tr>
<tr>
<td valign="top" align="left">&#x02329;(C, 4, 2), (A, 5, 5), (C, 2, 5), (D, 1, 7)&#x0232A;</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In practice, finding the entire set of frequent patterns in a sequence database is not the ultimate goal. The number of patterns is typically too large and may not provide significant insights. It is thus important to search for patterns that are not only frequent but also capture specific properties of the application at hand. This has motivated research in Constraint-based SPM (CSPM) (Pei et al., <xref ref-type="bibr" rid="B33">2007</xref>; Chen et al., <xref ref-type="bibr" rid="B13">2008</xref>). The goal of CSPM is to incorporate constraint reasoning into sequential pattern mining to find smaller subsets of interesting patterns.</p>
<p>As an example, let us consider online retail clickstream analysis. We might not be interested in all frequent browsing patterns. For instance, the pattern &#x02329;<italic>login, logout</italic>&#x0232A; is likely to be frequent but offers little value. Instead, we seek recurring clickstream patterns with unique properties, e.g., frequent patterns from sessions where users spend at least a minimum amount of time on a particular set of items with a specific price range. Such constraints help reduce the search space for the mining task and help discover patterns that are more effective in knowledge discovery than arbitrarily frequent clickstreams.</p>
<p>In this paper, our algorithmic focus is on embedding constraint-based sequential patterns in a framework that exploits the dichotomy of positive vs. negative outcomes in user cohorts. We refer to this framework as Dichotomic Pattern Mining (DPM) and present it in detail in Section 4. DPM offers several benefits over SPM and CSPM as discussed in Section 4.1. Our practical focus is on industry-relevant applications of digital behavior analysis. Specifically, we consider two scenarios of predicting the intent of users. The first application is a shopper vs. non-shopper identification in e-commerce applications. The second application is the detection of hostile vs non-hostile users in intrusion detection for security applications. The experimental results given in Section 7 demonstrate that our DPM framework yields significant improvements on such prediction tasks when compared to traditional approaches such as SPM and CSPM as well as modern machine learning approaches such as LSTMs (Hochreiter and Schmidhuber, <xref ref-type="bibr" rid="B22">1997</xref>).</p>
<p>More broadly, the analysis of digital behavior is an integral part of designing online experiences that are geared toward user needs. Customer intent prediction and intrusion detection are particularly important knowledge discovery tasks. Successful applications enabled by DPM hold the potential to boost performance and security in various domains including recommendation systems in e-commerce, virtual agents in retail, and conversational AI across the enterprise. Our paper contributes to this line of research with the introduction of the DPM framework and applications on digital behavior analysis. Overall, our contributions in this paper can be summarized as follows:</p>
<list list-type="bullet">
<list-item><p>We introduce the Dichotomic Pattern Mining (DPM) framework that extends our recent work (Wang and Kadioglu, <xref ref-type="bibr" rid="B38">2022</xref>) with further details and experiments.</p></list-item>
<list-item><p>We show how to apply DPM in practice as an integration technology between raw data and machine learning models for downstream prediction tasks.</p></list-item>
<list-item><p>We demonstrate two successful applications of DPM for digital behavior analysis on shopping intent prediction and intrusion detection. We choose these two applications because (i) we seek real-world dataset for industrially-relevant applications, and (ii) we seek different behavior characteristics. The intrusion detection targets a rare-event identification while the e-commerce dataset is relatively more balanced in terms of shopping behavior.</p></list-item>
</list>
<p>In the remainder, firstly we start with an overview of related topics and we position our work within the literature (Section 2). Secondly we illustrate a pattern mining example (Section 3) to make the ideas of SPM and CSPM more concrete. We then present the details of our dichotomic pattern mining framework and highlight its benefits(Section 4). Next, we share a system architecture (Section 5) that shows DPM as an integration technology. We focus on digital behavior analysis (Section 6) and demonstrate two real-world applications (Section 7) where DPM serves as an integrator between raw data and machine learning models in downstream tasks for customer intent prediction and intruder detection. Finally, we conclude the paper.</p></sec>
<sec id="s2">
<title>2. Related Work</title>
<p>Pattern mining benefits from a wealth of literature and enjoys several practical applications. Historically, sequential pattern mining was introduced in the context of market basket analysis (Agrawal and Srikant, <xref ref-type="bibr" rid="B1">1995</xref>) with several algorithm such as GSP (Srikant and Agrawal, <xref ref-type="bibr" rid="B36">1996</xref>), PrefixSpan (Pei et al., <xref ref-type="bibr" rid="B32">2001</xref>), SPADE (Zaki, <xref ref-type="bibr" rid="B41">2001</xref>), and SPAM (Ayres et al., <xref ref-type="bibr" rid="B5">2002</xref>). Mining the complete set of patterns imposes high computational costs and contains a large number of redundant patterns. Thus, CSPM is proposed to alleviate this problem (Bonchi and Lucchese, <xref ref-type="bibr" rid="B10">2005</xref>; Nijssen and Zimmermann, <xref ref-type="bibr" rid="B30">2014</xref>; Aoga et al., <xref ref-type="bibr" rid="B3">2017</xref>). Constraint Programming and graphical representation of the sequence database have been shown to perform well for CSPM (Guns et al., <xref ref-type="bibr" rid="B21">2017</xref>; Kemmar et al., <xref ref-type="bibr" rid="B26">2017</xref>; Borah and Nath, <xref ref-type="bibr" rid="B11">2018</xref>; Hosseininasab et al., <xref ref-type="bibr" rid="B23">2019</xref>).</p>
<p>Our proposal for Dichotomic Pattern Mining (DPM) is related to the supervised descriptive rule discovery (SDRD) framework (Novak et al., <xref ref-type="bibr" rid="B31">2009</xref>). Specific mining tasks in SDRD include Emerging Pattern Mining (EPM), Subgroup Discovery and Contrast set mining. EPM is a data mining task aimed at the detection of differentiating characteristics between classes (Garc&#x000ED;a-Vico et al., <xref ref-type="bibr" rid="B20">2017</xref>; Pellegrina et al., <xref ref-type="bibr" rid="B34">2019</xref>), where the discriminative patterns whose support increases significantly from one class to another are identified. Subgroup discovery identifies subsets of a dataset according to an interesting behavior with respect to certain criteria applied to a property (Atzmueller, <xref ref-type="bibr" rid="B4">2015</xref>). Another closely related approach, contrast set mining, tries to find patterns with the high difference of support across different data groups (Bay and Pazzani, <xref ref-type="bibr" rid="B6">2001</xref>). On one hand, DPM can be seen as a special case of these existing approaches focused on the dichotomy of positive and negative outcomes. On the other hand, DPM offers several unique aspects that contribute to this line of research. First, DPM employs <italic>sequential</italic> pattern mining coupled with constraint-reasoning based on the recent work (Wang and Kadioglu, <xref ref-type="bibr" rid="B38">2022</xref>). This is a unique feature of our application that is not considered before. Second, we envision DPM as an integration technology and provide an end-to-end system architecture that competes with sequence-to-pattern generation and pattern-to-feature generation. Finally, we focus on the analysis of digital behavior and utilize our DPM framework for two real-world applications.</p>
<p>Regarding pattern mining tools to utilize in practice, the Python tech stack lacks readily available libraries. Although a few Python libraries exist for SPM (see, e.g., Dagenais, <xref ref-type="bibr" rid="B15">2016</xref>; Gao, <xref ref-type="bibr" rid="B19">2019</xref>), <monospace>Seq2Pat</monospace> is the first CSPM library in Python that supports several anti-monotone and non-monotone constraint types. Unfortunately, other CSPM implementations are either not available in Python, hence missing the opportunity to integrate with ML applications, or limited to a few constraint types, most commonly, gap, maximum span, and regular expressions (Yu and Hayato, <xref ref-type="bibr" rid="B40">2006</xref>; Aoga et al., <xref ref-type="bibr" rid="B2">2016</xref>; Fournier-Viger et al., <xref ref-type="bibr" rid="B16">2016</xref>; Bermingham, <xref ref-type="bibr" rid="B9">2018</xref>). Powered by <monospace>Seq2Pat</monospace>, DPM offers a complete end-to-end solution.</p>
</sec>
<sec id="s3">
<title>3. Illustrative Example</title>
<p>Let us present the running example from <xref ref-type="table" rid="T1">Table 1</xref> to make the idea behind pattern mining more concrete. We first examine the patterns found by <italic>sequential</italic> mining and then extend it to <italic>constraint-based</italic> mining.</p>
<sec>
<title>3.1. Sequential Pattern Mining</title>
<p>In this example, the database is represented as a list of sequences each with a list of items. There are three sequences over the item set {<italic>A, B, C, D</italic>}. Assume we are searching for patterns that occur in at least two sequences. This is typically referred to as the minimum frequency threshold. When we are not enforcing any constraints, we discover three patterns {[<italic>A, D</italic>], [<italic>B, A</italic>], [<italic>C, A</italic>]} subject to minimum frequency threshold. Notice that each pattern occurs in exactly two sequences satisfying the minimum frequency. More specifically;</p>
<list list-type="bullet">
<list-item><p>The pattern [<italic>A, D</italic>] is a subsequence of the first and the third sequence.</p></list-item>
<list-item><p>The pattern [<italic>B, A</italic>] is a subsequence of the first and the second sequence.</p></list-item>
<list-item><p>The pattern [<italic>C, A</italic>] is a subsequence of the second and the third sequence.</p></list-item>
</list>
</sec>
<sec>
<title>3.2. Constraint-Based Sequential Pattern Mining</title>
<p>Next, we extend the example with more data to introduce various constraints to enforce desired properties on resulting patterns. As shown in <xref ref-type="table" rid="T1">Table 1</xref>, we incorporate two attributes; price and timestamp. Conceptually, the idea is to capture frequent patterns in the database from users who have spent at least a minimum amount of time on certain items within specific price ranges. For instance, we can introduce a constraint to restrict the average price of items in a pattern to be between the range [3,4]. The task now becomes constraint-based sequential pattern mining. Initially, we found three patterns {[<italic>A, D</italic>], [<italic>B, A</italic>], [<italic>C, A</italic>]} with a frequency threshold of two. When we introduce the average price [<italic>A, D</italic>] is the only remaining pattern that meets the constraint with the same support. The other patterns do not satisfy the conditions.</p>
<p>Let us examine the details of constraint satisfaction that is entailed for the first pattern. The important observation is that the first sequence in the database exhibits three different subsequences of [<italic>A, D</italic>]. Notice the subsequences exhibit different price averages. In the first sequence, the first and the second occurrence of [<italic>A, D</italic>] has <italic>price</italic>_<italic>average</italic>([5, 2]) &#x0003D; 3.5 while the third occurrence has <italic>price</italic>_<italic>average</italic>([8, 2]) &#x0003D; 5. The first two subsequences are feasible with respect to the price constraint, while the third subsequence is infeasible. One satisfying subsequence suffices for constraint feasibility. Hence the first sequence supports the pattern [<italic>A, D</italic>]. Similarly, the last sequence in the database satisfies the constraint with <italic>price</italic>_<italic>average</italic>([5, 1]) &#x0003D; 3 for the [<italic>A, D</italic>] pattern. Two sequences supporting the pattern for the price average meet the minimum required frequency condition. The other patterns do not satisfy the constraints and frequency conditions.</p>
<p>The technology behind our approach for constraint-based sequential pattern mining is based on Multi-valued Decision Diagrams (MDDs) (Bergman et al., <xref ref-type="bibr" rid="B8">2016</xref>). MDDs are widely used as efficient data structures (Wegener, <xref ref-type="bibr" rid="B39">2000</xref>) and for discrete optimization (Bergman et al., <xref ref-type="bibr" rid="B8">2016</xref>). More recently, MDDs were utilized for CSPM (Hosseininasab et al., <xref ref-type="bibr" rid="B23">2019</xref>) to encode the sequences and associated attributes of the sequence databases. The MDD approach accommodates multiple item attributes, such as price and timestamp in our running example, and various constraint types, such as the average constraint among others. The approach is shown to be competitive with or superior to existing CSPM algorithms in terms of scalability and efficiency (Hosseininasab et al., <xref ref-type="bibr" rid="B23">2019</xref>). We recently released the <monospace>Seq2Pat</monospace><xref ref-type="fn" rid="fn0001"><sup>1</sup></xref> library to make this efficient algorithm accessible to a broad audience with a user-friendly interface (Wang et al., <xref ref-type="bibr" rid="B37">2022</xref>).</p>
<p>The application of <monospace>Seq2Pat</monospace> to address CSPM is an integral part of our work. In the next section, we present the details of how to integrate CSPM, <italic>via</italic> <monospace>Seq2Pat</monospace>, into the dichotomic pattern mining framework.</p>
</sec>
</sec>
<sec id="s4">
<title>4. Dichotomic Pattern Mining</title>
<p>We now describe dichotomic pattern mining (DPM<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref>) that operates over sequence databases augmented with binary labels denoting positive and negative outcomes. Our experiments in Section 7, consider customer intent prediction and user intrusion behavior as the outcome variables.</p>
<p><xref ref-type="table" rid="T6">Algorithm 1</xref> presents our generic approach for dichotomic pattern mining that encapsulates constraint-based sequential pattern mining as a sub-procedure. The algorithm receives a sequence database, <inline-formula><mml:math id="M1"><mml:mi mathvariant="-tex-caligraphic">SD</mml:mi></mml:math></inline-formula>, containing <italic>N</italic> sequences {<italic>S</italic><sub>1</sub>, <italic>S</italic><sub>2</sub>, &#x02026;, <italic>S</italic><sub><italic>N</italic></sub>}. Each sequence represents a customer&#x00027;s behaviors in time order, for example, the digital clicks in one session. Sequences are associated with binary labels, <inline-formula><mml:math id="M2"><mml:mi mathvariant="-tex-caligraphic">Y</mml:mi></mml:math></inline-formula>, indicating the outcome of the <italic>i</italic>-th sequence, <inline-formula><mml:math id="M3"><mml:mrow><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x000A0;&#x02208;&#x000A0;</mml:mo><mml:mi mathvariant="-tex-caligraphic">Y</mml:mi></mml:mrow></mml:math></inline-formula> where <italic>i</italic> &#x02208; {1, 2, &#x022EF;&#x000A0;, <italic>N</italic>}, to be positive (&#x022A4;) or negative (&#x022A5;), e.g., purchase or non-purchase.</p>
<p>As in our example in <xref ref-type="table" rid="T1">Table 1</xref>, the items in each sequence are associated with a set of attributes &#x1D538; &#x0003D; {<inline-formula><mml:math id="M4"><mml:mi mathvariant="-tex-caligraphic">A</mml:mi></mml:math></inline-formula>, &#x02026;, <inline-formula><mml:math id="M5"><mml:mi mathvariant="-tex-caligraphic">A</mml:mi></mml:math></inline-formula><sub>|&#x1D538;|</sub>}. There is a set of functions <italic>C</italic><sub><italic>type</italic></sub>(&#x000B7;) imposed on attributes with a certain type of operation. For example, <inline-formula><mml:math id="M6"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mi>a</mml:mi><mml:mi>v</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi mathvariant="-tex-caligraphic">A</mml:mi><mml:mrow><mml:mi>p</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></inline-formula>&#x02265;20 requires a pattern to have minimum average price 20. Similarly, there is a minimum threshold &#x003B8; as frequency lower bound. Given two sets <italic>A</italic> and <italic>B</italic>, we let <italic>A</italic>\<italic>B</italic> denote removing the elements of <italic>A</italic> that are also in <italic>B</italic> and <italic>A</italic>&#x02229;<italic>B</italic> denote the intersection of the two sets.</p>
<table-wrap position="float" id="T6">
<label>Algorithm 1</label>
<caption><p>Dichotomic pattern mining.</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr><td align="left" valign="top"><monospace><bold>Input:</bold> <italic>Sequence database</italic> <inline-formula><mml:math id="M7"><mml:mi mathvariant="-tex-caligraphic">SD</mml:mi></mml:math></inline-formula> &#x0003D; {<italic>S</italic><sub>1</sub>, <italic>S</italic><sub>2</sub>, &#x02026;, <italic>S</italic><sub><italic>N</italic></sub>}</monospace></td></tr>
<tr><td align="left" valign="top"><monospace><bold>Input:</bold> <italic>Binary label for sequences</italic> <inline-formula><mml:math id="M8"><mml:mi mathvariant="-tex-caligraphic">Y</mml:mi></mml:math></inline-formula>, <italic>Y</italic><sub><italic>i</italic></sub>&#x000A0;&#x02208;&#x000A0;<inline-formula><mml:math id="M9"><mml:mi mathvariant="-tex-caligraphic">Y</mml:mi></mml:math></inline-formula>, <italic>Y</italic><sub><italic>i</italic></sub> &#x0003D; {&#x022A4;, &#x022A5;} and <italic>i</italic>&#x02208;{1, 2, &#x022EF;&#x000A0;, <italic>N</italic>}</monospace></td></tr>
<tr><td align="left" valign="top"><monospace><bold>Input:</bold> <italic>Minimum frequency threshold for positive and negative sets</italic>, &#x003B8;<sub>&#x022A4;</sub> and &#x003B8;<sub>&#x022A5;</sub></monospace></td></tr>
<tr><td align="left" valign="top"><monospace><bold>Input:</bold> <italic>Pattern constraints for positive and negative sets</italic>, <inline-formula><mml:math id="M10"><mml:msubsup><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>y</mml:mi><mml:mi>p</mml:mi><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x022A4;</mml:mo></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>&#x000B7;</mml:mo></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M11"><mml:msubsup><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>y</mml:mi><mml:mi>p</mml:mi><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x022A5;</mml:mo></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>&#x000B7;</mml:mo></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula></monospace></td></tr>
<tr><td align="left" valign="top"><monospace><bold>Output:</bold> <italic>Frequent pattern sets</italic></monospace> <inline-formula><mml:math id="M12"><mml:mi mathvariant="-tex-caligraphic">P</mml:mi></mml:math></inline-formula></td></tr>
<tr><td align="left" valign="top"><monospace><bold>Step 1</bold>. Dichotomic split over the dataset</monospace></td></tr>
<tr><td align="left" valign="top"><monospace><italic>Pos</italic>&#x02190;{<italic>SD</italic><sub><italic>i</italic></sub>&#x02223;<italic>Y</italic><sub><italic>i</italic></sub> &#x0003D; &#x022A4;}</monospace></td></tr>
<tr><td align="left" valign="top"><monospace><italic>Neg</italic>&#x02190;{<italic>SD</italic><sub><italic>i</italic></sub>&#x02223;<italic>Y</italic><sub><italic>i</italic></sub> &#x0003D; &#x022A5;}</monospace></td></tr>
<tr><td align="left" valign="top"><monospace><bold>Step 2</bold>. Apply constraint-based frequent pattern mining</monospace></td></tr>
<tr><td align="left" valign="top"><monospace><inline-formula><mml:math id="M13"><mml:mi>P</mml:mi><mml:mi>o</mml:mi><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>f</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>q</mml:mi><mml:mi>u</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02190;</mml:mo><mml:mi>C</mml:mi><mml:mi>S</mml:mi><mml:mi>P</mml:mi><mml:mi>M</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>y</mml:mi><mml:mi>p</mml:mi><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x022A4;</mml:mo></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>&#x000B7;</mml:mo></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x022A4;</mml:mo></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula></monospace></td></tr>
<tr><td align="left" valign="top"><monospace><inline-formula><mml:math id="M14"><mml:mi>N</mml:mi><mml:mi>e</mml:mi><mml:msub><mml:mrow><mml:mi>g</mml:mi></mml:mrow><mml:mrow><mml:mi>f</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>q</mml:mi><mml:mi>u</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02190;</mml:mo><mml:mi>C</mml:mi><mml:mi>S</mml:mi><mml:mi>P</mml:mi><mml:mi>M</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>N</mml:mi><mml:mi>e</mml:mi><mml:mi>g</mml:mi><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>y</mml:mi><mml:mi>p</mml:mi><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x022A5;</mml:mo></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>&#x000B7;</mml:mo></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x022A5;</mml:mo></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula></monospace></td></tr>
<tr><td align="left" valign="top"><monospace><bold>Step 3</bold>. Find unique patterns and their interaction</monospace></td></tr>
<tr><td align="left" valign="top"><monospace><italic>Pos</italic><sub><italic>unique</italic></sub>&#x02190;<italic>Pos</italic><sub><italic>frequent</italic></sub>\<italic>Neg</italic><sub><italic>frequent</italic></sub></monospace></td></tr>
<tr><td align="left" valign="top"><monospace><italic>Neg</italic><sub><italic>unique</italic></sub>&#x02190;<italic>Neg</italic><sub><italic>frequent</italic></sub>\<italic>Pos</italic><sub><italic>frequent</italic></sub></monospace></td></tr>
<tr><td align="left" valign="top"><monospace><italic>PN</italic><sub><italic>common</italic></sub>&#x02190;<italic>Pos</italic><sub><italic>frequent</italic></sub>&#x02229;<italic>Neg</italic><sub><italic>frequent</italic></sub></monospace></td></tr>
<tr><td align="left" valign="top"><monospace><bold>Step 4</bold>. Patterns ready for pattern-to-feature generation</monospace></td></tr>
<tr><td align="left" valign="top"><monospace><inline-formula><mml:math id="M15"><mml:mi mathvariant="-tex-caligraphic">P</mml:mi></mml:math></inline-formula>&#x000A0;&#x02190;&#x000A0;{<italic>Pos</italic><sub><italic>unique</italic></sub>, <italic>Neg</italic><sub><italic>unique</italic></sub>, <italic>PN</italic><sub><italic>common</italic></sub>}</monospace></td></tr>
<tr><td align="left" valign="top"><monospace><bold>Step 5</bold>. Return frequent patterns for downstream tasks</monospace></td></tr>
<tr><td align="left" valign="top"><monospace>return</monospace> <inline-formula><mml:math id="M16"><mml:mi mathvariant="-tex-caligraphic">P</mml:mi></mml:math></inline-formula></td></tr> 
</tbody>
</table>
</table-wrap>
 <p>Conceptually, our algorithm is straightforward and exploits the dichotomy of outcomes. In Step 1, we split the sequences into positive, <italic>Pos</italic>, and negative <italic>Neg</italic> sets. In Step 2, we apply CSPM on each group separately subject to minimum frequency, &#x003B8;<sub>&#x022A4;</sub> or &#x003B8;<sub>&#x022A5;</sub>, while satisfying constraints, <inline-formula><mml:math id="M17"><mml:msubsup><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>y</mml:mi><mml:mi>p</mml:mi><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x022A4;</mml:mo></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>&#x000B7;</mml:mo></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> or <inline-formula><mml:math id="M18"><mml:msubsup><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>y</mml:mi><mml:mi>p</mml:mi><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x022A5;</mml:mo></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>&#x000B7;</mml:mo></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>. Notice that frequent patterns found might overlap. Therefore, we perform a set difference operation in each direction. This allows us to distinguish between recurring patterns that <italic>uniquely</italic> identify the positive and negative populations. The outputs <inline-formula><mml:math id="M19"><mml:mi mathvariant="-tex-caligraphic">P</mml:mi></mml:math></inline-formula> of the DPM algorithm have three sets of patterns, including the frequent patterns that are unique to positive observations, <italic>Pos</italic><sub><italic>unique</italic></sub>, the frequent patterns that are unique to negative observations, <italic>Neg</italic><sub><italic>unique</italic></sub>, and the frequent patterns that are common on both cases, <italic>PN</italic><sub><italic>common</italic></sub>. As outlined in Section 5, downstream ML models can leverage these patterns as input.</p>
<sec>
<title>4.1. Benefits of Dichotomic Pattern Mining</title>
<p>The traditional approach to identify recurring patterns consists of applying (C)SPM on the entire sequence database including both positive and negative sequences. <xref ref-type="fig" rid="F1">Figure 1</xref> illustrates a comparison between different pattern mining approaches as in traditional CSPM vs. our proposal for DPM.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Traditional CSPM that operates on the entire sequence database vs. our DPM approach that exploits the dichotomy of outcomes to distinguish unique positive and negative patterns.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-05-868085-g0001.tif"/>
</fig>
<p>The traditional CSPM serves as a baseline approach while DPM offers several benefits when compared to its traditional counterpart. First of all, in many practical scenarios, the classes of sequences are highly imbalanced. As such, setting a single static minimum frequency threshold for pattern mining on the entire sequence database is misleading. More concretely, setting a higher frequency threshold leads to the exclusion of useful recurring patterns in the minor class, whereas a lower threshold keeps too many redundant and arbitrary patterns from the major class. In both cases, the quality of patterns will be negatively impacted.</p>
<p>Moreover, DPM is an expressive approach to pattern mining in which constraint models for each class label can be configured specifically. Thus it seeks patterns that represent each outcome more significantly. Finally, from a scalability perspective, the traditional approach must run on the entire dataset as a single batch. Contrarily, decomposing the dataset eases the mining tasks. In fact, the dichotomic CSPMs can be executed in parallel. Computational complexity of DPM is therefore dependent on the CSPM algorithm applied to the sequence database, which depends on the size of the database, the number of constraints, or the number of attributes (Pei et al., <xref ref-type="bibr" rid="B33">2007</xref>; Hosseininasab et al., <xref ref-type="bibr" rid="B23">2019</xref>). Overall, we expect DPM to be more efficient in pattern mining (since it is applied to smaller datasets thanks to dichotomy) and DPM patterns to be more effective (since it captures dichotomy uniquely) in downstream consumption.</p>
</sec>
</sec>
<sec id="s5">
<title>5. DPM as an Integration Technology</title>
<p>For sequential pattern mining with constraints, our tool <monospace>Seq2Pat</monospace> is readily available for mining applications that deal with data encoded as sequences of symbols. For continuous sequences, such as time series, discretization can be performed (Lin et al., <xref ref-type="bibr" rid="B28">2007</xref>; Fournier-Viger et al., <xref ref-type="bibr" rid="B17">2017</xref>). Previous work presents successful applications using streaming data from MSNBC, an e-commerce website, and online news. These are considerably large benchmarks with 900 K sequences of length more than 29 K, containing up to 40 K items (Hosseininasab et al., <xref ref-type="bibr" rid="B23">2019</xref>).</p>
<p>As illustrated in <xref ref-type="fig" rid="F2">Figure 2</xref>, going beyond pattern mining, we envision DPM as an <italic>integration technology</italic> to enable other AI applications. DPM can be used to capture data characteristics for downstream AI models. DPM generates succinct representations from large volumes of data, e.g., digital clickstream activity. The patterns found then become consumable for subsequent machine learning models and pattern analysis. This generic process alleviates manual feature engineering and automates feature generation. Thus, our algorithm serves as an integration block between pattern mining algorithms and the subsequent learning task. In the next section, we present a demonstration of this integration for digital behavior analysis.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>High-level system architecture for dichotomic pattern mining, embedded with sequence-to-pattern generation, as an integration technology between raw sequential data, e.g., clickstream, pattern analysis, pattern-to-feature generation, and machine learning models for downstream prediction tasks.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-05-868085-g0002.tif"/>
</fig>
<p>Note that the output of DPM as shown in <xref ref-type="table" rid="T6">Algorithm 1</xref> is a set of frequent patterns, <inline-formula><mml:math id="M20"><mml:mi mathvariant="-tex-caligraphic">P</mml:mi></mml:math></inline-formula>, that provides insights into how the sequential behavior varies between populations. Using <inline-formula><mml:math id="M21"><mml:mi mathvariant="-tex-caligraphic">P</mml:mi></mml:math></inline-formula>, we learn new representations for sequences using sequence-to-pattern generation. Next, to create a feature vector for each sequence, we use pattern-to-feature generation to encode sequences. A typical featurization approach is one-hot encoding to indicate the existence of patterns. We further discuss these components on pattern and feature generation in our computation experiments. Overall, our end-to-end system architecture yields an automated feature extraction process that is generic and independent of the machine learning models applied to pattern embeddings. In the next section, we cover the case for digital behavior analysis whereby DPM acts as the integrator in real-world scenarios.</p>
</sec>
<sec id="s6">
<title>6. Digital Behavior Analysis</title>
<p>The application focus is on leveraging DPM to analyze digital behavior, and in particular, the analysis of large volumes of digital clickstream data. Clickstream data is ubiquitous, possess unique properties such as real-time and sequential ordering, and is challenging to work with as a streaming data source. This type of dataset is classified as <italic>semi-structured data</italic>. On the one hand, the clickstream data provides <italic>unstructured text</italic>, such as web pages. On the other hand, it yields sequential information where visits can be viewed as <italic>structured event streams</italic> representing customer journeys. Given the clickstream behavior of a set of users, we are interested in two specific aspects ranging from population-level to individual-level information extraction. At the population level, we are interested in finding the most frequent clickstream patterns across all users subject to a set of properties of interest. At the individual level, we are interested in downstream tasks such as intent prediction. Finally, an overarching theme over both levels is the interpretability of the results.</p>
<p>Our contribution to digital behavior analysis is to show that (i) constrained-based sequential pattern mining is effective in extracting knowledge from semi-structured data and (ii) DPM serves as an integration technology to enable downstream applications while retaining interpretability. The main idea behind our approach is first to capture the population characteristics and extract succinct representations from large volumes of digital clickstream activity using DPM. Then, the patterns found by DPM become consumable in machine learning models. Overall, our generic framework alleviates manual feature engineering, automates feature generation, and improves the performance of downstream tasks.</p>
<p>To demonstrate our approach, we explore two real-world clickstream datasets with positive and negative intents for product purchases based on shopping activity and intrusion behavior based on web visits. We apply our framework to find the most frequent patterns in digital activity and then leverage them in machine learning models for intent and intrusion prediction. Finally, we show how to extract high-level signals from patterns of interest.</p>
</sec>
<sec id="s7">
<title>7. Computational Experiments</title>
<p>In the following, we outline the goal of our experiments, describe the data (Section 7.1), the CSPM constraint models for sequence-to-pattern generation using <monospace>Seq2Pat</monospace> (Section 7.2), pattern-to-feature generation (Section 7.3), and downstream prediction models.</p>
<p>We present numeric results that compare the prediction accuracy on both customer intent prediction and intrusion datasets experimenting with simple to complex machine learning models. We also stress test the runtime behavior of pattern mining as a function of the number of patterns and constraints (Section 7.5.3). Finally, we study feature importance to drive insights and explanations from auto-generated features (Section 7.6).</p>
<p>To demonstrate the effectiveness of our DPM framework, we consider the following research questions:</p>
<list list-type="simple">
<list-item><p>Q1: How does DPM perform when compared to a standalone CSPM that does not exploit the dichotomy in outcomes?</p></list-item>
<list-item><p>Q2: How does DPM compare with state-of-the-art methods, such as LSTMs, that are specifically designed for sequential data?</p></list-item>
<list-item><p>Q3: How effective is DPM as an integration technology when the patterns it generates are used in prediction tasks?</p></list-item>
</list>
<p>To answer the above questions, we design two sets of experiments. In the first set of experiments, we compare the performance of DPM and the traditional CSPM applied to the entire dataset. In this controlled setup, the patterns found by each method is fed into a Logistic Regression (Cox, <xref ref-type="bibr" rid="B14">1958</xref>) to make downstream predictions. We consider a simple modeling approach, such as Logistic Regression, to make the difference between the two approaches more pronounced. As mentioned in Section 4.1, due to class imbalance, variability in minimum frequency, and the distinction between constraint models, we expect DPM to be superior to CSPM in terms of prediction performance.</p>
<p>In the second set of experiments, we consider an array of machine learning models with different generalization capacities, including the state-of-the-art LSTM architecture from Requena et al. (<xref ref-type="bibr" rid="B35">2020</xref>). This allows us to quantify the best possible prediction performance achieved by DPM and position with respect to existing work. This also measures the effectiveness of our approach as an integration technology as well as an automated feature extractor from raw clickstream datasets. In these experiments, we expect DPM to be competitive, if not superior, to the previous literature.</p>
<sec>
<title>7.1. Clickstream Datasets</title>
<p>We consider two digital clickstream datasets, namely the e-commerce shopper intent prediction dataset (Requena et al., <xref ref-type="bibr" rid="B35">2020</xref>) for predicting the purchase intent at the end of a browsing session and intruder detection dataset (Kahn et al., <xref ref-type="bibr" rid="B24">2016</xref>) for predicting whether the web session is made by an intruder.</p>
<sec>
<title>7.1.1. Shopper Intent Prediction Dataset</title>
<p>The dataset contains rich clickstream behavior on online users browsing a popular fashion e-commerce website (Requena et al., <xref ref-type="bibr" rid="B35">2020</xref>). It consists of 203,084 shoppers&#x00027; click sequences. There are 8,329 sequences with at least one purchase, while 194,755 sequences lead to no purchase. The sequences are composed of symbolized events as shown in <xref ref-type="table" rid="T2">Table 2</xref> with length <italic>L</italic> between the range 5 &#x02264; <italic>L</italic> &#x02264; 155. Sequences leading to purchase are labeled as positive (&#x0002B;1); otherwise, labeled as negative (0), resulting in a binary intent classification problem.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>The symbols used to depict clickstream events in e-commerce shopper intent prediction dataset.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Symbol</bold></th>
<th valign="top" align="left"><bold>Event</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1</td>
<td valign="top" align="left">Page view</td>
</tr>
<tr>
<td valign="top" align="left">2</td>
<td valign="top" align="left">Detail (see product page)</td>
</tr>
<tr>
<td valign="top" align="left">3</td>
<td valign="top" align="left">Add (add product to cart)</td>
</tr>
<tr>
<td valign="top" align="left">4</td>
<td valign="top" align="left">Remove (remove product from cart)</td>
</tr>
<tr>
<td valign="top" align="left">5</td>
<td valign="top" align="left">Purchase</td>
</tr>
<tr>
<td valign="top" align="left">6</td>
<td valign="top" align="left">Click (click on result after search)</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>7.1.2. Intruder Detection Dataset</title>
<p>The dataset (Kahn et al., <xref ref-type="bibr" rid="B24">2016</xref>) consists of sequences of web pages visited by users. In addition to this, it contains labels denoting whether the web sessions were by genuine users or intruders. It has 253,561 click sequences of users out of which 250,098 instances have more than 1 click per sequence. 247,804 sequences of genuine users are labeled as positive (&#x0002B;1); the remaining 2,294 sequences are labeled as negative (0), resulting in a binary intrusion classification problem.</p>
</sec>
</sec>
<sec>
<title>7.2. Sequence-To-Pattern Generation</title>
<p>The Step 2 in <xref ref-type="table" rid="T6">Algorithm 1</xref> leaves the choice of the data mining approach to extract frequent patterns open. As mentioned earlier, in this step, we utilize the state-of-the-art <monospace>Seq2Pat</monospace> (Wang et al., <xref ref-type="bibr" rid="B37">2022</xref>) to find sequential patterns that occur frequently. <monospace>Seq2Pat</monospace> takes advantage of the multi-valued decision diagram representation of sequences (Hosseininasab et al., <xref ref-type="bibr" rid="B23">2019</xref>) and offers a declarative modeling language to support constraint reasoning. We developed <monospace>Seq2Pat</monospace> in a unique collaboration between academia and industry for large-scale mining operations to meet enterprise requirements and shared it with the community as an open-source library. The library is written in <monospace>Cython</monospace> to bring together the efficiency of a low-level <monospace>C&#x0002B;&#x0002B;</monospace> backend and the expressiveness of a high-level Python public interface (Behnel et al., <xref ref-type="bibr" rid="B7">2011</xref>). Equipped with <monospace>Seq2Pat</monospace>, we next declare our constraint model, <italic>C</italic><sub><italic>type</italic></sub>(&#x000B7;), to specify patterns of interest.</p>
<p><xref ref-type="fig" rid="F3">Figure 3</xref> present the CSPM constraint model using the exact <monospace>Seq2Pat</monospace> implementation on the intent prediction dataset. The clickstream data serves as the sequence database (Line 3). In addition, we have two attributes for each event: the order in a sequence, <inline-formula><mml:math id="M22"><mml:mrow><mml:msub><mml:mi mathvariant="-tex-caligraphic">A</mml:mi><mml:mrow><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> (Line 5), and the dwell time on a page, <inline-formula><mml:math id="M23"><mml:mrow><mml:msub><mml:mi mathvariant="-tex-caligraphic">A</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> (Line 6). Next, the <monospace>Seq2Pat</monospace> engine is created over the sequence database (Line 9). We then encapsulate this order and timestamp information in <italic>Attribute</italic> objects (Line 12&#x02013;13) so that the user can interact with the raw data. The attribute object allows reasoning about the properties of the pattern. We enforce two constraints to seek interesting patterns. The first condition (Line 16) restricts the span of event order in a pattern to be &#x02264; 9. This ensures the maximum length of a pattern is 10. This condition is added to the system as a <italic>constraint</italic> together with the average constraint on time (Line 17). The average constraint seeks page views where customers spend at least 20 s. More precisely, we set <italic>C</italic><sub><italic>span</italic></sub>(<inline-formula><mml:math id="M24"><mml:mrow><mml:msub><mml:mi mathvariant="-tex-caligraphic">A</mml:mi><mml:mrow><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>) &#x02264; 9 and <italic>C</italic><sub><italic>avg</italic></sub>(<inline-formula><mml:math id="M25"><mml:mrow><mml:msub><mml:mi mathvariant="-tex-caligraphic">A</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>)&#x02265;20<sub>(<italic>sec</italic>)</sub>. Finally, we set the minimum frequency threshold &#x003B8; as the 30% of the total number of sequences for shopper intent prediction.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p><monospace>Seq2Pat</monospace> model to enforce sequential pattern mining with constraints on the clickstream dataset in fashion e-commerce.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-05-868085-g0003.tif"/>
</fig>
<p>For the intruder detection dataset, the <monospace>Seq2Pat</monospace> constraint model for the intruder detection is almost identical, hence we omit the full implementation details. In this dataset, we seek for patterns of size &#x02264; 10, where the average time spent is at least 0.02 s, and the minimum frequency threshold is set to 0.1% for intruder detection.</p>
<p>Let us note that we find these values upon exploratory data analysis on the distribution of the datasets. Notice that these settings are not hyper-parameters to tune but rather characteristics of the datasets and a reflection of modeling preferences when seeking patterns of interest.</p>
<p>With the constraint model in <xref ref-type="fig" rid="F3">Figure 3</xref>, <monospace>Seq2Pat</monospace> finds 457 frequent patterns in purchase sequences, <italic>Pos</italic><sub><italic>frequent</italic></sub>, and 236 frequent patterns from the non-purchase sequences, <italic>Neg</italic><sub><italic>frequent</italic></sub>, from the e-commerce shopper digital clickstreams. On intruder detection dataset, <monospace>Seq2Pat</monospace> finds 2006 frequent patterns in intruder sequences and 1006 frequent patterns in non-intruder sequences. There exists some overlap between <italic>Pos</italic><sub><italic>frequent</italic></sub> and <italic>Neg</italic><sub><italic>frequent</italic></sub> on both datasets.</p>
</sec>
<sec>
<title>7.3. Pattern-To-Feature Generation</title>
<p>Our next task is to convert frequent patterns into features that can be consumed by prediction models. As shown in the Venn diagram in <xref ref-type="fig" rid="F1">Figure 1</xref>, DPM framework resorts to finding the unique patterns and the intersection from positive and negative outcomes.</p>
<p>In shopper intent prediction, when the sets of patterns from purchaser and non-purchaser are compared, we find 244 unique purchaser patterns, <italic>Pos</italic><sub><italic>unique</italic></sub>, and 23 unique non-purchaser patterns, <italic>Neg</italic><sub><italic>unique</italic></sub>. The groups share 213 patterns in common. In combination, we have 480 unique patterns <italic>PN</italic><sub><italic>union</italic></sub>. To transform 480 unique patterns into a feature space, we consider a binary representation <italic>via</italic> one-hot encoding. For each sequence, we create a 480-dimensional feature vector with a binary indicator to denote the existence of a pattern.</p>
<p>For intruder detection, we follow the similar feature generation procedure, but find 1,894 unique intruder patterns, 894 unique non-intruder patterns, with 112 patterns being in common. In combination, we have 2,900 unique patterns to create the binary indicator in the second dataset.</p>
</sec>
<sec>
<title>7.4. [<bold>Q1</bold>] Comparison of DPM With CSPM</title>
<p>Let us start by quantifying the added value of DPM as opposed to applying CSPM on the entire dataset which serves as a baseline. Let us note that CSPM is a strong baseline. CSPM already incorporates sequential information and constraint-reasoning. As such, it is an advancement over classical pattern mining approaches, which we would like to improve further with DPM.</p>
<p>In this set of experiments, a Logistic Regression (LR) model is used for the shopper intent prediction and the intruder detection tasks. In setting this baseline, we consider a standard (logistic) regression model so that we can attribute the difference in the results to the choice of pattern mining approach. We first generate patterns from sequences, as explained in Section 7.2, and then generate features from patterns, as explained in Section 7.3. Then, the LR model is trained using either the extracted features by DPM or CSPM. The goal of this experiment is to demonstrate that DPM provides the necessary flexibility in the mining process, especially when sequences are largely imbalanced in classes. This is inherently the case in most practical scenarios such as we have many more visitors than purchasers in fashion e-commerce and only a fraction of intruders within regular access patterns. DPM helps to find patterns that are significantly presented in both major and minor classes, meanwhile it effectively excludes redundant patterns. Thus DPM is expected to be superior to standalone CSPM in supporting the downstream modeling.</p>
<sec>
<title>7.4.1. Prediction Model</title>
<p>We train a LR model on each of the two used datasets. We use 80% of the data as the train set and 20% as the test set and repeat this split 10 times for robustness. We compare the average results for each model based on Precision, Recall, F1 score, and the area under the ROC curve, aka AUC.</p>
<sec>
<title>7.4.1.1. Hyper-Parameter Tuning</title>
<p>On both datasets, we apply three-fold cross-validation for hyper-parameter tuning using the train set. We apply grid search on the regularization parameter C [0.001, 0.01, 0.1, 1, 10, 100, 1,000] for LR model. On shopper intent prediction dataset, a final parameter C is set to be 0.01 for the data using traditional CSPM features (referred to as <monospace>LR_CSPM</monospace>) and 0.1 for the data using DPM based features (referred to as <monospace>LR_DPM</monospace>) since these values provide the best performance in terms of AUC. On the intruder detection dataset, C has been set to be 100 and 0.1 for <monospace>LR_CSPM</monospace> and <monospace>LR_DPM</monospace>, respectively.</p>
</sec>
</sec>
<sec>
<title>7.4.2. Prediction Performance of DPM and CSPM Features With Logistic Regression</title>
<p><xref ref-type="table" rid="T3">Table 3</xref> summarizes the averaged performance and standard deviation of the LR model for <monospace>LR_CSPM</monospace> and <monospace>LR_DPM</monospace>. The best performance is marked in bold. We set the minimum frequency threshold to be 30% for shopper intent prediction and 0.1% for intruder detection. Notice that LR model is not able to utilize the sequential information of input by default. The auto-generated features by applying DPM or standalone CSPM enable this simple model to tackle the sequential knowledge. On both shopping and intrusion datasets, <monospace>LR_DPM</monospace> consistently outperforms <monospace>LR_CSPM</monospace> on <italic>all metrics by a significant margin</italic>. This highlights the effectiveness of DPM to exploit unique patterns in different classes.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Comparison of averaged intent classification performance by using Logistic Regression model on CSPM and DPM features over 10 random Train-Test splits.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Model</bold></th>
<th valign="top" align="center"><bold>Precision(%)</bold></th>
<th valign="top" align="center"><bold>Recall(%)</bold></th>
<th valign="top" align="center"><bold>F1(%)</bold></th>
<th valign="top" align="center"><bold>AUC(%)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="5"><bold>Shopper intent prediction dataset</bold></td>
</tr>
<tr>
<td valign="top" align="left"><monospace>LR_CSPM</monospace></td>
<td valign="top" align="center">14.04 (&#x000B1;0.56)</td>
<td valign="top" align="center">36.25 (&#x000B1;4.59)</td>
<td valign="top" align="center">20.15 (&#x000B1;0.74)</td>
<td valign="top" align="center">78.67 (&#x000B1;0.38)</td>
</tr>
<tr>
<td valign="top" align="left"><monospace>LR_DPM</monospace></td>
<td valign="top" align="center"><bold>42.15</bold> (&#x000B1;1.5)</td>
<td valign="top" align="center"><bold>63.22</bold> (&#x000B1;3.93)</td>
<td valign="top" align="center"><bold>50.47</bold> (&#x000B1;0.9)</td>
<td valign="top" align="center"><bold>94.28</bold> (&#x000B1;0.17)</td>
</tr>
<tr>
<td valign="top" align="left" colspan="5"><bold>Intruder detection dataset</bold></td>
</tr>
<tr>
<td valign="top" align="left"><monospace>LR_CSPM</monospace></td>
<td valign="top" align="center">6.10 (&#x000B1;1.18)</td>
<td valign="top" align="center">13.90 (&#x000B1;5.04)</td>
<td valign="top" align="center">8.07 (&#x000B1;1.12)</td>
<td valign="top" align="center">76.98 (&#x000B1;0.85)</td>
</tr>
<tr>
<td valign="top" align="left"><monospace>LR_DPM</monospace></td>
<td valign="top" align="center"><bold>30.87</bold> (&#x000B1;6.21)</td>
<td valign="top" align="center"><bold>22.81</bold> (&#x000B1;5.02)</td>
<td valign="top" align="center"><bold>25.25</bold> (&#x000B1;1.32)</td>
<td valign="top" align="center"><bold>87.11</bold> (&#x000B1;0.57)</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>For shopper intent prediction, DPM identifies 480 patterns while standalone CSPM finds only 236 patterns. Standalone CSPM is applied on the entire data and sequence classes are largely imbalanced with only 8,329 out of 203,084 sequences being positive. As a result, a single frequency threshold becomes too high to include the patterns that are frequent in positive sequences. Instead, most mined patterns are restricted to be the ones from negative sequences. Overall, patterns mined from the entire dataset does not transform into predictive features for intent classification. DPM overcomes such performance deterioration by exploiting the structure of the data.</p>
<p>Alternatively, we have the option to lower the minimum frequency to include more patterns, such as by setting the threshold in standalone CSPM to be 30% of positive sequences, i.e., 0.01% of the entire set of sequences. Then we obtain 4,615 patterns which bring redundant and arbitrary patterns from negative sequences since the threshold becomes too low. The performance is again in favor of DPM. For this dataset, we conclude that setting a single static frequency threshold, a high or low value, misguides the feature generation and adversely impacts the performance of the downstream prediction task.</p>
<p>For the intruder detection dataset, we obtain similar results as shown in <xref ref-type="table" rid="T3">Table 3</xref>, The number of patterns decreased from 2,900 by applying DPM to 988 by standalone CSPM with a minimum frequency threshold being 0.1%. DPM dominates CSPM performance across all metrics by a large margin again.</p>
<p>To conclude Q1, based on these experiments, we find that DPM is superior to standalone CSPM. Before proceeding with further experiments that combine DPM with more sophisticated models, let us examine the efficiency in terms of runtime in the next section.</p>
</sec>
<sec>
<title>7.4.3. Runtime Performance</title>
<p>We report runtime performance of pattern mining on a machine with Linux RHEL7 OS, 16-core 2.2 GHz CPU, and 64 GB of RAM. We apply <monospace>Seq2Pat</monospace> to implement DPM and standalone CSPM. We impose the constraints as described in Section 7.2. For shopper intent prediction, the runtime of DPM is 117.49 s on positive sequences and 2088.34 s on negative sequences. Standalone CSPM runs for 2136.09 s. The runtime results are comparable since it is mostly dominated by the size of the majority class. For intruder detection, DPM runs for 0.2 s on positive sequences and 15.21 s on negative sequences while CSPM runs for 13.82 s. Overall, both DPM and CSPM achieve similar runtime and they both scale to sequence databases with 200,000&#x0002B; and 250,000&#x0002B; sequences. This is thanks to the underlying <monospace>Seq2Pat</monospace> library for an efficient implementation of constraint-based sequential pattern mining.</p>
</sec>
</sec>
<sec>
<title>7.5. <bold>[Q2 and Q3]</bold> DPM as an Integration Technology and Comparison With the State-Of-The-Art in Prediction Performance</title>
<p>The goal of our next set of experiments is to quantify the effectiveness of DPM as an integration technology and an automated feature extractor. For that purpose, we consider an array of machine learning models that are more sophisticated than Logistic Regression and exhibit varying generalization capacity.</p>
<sec>
<title>7.5.1. Prediction Models</title>
<p>As shown in <xref ref-type="table" rid="T4">Table 4</xref>, we consider four different models with different strengths using <monospace>CSPM</monospace> or <monospace>DPM</monospace> features either in standalone or in combination with <monospace>LSTM</monospace>.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>The set of machine learning models considered and their feature space.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Model</bold></th>
<th valign="top" align="left"><bold>Features space</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><monospace>LightGBM_{CSPM, DPM}</monospace></td>
<td valign="top" align="left">CSPM or DPM features</td>
</tr>
<tr>
<td valign="top" align="left"><monospace>Shallow_NN_{CSPM, DPM}</monospace></td>
<td valign="top" align="left">CSPM or DPM features</td>
</tr>
<tr>
<td valign="top" align="left"><monospace>LSTM</monospace></td>
<td valign="top" align="left">Clickstream Data</td>
</tr>
<tr>
<td valign="top" align="left"><monospace>LSTM&#x0002B;{CSPM, DPM}</monospace></td>
<td valign="top" align="left">Clickstream data &#x0002B; CSPM or DPM patterns</td>
</tr>
</tbody>
</table>
</table-wrap>
<list list-type="order">
<list-item><p><monospace>LightGBM</monospace> light gradient boosting machines (Ke et al., <xref ref-type="bibr" rid="B25">2017</xref>).</p></list-item>
<list-item><p><monospace>Shallow_NN</monospace> shallow neural network using one hidden layer.</p></list-item>
<list-item><p><monospace>LSTM</monospace> The state-of-the-art long short-term memory network (Hochreiter and Schmidhuber, <xref ref-type="bibr" rid="B22">1997</xref>) from Requena et al. (<xref ref-type="bibr" rid="B35">2020</xref>) that uses input sequences as-is. <monospace>LSTM</monospace> applies one hidden layer on the output of the last layer followed by a fully connected layer to make intent prediction.</p></list-item>
<list-item><p><monospace>LSTM&#x0002B;{CSPM,DPM}</monospace> The <monospace>LSTM</monospace> model boosted with pattern embeddings from <monospace>CSPM</monospace> or <monospace>DPM</monospace>. The model uses the same architecture with <monospace>LSTM</monospace>, with the only difference being that pattern based features are concatenated to the output of <monospace>LSTM</monospace> and are used together as input of the hidden layer.</p></list-item>
</list>
<p><xref ref-type="table" rid="T4">Table 4</xref> also shows the different feature spaces used by the compared models. Note models such as <monospace>LightGBM</monospace> and <monospace>Shallow_NN</monospace> cannot operate on semi-structured clickstream data since they cannot accommodate recurrent sequential relationships. Contrarily, more sophisticated architectures, such as <monospace>LSTM</monospace> can work directly with the input. For the former, our approach allows these relatively simpler models to work with sequence data. For the latter, our approach augments advanced models by incorporating pattern embeddings into the feature space. We use the same train-test split as described in the first set of experiments and repeat it 10 times for robustness.</p>
<sec>
<title>7.5.1.1. Hyper-Parameter Tuning</title>
<p>Similar to the parameter tuning in previous experiment, we apply three-fold cross-validation for hyper-parameter tuning using the train set. We apply grid search on the number of iterations [400, 600, 800, 1,000] for <monospace>LightGBM</monospace>, number of nodes in the hidden layer [32, 64, 128, 256, 512] for <monospace>Shallow_NN</monospace> and <monospace>LSTM</monospace> models, number of <monospace>LSTM</monospace> units [16, 32, 64, 128]. We use 10% of train set as a validation set to determine if training meets early stop condition. When the loss on validation set stops decreasing steadily, training is terminated. The validation set is used to determine a decision boundary on the predictions for the highest F1 score. The final parameters for shopper intent prediction models are 400 iterations for <monospace>LightGBM</monospace>, 64 nodes for <monospace>shallow_NN</monospace>, 32 <monospace>LSTM</monospace> units in <monospace>LSTM</monospace> models with 64 and 128 nodes in hidden layers. Intruder detection has slightly different parameters where <monospace>LightGBM</monospace> still gets 400 iterations, but the tuning results have 32 nodes for <monospace>shallow_NN</monospace>, 16 and 32 <monospace>LSTM</monospace> units in <monospace>LSTM</monospace> and <monospace>LSTM&#x0002B;{CSPM, DPM}</monospace> model, respectively, with each having 32 nodes in hidden layers.</p>
</sec>
</sec>
<sec>
<title>7.5.2. Prediction Performance With Sophisticated Models</title>
<p><xref ref-type="table" rid="T5">Table 5</xref> presents the average results that compare the performance of the models on the two datasets. The best performance is marked in bold. For feature space, we either use the patterns found by <monospace>CSPM</monospace> or <monospace>DPM</monospace>, the original clickstream events in the raw data, or their combination. For shopper intent prediction, using auto-generated <monospace>DPM</monospace> features, <monospace>LightGBM</monospace>, and <monospace>Shallow_NN</monospace> models achieve a performance that closely match the results given in the reference work (Requena et al., <xref ref-type="bibr" rid="B35">2020</xref>). The difference is, models in Requena et al. (<xref ref-type="bibr" rid="B35">2020</xref>) use hand-crafted features, while we automate the feature generation process here. Notice that without using clickstream events, the performance of both two models using <monospace>DPM</monospace> features significantly outperform those using <monospace>CSPM</monospace> features. When a more sophisticated model such as <monospace>LSTM</monospace> is used, it outperforms <monospace>LightGBM</monospace> and <monospace>Shallow_NN</monospace>. When the <monospace>LSTM</monospace> model is combined with <monospace>CSPM</monospace> or <monospace>DPM</monospace>, <monospace>LSTM&#x0002B;CSPM</monospace> achieves the best Precision among compared models while it achieves a similar performance with <monospace>LSTM</monospace> in terms of other metrics. Compared to <monospace>LSTM</monospace> and <monospace>LSTM&#x0002B;CSPM</monospace>, <monospace>LSTM&#x0002B;DPM</monospace> yield a substantial increase in Recall, and consequently, the highest F1 score. <monospace>LSTM&#x0002B;DPM</monospace> is also superior to others in terms of AUC.</p>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption><p>Comparison of averaged prediction performance by different methods over 10 random Train-Test splits.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Model</bold></th>
<th valign="top" align="center"><bold>Precision(%)</bold></th>
<th valign="top" align="center"><bold>Recall(%)</bold></th>
<th valign="top" align="center"><bold>F1(%)</bold></th>
<th valign="top" align="center"><bold>AUC(%)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="5"><bold>Shopper intent prediction dataset</bold></td>
</tr>
<tr>
<td valign="top" align="left"><monospace>LightGBM_CSPM</monospace></td>
<td valign="top" align="center">25.11 (&#x000B1;2.25)</td>
<td valign="top" align="center">33.34 (&#x000B1;3.52)</td>
<td valign="top" align="center">28.42 (&#x000B1;0.59)</td>
<td valign="top" align="center">83.81 (&#x000B1;0.43)</td>
</tr>
<tr>
<td valign="top" align="left"><monospace>LightGBM_DPM</monospace></td>
<td valign="top" align="center">44.70 (&#x000B1;1.92)</td>
<td valign="top" align="center">63.15 (&#x000B1;4.65)</td>
<td valign="top" align="center">52.20 (&#x000B1;0.65)</td>
<td valign="top" align="center">94.98 (&#x000B1;0.15)</td>
</tr>
<tr>
<td valign="top" align="left"><monospace>Shallow_NN_CSPM</monospace></td>
<td valign="top" align="center">22.62 (&#x000B1;2.06)</td>
<td valign="top" align="center">37.78 (&#x000B1;3.95)</td>
<td valign="top" align="center">27.18 (&#x000B1;0.59)</td>
<td valign="top" align="center">83.17 (&#x000B1;0.45)</td>
</tr>
<tr>
<td valign="top" align="left"><monospace>Shallow_NN_DPM</monospace></td>
<td valign="top" align="center">44.40 (&#x000B1;2.18)</td>
<td valign="top" align="center">64.11 (&#x000B1;4.57)</td>
<td valign="top" align="center">52.31 (&#x000B1;0.54)</td>
<td valign="top" align="center">95.00 (&#x000B1;0.17)</td>
</tr>
<tr>
<td valign="top" align="left"><monospace>LSTM</monospace></td>
<td valign="top" align="center">54.96 (&#x000B1; 1.77)</td>
<td valign="top" align="center">69.53 (&#x000B1;4.31)</td>
<td valign="top" align="center">61.28 (&#x000B1;0.95)</td>
<td valign="top" align="center">96.41 (&#x000B1;0.15)</td>
</tr>
<tr>
<td valign="top" align="left"><monospace>LSTM&#x0002B;CSPM</monospace></td>
<td valign="top" align="center"><bold>55.29</bold> (&#x000B1;1.78)</td>
<td valign="top" align="center">69.10 (&#x000B1;3.00)</td>
<td valign="top" align="center">61.36 (&#x000B1;0.83)</td>
<td valign="top" align="center">96.42 (&#x000B1;0.15)</td>
</tr>
<tr>
<td valign="top" align="left"><monospace>LSTM&#x0002B;DPM</monospace></td>
<td valign="top" align="center">54.35 (&#x000B1;2.40)</td>
<td valign="top" align="center"><bold>73.64</bold> (&#x000B1;4.70)</td>
<td valign="top" align="center"><bold>62.39</bold> (&#x000B1;0.81)</td>
<td valign="top" align="center"><bold>96.76</bold> (&#x000B1;0.12)</td>
</tr>
<tr>
<td valign="top" align="left" colspan="5"><bold>Intruder detection dataset</bold></td>
</tr>
<tr>
<td valign="top" align="left"><monospace>LightGBM_CSPM</monospace></td>
<td valign="top" align="center">7.27 (&#x000B1;1.27)</td>
<td valign="top" align="center">12.66 (&#x000B1;4.35)</td>
<td valign="top" align="center">8.94 (&#x000B1;1.39)</td>
<td valign="top" align="center">77.25 (&#x000B1;0.90)</td>
</tr>
<tr>
<td valign="top" align="left"><monospace>LightGBM_DPM</monospace></td>
<td valign="top" align="center">32.50 (&#x000B1;3.52)</td>
<td valign="top" align="center">23.31 (&#x000B1;3.84)</td>
<td valign="top" align="center">26.81 (&#x000B1;2.47)</td>
<td valign="top" align="center">86.35 (&#x000B1;0.72)</td>
</tr>
<tr>
<td valign="top" align="left"><monospace>Shallow_NN_CSPM</monospace></td>
<td valign="top" align="center">6.51 (&#x000B1;1.16)</td>
<td valign="top" align="center">13.34 (&#x000B1;4.14)</td>
<td valign="top" align="center">8.50 (&#x000B1;1.21)</td>
<td valign="top" align="center">77.30 (&#x000B1;0.87)</td>
</tr>
<tr>
<td valign="top" align="left"><monospace>Shallow_NN_DPM</monospace></td>
<td valign="top" align="center">39.55 (&#x000B1;6.15)</td>
<td valign="top" align="center">22.28 (&#x000B1;4.14)</td>
<td valign="top" align="center">27.94 (&#x000B1;2.46)</td>
<td valign="top" align="center">85.28 (&#x000B1;0.81)</td>
</tr>
<tr>
<td valign="top" align="left"><monospace>LSTM</monospace></td>
<td valign="top" align="center">42.62 (&#x000B1;6.19)</td>
<td valign="top" align="center">39.24 (&#x000B1;6.20)</td>
<td valign="top" align="center">40.31(&#x000B1;0.98)</td>
<td valign="top" align="center">95.20 (&#x000B1;0.36)</td>
</tr>
<tr>
<td valign="top" align="left"><monospace>LSTM&#x0002B;CSPM</monospace></td>
<td valign="top" align="center">42.20 (&#x000B1;5.49)</td>
<td valign="top" align="center"><bold>41.52</bold> (&#x000B1;3.64)</td>
<td valign="top" align="center"><bold>41.46</bold> (&#x000B1;1.64)</td>
<td valign="top" align="center"><bold>95.42</bold> (&#x000B1;0.47)</td>
</tr>
<tr>
<td valign="top" align="left"><monospace>LSTM&#x0002B;DPM</monospace></td>
<td valign="top" align="center"><bold>50.81</bold> (&#x000B1;8.24)</td>
<td valign="top" align="center">35.89 (&#x000B1;3.63)</td>
<td valign="top" align="center"><bold>41.46</bold> (&#x000B1;1.59)</td>
<td valign="top" align="center">95.14 (&#x000B1;0.26)</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>On the intruder detection dataset, again automatic feature extraction provides <monospace>LightGBM</monospace> and <monospace>Shallow_NN</monospace> an opportunity to tackle sequential input. As before, <monospace>LSTM</monospace> model using clickstream sequences significantly improves the prediction performance compared to <monospace>LightGBM</monospace> and <monospace>Shallow_NN</monospace>. Further, <monospace>LSTM&#x0002B;CSPM</monospace> performs better at Recall, Precision and slightly improves at AUC. Overall, <monospace>LSTM&#x0002B;DPM</monospace> achieves the best precision performance and the same highest F1 score as <monospace>LSTM&#x0002B;CSPM</monospace>.</p>
<p>To conclude Q2 and Q3, based on the results observed in two different datasets across four modeling choices, we conclude that the features extracted automatically <italic>via</italic> <monospace>DPM</monospace> boost ML models in the downstream task for intent and intrusion prediction, especially for the models such as <monospace>LightGBM</monospace> and <monospace>Shallow_NN</monospace> that lack the capability to operate on sequential clickstream data. Using <monospace>DPM</monospace> achieves significantly better performance than using <monospace>CSPM</monospace> for these two models. As for the model that can deal with sequence data such as <monospace>LSTM</monospace>, the improvement by using <monospace>DPM</monospace> over <monospace>CSPM</monospace> or only clickstream sequence becomes leveled off due to the sophistication of the model itself, which is as shown in the intruder detection task, while we observe a clear boost by using <monospace>DPM</monospace> features in the task for intent prediction.</p>
</sec>
<sec>
<title>7.5.3. Runtime Performance</title>
<p>Let us complement the model performance results with observations in the runtime as a function of different constraint models. We present the results for the fashion e-commerce dataset only since the findings are similar. We apply <monospace>Seq2Pat</monospace> on the positive set with 8,329 clickstream sequences. We impose the same types of constraint as described in Section 7.2 while we vary the constraint on the minimum average time spent on pages. To stress-test the runtime, we set the minimum frequency &#x003B8; &#x0003D; 2 which returns almost all the feasible patterns.</p>
<p><xref ref-type="fig" rid="F4">Figure 4</xref> shows the runtime in seconds (y-axis-left) and the number of patterns found (y-axis-right) as the average constraint increases (x-axis). As the constraint becomes harder to satisfy, the number of patterns goes down as expected. The runtime for the hardest case is &#x0007E;250 s while we observe speed-up as constraint reasoning becomes more effective.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>The runtime (y-axis-left) and the number of patterns found (y-axis-right) with varying constraints (x-axis) on the fashion e-commerce dataset. In this setting, the minimum frequency threshold, &#x003B8;, set to 2, to stress test runtime performance.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-05-868085-g0004.tif"/>
</fig>
</sec>
</sec>
<sec>
<title>7.6. Feature Importance</title>
<p>Finally, we study feature importance to drive high-level insights and explanations from auto-generated <monospace>DPM</monospace> features. We examine the Shapley value (Lundberg et al., <xref ref-type="bibr" rid="B29">2020</xref>) of features from the <monospace>LightGBM</monospace> model in the fashion e-commerce dataset.</p>
<p><xref ref-type="fig" rid="F5">Figure 5</xref> shows the top-20 features with the highest impact. Our observations match previous findings in Requena et al. (<xref ref-type="bibr" rid="B35">2020</xref>). The pattern &#x02329;3, 1, 1&#x0232A; provides the most predictive information, given that the symbol (3) stands for adding a product. Let us remind that <xref ref-type="table" rid="T2">Table 2</xref> describes the meaning of symbols. Repeated page views as in &#x02329;1, 1, 1, 1, 1, 1, 1&#x0232A;, or specific product views, &#x02329;2, 1, 1, 1&#x0232A; are indicative of purchase intent, whereas web exploration visiting many products, &#x02329;1, 1, 2, 1, 2&#x0232A;, are more negatively correlated to a purchase. Interestingly, searching actions &#x02329;6&#x0232A; have minimum impact on buying, raising questions about the quality of the search and ranking systems. Our frequent patterns also yield new insights not covered in the existing hand-crafted analysis. Most notably, we discover that removing a product but then remaining in the session for more views, &#x02329;4, 1, 1&#x0232A; is an important feature, positively correlated with a purchase. This scenario, where customers have specific product needs, hints at the missed business opportunity to create incentives such as prompting virtual chat or personalized promotions.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>SHAP values of auto-generated <monospace>Seq2Pat</monospace> features. Top-20 features ranked in descending importance. Color indicates high (in red) or low (in blue) feature value. Horizontal location indicates the correlation of the feature value to a high or low model prediction.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-05-868085-g0005.tif"/>
</fig>
</sec>
</sec>
<sec sec-type="conclusions" id="s8">
<title>8. Conclusion</title>
<p>Pattern mining is an essential part of data analytics and knowledge discovery from sequential databases. It is a powerful tool, especially when combined with constraint reasoning to specify desired properties. In this paper, we presented a simple procedure for Dichotomic Pattern Mining that operates over semi-structured clickstream datasets. The approach learns new representations of pattern embeddings. This representation enables simple models, which cannot handle sequential data by default, to predict from sequences. Moreover, it boosts the performance of more complex models with feature augmentation. Experiments on digital behavior analysis demonstrate that our approach is an effective integrator between automated feature generation and downstream tasks. Finally, as shown in our feature importance analysis, the representations we learn from pattern embeddings remain interpretable.</p>
<p>We have only considered the dichotomy between binary classes. Extending Dichtomic Pattern Mining to effectively deal with multi-class classification problems is a natural next direction. Additionally, sequence databases reach large-scales quickly. It remains open whether it is possible to design a <italic>distributed</italic> version of <monospace>CSPM</monospace> and <monospace>DPM</monospace> algorithms so that we can take advantage of modern architectures, including GPUs, to scale beyond datasets considered in our experiments.</p>
</sec>
<sec sec-type="data-availability" id="s9">
<title>Data Availability Statement</title>
<p>The dataset used in our experiments are public benchmarks that are available from the references therein and are commonly used in this line of research. Shopper Intent Prediction Dataset: <ext-link ext-link-type="uri" xlink:href="https://github.com/coveooss/shopper-intent-prediction-nature-2020">https://github.com/coveooss/shopper-intent-prediction-nature-2020</ext-link>. Intrusion Detection Dataset: <ext-link ext-link-type="uri" xlink:href="https://www.kaggle.com/danielkurniadi/catch-me-if-you-can">https://www.kaggle.com/danielkurniadi/catch-me-if-you-can</ext-link>.</p>
</sec>
<sec id="s10">
<title>Author Contributions</title>
<p>All authors contributed equally to this study and the write-up of the manuscript. All authors read and approved the final manuscript.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>SG, SY, XW, BC, and SK were employed by the company Fidelity Investments, United States.</p></sec>
<sec sec-type="disclaimer" id="s11">
<title>Publisher&#x00027;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Agrawal</surname> <given-names>R.</given-names></name> <name><surname>Srikant</surname> <given-names>R.</given-names></name></person-group> (<year>1995</year>). <article-title>Mining sequential patterns,</article-title> in <source>Proceedings of the Eleventh International Conference on Data Engineering</source>, <fpage>3</fpage>&#x02013;<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1109/ICDE.1995.380415</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aoga</surname> <given-names>J. O.</given-names></name> <name><surname>Guns</surname> <given-names>T.</given-names></name> <name><surname>Schaus</surname> <given-names>P.</given-names></name></person-group> (<year>2016</year>). <article-title>An efficient algorithm for mining frequent sequence with constraint programming,</article-title> in <source>Joint European Conference on Machine Learning and Knowledge Discovery in Databases</source>, <fpage>315</fpage>&#x02013;<lpage>330</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-46227-1_20</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aoga</surname> <given-names>J. O.</given-names></name> <name><surname>Guns</surname> <given-names>T.</given-names></name> <name><surname>Schaus</surname> <given-names>P.</given-names></name></person-group> (<year>2017</year>). <article-title>Mining time-constrained sequential patterns with constraint programming</article-title>. <source>Constraints</source> <volume>22</volume>, <fpage>548</fpage>&#x02013;<lpage>570</lpage>. <pub-id pub-id-type="doi">10.1007/s10601-017-9272-3</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Atzmueller</surname> <given-names>M.</given-names></name></person-group> (<year>2015</year>). <article-title>Subgroup discovery</article-title>. <source>Wiley Int. Rev. Data Min. Knowl. Disc</source>. <volume>5</volume>, <fpage>35</fpage>&#x02013;<lpage>49</lpage>. <pub-id pub-id-type="doi">10.1002/widm.1144</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ayres</surname> <given-names>J.</given-names></name> <name><surname>Flannick</surname> <given-names>J.</given-names></name> <name><surname>Gehrke</surname> <given-names>J.</given-names></name> <name><surname>Yiu</surname> <given-names>T.</given-names></name></person-group> (<year>2002</year>). <article-title>Sequential pattern mining using a bitmap representation,</article-title> in <source>Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>, <fpage>429</fpage>&#x02013;<lpage>435</lpage>. <pub-id pub-id-type="doi">10.1145/775047.775109</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bay</surname> <given-names>S. D.</given-names></name> <name><surname>Pazzani</surname> <given-names>M. J.</given-names></name></person-group> (<year>2001</year>). <article-title>Detecting group differences: mining contrast sets</article-title>. <source>Data Min. Knowl. Discov</source>. <volume>5</volume>, <fpage>213</fpage>&#x02013;<lpage>246</lpage>. <pub-id pub-id-type="doi">10.1023/A:1011429418057</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Behnel</surname> <given-names>S.</given-names></name> <name><surname>Bradshaw</surname> <given-names>R.</given-names></name> <name><surname>Citro</surname> <given-names>C.</given-names></name> <name><surname>Dalcin</surname> <given-names>L.</given-names></name> <name><surname>Seljebotn</surname> <given-names>D. S.</given-names></name> <name><surname>Smith</surname> <given-names>K.</given-names></name></person-group> (<year>2011</year>). <article-title>Cython: the best of both worlds</article-title>. <source>Comput. Sci. Eng</source>. <volume>13</volume>, <fpage>31</fpage>&#x02013;<lpage>39</lpage>. <pub-id pub-id-type="doi">10.1109/MCSE.2010.118</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bergman</surname> <given-names>D.</given-names></name> <name><surname>Cir&#x000E9;</surname> <given-names>A. A.</given-names></name> <name><surname>van Hoeve</surname> <given-names>W.</given-names></name> <name><surname>Hooker</surname> <given-names>J. N.</given-names></name></person-group> (<year>2016</year>). <article-title>Decision Diagrams for Optimization</article-title>. <source>Artificial Intelligence: Foundations, Theory, and Algorithms</source>. <publisher-name>Springer</publisher-name>. <pub-id pub-id-type="doi">10.1007/978-3-319-42849-9</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Bermingham</surname> <given-names>L.</given-names></name></person-group> (<year>2018</year>). <source>Sequential Pattern Mining Algorithm With DC-Span, CC-Span</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://github.com/lukehb/137-SPM">https://github.com/lukehb/137-SPM</ext-link> (accessed September 16, 2021).</citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bonchi</surname> <given-names>F.</given-names></name> <name><surname>Lucchese</surname> <given-names>C.</given-names></name></person-group> (<year>2005</year>). <article-title>Pushing tougher constraints in frequent pattern mining,</article-title> in <source>Proceedings of the 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining</source>, <fpage>114</fpage>&#x02013;<lpage>124</lpage>. <pub-id pub-id-type="doi">10.1007/11430919_15</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Borah</surname> <given-names>A.</given-names></name> <name><surname>Nath</surname> <given-names>B.</given-names></name></person-group> (<year>2018</year>). <article-title>FP-tree and its variants: towards solving the pattern mining challenges,</article-title> in <source>Proceedings of First International Conference on Smart System, Innovations and Computing</source>, eds <person-group person-group-type="editor"><name><surname>Somani</surname> <given-names>A. K.</given-names></name> <name><surname>Srivastava</surname> <given-names>S.</given-names></name> <name><surname>Mundra</surname> <given-names>A.</given-names></name> <name><surname>Rawat</surname> <given-names>S. R.</given-names></name></person-group> <fpage>535</fpage>&#x02013;<lpage>543</lpage>. <pub-id pub-id-type="doi">10.1007/978-981-10-5828-8_51</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bou Rjeily</surname> <given-names>C.</given-names></name> <name><surname>Badr</surname> <given-names>G.</given-names></name> <name><surname>Hajjarm El Hassani</surname> <given-names>A.</given-names></name> <name><surname>Andres</surname> <given-names>E.</given-names></name></person-group> (<year>2019</year>). <source>Medical Data Mining for Heart Diseases and the Future of Sequential Mining in Medical Field</source>. <publisher-name>Springer</publisher-name>, <fpage>71</fpage>&#x02013;<lpage>99</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-94030-4_4</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>E.</given-names></name> <name><surname>Cao</surname> <given-names>H.</given-names></name> <name><surname>Li</surname> <given-names>Q.</given-names></name> <name><surname>Qian</surname> <given-names>T.</given-names></name></person-group> (<year>2008</year>). <article-title>Efficient strategies for tough aggregate constraint-based sequential pattern mining</article-title>. <source>Information Sci</source>. <volume>178</volume>, <fpage>1498</fpage>&#x02013;<lpage>1518</lpage>. <pub-id pub-id-type="doi">10.1016/j.ins.2007.10.014</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cox</surname> <given-names>D. R.</given-names></name></person-group> (<year>1958</year>). <article-title>The regression analysis of binary sequences</article-title>. <source>J. R. Stat. Soci. Ser. B</source> <volume>20</volume>, <fpage>215</fpage>&#x02013;<lpage>232</lpage>. <pub-id pub-id-type="doi">10.1111/j.2517-6161.1958.tb00292.x</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Dagenais</surname> <given-names>B.</given-names></name></person-group> (<year>2016</year>). <source>Simple Algorithms for Frequent Item Set Mining</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://github.com/bartdag/pymining">https://github.com/bartdag/pymining</ext-link> (accessed September 16, 2021).<pub-id pub-id-type="pmid">30486809</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fournier-Viger</surname> <given-names>P.</given-names></name> <name><surname>Lin</surname> <given-names>C.</given-names></name> <name><surname>Gomariz</surname> <given-names>A.</given-names></name> <name><surname>Gueniche</surname> <given-names>T.</given-names></name> <name><surname>Soltani</surname> <given-names>A.</given-names></name> <name><surname>Deng</surname> <given-names>Z.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>The SPMF open-source data mining library version 2,</article-title> in <source>Proceedings of the 19th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2016) Part III</source>, <fpage>36</fpage>&#x02013;<lpage>40</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-46131-1_8</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Fournier-Viger</surname> <given-names>P.</given-names></name> <name><surname>Lin</surname> <given-names>J. C.-W.</given-names></name> <name><surname>Kiran</surname> <given-names>R.-U.</given-names></name> <name><surname>Koh</surname> <given-names>Y.-S.</given-names></name> <name><surname>Thomas</surname> <given-names>R.</given-names></name></person-group> (<year>2017</year>). <article-title>A survey of sequential pattern mining</article-title>. <source>Data Sci. Pattern Recogn</source>. <volume>1</volume>, <fpage>54</fpage>&#x02013;<lpage>77</lpage>. Retrieved from: <ext-link ext-link-type="uri" xlink:href="http://dspr.ikelab.net/category/vol1-2017/vol1-1-2017/">http://dspr.ikelab.net/category/vol1-2017/vol1-1-2017/</ext-link></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gan</surname> <given-names>W.</given-names></name> <name><surname>Lin</surname> <given-names>J. C.-W.</given-names></name> <name><surname>Fournier-Viger</surname> <given-names>P.</given-names></name> <name><surname>Chao</surname> <given-names>H.-C.</given-names></name> <name><surname>Yu</surname> <given-names>P. S.</given-names></name></person-group> (<year>2019</year>). <article-title>A survey of parallel sequential pattern mining</article-title>. <source>ACM Trans. Knowl. Discov. Data</source> <volume>13</volume>, <fpage>1</fpage>&#x02013;<lpage>34</lpage>. <pub-id pub-id-type="doi">10.1145/3314107</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Gao</surname> <given-names>C.</given-names></name></person-group> (<year>2019</year>). <source>Sequential Pattern Mining Algorithm With Prefixspan, Bide, and Feat</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://github.com/chuanconggao/PrefixSpan-py">https://github.com/chuanconggao/PrefixSpan-py</ext-link> (accessed September 16, 2021).</citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Garc&#x000ED;a-Vico</surname> <given-names>A.</given-names></name> <name><surname>Carmona</surname> <given-names>C.</given-names></name> <name><surname>Mart&#x000ED;n</surname> <given-names>D.</given-names></name> <name><surname>Garc&#x000ED;a-Borroto</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>An overview of emerging pattern mining in supervised descriptive rule discovery: taxonomy, empirical study, trends, and prospects</article-title>. <source>Wiley Interdiscip. Rev</source>. <volume>8</volume>:<fpage>e1231</fpage>. <pub-id pub-id-type="doi">10.1002/widm.1231</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guns</surname> <given-names>T.</given-names></name> <name><surname>Dries</surname> <given-names>A.</given-names></name> <name><surname>Nijssen</surname> <given-names>S.</given-names></name> <name><surname>Tack</surname> <given-names>G.</given-names></name> <name><surname>De Raedt</surname> <given-names>L.</given-names></name></person-group> (<year>2017</year>). <article-title>Miningzinc: a declarative framework for constraint-based mining</article-title>. <source>Artif. Intell</source>. <volume>244</volume>, <fpage>6</fpage>&#x02013;<lpage>29</lpage>. <pub-id pub-id-type="doi">10.1016/j.artint.2015.09.007</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hochreiter</surname> <given-names>S.</given-names></name> <name><surname>Schmidhuber</surname> <given-names>J.</given-names></name></person-group> (<year>1997</year>). <article-title>Long short-term memory</article-title>. <source>Neural Comput</source>. <volume>9</volume>, <fpage>1735</fpage>&#x02013;<lpage>1780</lpage>. <pub-id pub-id-type="doi">10.1162/neco.1997.9.8.1735</pub-id><pub-id pub-id-type="pmid">9377276</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hosseininasab</surname> <given-names>A.</given-names></name> <name><surname>van Hoeve</surname> <given-names>W.</given-names></name> <name><surname>Cir&#x000E9;</surname> <given-names>A. A.</given-names></name></person-group> (<year>2019</year>). <article-title>Constraint-based sequential pattern mining with decision diagrams,</article-title> in <source>The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019</source>, <fpage>1495</fpage>&#x02013;<lpage>1502</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v33i01.33011495</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kahn</surname> <given-names>G.</given-names></name> <name><surname>Loiseau</surname> <given-names>Y.</given-names></name> <name><surname>Raynaud</surname> <given-names>O.</given-names></name></person-group> (<year>2016</year>). <article-title>A tool for classification of sequential data,</article-title> in <source>FCA4AI&#x00040;ECAI</source>.</citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ke</surname> <given-names>G.</given-names></name> <name><surname>Meng</surname> <given-names>Q.</given-names></name> <name><surname>Finley</surname> <given-names>T.</given-names></name> <name><surname>Wang</surname> <given-names>T.</given-names></name> <name><surname>Chen</surname> <given-names>W.</given-names></name> <name><surname>Ma</surname> <given-names>W.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>LightGBM: a highly efficient gradient boosting decision tree,</article-title> in <source>Proceedings of the 31st International Conference on Neural Information Processing Systems</source>, <fpage>3149</fpage>&#x02013;<lpage>3157</lpage>.</citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kemmar</surname> <given-names>A.</given-names></name> <name><surname>Lebbah</surname> <given-names>Y.</given-names></name> <name><surname>Loudni</surname> <given-names>S.</given-names></name> <name><surname>Boizumault</surname> <given-names>P.</given-names></name> <name><surname>Charnois</surname> <given-names>T.</given-names></name></person-group> (<year>2017</year>). <article-title>Prefix-projection global constraint and top-k approach for sequential pattern mining</article-title>. <source>Constraints</source> <volume>22</volume>, <fpage>265</fpage>&#x02013;<lpage>306</lpage>. <pub-id pub-id-type="doi">10.1007/s10601-016-9252-z</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kuruba Manjunath</surname> <given-names>Y. S.</given-names></name> <name><surname>Kashef</surname> <given-names>R. F.</given-names></name></person-group> (<year>2021</year>). <article-title>Distributed clustering using multi-tier hierarchical overlay super-peer peer-to-peer network architecture for efficient customer segmentation</article-title>. <source>Electron. Commerce Res. Appl</source>. <volume>47</volume>:<fpage>101040</fpage>. <pub-id pub-id-type="doi">10.1016/j.elerap.2021.101040</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lin</surname> <given-names>J.</given-names></name> <name><surname>Keogh</surname> <given-names>E.</given-names></name> <name><surname>Wei</surname> <given-names>L.</given-names></name> <name><surname>Lonardi</surname> <given-names>S.</given-names></name></person-group> (<year>2007</year>). <article-title>Experiencing sax: a novel symbolic representation of time series</article-title>. <source>Data Mining Knowl. Discov</source>. <volume>15</volume>, <fpage>107</fpage>&#x02013;<lpage>144</lpage>. <pub-id pub-id-type="doi">10.1007/s10618-007-0064-z</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lundberg</surname> <given-names>S. M.</given-names></name> <name><surname>Erion</surname> <given-names>G.</given-names></name> <name><surname>Chen</surname> <given-names>H.</given-names></name> <name><surname>DeGrave</surname> <given-names>A.</given-names></name> <name><surname>Prutkin</surname> <given-names>J. M.</given-names></name> <name><surname>Nair</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>From local explanations to global understanding with explainable AI for trees</article-title>. <source>Nat. Mach. Intell</source>. <volume>2</volume>, <fpage>56</fpage>&#x02013;<lpage>67</lpage>. <pub-id pub-id-type="doi">10.1038/s42256-019-0138-9</pub-id><pub-id pub-id-type="pmid">32607472</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Nijssen</surname> <given-names>S.</given-names></name> <name><surname>Zimmermann</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <source>Constraint-Based Pattern Mining</source>. <publisher-name>Springer</publisher-name>, <fpage>147</fpage>&#x02013;<lpage>163</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-07821-2_7</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Novak</surname> <given-names>P. K.</given-names></name> <name><surname>Lavra&#x0010D;</surname> <given-names>N.</given-names></name> <name><surname>Webb</surname> <given-names>G. I.</given-names></name></person-group> (<year>2009</year>). <article-title>Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining</article-title>. <source>J. Mach. Learn. Res</source>. <volume>10</volume>, <fpage>377</fpage>&#x02013;<lpage>403</lpage>. <pub-id pub-id-type="doi">10.5555/1577069.1577083</pub-id></citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pei</surname> <given-names>J.</given-names></name> <name><surname>Han</surname> <given-names>J.</given-names></name> <name><surname>Mortazavi-Asl</surname> <given-names>B.</given-names></name> <name><surname>Pinto</surname> <given-names>H.</given-names></name> <name><surname>Chen</surname> <given-names>Q.</given-names></name> <name><surname>Dayal</surname> <given-names>U.</given-names></name> <etal/></person-group>. (<year>2001</year>). <article-title>PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth,</article-title> in <source>Proceedings 17th International Conference on Data Engineering</source>, <fpage>215</fpage>&#x02013;<lpage>224</lpage>.</citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pei</surname> <given-names>J.</given-names></name> <name><surname>Han</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>W.</given-names></name></person-group> (<year>2007</year>). <article-title>Constraint-based sequential pattern mining: the pattern-growth methods</article-title>. <source>J. Intell. Inform. Syst</source>. <volume>28</volume>, <fpage>133</fpage>&#x02013;<lpage>160</lpage>. <pub-id pub-id-type="doi">10.1007/s10844-006-0006-z</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pellegrina</surname> <given-names>L.</given-names></name> <name><surname>Riondato</surname> <given-names>M.</given-names></name> <name><surname>Vandin</surname> <given-names>F.</given-names></name></person-group> (<year>2019</year>). <article-title>Hypothesis testing and statistically-sound pattern mining,</article-title> in <source>Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &#x00026; Data Mining, KDD &#x00027;19</source>, <fpage>3215</fpage>&#x02013;<lpage>3216</lpage>. <pub-id pub-id-type="doi">10.1145/3292500.3332286</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Requena</surname> <given-names>B.</given-names></name> <name><surname>Cassani</surname> <given-names>G.</given-names></name> <name><surname>Tagliabue</surname> <given-names>J.</given-names></name> <name><surname>Greco</surname> <given-names>C.</given-names></name> <name><surname>Lacasa</surname> <given-names>L.</given-names></name></person-group> (<year>2020</year>). <article-title>Shopper intent prediction from clickstream e-commerce data with minimal browsing information</article-title>. <source>Sci. Rep</source>. <volume>10</volume>:<fpage>16983</fpage>. <pub-id pub-id-type="doi">10.1038/s41598-020-73622-y</pub-id><pub-id pub-id-type="pmid">33046722</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Srikant</surname> <given-names>R.</given-names></name> <name><surname>Agrawal</surname> <given-names>R.</given-names></name></person-group> (<year>1996</year>). <article-title>Mining sequential patterns: generalizations and performance improvements,</article-title> in <source>Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology</source>, <fpage>3</fpage>&#x02013;<lpage>17</lpage>. <pub-id pub-id-type="doi">10.1007/BFb0014140</pub-id></citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Hosseininasab</surname> <given-names>A.</given-names></name> <name><surname>Pablo</surname> <given-names>C.</given-names></name> <name><surname>Serdar</surname> <given-names>K.</given-names></name> <name><surname>van Hoeve</surname> <given-names>W.-J.</given-names></name></person-group> (<year>2022</year>). <article-title>Seq2pat: sequence-to-pattern generation for constraint-based sequential pattern mining,</article-title> in <source>AAAI-IAAI</source>.</citation>
</ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Kadioglu</surname> <given-names>S.</given-names></name></person-group> (<year>2022</year>). <article-title>Dichotomic pattern mining with applications to intent prediction from semi-structured clickstream datasets,</article-title> in <source>The AAAI-22 Workshop on Knowledge Discovery from Unstructured Data in Financial Services</source>.</citation>
</ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wegener</surname> <given-names>I.</given-names></name></person-group> (<year>2000</year>). <article-title>Branching programs and binary decision diagrams: theory and applications</article-title>. <source>Soc. Indus. Appl. Math</source>. <volume>4</volume>, <fpage>379</fpage>&#x02013;<lpage>408</lpage>. <pub-id pub-id-type="doi">10.1137/1.9780898719789</pub-id></citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>H.</given-names></name> <name><surname>Hayato</surname> <given-names>Y.</given-names></name></person-group> (<year>2006</year>). <article-title>Generalized sequential pattern mining with item intervals</article-title>. <source>J. Comput</source>. <volume>1</volume>, <fpage>51</fpage>&#x02013;<lpage>60</lpage>. <pub-id pub-id-type="doi">10.4304/jcp.1.3.51-60</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zaki</surname> <given-names>M. J.</given-names></name></person-group> (<year>2001</year>). <article-title>Spade: an efficient algorithm for mining frequent sequences</article-title>. <source>Mach. Learn</source>. <volume>42</volume>, <fpage>31</fpage>&#x02013;<lpage>60</lpage>. <pub-id pub-id-type="doi">10.1023/A:1007652502315</pub-id><pub-id pub-id-type="pmid">26848079</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn id="fn0001"><p><sup>1</sup><ext-link ext-link-type="uri" xlink:href="https://github.com/fidelity/seq2pat">https://github.com/fidelity/seq2pat</ext-link></p></fn>
<fn id="fn0002"><p><sup>2</sup><ext-link ext-link-type="uri" xlink:href="https://github.com/fidelity/seq2pat/tree/master/notebooks/dichotomic_pattern_mining.ipynb">https://github.com/fidelity/seq2pat/tree/master/notebooks/dichotomic_pattern_mining.ipynb</ext-link></p></fn>
</fn-group>

</back>
</article> 