<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Artif. Intell.</journal-id>
<journal-title>Frontiers in Artificial Intelligence</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Artif. Intell.</abbrev-journal-title>
<issn pub-type="epub">2624-8212</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/frai.2020.00047</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Artificial Intelligence</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Process Mining of Football Event Data: A Novel Approach for Tactical Insights Into the Game</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Kr&#x000F6;ckel</surname> <given-names>Pavlina</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/867006/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Bodendorf</surname> <given-names>Freimut</given-names></name>
</contrib>
</contrib-group>
<aff><institution>Institute of Information Systems, University of Erlangen-Nuremberg</institution>, <addr-line>Nuremberg</addr-line>, <country>Germany</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Daniel Adomako Asamoah, Wright State University, United States</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Mehmet Ergun, Istanbul Sehir University, Turkey; Hongyou Liu, South China Normal University, China</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Pavlina Kr&#x000F6;ckel <email>pavlina.kroeckel&#x00040;fau.de</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to AI in Business, a section of the journal Frontiers in Artificial Intelligence</p></fn></author-notes>
<pub-date pub-type="epub">
<day>14</day>
<month>07</month>
<year>2020</year>
</pub-date>
<pub-date pub-type="collection">
<year>2020</year>
</pub-date>
<volume>3</volume>
<elocation-id>47</elocation-id>
<history>
<date date-type="received">
<day>10</day>
<month>12</month>
<year>2019</year>
</date>
<date date-type="accepted">
<day>02</day>
<month>06</month>
<year>2020</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2020 Kr&#x000F6;ckel and Bodendorf.</copyright-statement>
<copyright-year>2020</copyright-year>
<copyright-holder>Kr&#x000F6;ckel and Bodendorf</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract><p>The paper explores process mining and its usefulness for analyzing football event data. We work with professional event data provided by OPTA Sports from the European Championship in 2016. We analyze one game of a favorite team (England) against an underdog team (Iceland). The success of the underdog teams in the Euro 2016 was remarkable, and it is what made the event special. For this reason, it is interesting to compare the performance of a favorite and an underdog team by applying process mining. The goal is to show the options that these types of algorithms and visual analytics offer for the interpretation of event data in football and discuss how the gained insights can support decision makers not only in pre- and post-match analysis but also during live games as well. We show process mining techniques which can be used to gain team or individual player insights by considering the types of actions, the sequence of actions, and the order of player involvement in each sequence. Finally, we also demonstrate the detection of typical or unusual behavior by trace and sequence clustering.</p></abstract>
<kwd-group>
<kwd>football</kwd>
<kwd>soccer</kwd>
<kwd>process mining</kwd>
<kwd>sports analytics</kwd>
<kwd>tactics</kwd>
</kwd-group>
<contract-sponsor id="cn001">Friedrich-Alexander-Universit&#x000E4;t Erlangen-N&#x000FC;rnberg<named-content content-type="fundref-id">10.13039/501100001652</named-content></contract-sponsor>
<counts>
<fig-count count="11"/>
<table-count count="5"/>
<equation-count count="0"/>
<ref-count count="27"/>
<page-count count="16"/>
<word-count count="9103"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>Analyzing the tactical behavior in team sports is of paramount importance in sports performance analysis. The individual actions performed are of interest when analyzing the team&#x00027;s tactics. For quite some time, action frequencies of teams, and players have been the only way to gain insight into this performance aspect. However, this is not enough to get a complete picture of the performance, and especially the tactical behavior. Therefore, action/event sequences have been suggested for deeper insight into the game. One reason mentioned by Carling et al. (<xref ref-type="bibr" rid="B3">2008</xref>) is that &#x0201C;on-the-ball&#x0201D; activity, physical contact, and the sequence in which these actions occur contribute to physiological energy expenditure. This means that in addition to tactics, this kind of action analysis could give insight into player fatigue. Action sequences are chains of sequential single actions during a game (Schrapf et al., <xref ref-type="bibr" rid="B18">2017</xref>). As the OPTA data are based entirely on event or action data, with timestamps and positional coordinates available, it is especially suitable for this type of analysis.</p>
<p>The paper presents a novel technique for sequence analysis of football event data and discusses its advantages and disadvantages for decision making in football. To the best of our knowledge, process mining has not been used for sports performance analysis until now.</p>
</sec>
<sec id="s2">
<title>Process Mining</title>
<p>Process mining aims at discovering, monitoring, and improving real processes by extracting knowledge from event logs (van der Aalst, <xref ref-type="bibr" rid="B21">2011</xref>). As a discipline, process mining sits between, on the one hand, machine learning and data mining, and on the other hand, process modeling and analysis (van der Aalst, <xref ref-type="bibr" rid="B21">2011</xref>). Some of the answers which process mining can deliver are: (a) what really happened, (b) why did it happen, (c) what is likely to happen in the future, and (d) when and why do organizations and people deviate (van der Aalst, <xref ref-type="bibr" rid="B21">2011</xref>).</p>
<p>There are different algorithms used in process mining, depending on the data available and the questions that need to be answered. Irrespective of this, process mining requires structured data, and specifically, event logs of business (or other) processes. The goal is to analyze event data from a process oriented perspective (van der Aalst, <xref ref-type="bibr" rid="B21">2011</xref>). For a process mining algorithm to work, a few attributes must be available. These are &#x0201C;case ID,&#x0201D; &#x0201C;activity,&#x0201D; and &#x0201C;timestamp.&#x0201D; Other attributes in the dataset give additional information on the process and can be used as well in specific types of analysis in process mining, but they are not critical to the analysis.</p>
<p>There are three types of process mining: discovery, conformance checking, and enhancement (van der Aalst et al., <xref ref-type="bibr" rid="B23">2012</xref>). The most often used type of process mining is discovery (van der Aalst et al., <xref ref-type="bibr" rid="B23">2012</xref>). This technique converts an event log into a process model, without any a-priori information (van der Aalst, <xref ref-type="bibr" rid="B21">2011</xref>). The discovered model can be in the form of a Petri net, BPMN, EPC, or UML activity diagram, but it can also be a social network model, depending on the perspective needed (van der Aalst et al., <xref ref-type="bibr" rid="B23">2012</xref>). Conformance checking uses an event log and a model as inputs. It is used for finding discrepancies between the reality (event log) and the process model (van der Aalst, <xref ref-type="bibr" rid="B21">2011</xref>; van der Aalst et al., <xref ref-type="bibr" rid="B23">2012</xref>). The third type, enhancement, also uses an event log and a model as an input, but the information from the event log is used to improve the existing process model (van der Aalst et al., <xref ref-type="bibr" rid="B23">2012</xref>). Finally, process mining may refer to different perspectives of the analyzed processes. These are explained below.</p>
<p><bold><italic>Control-flow perspective</italic></bold>&#x02014;ordering of activities. Here, the goal is to find a good characterization of all possible paths by deriving a process model that provides the best summary of the flow followed by most or all of the cases in the event log (ProM, <xref ref-type="bibr" rid="B13">2017a</xref>). It can answer questions such as:</p>
<list list-type="bullet">
<list-item><p>Which tasks precede which other ones?</p></list-item>
<list-item><p>Are there concurrent tasks?</p></list-item>
<list-item><p>Are there loops?</p></list-item>
</list>
<p>There are several options for analyzing the case-flow. Some of the algorithms that can be used are the Alpha algorithm, the Heuristic Miner, Fuzzy Miner, and Inductive Visual Miner (IVM) (van der Aalst, <xref ref-type="bibr" rid="B22">2016</xref>). A short comparison of these algorithms is presented in <xref ref-type="table" rid="T1">Table 1</xref>.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Comparison of mining algorithms.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Algorithm</bold></th>
<th valign="top" align="left"><bold>Input</bold></th>
<th valign="top" align="left"><bold>Output</bold></th>
<th valign="top" align="left"><bold>When to use</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Alpha miner</td>
<td valign="top" align="left">Event log</td>
<td valign="top" align="left">Petri Net</td>
<td valign="top" align="left">Not recommended for real-life data.</td>
</tr>
<tr>
<td valign="top" align="left">Heuristic miner</td>
<td valign="top" align="left">Event log</td>
<td valign="top" align="left">Heuristic net</td>
<td valign="top" align="left">For real-life data with not too many different events.</td>
</tr>
<tr>
<td valign="top" align="left">Fuzzy miner</td>
<td valign="top" align="left">Event log</td>
<td valign="top" align="left">Fuzzy model</td>
<td valign="top" align="left">For complex and unstructured log data or for simplification of the model.</td>
</tr>
<tr>
<td valign="top" align="left">Inductive visual miner</td>
<td valign="top" align="left">Event log</td>
<td valign="top" align="left">Petri net or process tree</td>
<td valign="top" align="left">For discovering process delays, deviations, and animation of the model.</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Self-compiled based on Rozinat (<xref ref-type="bibr" rid="B16">2010</xref>), Leemans et al. (<xref ref-type="bibr" rid="B11">2014</xref>)</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>The answer of which algorithm should be used in a specific case is not a straightforward one. <xref ref-type="table" rid="T1">Table 1</xref> provides a starting guideline when deciding which algorithm to use, but there are other options, and best is to test various algorithms and inspect the results. The <italic>Alpha algorithm</italic>, which was the first process mining algorithm developed, is not recommended for analysis of a real-world event log data (Rozinat, <xref ref-type="bibr" rid="B16">2010</xref>). The <italic>Heuristic Miner</italic> was developed following the Alpha Miner to address its deficiencies and is therefore also able to simplify the process model by abstracting exceptional behavior and noise&#x02014;by leaving out edges, i.e., connections between certain events (Rozinat, <xref ref-type="bibr" rid="B16">2010</xref>). This algorithm is able to detect short loops and skipping of activities. However, it still shows rather complex process models (Buijs, <xref ref-type="bibr" rid="B2">2017</xref>). The <italic>Fuzzy Miner</italic> interactively simplifies the process model by hiding some activities and paths, if desired (Rozinat, <xref ref-type="bibr" rid="B16">2010</xref>). We use the <italic>Inductive Miner</italic> in our analysis because it was developed to overcome the disadvantages of other algorithms, and it shows a sound process model in the most user-friendly manner (Leemans et al., <xref ref-type="bibr" rid="B11">2014</xref>).</p>
<p><bold><italic>Organizational perspective</italic></bold>&#x02014;focusing on information about the resources, which can be people, departments, roles, etc., and how they relate to each other. This relationship can also be represented as a social network based on the activities of the resources and can be used to find interaction patterns or evaluate the role of individuals (RapidProM, <xref ref-type="bibr" rid="B15">2017</xref>). <italic>Social network mining</italic> is the most useful technique in the case of the organizational perspective, since network science is an area that studies interactions and relations between individuals. To derive sociograms from event logs, there are a few categories of metrics that have been developed (see <xref ref-type="table" rid="T2">Table 2</xref>).</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Types of social network metrics used for analyzing relationships from event logs.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Metric category</bold></th>
<th valign="top" align="left"><bold>Definition</bold></th>
<th valign="top" align="left"><bold>Examples of metrics</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Metrics based on (possible) causality</td>
<td valign="top" align="left">Analyze how work moves among performers.</td>
<td valign="top" align="left">Handover of Work (HoW)</td>
</tr>
<tr>
<td/>
<td/>
<td valign="top" align="left">Subcontracting</td>
</tr>
<tr>
<td valign="top" align="left">Metrics based on joint cases</td>
<td valign="top" align="left">Count how frequently two individuals are performing activities for the same case.</td>
<td valign="top" align="left">Working together</td>
</tr>
<tr>
<td valign="top" align="left">Metrics based on joint activities</td>
<td valign="top" align="left">Focus on the activities performed by individuals -&#x0003E; people are more similar if they perform the same activities.</td>
<td valign="top" align="left">Similar task metric</td>
</tr>
<tr>
<td valign="top" align="left">Metrics based on special event types</td>
<td valign="top" align="left">Consider the type of event.</td>
<td valign="top" align="left">Reassignment</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Self-compiled based on van der Aalst et al. (<xref ref-type="bibr" rid="B24">2005</xref>)</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>In this paper, we use two metrics for the analysis: the Handover of Work and the Working Together metrics, because the information is displayed in a similar way. Thus, we believe that by showing the results from these two metrics, the reader will understand what type of information can be extracted and how it is visualized.</p>
<p>The metrics based on (possible) causality consider how work moves among performers (van der Aalst et al., <xref ref-type="bibr" rid="B24">2005</xref>). In a football game, it considers the flow of events between the players. For instance, there will be a <italic>Handover of Work</italic> between two players if there are two subsequent activities/events where the first is completed by player A and the second by player B. In addition to a direct succession, it is also possible to analyze &#x0201C;indirect succession using a &#x0201C;causality fall factor&#x0201D; &#x003B2;, i.e., if there are 3 activities in-between an activity completed by <italic>i</italic> and an activity completed by <italic>j</italic>, the causality fall factor is &#x003B2;<sup>3</sup>&#x0201D; (van der Aalst et al., <xref ref-type="bibr" rid="B24">2005</xref>, p. 9). The subcontracting metric counts the number of times when player B executed an activity in between two activities done by player A. For instance, Player A -&#x0003E; Player B -&#x0003E; Player A. This could indicate that work was subcontracted from Player A to Player B.</p>
<p>Metrics based on joint cases ignore the causality and simply count how often individuals are performing activities within the same case, i.e., sequence of activities (van der Aalst and Song, <xref ref-type="bibr" rid="B25">2004</xref>). Thus, the metric <italic>Working Together</italic> shows which players most often participate or &#x0201C;work together&#x0201D; in the same ball possession sequence. If two individuals often work together on cases, they are considered to have a stronger relation than individuals rarely working together (van der Aalst and Song, <xref ref-type="bibr" rid="B25">2004</xref>; van der Aalst et al., <xref ref-type="bibr" rid="B24">2005</xref>).</p>
<p><bold><italic>Case perspective</italic></bold>&#x02014;focusing on the properties of the cases. It can answer questions, such as (ProM, <xref ref-type="bibr" rid="B13">2017a</xref>):</p>
<list list-type="bullet">
<list-item><p>What are the most frequent paths in the process?</p></list-item>
<list-item><p>Are there any loop patterns in the process?</p></list-item>
<list-item><p>What is the distribution of all cases over the different paths through the process?</p></list-item>
<list-item><p>Can you select a subset of traces where specific paths were executed?</p></list-item>
<list-item><p>Can you simplify the log by abstracting the most frequent paths?</p></list-item>
</list>
<p>Some options to answer the above questions with process mining are the <italic>Trace Variants, Dotted Chart</italic> visualizations, and the <italic>Trace and Sequence Clustering</italic>.</p>
<p>The basic idea of <italic>Trace Clustering</italic> is to split the event log into homogeneous subsets and for each subset to create a process model (Song et al., <xref ref-type="bibr" rid="B20">2009</xref>). What this technique does is basically identification and clustering of similar sequences. The similarity is calculated based on a distance metric, usually the Euclidean or Hamming distance, while the clustering can be performed by using different algorithms, like k-Means or SOM (Veiga, <xref ref-type="bibr" rid="B26">2009</xref>). A list of the algorithms available for clustering in ProM is presented in <xref ref-type="supplementary-material" rid="SM1">Supplementary Table 1</xref>.</p>
<p>Trace Clustering works by creating a set of profiles, each measuring a number of features for each case from a specific perspective (Song et al., <xref ref-type="bibr" rid="B20">2009</xref>). In a second step, the distance between each case is measured by a distance metric; in this case, the Euclidean distance is used as it is found to be the most reliable. Finally, in a third step, similar cases are put together by using a clustering algorithm. Clusters can be analyzed independently from one another which improves the quality of the results for flexible environments (Song et al., <xref ref-type="bibr" rid="B20">2009</xref>). Considering that football consists of 11 players who do not act according to a specific pre-defined process but rather based on quite a few distinct factors from their surrounding environment, one could reasonably assume that football can be considered a flexible environment within the process mining analytics area. Therefore, it would be interesting to see if and how trace/sequence clustering could be helpful for football performance analysis.</p>
<p><italic>Sequence Clustering</italic> is based on a similar idea as Trace Clustering. However, this type of clustering is performed directly on the input data, i.e., no features are extracted from the sequences (Veiga, <xref ref-type="bibr" rid="B26">2009</xref>). The plugin in ProM 5.7 has been implemented by Veiga (<xref ref-type="bibr" rid="B26">2009</xref>) whose algorithm is based on first-order Markov chains in which case the current state depends only on the previous state (Ferreira et al., <xref ref-type="bibr" rid="B5">2007</xref>). The probability that an observed sequence is assigned to a given cluster is the probability that the observed sequence was produced by the Markov chain associated with that cluster, or simply the assignment of sequences to clusters is based on the probability of each cluster producing the given sequence (Ferreira et al., <xref ref-type="bibr" rid="B5">2007</xref>; Veiga, <xref ref-type="bibr" rid="B26">2009</xref>). Thus, a given sequence will be assigned to the cluster that is able to produce it with higher probability (Veiga, <xref ref-type="bibr" rid="B26">2009</xref>). Veiga also adds two additional dummy states in the Markov chain: an input and an output state. This is necessary in order to represent the probability of a given event being the first or the last event in a sequence, which could be useful to distinguish between some types of sequences (Veiga, <xref ref-type="bibr" rid="B26">2009</xref>).</p>
<p>In this paper, we use the SOM clustering algorithm and Markov chain clustering. SOM is used because it is very efficient with respect to computation time and is also quite robust concerning the results, especially for situations, where the characteristics of the process underlying an event log are largely unknown (G&#x000FC;nther, <xref ref-type="bibr" rid="B8">2009</xref>). The Markov chain clustering is preferred because it also discovers clusters without the analyst having to predefine the number of clusters. A detailed explanation of the algorithms is beyond the scope of this paper. For an overview of the SOM algorithm, the reader is referred to Si et al. (<xref ref-type="bibr" rid="B19">2003</xref>), and for more details on Markov chains, refer to Chung (<xref ref-type="bibr" rid="B4">1967</xref>).</p>
<p><bold><italic>Time perspective</italic></bold>&#x02014;analyzing the timing and frequency of events. If timestamps are available, it is possible to detect bottlenecks, monitor the utilization of resources, or predict the remaining processing time of running cases (van der Aalst, <xref ref-type="bibr" rid="B22">2016</xref>). On its own, this perspective will most likely not be too interesting in a football scenario. However, combined with other perspectives, it can give interesting insights.</p>
<p>Each of these perspectives gives a different view of the process analyzed. The control-flow perspective relates to the &#x0201C;How&#x0201D; question, the organizational perspective to the &#x0201C;Who&#x0201D; question, while the case perspective answers the &#x0201C;What&#x0201D; question (ProM, <xref ref-type="bibr" rid="B14">2017b</xref>). For a proper business understanding, users typically have to extract several models that describe different perspectives in the process analyses (Ingvaldsen and Gulla, <xref ref-type="bibr" rid="B9">2008</xref>).</p>
<p>As seen, process mining is not a reporting, but an analysis tool, which is able to model and analyze complex processes (Rozinat and Gunther, <xref ref-type="bibr" rid="B17">2015</xref>). Even though it works with historical data, it does not mean that it is limited to offline analysis, as the results can be applied to running cases (van der Aalst et al., <xref ref-type="bibr" rid="B23">2012</xref>). Not all process mining types and perspectives can be applied to a football game scenario. From the three types of process mining mentioned, the discovery type is certainly applicable in this case, as conformance checking and enhancement require a model, which the discovered model from the OPTA log can be compared to. In football, there is no &#x0201C;perfect&#x0201D; or pre-defined model of the game process. Therefore, process mining can help with modeling the real-world process of what actually happened during the game. Finally, it is possible to view the event logs of the matches from all four perspectives discussed above.</p>
</sec>
<sec id="s3">
<title>Analytics Approach and Tools</title>
<p>Each of the process mining analytic perspectives explained previously (case-flow, case, and organizational) is used in the analyses introduced here.</p>
<p>In a first step, the original OPTA data are pre-processed and converted into an event log data. This is followed by analysis of the resulting event logs, discovery of the process models, and interpretation. The potential of the applied analytics techniques to gain a tactical understanding in football is presented in the Discussion section.</p>
<p>One exemplary game is analyzed as the goal is to demonstrate the techniques. The game between England and Iceland is chosen because Iceland won, while England&#x00027;s team showed one of its worst performances in a tournament. Therefore, it is interesting to explore what process mining can reveal about the tactical player and team behaviors. A summary of the game is presented in <xref ref-type="supplementary-material" rid="SM1">Supplementary Table 2</xref>.</p>
<p>The tools used for the analyses are:</p>
<list list-type="simple">
<list-item><p><bold>ProM</bold>&#x02014; this is an open-source process mining software, which offers a wide range of algorithms and techniques to process and analyze event logs. There are also various plugins available to extend the analytics options further. In this paper, two versions of the software are used: ProM 5.2 and 6.7<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref> Some useful techniques such as the SOM trace clustering are missing in the later version, and therefore both versions are used in the analysis.</p></list-item>
<list-item><p><bold>Disco</bold>&#x02014; this is a proprietary process mining software developed by Fluxicon<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref> It is more user friendly than ProM and it is easier to read-in the event logs and get quick results. Although it has a better learning curve and results that are easy to interpret, it offers less analytical options than ProM. However, some of the techniques are easier for analysis, and therefore it is used in combination with ProM.</p></list-item>
</list>
</sec>
<sec sec-type="results" id="s4">
<title>Results</title>
<sec>
<title>Data Pre-processing</title>
<p>The first step of process mining is to pre-process the event log data from OPTA for the analyzed teams. Depending on the amount of additional information needed on each event, i.e., the attributes as described previously, the task can vary in complexity. The main issue, converting the log data into a format required by the process mining algorithms, is that each event in the OPTA log is described over several rows. Each row has different types and number of qualifiers which describe the event further. For instance, if a pass is analyzed, it can have qualifiers referring to the length of the pass, the angle, the x and y coordinates, etc. There are 36 qualifiers in total that can be used to describe the pass in more detail. Not all of them are used for each pass. The situation is similar with the rest of the 73 event types. Thus, it is a challenge to extract the relevant information in a way that the attributes of each event are added on a single row. A Python script tackles this challenge. Below, a few key steps executed by the code are introduced:</p>
<list list-type="order">
<list-item><p>Eliminate unnecessary event types (formation change; deleted event, namely all events not related to ball possession or loss thereof)</p></list-item>
<list-item><p>Re-sort the data according to the scheme provided by OPTA, so that an accurate sequence of events can be obtained</p></list-item>
<list-item><p>Pivot the qualifiers (each tuple of qualifier ID and value is transposed to one column per qualifier)</p></list-item>
<list-item><p>Summarize data by event IDs (one row per event including all values for qualifiers)</p></list-item>
<list-item><p>Assign case IDs.</p></list-item>
</list>
<p>The output of the pre-processing step is a sequence of all events referring to the game with the ball. This means that a sequence for team A starts when the team gains ball possession and ends when the team loses the ball. An overview of the final data format is presented in <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure 1</xref>. The minimum requirements for process mining are fulfilled by the columns &#x0201C;Seq. Num.&#x0201D;, &#x0201C;Event type,&#x0201D; and &#x0201C;Timestamp.&#x0201D; Additionally, the Period ID column (1&#x02014;first half of the game), the x and y coordinates of the event in question, and its outcome (1&#x02014;successful, 0&#x02014;not successful) are available as attributes.</p>
</sec>
<sec>
<title>Data Analytics</title>
<sec>
<title>Case-Flow Perspective</title>
<p>In a first step, the game is analyzed from a broader viewpoint, particularly the case-flow perspective. It gives a &#x0201C;helicopter&#x0201D; view of the sequences that happened for both teams and a summary of actions that characterize both the team and its players. As discussed, there are various algorithms that can be used to derive a process model from event log data. The algorithms are initially run with default settings, as this works well in most cases, at least in giving an initial idea of the usefulness of the algorithm in each individual case. In <xref ref-type="fig" rid="F1">Figure 1</xref> the results from the Inductive Visual Miner (IVM) are presented for the team of England. The IVM model for Iceland is presented in <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure 2</xref>.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Process model of England&#x00027;s team by using the Inductive Visual Miner.</p></caption>
<graphic xlink:href="frai-03-00047-g0001.tif"/>
</fig>
<p>Both models display all activities and paths for each team during the game. The darker blue color indicates that those activities (events) occur more often during the game. Not surprisingly, the event &#x0201C;pass&#x0201D; is usually highlighted in this way. From an initial inspection of the models, we gain a first impression about the <italic>event frequency</italic>, e.g., for England it is immediately visible that the team had 580 passes or 9 corners awarded. But more interestingly, we can visualize the <italic>dependency</italic> between events, i.e., how often an event was followed by another. For instance, in England&#x00027;s team, once a &#x0201C;foul&#x0201D; (out of 20) was followed by a &#x0201C;card&#x0201D; event. Unfortunately, the model does not distinguish whether the foul was caused or suffered by England. Therefore, the process model for Iceland also shows that there are 20 fouls in the match.</p>
<p>The case-flow perspective does not seem to be very useful in a football case scenario as the frequencies of events are not interesting enough and are part of the traditional notational analysis.</p>
</sec>
<sec>
<title>Case Perspective</title>
<p>In a next step, various techniques and visualizations from the case perspective are applied.</p>
<p>One option is to examine sequences that are of interest to the coach or his team. For instance, we can inspect the sequences that end with the event &#x0201C;<italic>miss</italic>&#x0201D; (any shot on goal which goes wide or over the goal). All the sequences, the time they occurred and duration, as well as players who started and ended the sequence can be inspected in this way. The overview of the sequences ending in the event &#x0201C;miss&#x0201D; for England are presented in <xref ref-type="fig" rid="F2">Figure 2</xref>.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Instances ending in a &#x0201C;miss&#x0201D; event for England.</p></caption>
<graphic xlink:href="frai-03-00047-g0002.tif"/>
</fig>
<p>From <xref ref-type="fig" rid="F2">Figure 2</xref>, we can see how many different variants these specific sequences have, i.e., how much they differ in terms of the type of events that precede a &#x0201C;miss&#x0201D; (in the case of England, there are 7 variants). Furthermore, we can examine how long it took between the starting event and the &#x0201C;miss&#x0201D; event (longest sequence lasts 27 sec). In 3 out of 11 instances, England missed a scoring chance after a longer passing sequence. Each sequence of interest can be individually inspected in order to investigate the exact order in which events happened as well as the player involved for each event. For instance, <xref ref-type="fig" rid="F3">Figure 3</xref> presents part of the longest variant type that lasts 27 seconds and happened once.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Closer inspection of a sequence ending in the event &#x0201C;miss&#x0201D;.</p></caption>
<graphic xlink:href="frai-03-00047-g0003.tif"/>
</fig>
<p>From <xref ref-type="fig" rid="F3">Figure 3</xref> we can see that the pass between Walker and Smalling took 5 seconds. Football coaches and experts can inspect in this way all sequences or parts of sequences that took the longest as well as who was involved and when. This will help them to strategize better but also understand what contributed to the success or failure of the team in that particular match.</p>
<p>The sequences also give more insights about the value of a player. When assigning credit to a player, standard statistics do not give enough credit to players who managed to keep the ball in possession by successfully getting out of tight situations (Gregory, <xref ref-type="bibr" rid="B7">2017</xref>). One should not only look at players who shot toward the goal or made the key assist, as sometimes it can be much more difficult to enable that assist in the first place (Gregory, <xref ref-type="bibr" rid="B7">2017</xref>). Process mining can give additional insights into a player&#x00027;s involvement in such situations. One option is to filter out all sequences that end in the following events: &#x0201C;attempt saved&#x0201D; and &#x0201C;goal&#x0201D;.</p>
<p>All sequences of England&#x00027;s team that end in one of the mentioned events are filtered out. The resulting process model is presented in <xref ref-type="fig" rid="F4">Figure 4</xref>. Six sequences ended in &#x0201C;attempt saved&#x0201D;, while one sequence ended in &#x0201C;goal&#x0201D;. First, we can see which players were involved in these sequences, and which players started and ended the sequence. This is presented in <xref ref-type="table" rid="T3">Table 3</xref>. Based on <xref ref-type="table" rid="T3">Table 3C</xref>, Kane and Vardy are both frequent end-ers of offensive sequences in England&#x00027;s team. This makes these two players very valuable. However, we would like to know which players enabled these last key events, i.e., the shots on goal. <xref ref-type="fig" rid="F4">Figure 4</xref> shows that Rooney, Sterling, Kane, and Vardy are directly connected to the process endpoint&#x02014;the red circle below. Thus, these players are process end-ers. Next, we can investigate which players are connected directly to the process end-ers and, thus, discover the players who enabled that final key pass (the shot on goal). Following the direction of the arrows in <xref ref-type="fig" rid="F4">Figure 4</xref>, these players are Walker, Alli, and Sturridge (passing to Vardy) and Wilshere, Sturridge, and Vardy (passing to Kane). This leads to the conclusion that Sturridge is perhaps equally valuable as the sequence end-ers. The exact sequences can be closely inspected for details similarly to the information presented in <xref ref-type="fig" rid="F3">Figure 3</xref>.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>England&#x00027;s offensive sequences ending in a &#x0201C;goal&#x0201D; or &#x0201C;attempt saved&#x0201D;.</p></caption>
<graphic xlink:href="frai-03-00047-g0004.tif"/>
</fig>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Overview of players involved in offensive sequences.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>(A) List of all players involved in the offensive sequences</bold></th>
<th valign="top" align="left"><bold>(B) Sequence initiators</bold></th>
<th valign="top" align="left"><bold>(C) Sequence end-ers</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><inline-graphic xlink:href="frai-03-00047-i0001.tif"/></td>
<td valign="top" align="left"><inline-graphic xlink:href="frai-03-00047-i0002.tif"/></td>
<td valign="top" align="left"><inline-graphic xlink:href="frai-03-00047-i0003.tif"/></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The <italic>Dotted Chart</italic> is another visual analytics technique available in process mining. It is simple yet extremely useful for having a quick look at various aspects of the game and the players. It can be tweaked to present different dependencies between time, events, and players. One option is presented in <xref ref-type="fig" rid="F5">Figure 5</xref>, but more are possible.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Outcome of activities of England&#x00027;s defenders in first-half.</p></caption>
<graphic xlink:href="frai-03-00047-g0005.tif"/>
</fig>
<p><xref ref-type="fig" rid="F5">Figure 5</xref> visualizes the outcome of events in which England&#x00027;s defenders were involved in the first half of the game. Red is unsuccessful outcome (for instance, ball lost) while green is successful outcome (for instance, successful pass). We can also see how often and when the defenders are engaged in the game. If we observe the timeframe between the first two goals, England&#x00027;s defenders show little action and the two visible actions in the highlighted part above have a negative outcome. We can also detect defenders who were involved in an unsuccessful activity, in this case Cahill and Rose.</p>
<p>As a final step in the case perspective analysis, we use trace and sequence clustering by means of SOM and Markov Chain algorithms.</p>
<p>To be able to apply SOM for trace clustering, we first need to build profiles of the traces based on some features. There are several options that can be chosen, and to do this right, one needs to ask what makes two sequences in football similar to each other. That would be the number and type of events in each sequence, the sequence duration, as well as the participants in each sequence, i.e., the players. We selected these as features based on which of the profiles of the sequences are built before the SOM clustering algorithm is used. There are several parameters for the SOM network which can be fine-tuned in the training process. These are briefly:</p>
<list list-type="bullet">
<list-item><p>Width and Height: this refers to the number of cells that should be used for the resulting rectangular grid. Each cell corresponds to one neuron.</p></list-item>
<list-item><p>Radius: usually set to 2</p></list-item>
<list-item><p>Random seed</p></list-item>
<list-item><p>Training epochs.</p></list-item>
</list>
<p>In a few publications that use SOM for trace clustering (G&#x000FC;nther, <xref ref-type="bibr" rid="B8">2009</xref>; Song et al., <xref ref-type="bibr" rid="B20">2009</xref>; Buddhika, <xref ref-type="bibr" rid="B1">2016</xref>), parameter tuning is not discussed in detail. Usually, the Euclidean distance is used in combination with SOM and this combination is applied here as well. As to the width and height, there should not be more cells than there are traces (G&#x000FC;nther, <xref ref-type="bibr" rid="B8">2009</xref>). This is chosen usually intuitively after trial and error. The radius value which is used in step 5 of the SOM algorithm as well as the random seed parameters are usually kept at their default values of 2 and 999, respectively. This is the choice also for the analysis employed below. Additionally, the colors in the resulting map indicate the relationship between the neurons, i.e., neurons with a similar weight vector will be painted in a similar color (G&#x000FC;nther, <xref ref-type="bibr" rid="B8">2009</xref>). Clusters with many similarities, exhibiting normal behavior, are located in &#x0201C;high land&#x0201D; colored in green, while the clusters with exceptional cases are located at &#x0201C;sea&#x0201D; colored in blue (Buddhika, <xref ref-type="bibr" rid="B1">2016</xref>). Finally, the cases in the same cell (the separate quadrants in <xref ref-type="fig" rid="F6">Figure 6</xref>) belong to the same cluster (Song et al., <xref ref-type="bibr" rid="B20">2009</xref>). These cells are calculated by using the U-matrix, a commonly used technique to cluster the SOM visually (Vesanto and Sulkava, <xref ref-type="bibr" rid="B27">2002</xref>). A neuron n and the neurons in its Moore neighborhood N(n) on the output grid of the SOM represent points in the data space, while the sum of distances between n and the neurons in N(n) in the high-dimensional space is shown on a U-matrix as a height value (U-height) at neuron n (L&#x000F6;tsch and Ultsch, <xref ref-type="bibr" rid="B12">2014</xref>, p. 249). The results of the SOM clustering are presented in <xref ref-type="fig" rid="F6">Figure 6</xref>. Each dot represents one sequence (case), and all dots in the same cell belong to the same cluster.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>SOM trace clustering &#x02013; England and Iceland. <bold>(A)</bold> England <bold>(B)</bold> Iceland.</p></caption>
<graphic xlink:href="frai-03-00047-g0006.tif"/>
</fig>
<p>The parameters are modified in a trial and error; however, the trace clustering for England&#x00027;s team results in a similar looking graphic as depicted in <xref ref-type="fig" rid="F6">Figure 6A</xref>. There is no significant change in the outcome. On the other hand, for Iceland&#x00027;s team, the trial and error experiments result in more diverse clusters by changing the parameters&#x00027; values. In the end, the result depicted in <xref ref-type="fig" rid="F6">Figure 6B</xref> is chosen because it represents the average of the combination of results.</p>
<p>The trace clustering results for these two teams lead to the conclusion that England&#x00027;s players demonstrated a more homogenous behavior (all cases are in the same cell), while Iceland&#x00027;s players seem to be more creative (cases are split in four cells/clusters). This can be confirmed by popular opinions following the game.</p>
<p>The SOM results are confirmed by the sequence clustering and the generated Markov chains for recognized clusters. As we must first pre-define the number of clusters that need to be recognized, a trial and error for England&#x00027;s team reveals that when choosing a smaller number of pre-defined clusters (e.g., 2 to 4 clusters) the resulting clusters are of similar size and the Markov chains look relatively similar to each other. This again confirms the results from the SOM clustering that England&#x00027;s team plays in a rather predictable manner and their behavior is not exceptional or unique. The Markov chains though show more precisely a summarized overview of the main behavior of the team. In addition, we can see the probabilities that one event is followed by another. Finally, there are a few pre-processing steps that can be used for better clustering results, especially because without such pre-processing, the analysis can take more than 24 h. The options for pre-processing parameters and their values set for a football case scenario are presented in <xref ref-type="table" rid="T4">Table 4</xref>.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>Preprocessing parameters for Markov chain sequence clustering for England&#x00027;s team.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Parameter</bold></th>
<th valign="top" align="center"><bold>Value</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Min event occurrence (%)</td>
<td valign="top" align="center">3</td>
</tr>
<tr>
<td valign="top" align="left">Max event occurrence (%)</td>
<td valign="top" align="center">100</td>
</tr>
<tr>
<td valign="top" align="left">Min number of events in a sequence</td>
<td valign="top" align="center">2</td>
</tr>
<tr>
<td valign="top" align="left">Max number of events in a sequence</td>
<td valign="top" align="center">18</td>
</tr>
<tr>
<td valign="top" align="left">Min sequence occurrence</td>
<td valign="top" align="center">3</td>
</tr>
<tr>
<td valign="top" align="left">Max sequence occurrence</td>
<td valign="top" align="center">329</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>We decided that an event should occur at a minimum 30 percent of all events and that there should be a minimum of 2 events in a sequence, to avoid rare and not interesting sequences of only one event. A sequence should also occur at least 3 times, while the maximum parameters are left at default. The number of clusters with these pre-processing steps applied is set to 4. The resulting Markov chains can be seen in <xref ref-type="fig" rid="F7">Figures 7A&#x02013;D</xref>.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>Markov chains for England&#x00027;s team. <bold>(A)</bold> Cluster 0: 30 instances, <bold>(B)</bold> Cluster 1: 3 instances, <bold>(C)</bold> Cluster 2: 33 instances, <bold>(D)</bold> Cluster 3: 35 instances.</p></caption>
<graphic xlink:href="frai-03-00047-g0007.tif"/>
</fig>
<p>This method gives an opportunity to easily drill down and get a quick overview of not only the events the players are mostly involved in and how often, but also the exact sequences that occur most often. We can also choose a higher percentage and check if there are some sequences that occur 50 or even 80 percent of the time. In England&#x00027;s case, when the minimum sequence occurrence is increased to 10 and the minimum event occurrence is increased to 40 percent, two clusters are generated with Markov chains in <xref ref-type="fig" rid="F7">Figures 7A,D</xref>.</p>
<p>From England&#x00027;s Markov chains we can conclude that in roughly 30 percent of their game play, the ball is lost following just one pass after the ball was out of play. This means that they recover the ball and then lose it with just one pass (cluster 0). Furthermore, in 3 instances, England&#x00027;s team makes an unsuccessful dribble attempt past an opponent (cluster 1); there is a probability of 0.825 that they will lose the ball following a pass after a ball recovery (cluster 2), and finally, following an aerial duel, the probability for a clearance is 0.583 (cluster 3). This all speaks against England&#x00027;s team and shows at least some of the reasons behind their loss.</p>
<p>For Iceland&#x00027;s team it is more difficult to generate Markov chains that summarize the behavior well. One reason is that they are more resourceful than England&#x00027;s team. Thus, it is less likely that their play can be clustered in a meaningful way. Similarly to England, the minimum number parameters are modified while the maximum number parameters are kept at default. Using the same parameter setting for England, only 6 instances are left after the pre-processing steps. We get similar results by increasing the parameter &#x0201C;min event occurrence.&#x0201D; Therefore, after a trial and error we decided not to use the pre-processing parameters in Iceland&#x00027;s case and proceed with the clustering directly. The clustering results by pre-defining a different number of clusters are presented in <xref ref-type="supplementary-material" rid="SM1">Supplementary Table 3</xref>.</p>
<p>The Markov chains and the instances are inspected for all the clusters in <xref ref-type="supplementary-material" rid="SM1">Supplementary Table 3</xref>. The 5 clusters summarize the behavior in the best way. For instance, cluster 3 is presented in <xref ref-type="fig" rid="F8">Figure 8</xref>.</p>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p>Markov chain for Iceland&#x00027;s team: cluster 3 with 34 instances.</p></caption>
<graphic xlink:href="frai-03-00047-g0008.tif"/>
</fig>
<p>Looking at <xref ref-type="fig" rid="F8">Figure 8</xref>, Iceland&#x00027;s team is often engaged in passing events (which is not that informative, as passes are the most frequent events for every team). However, Iceland has often events such as &#x0201C;ball recovery,&#x0201D; &#x0201C;interception,&#x0201D; &#x0201C;clearance,&#x0201D; and &#x0201C;out.&#x0201D; Furthermore, every time there is an interception, it is most likely followed by clearance, which is then followed by &#x0201C;out&#x0201D; with a probability of 0.75. This means that Iceland&#x00027;s team is quite successful in defending their half and intercepting the ball from the opponent&#x00027;s team.</p>
<p><xref ref-type="fig" rid="F9">Figure 9</xref> shows that a &#x0201C;tackle&#x0201D; is most likely followed by a &#x0201C;ball touch&#x0201D; which in turn is followed by &#x0201C;out&#x0201D; (with a probability of 0.4) or &#x0201C;challenge&#x0201D; (with a probability of 0.6). This means that following a tackle, for Iceland&#x00027;s players the ball goes out of play for a throw-in or goal kick (out), or a player fails to win the ball as an opponent successfully dribbles past them (challenge). By using further analyses offered by process mining, for instance, the dotted chart (<xref ref-type="fig" rid="F5">Figure 5</xref>), one can also check which players are involved in these unsuccessful events.</p>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption><p>Markov chain for Iceland&#x00027;s team: cluster 1 with 79 instances.</p></caption>
<graphic xlink:href="frai-03-00047-g0009.tif"/>
</fig>
</sec>
<sec>
<title>Social and Organizational Perspective of a Process</title>
<p>A process can be analyzed by looking at the organizational or social perspective. In the case of a football game, this mainly refers to viewing the process from the resource, i.e., the player perspective. As opposed to a typical social network analysis of a football game, where only passes are considered, this type of process mining takes into account all of the events available in the OPTA log.</p>
<p>The first metric used in this analysis is the Handover-of-Work (HoW) metric. <xref ref-type="fig" rid="F10">Figure 10</xref> shows which players from England&#x00027;s team hand over work to other players in all action sequences. Only direct succession is considered. The HoW can be displayed by using different SNA metrics, like degree centrality, in and out degree centrality, betweenness, and closeness centrality. In this case, the degree centrality is chosen as it expresses the relation between the in and out degree of the connections between the nodes (ProM, <xref ref-type="bibr" rid="B13">2017a</xref>). For Iceland&#x00027;s HoW graph, see <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure 4</xref>. The graphs do not change significantly by using the other metrics.</p>
<fig id="F10" position="float">
<label>Figure 10</label>
<caption><p>England&#x02013;handover of work.</p></caption>
<graphic xlink:href="frai-03-00047-g0010.tif"/>
</fig>
<p>In <xref ref-type="fig" rid="F10">Figure 10</xref>, the colors add a cluster point of view to get a better visual perspective. The oval shape also has a meaning. The more vertically shaped nodes have a higher proportion of ingoing arcs, while the more horizontally shaped nodes have more outgoing arcs (ProM, <xref ref-type="bibr" rid="B13">2017a</xref>). In this case, the clusters do not change significantly by removing more edges, which means that players from both teams have, on average, good participation over the course of the game, and display balanced participation. There are a few players that distinguish themselves from the others, however. In Iceland&#x00027;s team, Bjarnason, a midfield substitute player, has a distinctly vertical shape which means he gets more work delegated from the other players. Arnason, a defender, and Bodvarsson, a striker, also gets more work delegated than they themselves did for other players. In general, strikers would perhaps be players who are expected to have more incoming than outgoing arcs due to the nature of their position and, thus, the tasks that are required from them. Defenders, on the other hand, would ideally have more outgoing than incoming arcs. In England&#x00027;s team, Sterling and Rooney display slightly more vertical shapes, but overall, all players have a more balanced handover compared to Iceland&#x00027;s team.</p>
<p>The second metric investigated is the <italic>Working Together</italic> metric. This gives an insight into which two players often participate together in the same attacking sequence&#x02014;for example, they pass the ball to each other in the same sequence.</p>
<p>The network graphs in <xref ref-type="fig" rid="F11">Figure 11</xref> are generated by using the ISOM layout and degree centrality. Similarly to HoW, the graphs are not too different if other network metrics are used. This layout algorithm shows that in the team of England there are two more distinctive clusters of players that work together during attack and which consist of most of the players in the team: cluster E-1 consists of five players (S-Sterling, S-Kane, S-Sturridge, D-Cahill, M-Wilshere-Sub); cluster E-2 consists of six players (M-Rooney, M-Alli, D-Walker, D-Smalling, D-Rose, G-Hart). Three players from this team are isolated from the clusters: F-Vardy-Sub, F-Rashford-Sub, and M-Dier. The two substitute players come in minutes 60 and 86, respectively, so it is not surprising that they are outside of a cluster. Dier, on the other hand, plays as a central midfielder, and therefore has connections to both the E-1 and E-2 clusters. However, he is substituted at half-time by Wilshere, who did not perform well in an earlier match against Slovakia (Glendenning, <xref ref-type="bibr" rid="B6">2016</xref>). From this SNA metric, Wilshere does appear to have stronger relationship with the players from the E-2 cluster as well. In Iceland&#x00027;s team, players are closely clustered together, with Traustason connected with the other substitute player, Bjarnason, the goalkeeper, and Bodvarsson. The midfielder, Bjarnasson, appears to have the closest connection to Skulasson and Sigurdsson. The defender Saevarsson works together occasionally with the rest of his teammates but does not seem to have a stronger relationship with a particular player. In the case of defenders, this behavior could also mean that the defender, by the nature of his task, more often interrupts a sequence of the opposite team. Skulasson works together with Bodvarsson quite often.</p>
<fig id="F11" position="float">
<label>Figure 11</label>
<caption><p>Working Together comparison between England <bold>(Left)</bold> and Iceland <bold>(Right)</bold>.</p></caption>
<graphic xlink:href="frai-03-00047-g0011.tif"/>
</fig>
</sec>
</sec>
</sec>
<sec sec-type="discussion" id="s5">
<title>Discussion</title>
<p>This paper presents an exploratory study to evaluate the potential and suitability of process mining for tactical football performance analysis. As seen, process mining is a collection of algorithms and analytics techniques which are widely used in other domains for analyzing all kinds of business processes. It has never been applied specifically to sports, however. Not all algorithms and visualization techniques of process mining are demonstrated in this paper and not all types of process mining can be used for performance analysis in football. The discovery type of process mining algorithms makes the most sense, as it can demonstrate the exact behavior of teams and players. The conformance checking type of process mining is, in our opinion, not useful in this scenario because one does not have the perfect process model according to which players need to behave during the game. Enhancement of the process model does not seem to be useful in this case either. However, the discovery algorithms and techniques prove to be very valuable for analyzing a football game from a process perspective. <xref ref-type="table" rid="T5">Table 5</xref> presents a summary of the techniques and algorithms addressed in this paper.</p>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption><p>Process mining techniques and their insights in football.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Perspective</bold></th>
<th valign="top" align="left"><bold>Algorithm/analytics technique</bold></th>
<th valign="top" align="left"><bold>Tactical insights into</bold></th>
<th valign="top" align="left"><bold>Potential for decision support in football</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Case-flow perspective</td>
<td valign="top" align="left">Inductive visual miner</td>
<td valign="top" align="left">Team</td>
<td valign="top" align="left">Low</td>
</tr>
<tr>
<td valign="top" align="left">Case perspective</td>
<td valign="top" align="left">Filtering of specific sequences (e.g., offensive)</td>
<td valign="top" align="left">Team and player</td>
<td valign="top" align="left">Medium</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Instances inspector</td>
<td valign="top" align="left">Team and player</td>
<td valign="top" align="left">High</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Dotted chart</td>
<td valign="top" align="left">Team and player</td>
<td valign="top" align="left">High</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">SOM trace clustering</td>
<td valign="top" align="left">Team</td>
<td valign="top" align="left">Medium</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Markov chain sequence clustering</td>
<td valign="top" align="left">Team</td>
<td valign="top" align="left">Medium</td>
</tr>
<tr>
<td valign="top" align="left">Social/organizational perspective</td>
<td valign="top" align="left">Handover of work</td>
<td valign="top" align="left">Player</td>
<td valign="top" align="left">High</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Working together</td>
<td valign="top" align="left">Player</td>
<td valign="top" align="left">High</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The case-flow perspective with the various types of algorithms for discovering the process model has the lowest potential to improve decision making in football based on event data. We can only detect which event types occur most frequently and how events are connected. In addition, for some event types, this perspective is not useful. For instance, the event &#x0201C;foul&#x0201D; will appear in both teams, and it is not clear from the mined process model which team has made or suffered how many fouls. This is due to the way in which the sequences are extracted from the original dataset. As each event has different qualifiers, if all those are considered when creating the process model, there would be too many variants. As the idea of the model is to give a quick overview of what happened as well as some dependencies between the activities, such a level of detail is not focused on in this paper. Unfortunately, it is not possible to avoid this issue. However, this is the case only for a limited number of events. This type of visualization, as demonstrated by the Inductive Visual Miner, can usually be used to analyze the process from a time perspective, i.e., to check which activities last too long and discover bottlenecks. However, due to the nature of football, analyzing this process model from the time perspective would make less sense. We consider the time perspective as part of the other two perspectives (case and social perspectives) which are more useful for performance analysis in football.</p>
<p>The case perspective offers various useful visual analytic techniques and clustering algorithms which can give valuable insights into both the team and player behavior. For instance, once the process model has been generated with the Inductive Visual Miner, it is possible to drill down and filter out specific sequences, which are of interest for the decision maker. We filtered out the attacking sequences which gives answers to questions like:</p>
<list list-type="bullet">
<list-item><p>How many times did a team&#x00027;s action end up in events leading to shot-on-goal?</p></list-item>
<list-item><p>Which events are those exactly? (e.g., miss, post, attempt saved, or goal)</p></list-item>
<list-item><p>Which players were involved?</p></list-item>
<list-item><p>When did these events occur and how long did the sequences last?</p></list-item>
</list>
<p>By using this option, it is possible to not only visualize the sequences leading to shot on goal for England&#x00027;s team but also to find out which players mostly started or ended a sequence. These analyses can be very useful in assessing the value of a player in a game.</p>
<p>Furthermore, clustering algorithms like SOM and first order Markov chains give a quick insight into the behavior of a team. Such analyses can be used, for example, during the half-time break in order to make tactical readjustments for the second half. Finally, social network analysis can be used for player insights. In this case, all event types occurring between the players are considered in the analysis in order to discover cooperation patterns between them. The two metrics that are applied, Handover of Work and Working Together, prove to be valuable in revealing important information about separate players. For instance, the Working Together metric can reveal which two players often cooperate in a sequence of ball possession, which in turn helps to plan tactical adjustments accordingly, especially concerning the defense of one own team. The Handover of Work metric can show which player is overwhelmed by having more work delegated from the other players. This could indicate fatigue or for the opponent can mean that that player should be the focus of their own defense.</p>
<p>We make an initial subjective evaluation of the potential usefulness of each technique presented in this paper for decision making in football. We base our evaluation on our knowledge of other data mining and visualization methods used in football performance analysis. However, this should be studied further by, for instance, conducting a study with coaches and other football experts, to gain an unbiased view on how useful such techniques are for actual decision making in football. Based on our initial results presented in this paper, process mining can be successfully used in addition to the traditional notational analysis for performance evaluation in football. It is an extension to the traditional analytics techniques that mostly consider the frequencies of actions. One major advantage is the possibility to visually infer patterns of interactions between the players and dependencies between different event types (goal, miss, duel, etc.), which has not been possible to achieve with other methods as, for instance, T-pattern analysis.</p>
</sec>
<sec sec-type="conclusions" id="s6">
<title>Conclusion</title>
<p>Based on the presented results, process mining offers valuable techniques and algorithms, which give insights into players&#x00027; and team&#x00027;s behavior. The results are usually quick and understandable. This type of analysis can be used for examining successful and unsuccessful sequence outcomes, establishing defensive strategies against specific players, and overall gaining tactical insights into team and player behaviors. It is more user friendly compared to methods like T-pattern analysis. The sequences of events are clearer. There are also various options for additional analyses of the sequences as well as filtering out and focusing on specific types of sequences, e.g., offensive or defensive, sequences ending in a specific event, or sequences in which a particular player is involved, sequences that last longest, etc.</p>
<p>Process mining offers even more possibilities for analyses of the action sequences. Therefore, future research can explore whether the conformance checking type of process mining would be helpful in a football scenario. For instance, it may be possible to use conformance checking techniques to simulate and test the outcomes of sequences by enhancing the event log with other events. Finally, the pitch zones based on the positional data could be integrated to check if they give even more detailed insights from the sequence data.</p>
</sec>
<sec sec-type="data-availability-statement" id="s7">
<title>Data Availability Statement</title>
<p>The datasets for this article are not publicly available because the data is proprietary data and subject to costs. Requests to access the datasets should be directed to the author.</p>
</sec>
<sec id="s8">
<title>Author&#x00027;s Note</title>
<p>The paper is based on Chapter 8 of the first author&#x00027;s dissertation, which is also publicly available online (please refer to Kr&#x000F6;ckel, <xref ref-type="bibr" rid="B10">2019</xref>).</p>
</sec>
<sec id="s9">
<title>Author Contributions</title>
<p>PK wrote the paper and conducted the data analysis. FB revised the paper in terms of structure, content, writing style and grammar, acquired the data used in the analysis, and drafted the data analytics concept. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec id="s10">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<ack><p>The author would like to thank the Institute of Football Management in Ismaning (IFM), Munich, for providing the data which was used in the analysis. Data was provided as part of a joint research project between the University of Erlangen-Nuremberg and the IFM in the area of football performance analysis.</p>
</ack>
<sec sec-type="supplementary-material" id="s11">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/frai.2020.00047/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/frai.2020.00047/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.docx" id="SM1" mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Buddhika</surname> <given-names>G.</given-names></name></person-group> (<year>2016</year>). Evaluation of Trace Clustering Techniques in Process Mining to Detect Normal and Exceptional Behavior (No. SC/2012/8565). Retrieved from University of Ruhuna website: <ext-link ext-link-type="uri" xlink:href="https://www.academia.edu/31068869/Evaluation_of_Trace_Clustering_techniques_in_Process_Mining_to_detect_normal_and_exceptional_behavior">https://www.academia.edu/31068869/Evaluation_of_Trace_Clustering_techniques_in_Process_Mining_to_detect_normal_and_exceptional_behavior</ext-link> (accessed September 10, 2019).</citation></ref>
<ref id="B2">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Buijs</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). Heuristics miner in ProM. Introduction to process Mining with proM, FutureLearn. Retrieved from <ext-link ext-link-type="uri" xlink:href="https://www.futurelearn.com/courses/process-mining">https://www.futurelearn.com/courses/process-mining</ext-link> (accessed July 21, 2019).</citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Carling</surname> <given-names>C.</given-names></name> <name><surname>Bloomfield</surname> <given-names>J.</given-names></name> <name><surname>Nelsen</surname> <given-names>L.</given-names></name> <name><surname>Reilly</surname> <given-names>T.</given-names></name></person-group> (<year>2008</year>). <article-title>The role of motion analysis in elite soccer: contemporary performance measurement techniques and work rate data</article-title>. <source>Sports Med.</source> <volume>38</volume>, <fpage>839</fpage>&#x02013;<lpage>862</lpage>. <pub-id pub-id-type="doi">10.2165/00007256-200838100-00004</pub-id><pub-id pub-id-type="pmid">18803436</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chung</surname> <given-names>K. L.</given-names></name></person-group> (<year>1967</year>). <source>Markov Chains: With Stationary Transition Probabilities (Second Edition). Grundlehren der mathematischen Wissenschaften, A Series of Comprehensive Studies in Mathematics</source>, Vol. <volume>104</volume>. <publisher-loc>Berlin: Springer</publisher-loc>.</citation></ref>
<ref id="B5">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ferreira</surname> <given-names>D.</given-names></name> <name><surname>Zacarias</surname> <given-names>M.</given-names></name> <name><surname>Malheiros</surname> <given-names>M.</given-names></name> <name><surname>Ferreira</surname> <given-names>P.</given-names></name></person-group> (<year>2007</year>). <article-title>Approaching process mining with sequence clustering: experiments and findings</article-title>, in <source>Lecture Notes in Computer Science. Business Process Management</source>, Vol. <volume>4714</volume>, eds <person-group person-group-type="editor"><name><surname>Alonso</surname> <given-names>G.</given-names></name> <name><surname>Dadam</surname> <given-names>P.</given-names></name> <name><surname>Rosemann</surname> <given-names>M.</given-names></name></person-group> (<publisher-loc>Berlin: Springer Berlin Heidelberg</publisher-loc>), <fpage>360</fpage>&#x02013;<lpage>374</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-540-75183-0_26</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Glendenning</surname> <given-names>B.</given-names></name></person-group> (<year>2016</year>). England vs. Iceland. Minute-by-minute Report. The Guardian. Retrieved from <ext-link ext-link-type="uri" xlink:href="https://www.theguardian.com/football/live/2016/jun/27/england-v-iceland-euro-2016-live">https://www.theguardian.com/football/live/2016/jun/27/england-v-iceland-euro-2016-live</ext-link> (accessed July 21, 2019).</citation></ref>
<ref id="B7">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Gregory</surname> <given-names>S.</given-names></name></person-group> (<year>2017</year>). How We Assign Credit in Football. OPTA Pro Blog. Retrieved from <ext-link ext-link-type="uri" xlink:href="http://www.optasportspro.com/about/optapro-blog/posts/2017/blog-how-we-assign-credit-in-football/">http://www.optasportspro.com/about/optapro-blog/posts/2017/blog-how-we-assign-credit-in-football/</ext-link> (accessed Septempber 09, 2019).</citation></ref>
<ref id="B8">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>G&#x000FC;nther</surname> <given-names>C. W.</given-names></name></person-group> (<year>2009</year>). <source>Process Mining in Flexible Environments. (Dissertation), Eindhoven: Eindhoven University of Technology</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://research.tue.nl/en/publications/process-mining-in-flexible-environments">https://research.tue.nl/en/publications/process-mining-in-flexible-environments</ext-link></citation></ref>
<ref id="B9">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ingvaldsen</surname> <given-names>J. E.</given-names></name> <name><surname>Gulla</surname> <given-names>J. A.</given-names></name></person-group> (<year>2008</year>). <article-title>Preprocessing support for large scale process mining of SAP transactions</article-title>, in <source>Lecture Notes in Computer Science: Vol. 4928, Business Process Management Workshops. BPM 2007</source>, eds <person-group person-group-type="editor"><name><surname>ter Hofstede</surname> <given-names>A.</given-names></name> <name><surname>Benatallah</surname> <given-names>B.</given-names></name> <name><surname>Paik</surname> <given-names>H.-Y.</given-names></name></person-group> (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>30</fpage>&#x02013;<lpage>41</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-540-78238-4_5</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Kr&#x000F6;ckel</surname> <given-names>P.</given-names></name></person-group> (<year>2019</year>). <article-title>Big Data Event Analytics in Football for Tactical Decision Support</article-title>. (Doctoral thesis), Friedrich-Alexander-Universit&#x000E4;t Erlangen-N&#x000FC;rnberg (FAU). Retrieved from <ext-link ext-link-type="uri" xlink:href="https://opus4.kobv.de/opus4-fau/frontdoor/index/index/docId/12365">https://opus4.kobv.de/opus4-fau/frontdoor/index/index/docId/12365</ext-link> (accessed October 03, 2019).</citation></ref>
<ref id="B11">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Leemans</surname> <given-names>S. J. J.</given-names></name> <name><surname>Fahland</surname> <given-names>D.</given-names></name> <name><surname>van der Aalst</surname> <given-names>W. M. P.</given-names></name></person-group> (<year>2014</year>). <article-title>Process and Deviation Exploration with Inductive Visual Miner</article-title>. Retrieved from <ext-link ext-link-type="uri" xlink:href="http://www.processmining.org/_media/blogs/pub2014/bpmdemoleemans.pdf">http://www.processmining.org/_media/blogs/pub2014/bpmdemoleemans.pdf</ext-link> (accessed September 23, 2019).</citation></ref>
<ref id="B12">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>L&#x000F6;tsch</surname> <given-names>J.</given-names></name> <name><surname>Ultsch</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <article-title>Exploiting the structures of the U-matrix</article-title>, in <source>Advances in Intelligent Systems and Computing. Advances in Self-Organizing Maps and Learning Vector Quantization</source>, Vol. <volume>295</volume>, eds <person-group person-group-type="editor"><name><surname>Villmann</surname> <given-names>T.</given-names></name> <name><surname>Schleif</surname> <given-names>F.-M.</given-names></name> <name><surname>Kaden</surname> <given-names>M.</given-names></name> <name><surname>Lange</surname> <given-names>M.</given-names></name></person-group> (<publisher-loc>Cham: Springer International Publishing</publisher-loc>), <fpage>249</fpage>&#x02013;<lpage>257</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-07695-9_24</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="web"><person-group person-group-type="author"><collab>ProM</collab></person-group> (<year>2017a</year>). <publisher-loc>Questions Answered Based on an Event Log Only</publisher-loc>. Retrieved from <ext-link ext-link-type="uri" xlink:href="http://www.promtools.org/doku.php?id=tutorial:answers">http://www.promtools.org/doku.php?id=tutorial:answers</ext-link> (accessed October 01, 2019).</citation></ref>
<ref id="B14">
<citation citation-type="web"><person-group person-group-type="author"><collab>ProM</collab></person-group> (<year>2017b</year>). Tutorial on ProM 6. Retrieved from <ext-link ext-link-type="uri" xlink:href="http://www.promtools.org/doku.php?id=tutorial:introduction">http://www.promtools.org/doku.php?id=tutorial:introduction</ext-link> (accessed October 01, 2019).</citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><collab>RapidProM</collab></person-group> (<year>2017</year>). <source>Social Network Miner RapidProM- Description</source>.</citation></ref>
<ref id="B16">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Rozinat</surname> <given-names>A.</given-names></name></person-group> (<year>2010</year>). ProM Tips &#x02014; Which Mining Algorithm Should You Use? Retrieved from <ext-link ext-link-type="uri" xlink:href="https://fluxicon.com/blog/2010/10/prom-tips-mining-algorithm/">https://fluxicon.com/blog/2010/10/prom-tips-mining-algorithm/</ext-link> (accessed September 24, 2019).</citation></ref>
<ref id="B17">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Rozinat</surname> <given-names>A.</given-names></name> <name><surname>Gunther</surname> <given-names>C. W.</given-names></name></person-group> (<year>2015</year>). <article-title>Data Science of Process Mining &#x02013; Understanding Complex Processes</article-title>. Retrieved from <ext-link ext-link-type="uri" xlink:href="https://www.kdnuggets.com/2015/09/data-science-process-mining-understanding-complex-processes.html">https://www.kdnuggets.com/2015/09/data-science-process-mining-understanding-complex-processes.html</ext-link> (accessed July 22, 2019).</citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schrapf</surname> <given-names>N.</given-names></name> <name><surname>Alsaied</surname> <given-names>S.</given-names></name> <name><surname>Tilp</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). <article-title>Tactical interaction of offensive and defensive teams in team handball analysed by artificial neural networks</article-title>. <source>Math. Comp. Model. Dyn. Syst.</source> <volume>23</volume>, <fpage>363</fpage>&#x02013;<lpage>371</lpage>. <pub-id pub-id-type="doi">10.1080/13873954.2017.1336733</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Si</surname> <given-names>J.</given-names></name> <name><surname>Nelson</surname> <given-names>B. J.</given-names></name> <name><surname>Runger</surname> <given-names>G. C.</given-names></name></person-group> (<year>2003</year>). <article-title>Artificial neural network models for data mining</article-title>, in <source>Human Factors and Ergonomics. The Handbook of Data Mining</source>, ed N. Ye (<publisher-loc>Mahwah: Lawrence Erlbaum Association</publisher-loc>), <fpage>41</fpage>&#x02013;<lpage>66</lpage>.</citation></ref>
<ref id="B20">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Song</surname> <given-names>M.</given-names></name> <name><surname>G&#x000FC;nther</surname> <given-names>C. W.</given-names></name> <name><surname>van der Aalst</surname> <given-names>W. M. P.</given-names></name></person-group> (<year>2009</year>). <article-title>Trace clustering in process mining</article-title>, in <source>Lecture Notes in Business Information Processing. Business Process Management Workshops</source>, Vol. <volume>17</volume>, eds <person-group person-group-type="editor"><name><surname>Ardagna</surname> <given-names>D.</given-names></name> <name><surname>Mecella</surname> <given-names>M.</given-names></name> <name><surname>Yang</surname> <given-names>J.</given-names></name></person-group> (<publisher-loc>Berlin: Springer Berlin Heidelberg</publisher-loc>), <fpage>109</fpage>&#x02013;<lpage>120</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-642-00328-8_11</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>van der Aalst</surname> <given-names>W.</given-names></name></person-group> (<year>2011</year>). <source>Process Mining</source>. <publisher-loc>Berlin: Springer Berlin Heidelberg</publisher-loc>. <pub-id pub-id-type="doi">10.1007/978-3-642-19345-3</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>van der Aalst</surname> <given-names>W.</given-names></name></person-group> (<year>2016</year>). <article-title>Process Mining: Data Science in Action (Second Edition). Springer</article-title>. Retrieved from: <ext-link ext-link-type="uri" xlink:href="https://www.springer.com/gp/book/9783662498507">https://www.springer.com/gp/book/9783662498507</ext-link>.</citation></ref>
<ref id="B23">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>van der Aalst</surname> <given-names>W.</given-names></name> <name><surname>Adriansyah</surname> <given-names>A.</given-names></name> <name><surname>Medeiros</surname> <given-names>A. K. A.</given-names></name> <name><surname>de Arcieri</surname> <given-names>F.</given-names></name> <name><surname>Baier</surname> <given-names>T.</given-names></name> <name><surname>Blickle</surname> <given-names>T.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>Process mining manifesto</article-title>, in <source>Lecture Notes in Business Information Processing. Business Process Management Workshops</source>, Vol. <volume>99</volume>, eds <person-group person-group-type="editor"><name><surname>Daniel</surname> <given-names>F.</given-names></name> <name><surname>Barkaoui</surname> <given-names>K.</given-names></name> <name><surname>Dustdar</surname> <given-names>S.</given-names></name></person-group> (<publisher-loc>Berlin: Springer Berlin Heidelberg</publisher-loc>), <fpage>169</fpage>&#x02013;<lpage>194</lpage>.</citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>van der Aalst</surname> <given-names>W.</given-names></name> <name><surname>Reijers</surname> <given-names>H. A.</given-names></name> <name><surname>Song</surname> <given-names>M.</given-names></name></person-group> (<year>2005</year>). <article-title>Discovering social networks from event logs</article-title>. <source>Comp. Support. Coop. Work</source> <volume>14</volume>, <fpage>549</fpage>&#x02013;<lpage>593</lpage>. <pub-id pub-id-type="doi">10.1007/s10606-005-9005-9</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>van der Aalst</surname> <given-names>W.</given-names></name> <name><surname>Song</surname> <given-names>M.</given-names></name></person-group> (<year>2004</year>). <article-title>Mining social networks: uncovering interaction patterns in business processes</article-title>, in <source>Lecture notes in computer science. Business Process Management</source>, Vol. <volume>3080</volume>, eds <person-group person-group-type="editor"><name><surname>Kanade</surname> <given-names>T.</given-names></name> <name><surname>Kittler</surname> <given-names>J.</given-names></name> <name><surname>Kleinberg</surname> <given-names>J. M.</given-names></name> <name><surname>Mattern</surname> <given-names>F.</given-names></name> <name><surname>Mitchell</surname> <given-names>J. C.</given-names></name> <name><surname>Naor</surname> <given-names>M.</given-names></name> <name><surname>Weske</surname> <given-names>M.</given-names></name></person-group> (<publisher-loc>Berlin: Springer Berlin Heidelberg</publisher-loc>), <fpage>244</fpage>&#x02013;<lpage>260</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-540-25970-1_16</pub-id></citation></ref>
<ref id="B26">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Veiga</surname> <given-names>G. M.</given-names></name></person-group> (<year>2009</year>). <article-title>Developing Process Mining Tools Developing Process Mining Tools: An Implementation of Sequence Clustering for ProM</article-title>. IST &#x02013; Technical University of Lisbon, Lisbon. Retrieved from <ext-link ext-link-type="uri" xlink:href="https://fenix.tecnico.ulisboa.pt/downloadFile/395139104449/Dissertacao_54276.pdf">https://fenix.tecnico.ulisboa.pt/downloadFile/395139104449/Dissertacao_54276.pdf</ext-link> (accessed Septempber 15, 2019).</citation></ref>
<ref id="B27">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Vesanto</surname> <given-names>J.</given-names></name> <name><surname>Sulkava</surname> <given-names>M.</given-names></name></person-group> (<year>2002</year>). <article-title>Distance matrix based clustering of the self-organizing map</article-title>, in <source>Artificial Neural Networks &#x02014; ICANN 2002</source>, ed J. R. Dorronsoro (<publisher-loc>Berlin: Springer Berlin Heidelberg</publisher-loc>), <fpage>951</fpage>&#x02013;<lpage>956</lpage>. Retrieved from <ext-link ext-link-type="uri" xlink:href="https://link.springer.com/chapter/10.1007/3-540-46084-5_154">https://link.springer.com/chapter/10.1007/3-540-46084-5_154</ext-link> (accessed Septempber 15, 2019).</citation></ref>
</ref-list>
<fn-group>
<fn id="fn0001"><p><sup>1</sup><ext-link ext-link-type="uri" xlink:href="http://www.promtools.org/doku.php">http://www.promtools.org/doku.php</ext-link></p></fn>
<fn id="fn0002"><p><sup>2</sup><ext-link ext-link-type="uri" xlink:href="http://fluxicon.com/disco/">http://fluxicon.com/disco/</ext-link></p></fn>
</fn-group>
</back>
</article> 