<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. ICT</journal-id>
<journal-title>Frontiers in ICT</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. ICT</abbrev-journal-title>
<issn pub-type="epub">2297-198X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fict.2017.00010</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>ICT</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Detection and Localization of Anomalous Motion in Video Sequences from Local Histograms of Labeled Affine Flows</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>P&#x000E9;rez-R&#x000FA;a</surname> <given-names>Juan-Manuel</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="cor1">&#x0002A;</xref>
<uri xlink:href="http://frontiersin.org/people/u/399791"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Basset</surname> <given-names>Antoine</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Bouthemy</surname> <given-names>Patrick</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://frontiersin.org/people/u/173816"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Inria, Serpico Team</institution>, <addr-line>Rennes</addr-line>, <country>France</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Dimitris N. Metaxas, Rutgers University, USA</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Carlos Vazquez, &#x000C9;cole de technologie sup&#x000E9;rieure, Canada; Alex Pappachen James, Nazarbayev University School of Medicine, Kazakhstan</p></fn>
<corresp content-type="corresp" id="cor1">&#x0002A;Correspondence: Juan-Manuel P&#x000E9;rez-R&#x000FA;a, <email>juan-manuel.perez-rua&#x00040;inria.fr</email></corresp>
<fn fn-type="other" id="fn001"><p>Specialty section: This article was submitted to Computer Image Analysis, a section of the journal Frontiers in ICT</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>26</day>
<month>05</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="collection">
<year>2017</year>
</pub-date><volume>4</volume>
<elocation-id>10</elocation-id>
<history>
<date date-type="received">
<day>15</day>
<month>12</month>
<year>2016</year>
</date>
<date date-type="accepted">
<day>13</day>
<month>04</month>
<year>2017</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2017 P&#x000E9;rez-R&#x000FA;a, Basset and Bouthemy.</copyright-statement>
<copyright-year>2017</copyright-year>
<copyright-holder>P&#x000E9;rez-R&#x000FA;a, Basset and Bouthemy</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>We propose an original method for detecting and localizing anomalous motion patterns in videos from a camera view-based motion representation perspective. Anomalous motion should be taken in a broad sense, i.e., unexpected, abnormal, singular, irregular, or unusual motion. Identifying distinctive dynamic information at any time point and at any image location in a sequence of images is a key requirement in many situations and applications. The proposed method relies on so-called labeled affine flows (LAF) involving both affine velocity vectors and affine motion classes. At every pixel, a motion class is inferred from the affine motion model selected in a set of candidate models estimated over a collection of windows. Then, the image is subdivided in blocks where motion class histograms weighted by the affine motion vector magnitudes are computed. They are compared blockwise to histograms of normal behaviors with a dedicated distance. More specifically, we introduce the local outlier factor (LOF) to detect anomalous blocks. LOF is a local flexible measure of the relative density of data points in a feature space, here the space of LAF histograms. By thresholding the LOF value, we can detect an anomalous motion pattern in any block at any time instant of the video sequence. The threshold value is automatically set in each block by means of statistical arguments. We report comparative experiments on several real video datasets, demonstrating that our method is highly competitive for the intricate task of detecting different types of anomalous motion in videos. Specifically, we obtain very competitive results on all the tested datasets: 99.2% AUC for UMN, 82.8% AUC for UCSD, and 95.73% accuracy for PETS 2009, at the frame level.</p>
</abstract>
<kwd-group>
<kwd>video processing</kwd>
<kwd>affine flow</kwd>
<kwd>motion patterns</kwd>
<kwd>anomalous motion detection</kwd>
<kwd>local outlier factor</kwd>
</kwd-group>
<counts>
<fig-count count="12"/>
<table-count count="9"/>
<equation-count count="18"/>
<ref-count count="76"/>
<page-count count="19"/>
<word-count count="15191"/>
</counts>
</article-meta>
</front>
<body>
<sec id="S1" sec-type="introduction">
<label>1</label> <title>Introduction</title>
<p>Motion analysis, with all its possible branches, i.e., motion detection (Goyette et al., <xref ref-type="bibr" rid="B29">2014</xref>), motion estimation (Fortun et al., <xref ref-type="bibr" rid="B27">2015</xref>), motion segmentation (Zhang and Lu, <xref ref-type="bibr" rid="B71">2001</xref>), and motion recognition (Cedras and Shah, <xref ref-type="bibr" rid="B15">1995</xref>), is a key processing step for difficult tasks related to video analysis, such as activity recognition (Aggarwal and Ryoo, <xref ref-type="bibr" rid="B2">2011</xref>; Vishwakarma and Agrawal, <xref ref-type="bibr" rid="B63">2013</xref>; Li et al., <xref ref-type="bibr" rid="B45">2015b</xref>). However, there is a gap between low-level description of videos and high-level video understanding tasks. In this paper, we focus on the problem of detecting and localizing anomalous motion in videos. The detected anomalous motion can be further interpreted in accordance with the targeted application.</p>
<p>In a general setting, analysis of activities from videos requires automatic tools to tackle the tremendous amount of routinely acquired data from cameras installed in a wide range of contexts (Zhan et al., <xref ref-type="bibr" rid="B70">2008</xref>; Li et al., <xref ref-type="bibr" rid="B45">2015b</xref>). Motivations can be manifold depending on the applications: traffic monitoring, crowd safety in big social or sport events, surveillance in public transportation areas, understanding of animal groups, etc. A common and frequent goal is to rapidly and reliably detect anomalous motion in the broad sense of irregular, abnormal, singular, unexpected, or unusual motion. Anomalous motion pertains to events of that type. This kind of activity analysis usually requires intense human supervision, all the more when the objective of the analysis is identifying anomalies in the scene. A particularly common setup for scenes where anomalies are sought consists of fixed-pose cameras pointing to scenes of interest. In these cases, the goal is to detect anomalies from the point of view of the camera. This task becomes even more difficult in crowded scenes, where the behavioral complexity in different parts of the video can cause confusion and distraction. Thus, the need for automatic systems that are able to assist the video monitoring of scenes has been growing steadily.</p>
<p>There is generally no unique or even intrinsic definition of an anomalous motion. It may depend on the context and the application. As in Chandola et al. (<xref ref-type="bibr" rid="B17">2009</xref>) and Hu et al. (<xref ref-type="bibr" rid="B31">2013</xref>), we consider in this work that anomalous motion means that the motion significantly differs from the mainstream one, observed either in the same video segment or in the whole video. Indeed, anomalous motion is taken here in the broad acceptance of a different behavior w.r.t. context. It does not mean that the so-called abnormal motion is necessarily malicious, dangerous, or forbidden. This formulation is general enough to be of large practical interest. The presence of anomalous motion can be detected by deciding that the given motion cannot be fit in a model, which is learned from a set of training data of normal behaviors for a given scenario, computed online. Local motion can also be assessed as anomalous by simply comparing its characteristics with others in its (possibly wide) spatial or spatiotemporal vicinity without any pre-computed model available.</p>
<p>The desired solution, however, has to comply with a number of requirements. First, the devised modeling has to be simple and generic enough so that it can be used in a wide range of applications. Second, the algorithm has to be fast. Computational performance is an important criterion looking toward real-time implementation (Lu et al., <xref ref-type="bibr" rid="B47">2013</xref>), to supply on time information on where to focus when analyzing videos. Finally, an anomalous event detection at the frame level does not provide with enough information As a consequence, the method has to be able to localize motion anomalies in the video both <italic>temporally</italic> and <italic>spatially</italic>.</p>
<p>In this paper, we present an original method for detecting <italic>and</italic> localizing anomalous motion in videos. It relies on novel motion descriptors consisting of histograms of local affine motion classes, weighted by affine flow magnitude and computed over image blocks. This type of histograms outreaches usual histograms of motion vectors. A dedicated histogram distance is accordingly specified. At each pixel, the motion class is derived from the affine motion model selected among a set of candidate models estimated over a collection of overlapping windows of different sizes. Thus, the motion models selected over the image domain yield both an affine flow and a map of pixelwise motion classes, whose concatenation forms what we call the <italic>labeled affine flow</italic>. The latter conveys the real flow value <italic>and</italic> the affine motion class at every pixel. Since the concept of anomalous motion cannot be intrinsically defined, we need a decision criterion able to specify in a data-driven way the local singularity of the motion descriptor. Consequently, we propose the <italic>local outlier factor</italic> (LOF) to detect anomalous motion. LOF is a local flexible measure of the relative density of data points in a feature space (Breunig et al., <xref ref-type="bibr" rid="B12">2000</xref>). It was initially designed, and used so far, in very different application domains than computer vision. Here, the feature space is formed by the local block-based motion class histograms.</p>
<p>The overall method is a fully automated and generic method embedded in a block-based framework and able to <italic>jointly</italic> detect and localize anomalous motion. With the very same method, we can handle <italic>local</italic> anomalous motion, that is, local unusual behaviors compared to the other ones in the image, and <italic>global</italic> anomalous motion, that is, unusual behavior compared to previous ones, suddenly shared by all the actors of the scene. Our method does not involve any parametric model of normal behavior, nor of anomalous motion. It only requires that reference LAF histograms accounting for normal behavior are available, either pre-computed or computed online. We have tested our method on several video datasets depicting different types of applications.</p>
<p>The rest of the paper is organized as follows. In Section <xref ref-type="sec" rid="S2">2</xref>, we review the related literature and previous work on anomalous motion detection, specifically in the context of crowd anomaly detection. We explain how we compute the so-called labeled affine flow in Section <xref ref-type="sec" rid="S3">3</xref>. Then, in Section <xref ref-type="sec" rid="S4">4</xref>, we fully describe our anomalous motion detection-and-localization method and give insights about its main properties. In Section <xref ref-type="sec" rid="S5">5</xref>, we report a comparative objective evaluation on several video datasets with an application to crowd anomaly detection and dedicated experimental investigations on the two main stages of our method, that is, the LAF histograms and the LOF criterion. Finally, we offer concluding comments in Section <xref ref-type="sec" rid="S6">6</xref>.</p>
</sec>
<sec id="S2">
<label>2</label> <title>Related Work</title>
<p>While motion irregularities were studied <italic>per se</italic> in Boiman and Irani (<xref ref-type="bibr" rid="B10">2007</xref>), motion anomaly has been mainly investigated in the context of crowd anomaly detection. As a consequence, our description of the related work will be driven by this application, even though appearance features are often simultaneously exploited for that goal as in Mahadevan et al. (<xref ref-type="bibr" rid="B48">2010</xref>), Antic and Ommer (<xref ref-type="bibr" rid="B4">2011</xref>), Bertini et al. (<xref ref-type="bibr" rid="B8">2012</xref>), and Zhang et al. (<xref ref-type="bibr" rid="B72">2016</xref>).</p>
<p>Specialized descriptors have been designed to capture the dynamics of crowds motion from videos and have been used for a number of inference tasks in crowd analysis, such as categorizing crowd behaviors, finding principal paths, or detecting objects in video surveillance (Basharat et al., <xref ref-type="bibr" rid="B5">2008</xref>; Solmaz et al., <xref ref-type="bibr" rid="B58">2012</xref>; Thida et al., <xref ref-type="bibr" rid="B60">2013</xref>; Basset et al., <xref ref-type="bibr" rid="B6">2014</xref>; Li et al., <xref ref-type="bibr" rid="B45">2015b</xref>).</p>
<p>As for anomaly detection in crowd videos, several approaches have been explored. Some methods target specific scenarios, or are specialized for certain types of video data. For instance, escape behaviors can be considered as a specific case of anomaly in surveillance videos (Wu et al., <xref ref-type="bibr" rid="B68">2014</xref>). Determined urban groups dynamics can also be viewed as a special case of anomaly detection in crowded videos. With this goal, the authors in Andersson et al. (<xref ref-type="bibr" rid="B3">2013</xref>) proposed an algorithm to detect disturbances caused by individuals merging groups. Other works are able to detect anomalies locally in videos and without an explicit definition of what the abnormality is. Among these, two main classes are found: trajectory-based (Stauffer and Grimson, <xref ref-type="bibr" rid="B59">2000</xref>; Piciarelli et al., <xref ref-type="bibr" rid="B55">2008</xref>; Wu et al., <xref ref-type="bibr" rid="B67">2010</xref>; Jiang et al., <xref ref-type="bibr" rid="B35">2011</xref>; Zen et al., <xref ref-type="bibr" rid="B69">2012</xref>; Li et al., <xref ref-type="bibr" rid="B43">2013</xref>) and feature-based ones (Adam et al., <xref ref-type="bibr" rid="B1">2008</xref>; Kim and Grauman, <xref ref-type="bibr" rid="B37">2009</xref>; Kratz and Nishino, <xref ref-type="bibr" rid="B39">2009</xref>; Antic and Ommer, <xref ref-type="bibr" rid="B4">2011</xref>; Bertini et al., <xref ref-type="bibr" rid="B8">2012</xref>; Cong et al., <xref ref-type="bibr" rid="B21">2013</xref>; Hu et al., <xref ref-type="bibr" rid="B31">2013</xref>; Li et al., <xref ref-type="bibr" rid="B46">2014</xref>; Cheng et al., <xref ref-type="bibr" rid="B19">2015</xref>; Zhang et al., <xref ref-type="bibr" rid="B72">2016</xref>).</p>
<p><italic>Trajectory-based</italic> methods make use of the relevant information embedded in object tracks (Stauffer and Grimson, <xref ref-type="bibr" rid="B59">2000</xref>; Porikli and Haga, <xref ref-type="bibr" rid="B56">2004</xref>; Jiang et al., <xref ref-type="bibr" rid="B35">2011</xref>; Leach et al., <xref ref-type="bibr" rid="B41">2014</xref>). Nevertheless, these methods are usually constrained to scenes where it is possible to perform foreground tracking; otherwise, they are subject to a large amount of false positives, as pointed out by Adam et al. (<xref ref-type="bibr" rid="B1">2008</xref>). In Wu et al. (<xref ref-type="bibr" rid="B67">2010</xref>), representative trajectories are first extracted after particle advection and chaotic features are exploited. The normality is modeled by a Gaussian mixture model. A maximum likelihood (ML) estimation with comparison to a predefined threshold enables to determine normal and abnormal frames. Then, anomalies are located in frames identified as abnormal, with certain success on the dataset of University of Minnesota (Papanikolopoulos, <xref ref-type="bibr" rid="B53">2005</xref>).</p>
<p>A different approach was investigated in Mehran et al. (<xref ref-type="bibr" rid="B50">2009</xref>), still based on particle trajectories. Interaction forces between particles are introduced, which yield a force flow in every frame. Recognizing normal frames and abnormal ones in the video sequence is achieved using a bag-of-words approach involving a latent Dirichlet allocation (LDA) model. Anomalies are delineated in abnormal frames as regions with high force flow. A similar idea to the interaction forces is presented by Leach et al. (<xref ref-type="bibr" rid="B41">2014</xref>), where hand-crafted features and metrics from individuals&#x02019; human tracks are used to detect anomalies.</p>
<p>The method described in Cui et al. (<xref ref-type="bibr" rid="B23">2011</xref>) relied on tracked key points to calculate interaction energy potentials and to separate normal and abnormal crowd behaviors with a support vector machine (SVM) classifier. The work in Piciarelli et al. (<xref ref-type="bibr" rid="B55">2008</xref>) follows a similar classification approach, but it starts from trajectory-based clustering to model normal behaviors.</p>
<p>A non-parametric Bayesian framework is designed in Wang et al. (<xref ref-type="bibr" rid="B65">2011</xref>), which can be used to detect anomalous trajectories. Trajectories are described as bags of words, composed of quantized positions and directions. A dual hierarchical Dirichlet process (Dual-HDP (Wang et al., <xref ref-type="bibr" rid="B66">2009</xref>)) is defined to cluster both words and trajectories. Unlikely trajectories are considered as anomalous ones.</p>
<p>On the other hand, <italic>feature-based</italic> approaches are less prone to depend on specific scenarios and have been tested on a wide range of datasets. In Kratz and Nishino (<xref ref-type="bibr" rid="B39">2009</xref>), spatiotemporal intensity gradients are used, whose distribution over patches in normal situations is supposed to be Gaussian. The Gaussian parameters are learned on the training set. In Kim and Grauman (<xref ref-type="bibr" rid="B37">2009</xref>), a mixture of probabilistic principal component analysis (MPPCA) aims at modeling normal flow patterns, estimated over patches of the training video set.</p>
<p>The method (Chockalingam et al., <xref ref-type="bibr" rid="B20">2013</xref>) builds upon probabilistic latent sequential models (PLSM) previously defined by the authors in Varadarajan et al. (<xref ref-type="bibr" rid="B61">2007</xref>), to detect and localize anomalous motion. These enhanced topic models, which automatically find temporal and spatial co-occurrences of words, are learned in long image sequences, where anomalous events happen. The spatiotemporal compositions (STC) method (Roshtkhari and Levine, <xref ref-type="bibr" rid="B57">2013</xref>) requires about a hundred initialization frames to start learning weights of so-called code words representing normal behaviors. Afterward, weights are updated online so that no other training sequences are required.</p>
<p>In Benezeth et al. (<xref ref-type="bibr" rid="B7">2011</xref>), co-occurrence matrices for key pixels are embedded in a Markov random field formulation to describe the probability of abnormalities. Zhong et al. (<xref ref-type="bibr" rid="B75">2004</xref>) also uses co-occurrence matrices, but in an unsupervised setting.</p>
<p>Mixtures of dynamic textures (MDT) are introduced in Li et al. (<xref ref-type="bibr" rid="B46">2014</xref>) with conditional random fields (CRF) to represent crowd behaviors. By exploiting both appearance and motion, they reported successful results on several datasets, but at the cost of sophisticated models that require intensive learning and high computation time.</p>
<p>Other authors focused on giving explicit inclusion of spatial awareness, by subdividing the image in local regions or blocks, in order to obtain a good detection performance with less learning requirements (Boiman and Irani, <xref ref-type="bibr" rid="B10">2007</xref>; Adam et al., <xref ref-type="bibr" rid="B1">2008</xref>).</p>
<p>Another approach was explored in Antic and Ommer (<xref ref-type="bibr" rid="B4">2011</xref>). Vectors of spatiotemporal derivatives were utilized as input of a SVM classifier with linear kernel to support the foreground separation process. The latter feeds a graphical probabilistic model. Very good results were obtained on the UCSD dataset (Li et al., <xref ref-type="bibr" rid="B46">2014</xref>). However, this method depends heavily on how well the foreground elements of a video dataset are separated, undermining a possible application for very crowded scenes.</p>
<p>Social force models based on optical flow of particles, as introduced in Mehran et al. (<xref ref-type="bibr" rid="B50">2009</xref>) is another example of descriptor used to detect anomalies. Constructing on the social force concept, Zhang et al. (<xref ref-type="bibr" rid="B73">2015</xref>) introduced the so-called social attribute awareness to model crowds&#x02019; interaction and to detect anomalies. In a similar fashion, Lee et al. (<xref ref-type="bibr" rid="B42">2015</xref>) used a feature constructed over motion influence maps within a per-block codebook approach to detect anomalies in crowd videos.</p>
<p>Sparse representations have been increasingly adopted for anomaly detection, as the problem can be elegantly modeled with sparse linear combinations of representations in a training dataset (Zhao et al., <xref ref-type="bibr" rid="B74">2011</xref>; Cong et al., <xref ref-type="bibr" rid="B21">2013</xref>; Li et al., <xref ref-type="bibr" rid="B43">2013</xref>; Zhu et al., <xref ref-type="bibr" rid="B76">2014</xref>). Explicit image space subdivision can also benefit anomaly localization performance in sparse representation-based methods (Biswas and Babu, <xref ref-type="bibr" rid="B9">2014</xref>). It is shown in Mo et al. (<xref ref-type="bibr" rid="B51">2014</xref>) that, by introducing non-linearity into the sparse model, better data separation can be achieved. Also, some modifications can be made to the usual construction of the sparsity models by introducing small-scale least-square optimization steps (Lu et al., <xref ref-type="bibr" rid="B47">2013</xref>), sacrificing accuracy for the benefit of a fast implementation. However, although elegant and sound, sparse representation methods have not shown high performance in more demanding datasets for anomaly localization.</p>
<p>The method presented in Hu et al. (<xref ref-type="bibr" rid="B31">2013</xref>) exploits optic flow measurements only and is fully unsupervised. It introduces a semiparametric likelihood test computed on a given window and outside the window to decide if the content of the tested window contains abnormal motion or not. Competitive results are reported, especially on the crowd anomaly UCSD dataset. However, the exhaustive search within a large number of space-time windows of different shapes and sizes is highly time consuming. Thus, a fast scanning variant is proposed which exploits histograms of flow words and fixed space&#x02013;time elementary blocks.</p>
<p>On the other hand, anomalous motion is somehow related to the concept of motion saliency. Spatiotemporal saliency in videos has attracted growing interest in recent years (Mahadevan and Vasconcelos, <xref ref-type="bibr" rid="B49">2010</xref>; Georgiadis et al., <xref ref-type="bibr" rid="B28">2012</xref>; Fang et al., <xref ref-type="bibr" rid="B24">2014</xref>; Huang et al., <xref ref-type="bibr" rid="B32">2014</xref>; Jiang et al., <xref ref-type="bibr" rid="B36">2014</xref>; Kim and Kim, <xref ref-type="bibr" rid="B38">2014</xref>; Li et al., <xref ref-type="bibr" rid="B44">2015a</xref>; Wang et al., <xref ref-type="bibr" rid="B64">2015</xref>). Here again, motion saliency features are often combined with spatial saliency features. However, the respective goals can diverge. Indeed, saliency detection is more concerned with moving objects of interest in a scene, even by the primary moving object in the scene, not necessarily with anomalous motion. The notion of surprising event described in Itti and Baldi (<xref ref-type="bibr" rid="B34">2005</xref>) is maybe more in the line of our general definition of anomalous motion. Salient event detection in videos was addressed in Hospedales et al. (<xref ref-type="bibr" rid="B30">2012</xref>) based on a Markov clustering topic model.</p>
<p>Undoubtedly, the literature related to anomalous motion detection is extensive and comprises a growing number of algorithms and tools. However, the task of accurately detecting and locating motion anomalies by being generic in the definition of what corresponds to anomaly remains an interesting challenge. Inside the current set of algorithms, we present a novel method that can be classified as feature based and data driven. More exactly, we introduce a simple, yet powerful local motion descriptor, which is well suited to handle anomalous motion. Then, we exploit a non-parametric feature-density criterion to detect and localize anomalous motion. We explain our method in depth hereafter.</p>
</sec>
<sec id="S3">
<label>3</label> <title>Labeled Affine Flow</title>
<sec id="S3-1">
<label>3.1</label> <title>Affine Motion Models</title>
<p>We need to extract motion measurements from the video sequence in order to determine the type of local motions in the image and decide on their nature (normal or anomalous). Several alternating options could be adopted: local space&#x02013;time features, optic flow fields, or tracklets as outlined in Section <xref ref-type="sec" rid="S2">2</xref>. We adopt the computation of affine flow. Parametric motion models are easier to estimate; they can account for local and global motions as well and provide readily exploitable information for classification. To overcome the motion segmentation issue entangled with parametric motion estimation (i.e., computing the motion model on the correct support), we use a collection of windows, as we proposed in Basset et al. (<xref ref-type="bibr" rid="B6">2014</xref>). However, the purpose in Basset et al. (<xref ref-type="bibr" rid="B6">2014</xref>) was to extract the main crowd motion patterns in the image in order to globally characterize the movements of the crowd. Here, our goal is different since we are interested in detecting local anomalous motions if any. Then, we made substantial modifications on the algorithm described in Basset et al. (<xref ref-type="bibr" rid="B6">2014</xref>). For instance, in contrast to Basset et al. (<xref ref-type="bibr" rid="B6">2014</xref>), we exploit affine motion magnitude to weight motion class histograms. All the improvements will be pointed out throughout the subsequent description.</p>
<p>The collection of affine motion models estimated in the collection of windows, provides us with a set of motion candidates at every point <italic>p</italic>&#x02009;&#x0003D;&#x02009;(<italic>x</italic>, <italic>y</italic>) in the image domain &#x003A9;, that is, the velocity vectors supplied by the affine motion models at <italic>p</italic>. There are as many candidates at <italic>p</italic> as windows containing point <italic>p</italic>. We will have to select the right candidate as explained below. The advantage of these motion measurements is that they are robustly estimated from two consecutive frames only, while well anticipating the needs of the subsequent classification.</p>
<p>As aforementioned, taking a collection of predefined windows allows us to circumvent the complex issue of motion-based image segmentation into regions. The collection <inline-formula><mml:math id="M1"><mml:mi mathvariant="script">W</mml:mi></mml:math></inline-formula> consists of overlapping windows of four different sizes, 12.5, 25, 50, and 100% of the image dimensions to handle motion of different scale. An additional smaller size is considered, compared to Basset et al. (<xref ref-type="bibr" rid="B6">2014</xref>), to better capture local independent motions. For a given size, the window overlap rate is 50% both in the horizontal and vertical directions, so that a given point <italic>p</italic>&#x02009;&#x02208;&#x02009;&#x003A9; belongs to four windows of that size (apart from image border effects). In order to mitigate the rectangular block artifacts induced by the subdivision mechanism, we add a small random modification on the width and height of each window. An illustration is given in Figure <xref ref-type="fig" rid="F1">1</xref> for three window sizes only for the sake of readability.</p>
<fig position="float" id="F1">
<label>Figure 1</label>
<caption><p><bold>Illustration of three window sizes (respectively plotted in red, green, and blue) with an overlap rate of 50%</bold>. Any point <italic>p</italic> in the image domain belongs to a subset of windows of different size.</p></caption>
<graphic xlink:href="fict-04-00010-g001.tif"/>
</fig>
<p>A static camera configuration is assumed, as it is the usual situation in the targeted applications, but extension to a mobile camera could be considered, for instance by compensating beforehand for the dominant image motion due to the camera motion. In order to minimize the computational load, we extract first the binary mask of moving objects in every frame by means of a motion detection algorithm. To this end, we use our motion detection method by background subtraction described in Crivelli et al. (<xref ref-type="bibr" rid="B22">2011</xref>) which also built upon (Veit et al., <xref ref-type="bibr" rid="B62">2011</xref>). We denote by &#x003D2;(<italic>t</italic>) the set of moving pixels extracted at time instant <italic>t</italic>, with &#x003D2;(<italic>t</italic>)&#x02009;&#x02282;&#x02009;&#x003A9;.</p>
<p>We assume that affine motion models are sufficient to represent image motion with the view of anomalous motion detection. The velocity vector of point <italic>p</italic>&#x02009;&#x02208;&#x02009;&#x003D2;(<italic>t</italic>) at time instant <italic>t</italic> given by the affine motion model of parameters <italic>&#x003B8;</italic>(<italic>t</italic>)&#x02009;&#x0003D;&#x02009;(<italic>b</italic><sub>1</sub>(<italic>t</italic>), <italic>a</italic><sub>1</sub>(<italic>t</italic>), <italic>a</italic><sub>2</sub>(<italic>t</italic>), <italic>b</italic><sub>2</sub>(<italic>t</italic>), <italic>a</italic><sub>3</sub>(<italic>t</italic>), <italic>a</italic><sub>4</sub>(<italic>t</italic>)), reads
<disp-formula id="E1"><label>(1)</label><mml:math id="M2"><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>&#x003B8;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mi>&#x003B8;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>&#x003BD;</mml:mi><mml:mrow><mml:mi>&#x003B8;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mi>x</mml:mi><mml:mo>+</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mn>3</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mi>x</mml:mi><mml:mo>+</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mn>4</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>We will even further assume that, at the proper scale, it can be represented by one of the three following specific affine motion models: translation (<italic>T</italic>), scaling (<italic>S</italic>), and rotation (<italic>R</italic>). As explained above, the three types of 2D motion models are computed in a collection of predefined windows. We use the robust estimation method (Odobez and Bouthemy, <xref ref-type="bibr" rid="B52">1995</xref>) implemented in the publicly available Motion2D software<xref ref-type="fn" rid="fn1"><sup>1</sup></xref> to compute these parametric motion models.</p>
<p>Let us denote by <inline-formula><mml:math id="M3"><mml:mi mathvariant="script">W</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M4"><mml:mi mathvariant="script">M</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:math></inline-formula>, and &#x00398;(<italic>p</italic>, <italic>t</italic>), respectively, the set of windows from the collection &#x1D4B2; containing point <italic>p</italic>, the set of motion models computed at time instant <italic>t</italic> within the windows of <inline-formula><mml:math id="M5"><mml:mi mathvariant="script">W</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:math></inline-formula> and supplying candidate velocity vectors at <italic>p</italic>, and the associated set of parameter values of these motion models. We have <inline-formula><mml:math id="M6"><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mi mathvariant="script">M</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mspace width="1em" class="nbsp"/><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mn>3</mml:mn><mml:mtext>&#x02009;</mml:mtext><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mi mathvariant="script">W</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0007C;</mml:mo></mml:math></inline-formula>, where &#x0007C;.&#x0007C; denotes the set cardinality, since three 2D motion models (<italic>T</italic>, <italic>R</italic> and <italic>S</italic>, as defined above) are computed in each window of <inline-formula><mml:math id="M7"><mml:mi mathvariant="script">W</mml:mi><mml:mtext>(</mml:mtext><mml:mi>p</mml:mi><mml:mtext>)</mml:mtext></mml:math></inline-formula>. We have <inline-formula><mml:math id="M8"><mml:mn>&#x00398;</mml:mn><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:mi>p</mml:mi><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:mfenced><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mfenced separators="" open="{" close="}"><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mn>1</mml:mn><mml:mo class="MathClass-op">&#x02026;</mml:mo><mml:mn>&#x0007C;</mml:mn><mml:mi mathvariant="script">M</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mn>&#x0007C;</mml:mn></mml:mrow></mml:mfenced></mml:math></inline-formula>. With the aforementioned choices on the number of window sizes and the overlap rate, we have <inline-formula><mml:math id="M9"><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mi mathvariant="script">W</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mn>4</mml:mn><mml:mo class="MathClass-bin">&#x000D7;</mml:mo><mml:mn>4</mml:mn><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mn>16</mml:mn></mml:math></inline-formula> and <inline-formula><mml:math id="M10"><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mi mathvariant="script">M</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mn>48</mml:mn></mml:math></inline-formula>.</p>
<p>In the sequel, for the sake of notation simplicity, we will drop the reference to time instant <italic>t</italic>. We aim to find the most relevant motion model at <italic>p</italic>&#x02009;&#x02208;&#x02009;&#x003D2; among the candidates specified by &#x00398;(<italic>p</italic>). We take the displaced frame difference as fitting variable to test each motion model <italic>k</italic> of <inline-formula><mml:math id="M11"><mml:mi mathvariant="script">M</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:math></inline-formula> at <italic>p</italic>:
<disp-formula id="E2"><label>(2)</label><mml:math id="M12"><mml:mi>&#x003F5;</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo class="MathClass-punc">,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:msub><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo class="MathClass-bin">&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:msub><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-punc">,</mml:mo></mml:math></disp-formula>
where <italic>I<sub>t</sub></italic>(<italic>p</italic>) denotes the intensity at <italic>p</italic> at time instant <italic>t</italic>, and <inline-formula><mml:math id="M13"><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-punc">,</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is the velocity vector given by the motion model <italic>k</italic> at <italic>p</italic>. Let us specify expression (1) for the three motion types. For the <italic>T</italic>-motion type,
<disp-formula id="E3"><mml:math id="M14"><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-punc">,</mml:mo><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:mspace width="0.5em" class="nbsp"/><mml:mtext>with</mml:mtext><mml:mspace width="0.5em" class="nbsp"/><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-punc">,</mml:mo><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-punc">;</mml:mo></mml:math></disp-formula>
for the <italic>S</italic>-motion type,
<disp-formula id="E4"><mml:math id="M15"><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-bin">&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mi>x</mml:mi><mml:mo class="MathClass-punc">,</mml:mo><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-bin">&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mi>y</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:mo class="MathClass-punc">,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-punc">,</mml:mo><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-punc">,</mml:mo><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-punc">;</mml:mo></mml:math></disp-formula>
and for the <italic>R</italic>-motion type,
<disp-formula id="E5"><mml:math id="M16"><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-bin">&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mi>y</mml:mi><mml:mo class="MathClass-punc">,</mml:mo><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mi>x</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:mo class="MathClass-punc">,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-punc">,</mml:mo><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-punc">,</mml:mo><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-punc">.</mml:mo></mml:math></disp-formula></p>
</sec>
<sec id="S3-2">
<label>3.2</label> <title>Selection among the Motion Model Candidates</title>
<p>The optimal motion model at <italic>p</italic> should best fit the real (unknown) local motion at <italic>p</italic> while being of the lowest possible complexity. We consider a local neighborhood <italic>&#x003BD;</italic>(<italic>p</italic>) centered in <italic>p</italic>, and we exploit the fitting variable (2), which is likely to be close to 0 for the correct velocity vector exploiting the intensity constancy constraint as done for optical flow computation (Fortun et al., <xref ref-type="bibr" rid="B27">2015</xref>). Let us assume that the <inline-formula><mml:math id="M17"><mml:mi>&#x003F5;</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>q</mml:mi><mml:mo class="MathClass-punc">,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:math></inline-formula>&#x02019;s are independent identically distributed (i.i.d.) variables over points <italic>q</italic>&#x02009;&#x02208;&#x02009;<italic>&#x003BD;</italic>(<italic>p</italic>)&#x02009;&#x02229;&#x02009;&#x003D2; and follow a zero-mean Gaussian law of variance <inline-formula><mml:math id="M18"><mml:msubsup><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula>. Then, we can write the joint likelihood in the neighborhood <italic>&#x003BD;</italic>(<italic>p</italic>) for each motion model <italic>k</italic>:
<disp-formula id="E6"><label>(3)</label><mml:math id="M19"><mml:mi>&#x003D5;</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo class="MathClass-punc">,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:msqrt><mml:mrow><mml:mn>2</mml:mn><mml:mi>&#x003C0;</mml:mi><mml:msubsup><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow></mml:msqrt></mml:mrow><mml:mrow><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mi>&#x003BD;</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x02229;</mml:mo><mml:mo class="MathClass-op">&#x003D2;</mml:mo><mml:mo class="MathClass-rel">&#x0007C;</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:mfrac><mml:mrow><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x0220F;</mml:mo><mml:mrow><mml:mi>q</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>&#x003BD;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x02229;</mml:mo><mml:mi>&#x003D2;</mml:mi></mml:mrow></mml:munder></mml:mstyle></mml:mrow><mml:mtext>&#x02009;</mml:mtext><mml:mi mathvariant="normal">exp</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mfrac><mml:mrow><mml:mi>&#x003F5;</mml:mi><mml:msup><mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>q</mml:mi><mml:mo class="MathClass-punc">,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:msubsup><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac><mml:mo class="MathClass-punc">.</mml:mo></mml:math></disp-formula></p>
<p>The variance <inline-formula><mml:math id="M20"><mml:msubsup><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula> is estimated from the inliers of the motion model <italic>k</italic> within the window used for robustly estimating the motion model <italic>k</italic>. To penalize the complexity of the motion model, i.e., the dimension of the model given by the number of its parameters, we resort to the Akaike information criterion (AIC) with a correction for finite sample sizes (Cavanaugh, <xref ref-type="bibr" rid="B14">1997</xref>). The correction is especially useful when the sample size is small, which is precisely the case here for the neighborhood <italic>&#x003BD;</italic>(<italic>p</italic>). The penalized criterion writes as follows:
<disp-formula id="E7"><label>(4)</label><mml:math id="M21"><mml:mi mathvariant="italic">AICc</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo class="MathClass-punc">,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>2</mml:mn><mml:mtext>&#x02009;</mml:mtext><mml:mi mathvariant="normal">ln</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>&#x003D5;</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo class="MathClass-punc">,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x0002B;</mml:mo><mml:mn>2</mml:mn><mml:msub><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-bin">&#x0002B;</mml:mo><mml:mfrac><mml:mrow><mml:mn>2</mml:mn><mml:msub><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-bin">&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mi>&#x003BD;</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x02229;</mml:mo><mml:mo class="MathClass-op">&#x003D2;</mml:mo><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mfrac><mml:mo class="MathClass-punc">,</mml:mo></mml:math></disp-formula>
where <italic>&#x003B7;<sub>k</sub></italic> is the dimension of the motion model <italic>k</italic>, that is, <italic>&#x003B7;<sub>k</sub></italic>&#x02009;&#x0003D;&#x02009;2 for <italic>T</italic>-motion model, and <italic>&#x003B7;<sub>k</sub></italic>&#x02009;&#x0003D;&#x02009;3 for <italic>S</italic>- and <italic>R</italic>-motion models. Finally, the optimal motion model <inline-formula><mml:math id="M117"><mml:mover accent='true'><mml:mi>k</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover></mml:math></inline-formula> at <italic>p</italic> is
<disp-formula id="E8"><label>(5)</label><mml:math id="M22"><mml:mover accent='true'><mml:mi>k</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mtext>arg</mml:mtext><mml:munder accentunder="true"><mml:mrow><mml:mi mathvariant="normal">min</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x02208;</mml:mo><mml:mn>&#x00398;</mml:mn><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow></mml:munder><mml:mtext>&#x02009;</mml:mtext><mml:mi mathvariant="italic">AICc</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo class="MathClass-punc">,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-punc">,</mml:mo></mml:math></disp-formula>
which minimizes criterion (4).</p>
<p>From the motion models selected at pixels <italic>p</italic>&#x02009;&#x02208;&#x02009;&#x003D2;, we obtain the affine flow <inline-formula><mml:math id="M23"><mml:mrow><mml:mo class="MathClass-open">&#x0007B;</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mover accent='true'><mml:mi>k</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>p</mml:mi><mml:mo class="MathClass-rel">&#x02208;</mml:mo><mml:mo class="MathClass-op">&#x003D2;</mml:mo></mml:mrow><mml:mo class="MathClass-close">&#x0007D;</mml:mo></mml:mrow></mml:math></inline-formula>.</p>
</sec>
<sec id="S3-3">
<label>3.3</label> <title>Determination of Motion Classes</title>
<p>We have now to assign to each <italic>p</italic>&#x02009;&#x02208;&#x02009;&#x003D2; its motion class. The motion classes will be used to compute the motion descriptors which are the input of our anomalous motion detection method. As explained in subsection <xref ref-type="sec" rid="S3-2">3.2</xref>, we have selected the right motion model at each point <italic>p</italic>&#x02009;&#x02208;&#x02009;&#x003D2; among the estimated motion model candidates with the penalized likelihood given by the corrected Akaike information criterion for small sample sets defined in equation (<xref ref-type="disp-formula" rid="E5">5</xref>).</p>
<p>As already stated, the different image motions are assumed to be well captured by three affine motion types: translation (<italic>T</italic>), scaling (<italic>S</italic>), and rotation (<italic>R</italic>), in a view-based representation. Motion classes are straightforwardly inferred from the motion types as summarized in Tables <xref ref-type="table" rid="T1">1</xref> and <xref ref-type="table" rid="T2">2</xref>. More specifically, the translation type is subdivided into motion classes indicating the direction of the translation in the image, since we have adopted a motion representation corresponding to the camera view point. The scaling type is split in two classes, called Convergence and Divergence, according to the sign of the divergence coefficient. Finally, the rotation type is subdivided into Clockwise and Counterclockwise motion classes. Aiming for other crowd analysis tasks than anomaly detection, the crowd motion classification introduced in Basset et al. (<xref ref-type="bibr" rid="B6">2014</xref>) comprised only four translation classes (i.e., North, East, South, and East). Here, we introduce a finer orientation quantization with eight translation directions. Indeed, we need a finer characterization of the movement for anomalous motion detection. We come up with a set of twelve motion classes, denoted by &#x00393;&#x02009;&#x0003D;&#x02009;{&#x003B3;<italic><sub>l</sub></italic>, <italic>l</italic>&#x02009;&#x0003D;&#x02009;1,&#x02009;&#x02026;&#x02009;12}. From now on, the motion classes will be represented in the figures by the color codes given in Table <xref ref-type="table" rid="T1">1</xref>.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p><bold>Definition of motion types, motion classes, and color codes</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Motion types</th>
<th align="center">Motion classes</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Translation</td>
<td align="center"><inline-graphic xlink:href="fict-04-00010-i001.tif"/></td>
</tr>
<tr>
<td align="left" colspan="2"><hr/></td>
</tr>
<tr>
<td align="left">Scaling</td>
<td align="center"><inline-graphic xlink:href="fict-04-00010-i002.tif"/> Convergence</td>
</tr>
<tr>
<td align="left"/>
<td align="center"><inline-graphic xlink:href="fict-04-00010-i003.tif"/> Divergence</td>
</tr>
<tr>
<td align="left" colspan="2"><hr/></td>
</tr>
<tr>
<td align="left">Rotation</td>
<td align="center"><inline-graphic xlink:href="fict-04-00010-i004.tif"/> Clockwise</td>
</tr>
<tr>
<td align="left"/>
<td align="center"><inline-graphic xlink:href="fict-04-00010-i005.tif"/> Counterclockwise</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p><bold>Rules for determining motion classes from motion types</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Motion types</th>
<th align="center" colspan="3">Motion classes<hr/></th>
</tr>
<tr>
<th align="center"/>
<th align="center"/>
<th align="center">Orientations</th>
<th align="center">Rules</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Translation</td>
<td align="center"><inline-graphic xlink:href="fict-04-00010-i006.tif"/></td>
<td align="center">North</td>
<td align="center"><inline-formula><mml:math id="M24"><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mtext>2</mml:mtext><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0003E;</mml:mo><mml:mtext>0</mml:mtext></mml:math></inline-formula>, <inline-formula><mml:math id="M25"><mml:mfrac><mml:mrow><mml:mtext>3</mml:mtext><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mtext>8</mml:mtext></mml:mrow></mml:mfrac><mml:mo class="MathClass-rel">&#x0003C;</mml:mo><mml:mtext>acos&#x02009;</mml:mtext><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mrow> <mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mtext>1</mml:mtext><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mo class="MathClass-rel">&#x0007C;</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:mfenced><mml:mo class="MathClass-rel">&#x02264;</mml:mo><mml:mfrac><mml:mrow><mml:mtext>5</mml:mtext><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mtext>8</mml:mtext></mml:mrow></mml:mfrac></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left"/>
<td align="center"><inline-graphic xlink:href="fict-04-00010-i007.tif"/></td>
<td align="center">North West</td>
<td align="center"><inline-formula><mml:math id="M26"><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mtext>2</mml:mtext><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0003E;</mml:mo><mml:mtext>0</mml:mtext></mml:math></inline-formula>, <inline-formula><mml:math id="M27"><mml:mfrac><mml:mrow><mml:mtext>5</mml:mtext><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mtext>8</mml:mtext></mml:mrow></mml:mfrac><mml:mo class="MathClass-rel">&#x0003C;</mml:mo><mml:mtext>acos&#x02009;</mml:mtext><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mrow> <mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mtext>1</mml:mtext><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mo class="MathClass-rel">&#x0007C;</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:mfenced><mml:mo class="MathClass-rel">&#x02264;</mml:mo><mml:mfrac><mml:mrow><mml:mtext>7</mml:mtext><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mtext>8</mml:mtext></mml:mrow></mml:mfrac></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left"/>
<td align="center"><inline-graphic xlink:href="fict-04-00010-i008.tif"/></td>
<td align="center">West</td>
<td align="center"><inline-formula><mml:math id="M28"><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mtext>1</mml:mtext><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0003C;</mml:mo><mml:mtext>0</mml:mtext></mml:math></inline-formula>, <inline-formula><mml:math id="M29"><mml:mfrac><mml:mrow><mml:mtext>3</mml:mtext><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mtext>8</mml:mtext></mml:mrow></mml:mfrac><mml:mo class="MathClass-rel">&#x0003C;</mml:mo><mml:mtext>acos&#x02009;</mml:mtext><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mrow> <mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mtext>2</mml:mtext><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mo class="MathClass-rel">&#x0007C;</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:mfenced><mml:mo class="MathClass-rel">&#x02264;</mml:mo><mml:mfrac><mml:mrow><mml:mtext>5</mml:mtext><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mtext>8</mml:mtext></mml:mrow></mml:mfrac></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left"/>
<td align="center"><inline-graphic xlink:href="fict-04-00010-i009.tif"/></td>
<td align="center">South West</td>
<td align="center"><inline-formula><mml:math id="M30"><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mtext>1</mml:mtext><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0003C;</mml:mo><mml:mtext>0</mml:mtext></mml:math></inline-formula>, <inline-formula><mml:math id="M31"><mml:mfrac><mml:mrow><mml:mtext>5</mml:mtext><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mtext>8</mml:mtext></mml:mrow></mml:mfrac><mml:mo class="MathClass-rel">&#x0003C;</mml:mo><mml:mtext>acos&#x02009;</mml:mtext><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mrow> <mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mtext>2</mml:mtext><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mo class="MathClass-rel">&#x0007C;</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:mfenced><mml:mo class="MathClass-rel">&#x02264;</mml:mo><mml:mfrac><mml:mrow><mml:mtext>7</mml:mtext><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mtext>8</mml:mtext></mml:mrow></mml:mfrac></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left"/>
<td align="center"><inline-graphic xlink:href="fict-04-00010-i010.tif"/></td>
<td align="center">South</td>
<td align="center"><inline-formula><mml:math id="M32"><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mtext>2</mml:mtext><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0003C;</mml:mo><mml:mtext>0</mml:mtext></mml:math></inline-formula>, <inline-formula><mml:math id="M33"><mml:mfrac><mml:mrow><mml:mtext>3</mml:mtext><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mtext>8</mml:mtext></mml:mrow></mml:mfrac><mml:mo class="MathClass-rel">&#x0003C;</mml:mo><mml:mtext>acos&#x02009;</mml:mtext><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:mfrac><mml:mrow><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mtext>1</mml:mtext><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mo class="MathClass-rel">&#x0007C;</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:mfenced><mml:mo class="MathClass-rel">&#x02264;</mml:mo><mml:mfrac><mml:mrow><mml:mtext>5</mml:mtext><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mtext>8</mml:mtext></mml:mrow></mml:mfrac></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left"/>
<td align="center"><inline-graphic xlink:href="fict-04-00010-i011.tif"/></td>
<td align="center">South East</td>
<td align="center"><inline-formula><mml:math id="M34"><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mtext>2</mml:mtext><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0003C;</mml:mo><mml:mtext>0</mml:mtext></mml:math></inline-formula>, <inline-formula><mml:math id="M35"><mml:mfrac><mml:mrow><mml:mtext>5</mml:mtext><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mtext>8</mml:mtext></mml:mrow></mml:mfrac><mml:mo class="MathClass-rel">&#x0003C;</mml:mo><mml:mtext>acos&#x02009;</mml:mtext><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:mfrac><mml:mrow><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mtext>1</mml:mtext><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mo class="MathClass-rel">&#x0007C;</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:mfenced><mml:mo class="MathClass-rel">&#x02264;</mml:mo><mml:mfrac><mml:mrow><mml:mn>7</mml:mn><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mn>8</mml:mn></mml:mrow></mml:mfrac></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left"/>
<td align="center"><inline-graphic xlink:href="fict-04-00010-i012.tif"/></td>
<td align="center">East</td>
<td align="center"><inline-formula><mml:math id="M36"><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mtext>1</mml:mtext><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0003E;</mml:mo><mml:mtext>0</mml:mtext></mml:math></inline-formula>, <inline-formula><mml:math id="M37"><mml:mfrac><mml:mrow><mml:mtext>3</mml:mtext><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mtext>8</mml:mtext></mml:mrow></mml:mfrac><mml:mo class="MathClass-rel">&#x0003C;</mml:mo><mml:mtext>acos&#x02009;</mml:mtext><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:mfrac><mml:mrow><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mtext>2</mml:mtext><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mo class="MathClass-rel">&#x0007C;</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:mfenced><mml:mo class="MathClass-rel">&#x02264;</mml:mo><mml:mfrac><mml:mrow><mml:mtext>5</mml:mtext><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mtext>8</mml:mtext></mml:mrow></mml:mfrac></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left"/>
<td align="center"><inline-graphic xlink:href="fict-04-00010-i013.tif"/></td>
<td align="center">North East</td>
<td align="center"><inline-formula><mml:math id="M38"><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mtext>1</mml:mtext><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0003E;</mml:mo><mml:mtext>0</mml:mtext></mml:math></inline-formula>, <inline-formula><mml:math id="M39"><mml:mfrac><mml:mrow><mml:mtext>5</mml:mtext><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mtext>8</mml:mtext></mml:mrow></mml:mfrac><mml:mo class="MathClass-rel">&#x0003C;</mml:mo><mml:mtext>acos&#x02009;</mml:mtext><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:mfrac><mml:mrow><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mtext>2</mml:mtext><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mo class="MathClass-rel">&#x0007C;</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:mfenced><mml:mo class="MathClass-rel">&#x02264;</mml:mo><mml:mfrac><mml:mrow><mml:mn>7</mml:mn><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mn>8</mml:mn></mml:mrow></mml:mfrac></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left" colspan="4"><hr/></td>
</tr>
<tr>
<td align="left">Scaling</td>
<td align="center"><inline-graphic xlink:href="fict-04-00010-i001.tif"/></td>
<td align="center">Convergence</td>
<td align="center"><inline-formula><mml:math id="M40"><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mtext>1</mml:mtext><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0003C;</mml:mo><mml:mtext>0</mml:mtext></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left"/>
<td align="center"><inline-graphic xlink:href="fict-04-00010-i002.tif"/></td>
<td align="center">Divergence</td>
<td align="center"><inline-formula><mml:math id="M41"><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mtext>1</mml:mtext><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0003E;</mml:mo><mml:mtext>0</mml:mtext></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left" colspan="4"><hr/></td>
</tr>
<tr>
<td align="left">Rotation</td>
<td align="center"><inline-graphic xlink:href="fict-04-00010-i003.tif"/></td>
<td align="center">Clockwise</td>
<td align="center"><inline-formula><mml:math id="M42"><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mtext>2</mml:mtext><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0003C;</mml:mo><mml:mtext>0</mml:mtext></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left"/>
<td align="center"><inline-graphic xlink:href="fict-04-00010-i004.tif"/></td>
<td align="center">Counterclockwise</td>
<td align="center"><inline-formula><mml:math id="M43"><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mtext>2</mml:mtext><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0003E;</mml:mo><mml:mtext>0</mml:mtext></mml:math></inline-formula></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The motion classification map <inline-formula><mml:math id="M44"><mml:mi mathvariant="script">L</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:math></inline-formula> is determined by applying the rules summarized in Table <xref ref-type="table" rid="T2">2</xref>. They are based on the signs of the parameters of the selected motion models and on simple functions of these parameters. Each <inline-formula><mml:math id="M45"><mml:mi mathvariant="script">L</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:math></inline-formula> value is one of the twelve motion classes &#x003B3;<italic><sub>l</sub></italic> of the set &#x00393;. This is illustrated on Figure <xref ref-type="fig" rid="F2">2</xref>. In contrast to Basset et al. (<xref ref-type="bibr" rid="B6">2014</xref>), we do not regularize <inline-formula><mml:math id="M46"><mml:mi mathvariant="script">L</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:math></inline-formula> by a vote procedure, since we precisely aim to detect local anomalous motion. The local motion information must not be smoothed out. If we process an image sequence of &#x1D4AF; successive images, we come up with <inline-formula><mml:math id="M47"><mml:mi mathvariant="script">T</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula> successive motion classification maps <inline-formula><mml:math id="M48"><mml:mi mathvariant="script">L</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-punc">,</mml:mo><mml:mtext>&#x02009;</mml:mtext><mml:mi>t</mml:mi><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mn>1</mml:mn><mml:mtext>&#x02009;</mml:mtext><mml:mo class="MathClass-op">&#x02026;</mml:mo><mml:mi mathvariant="script">T</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula>.</p>
<fig position="float" id="F2">
<label>Figure 2</label>
<caption><p><bold>Overview of the computation of the labeled affine flow in several real video sequences</bold>. <bold>(A)</bold> Input images. <bold>(B)</bold> Motion detection maps &#x003D2;(<italic>t</italic>). <bold>(C)</bold> Affine flow deduced from the selected motion models, colored according to standard code for optical flow maps, which is the continuous version of the quantized color code used for Translation classes provided in Table <xref ref-type="table" rid="T1">1</xref>. <bold>(D)</bold> Map &#x02112;(t) of motion classes for each pixel, colored according to Table <xref ref-type="table" rid="T1">1</xref>.</p></caption>
<graphic xlink:href="fict-04-00010-g002.tif"/>
</fig>
<p>We coin the term <italic>Labeled Affine Flow</italic> (LAF) to emphasize that, from the selected motion model at <italic>p</italic>&#x02009;&#x02208;&#x02009;&#x003D2;, we have not only computed an affine flow vector at <italic>p</italic> but have jointly determined its motion class. The labeled affine flow is defined at each time instant <italic>t</italic> of the video sequence by <inline-formula><mml:math id="M49"><mml:mrow><mml:mo class="MathClass-open">&#x0007B;</mml:mo><mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B8;</mml:mi><mml:msub><mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mover accent='true'><mml:mi>k</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-punc">,</mml:mo><mml:mi mathvariant="script">L</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>p</mml:mi><mml:mo class="MathClass-rel">&#x02208;</mml:mo><mml:mtext></mml:mtext><mml:mo class="MathClass-op">&#x003D2;</mml:mo><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mo class="MathClass-close">&#x0007D;</mml:mo></mml:mrow></mml:math></inline-formula>.</p>
<p>A possible extension to the classification process would be to include more motion classes (e.g., by first adding the motion type TRS&#x02014;Translation plus Rotation plus Scale&#x02014;and then, corresponding motion classes). For our target application, however, addressing too many motion classes would have detrimental effects. In particular, it might affect the statistical relevance of motion class histograms and make the discrimination of local anomalies difficult. It would require more available data with all the possible motion combinations.</p>
</sec>
</sec>
<sec id="S4">
<label>4</label> <title>Detection and Localization of Anomalous Motion</title>
<p>Our anomalous motion detection-and-localization method relies on local motion classes derived from the pixelwise selected motion types. We compute block-based motion-weighted histograms of these motion classes as motion descriptors to characterize local motions. We use non-overlapping blocks as illustrated in Figure <xref ref-type="fig" rid="F3">3</xref>. If required, overlapping blocks could be used as well to increase the spatial accuracy of the anomalous motion detection at the expense of computation load, down to a pixelwise detection with one-pixel stride in the block generation. Then, we adopt a density-based measure in the histogram space to detect anomalous motion. The pipeline is explained in detail hereafter.</p>
<fig position="float" id="F3">
<label>Figure 3</label>
<caption><p><bold>Spatial blocks are introduced to allow localization of anomalous motion</bold>. The images are respectively taken from the UMN and UCSD datasets.</p></caption>
<graphic xlink:href="fict-04-00010-g003.tif"/>
</fig>
<sec id="S4-4">
<label>4.1</label> <title>Local LAF Histograms</title>
<p>As noted in Section <xref ref-type="sec" rid="S2">2</xref>, feature-based methods for anomaly detection that benefit of direct use of spatial information achieve better performance. For this purpose, we split the image, and consequently the scene since the camera is static, in spatial blocks <inline-formula><mml:math id="M50"><mml:msub><mml:mrow><mml:mi mathvariant="script">B</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, with <italic>i</italic>&#x02009;&#x0003D;&#x02009;1&#x02009;&#x02026;&#x02009;<italic>B</italic>. This subdivides the anomalous motion detection task in multiple sub-problems (see Figure <xref ref-type="fig" rid="F3">3</xref>). We introduce an original motion histogram which we call LAF histogram. The LAF histogram is computed for every block <inline-formula><mml:math id="M51"><mml:msub><mml:mrow><mml:mi mathvariant="script">B</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> at time <italic>t</italic>, and is denoted by <inline-formula><mml:math id="M52"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>. The LAF histogram corresponds to a weighted motion class histogram. More specifically, the LAF histogram <inline-formula><mml:math id="M53"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> is constructed by summing a function <italic>&#x003C8;</italic>(<italic>p</italic>, <italic>t</italic>, <italic>l</italic>) over block <inline-formula><mml:math id="M54"><mml:msub><mml:mrow><mml:mi mathvariant="script">B</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> within a short time interval around time instant <italic>t</italic>. We define <italic>&#x003C8;</italic>(.) as follows:
<disp-formula id="E9"><label>(6)</label><mml:math id="M55"><mml:mi>&#x003C8;</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>t</mml:mi><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>l</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mfenced separators="" open="{" close=""><mml:mrow><mml:mtable class="array"><mml:mtr><mml:mtd class="array" columnalign="left"><mml:mo class="MathClass-rel">&#x0007C;&#x0007C;</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:msub><mml:mrow><mml:mo class="MathClass-rel">&#x0007C;&#x0007C;</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mspace width="1em" class="quad"/></mml:mtd><mml:mtd class="array" columnalign="left"><mml:mtext>if</mml:mtext><mml:mspace width="0.5em" class="nbsp"/><mml:mi mathvariant="script">L</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:msub><mml:mrow><mml:mn>&#x003B3;</mml:mn></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-punc">,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="array" columnalign="left"><mml:mn>0</mml:mn><mml:mspace width="1em" class="quad"/></mml:mtd><mml:mtd class="array" columnalign="left"><mml:mtext>otherwise</mml:mtext><mml:mo class="MathClass-punc">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mfenced></mml:math></disp-formula>
where <italic>k<sub>l</sub></italic> denotes the motion model selected at <italic>p</italic> with equation (<xref ref-type="disp-formula" rid="E5">5</xref>), and associated with the motion class &#x003B3;<italic><sub>l</sub></italic>. As defined by equation (<xref ref-type="disp-formula" rid="E6">6</xref>), the weight to be added in bin <italic>l</italic> of the histogram is the magnitude of the affine flow vector at <italic>p</italic>. Thus, we simultaneously take into account both the motion magnitude and the motion class to detect anomalous motion. As aforementioned, affine motion magnitude was not exploited in Basset et al. (<xref ref-type="bibr" rid="B6">2014</xref>). For each bin <italic>l</italic> of the histogram, we set:
<disp-formula id="E10"><label>(7)</label><mml:math id="M56"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mrow><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mi>&#x0212C;</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:munder></mml:mstyle></mml:mrow><mml:mtext>&#x02009;</mml:mtext><mml:mrow><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>&#x003C4;</mml:mi><mml:mo>=</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover></mml:mstyle></mml:mrow><mml:mtext>&#x02009;</mml:mtext><mml:mi>&#x003C8;</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo class="MathClass-punc">,</mml:mo><mml:mn>&#x003C4;</mml:mn><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>l</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-punc">,</mml:mo></mml:math></disp-formula>
involving the previous (<italic>t</italic>&#x02009;&#x02212;&#x02009;1) and following (<italic>t</italic>&#x02009;&#x0002B;&#x02009;1) frames to build the histogram at time <italic>t</italic>. This procedure enables us to specify short-term temporal behaviors in a given block. The motion magnitude is used to weight the importance of a given motion class (or bin of the histogram). In this way, we manage to capture several essential aspects of motion in a single descriptor, allowing us to distinguish anomalies by their speed and their movement direction as well. It is true that a small fast moving object and a large slow one may lead to similar histograms if they undergo exactly the same type of motion and if the ratios in speed and size are strictly equal, which has however a very low probability to occur.</p>
</sec>
<sec id="S4-5">
<label>4.2</label> <title>Dedicated LAF Histogram Distance</title>
<p>We now specify the appropriate distance to compare two LAF histograms. Let us take two LAF histograms computed in the same block <inline-formula><mml:math id="M57"><mml:msub><mml:mrow><mml:mi mathvariant="script">B</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> for two different images and referenced as <italic>&#x003B1;</italic> and <italic>&#x003B2;</italic>. They could be the test histogram and the training one for instance. As a matter of fact, this distance will combine two distances, since we first separate the histograms in two sub-histograms. The first one involves the eight classes of translation motion, and the second one involves the four classes related to scaling and rotation motions. This will be motivated right below. The two sub-histograms are denoted by <italic>&#x003BA;<sub>i</sub></italic> and &#x003B6;<italic><sub>i</sub></italic>, respectively.</p>
<p>For the translation-related sub-histogram, we adopt the modulo distance <italic>D<sub>mod</sub></italic>(&#x022C5;, &#x022C5;) introduced in Cha and Srihari (<xref ref-type="bibr" rid="B16">2002</xref>) for circular histograms. This is precisely the case for the translation sub-histogram, since the eight translation classes are defined by compass orientations (see Table <xref ref-type="table" rid="T1">1</xref>). There is no closed-form way to compute the modulo distance, but an algorithm is available, the pseudocode of which can be found in Cha and Srihari (<xref ref-type="bibr" rid="B16">2002</xref>). The modulo distance plays a key role as it is higher between opposite directions that between adjacent ones. It can truly emphasize discrepancy and closeness between motion translation classes, and it is less sensitive to orientation quantization. On the other hand, the sub-histograms related to scaling and rotation motion classes are compared with a <italic>L</italic><sub>1</sub> distance. We found that <italic>L</italic><sub>1</sub> distance was the best choice after experimentally comparing several usual histogram distances. The overall histogram dissimilarity measure between two LAF histograms <inline-formula><mml:math id="M58"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M59"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> is then defined as follows:
<disp-formula id="E11"><label>(8)</label><mml:math id="M60"><mml:mi>D</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup><mml:mo class="MathClass-punc">,</mml:mo><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">mod</mml:mi></mml:mrow></mml:msub><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:msubsup><mml:mrow><mml:mn>&#x003BA;</mml:mn></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup><mml:mo class="MathClass-punc">,</mml:mo><mml:msubsup><mml:mrow><mml:mn>&#x003BA;</mml:mn></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msubsup></mml:mrow></mml:mfenced><mml:mo class="MathClass-bin">&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:msubsup><mml:mrow><mml:mn>&#x003B6;</mml:mn></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup><mml:mo class="MathClass-punc">,</mml:mo><mml:msubsup><mml:mrow><mml:mn>&#x003B6;</mml:mn></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msubsup></mml:mrow></mml:mfenced><mml:mspace width="0.3em" class="thinspace"/><mml:mo class="MathClass-punc">,</mml:mo></mml:math></disp-formula>
with equally weighted summands, since the ranges of the modulo and <italic>L</italic><sub>1</sub> distances are similar as explained in Cha and Srihari (<xref ref-type="bibr" rid="B16">2002</xref>).</p>
</sec>
<sec id="S4-6">
<label>4.3</label> <title>Local Outlier Factor</title>
<p>We face a specific situation for formulating the anomalous motion detection criterion. We have no prior models on what should be both normal and abnormal behaviors. Training a classifier could be a possibility; it is easy to collect training examples for normal behaviors. However, we have usually few available anomalous motion examples, they can be of different kind, and in particular they may be unexpected. Furthermore, we want that our method can be applied online as well, without any previous learning stage. This advocates for a purely data-driven decision process to detect anomalous motion. A LAF histogram corresponding to anomalous motion should be an outlier in the feature space of LAF histograms, where all computed LAF histograms are collected, since the large majority of LAF histograms are likely to correspond to normal behaviors. An outlier can be characterized by its distance to clusters of normal behavior. However, this would require to perform clustering in a high-dimensional space without knowing the number of clusters and their respective shape (inner distribution). Then, a more attractive approach is to compare the local density of features (i.e., histograms) around the test histogram with the local feature densities of its nearest neighbors.</p>
<p>To do this, we resort to the local outlier factor (LOF) which is precisely a measure to discriminate anomalous data in a dataset. LOF was proposed to detect anomalies in the e-commerce field (Breunig et al., <xref ref-type="bibr" rid="B12">2000</xref>). It has subsequently been used to detect anomalies for other kinds of problems, such as network intrusions (Lazarevic et al., <xref ref-type="bibr" rid="B40">2003</xref>). However, LOF has not been exploited so far for solving computer vision problems. We will demonstrate its utility for anomalous motion detection in videos.</p>
<p>The rationale behind the LOF measure is that detecting outliers can be achieved by identifying points of a certain feature space that have low data density around them with respect to their neighbors. This approach allows one to specify abnormality in a local and relative way. It is thus quite flexible. This is illustrated in Figure <xref ref-type="fig" rid="F4">4</xref>, which includes two clusters, <italic>C</italic><sub>1</sub> and <italic>C</italic><sub>2</sub>, and a few other data points, such as <inline-formula><mml:math id="M61"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M62"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msubsup></mml:math></inline-formula>, supposed to be outliers (i.e., anomalous motion for our problem). If we merely perform thresholding on the distance of the data point to the cluster centroids, it may be tricky to set the threshold value and lead to errors. Indeed, several points of cluster <italic>C</italic><sub>1</sub> could be for instance incorrectly labeled as anomalous, or conversely it could be the case that <inline-formula><mml:math id="M63"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msubsup></mml:math></inline-formula> may be interpreted as an inlier.</p>
<fig position="float" id="F4">
<label>Figure 4</label>
<caption><p><bold>(A)</bold> Illustrative case where a dataset is shaped mainly in the form of two clusters (data points in green and blue), with a few of clear outliers in red, including <inline-formula><mml:math id="M64"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M65"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msubsup></mml:math></inline-formula>. In our case, data points in the feature space for a single block are LAF histograms of weighted motion classes. <bold>(B)</bold> Detail for the outlier data point <inline-formula><mml:math id="M66"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msubsup></mml:math></inline-formula>, where the local reachability density of the data points is encoded with circles. Regarding the neighborhood of <italic>k</italic>-nearest histograms, <italic>k</italic>&#x02009;&#x0003D;&#x02009;3 in this illustration. The outlier point is surrounded by a circle (in blue) that is clearly bigger that the ones of their neighbors (circles in orange).</p></caption>
<graphic xlink:href="fict-04-00010-g004.tif"/>
</fig>
<p>By contrast, using the LOF measure, an &#x0201C;<italic>outlierity</italic>&#x0201D; measure is assigned to every point in the dataset, and it does not require to determine any cluster. This measure is calculated as the average local density attached to data points within its neighborhood divided by its own local density. This notion is illustrated in Figure <xref ref-type="fig" rid="F4">4</xref>B, by encoding the density measure as a circle that contains the <italic>k</italic>-nearest neighbors of a given point. The larger the circle, the smaller the density measure. It can be observed that inliers are surrounded by smaller circles than the outlier points. The formal definition of this density is supplied later on in this section. Thus, if a data point of the feature space is assigned a low density compared to data points in its neighborhood, its <italic>outlierity</italic> measure is higher.</p>
<p>The neighborhood of a data point in the feature space is given by its <italic>k</italic>-nearest neighbors for the distance of equation (<xref ref-type="disp-formula" rid="E8">8</xref>). In this sense, the neighborhood geometry is locally adaptive. Although the distance of a tested data point to a &#x0201C;normal&#x0201D; point may be smaller that the distance between two other &#x0201C;normal&#x0201D; points, the test data point can still be classified as outlier/anomalous, if its most proximal neighbors depict a higher density. As an example, in Figure <xref ref-type="fig" rid="F4">4</xref>, it can be seen that although the distance between data point <inline-formula><mml:math id="M67"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msubsup></mml:math></inline-formula> and points of cluster <italic>C</italic><sub>2</sub> is similar to the distance between pairs of data points in cluster <italic>C</italic><sub>1</sub>, <inline-formula><mml:math id="M68"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msubsup></mml:math></inline-formula> can still be detected as outlier/anomalous, since its density is being compared to data points that belong to <italic>C</italic><sub>2</sub> mainly. The same reasoning can be applied to data points of cluster <italic>C</italic><sub>1</sub>, allowing them to be labeled as inlier/normal.</p>
<p>More formally, we need first to introduce the <italic>reachability distance</italic> (Breunig et al., <xref ref-type="bibr" rid="B12">2000</xref>) of a histogram <inline-formula><mml:math id="M69"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> from another histogram <inline-formula><mml:math id="M70"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>:
<disp-formula id="E12"><label>(9)</label><mml:math id="M71"><mml:msub><mml:mrow><mml:mi>&#x003C6;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup><mml:mo class="MathClass-punc">,</mml:mo><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mi mathvariant="normal">max</mml:mi><mml:mtext>&#x02009;</mml:mtext><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:mi>D</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msubsup><mml:mo class="MathClass-punc">,</mml:mo><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>D</mml:mi><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup><mml:mo class="MathClass-punc">,</mml:mo><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msubsup></mml:mrow></mml:mfenced></mml:mrow></mml:mfenced><mml:mspace width="0.3em" class="thinspace"/><mml:mo class="MathClass-punc">,</mml:mo></mml:math></disp-formula>
where <italic>k</italic>(<italic>&#x003B2;</italic>) denotes the <italic>k</italic>-th nearest neighbor of <italic>&#x003B2;</italic>, and <italic>D</italic>(&#x022C5;, &#x022C5;) the distance introduced in equation (<xref ref-type="disp-formula" rid="E8">8</xref>). Let us stress that <italic>&#x003C6;<sub>k</sub></italic>(&#x022C5;, &#x022C5;) is not a real distance since it is not symmetric. Hence, <inline-formula><mml:math id="M72"><mml:msub><mml:mrow><mml:mi>&#x003C6;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup><mml:mo class="MathClass-punc">,</mml:mo><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:math></inline-formula> is not expected to be equal to <inline-formula><mml:math id="M73"><mml:msub><mml:mrow><mml:mi>&#x003C6;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msubsup><mml:mo class="MathClass-punc">,</mml:mo><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:math></inline-formula>.</p>
<p>We denote the set of <italic>k</italic>-nearest neighbors of a given histogram <inline-formula><mml:math id="M74"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> by <inline-formula><mml:math id="M75"><mml:msub><mml:mrow><mml:mi mathvariant="script">N</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:math></inline-formula>. The <italic>local reachability density</italic> <inline-formula><mml:math id="M76"><mml:msub><mml:mrow><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:math></inline-formula> is defined as the inverse of the average reachability distance of the histogram <inline-formula><mml:math id="M77"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> from its neighbors (Breunig et al., <xref ref-type="bibr" rid="B12">2000</xref>):
<disp-formula id="E13"><label>(10)</label><mml:math id="M78"><mml:msub><mml:mrow><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:msup><mml:mrow><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:mfrac><mml:mrow><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mtext>&#x02009;</mml:mtext><mml:msubsup><mml:mi>h</mml:mi><mml:mi>i</mml:mi><mml:mi>&#x003B7;</mml:mi></mml:msubsup><mml:mo>&#x02208;</mml:mo><mml:mtext>&#x02009;</mml:mtext><mml:msub><mml:mi>&#x1D4A9;</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msubsup><mml:mi>h</mml:mi><mml:mi>i</mml:mi><mml:mi>&#x003B1;</mml:mi></mml:msubsup><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:munder></mml:mstyle></mml:mrow><mml:mtext>&#x02009;</mml:mtext><mml:msub><mml:mrow><mml:mi>&#x003C6;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup><mml:mo class="MathClass-punc">,</mml:mo><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow></mml:mfenced></mml:mrow><mml:mrow><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo class="MathClass-punc">,</mml:mo></mml:math></disp-formula>
since the cardinality of <inline-formula><mml:math id="M79"><mml:msub><mml:mrow><mml:mi mathvariant="script">N</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:math></inline-formula> equals <italic>k</italic>. Let us remind that <inline-formula><mml:math id="M80"><mml:msub><mml:mrow><mml:mi>&#x003C6;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup><mml:mo class="MathClass-punc">,</mml:mo><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:math></inline-formula> is the reachability distance of <inline-formula><mml:math id="M81"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> from <inline-formula><mml:math id="M82"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>, not to be confused with <inline-formula><mml:math id="M83"><mml:msub><mml:mrow><mml:mi>&#x003C6;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow></mml:msubsup><mml:mo class="MathClass-punc">,</mml:mo><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:math></inline-formula>.</p>
<p>The local outlier factor <italic>LOF<sup>k</sup></italic> (upper script <italic>k</italic> expresses the use of <italic>k</italic>-nearest neighbors) is then defined to compare the local reachability densities of a given histogram with respect to its own neighbors as follows (Breunig et al., <xref ref-type="bibr" rid="B12">2000</xref>):
<disp-formula id="E14"><label>(11)</label><mml:math id="M84"><mml:msup><mml:mrow><mml:mi mathvariant="italic">LOF</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mo class="MathClass-op">&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow></mml:msubsup><mml:mo class="MathClass-rel">&#x02208;</mml:mo><mml:mtext></mml:mtext><mml:msub><mml:mrow><mml:mi mathvariant="script">N</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mtext>&#x02009;</mml:mtext><mml:msub><mml:mrow><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="script">N</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mspace width="1em" class="nbsp"/><mml:msub><mml:mrow><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:mo class="MathClass-punc">.</mml:mo></mml:math></disp-formula></p>
<p>In other words, the local outlier factor captures the local reachability density ratio between the neighbors of a given histogram and itself. It produces values that are close to one if the average density of its neighbors is similar to its own. Conversely, higher values indicate possible outliers.</p>
</sec>
<sec id="S4-7">
<label>4.4</label> <title>Test for Detecting Anomalous Motion</title>
<p>Our goal is to make a classification for every block in every frame of the video sequence in two classes, namely, &#x0201C;normal motion&#x0201D; and &#x0201C;anomalous motion.&#x0201D; As explained in Section <xref ref-type="sec" rid="S1">1</xref>, the large majority of these are likely to correspond to &#x0201C;normal motion.&#x0201D; In order to perform this classification, we measure the LOF of the current LAF histogram in a given block and compare it to a threshold.</p>
<p>As explained in the previous section, the LOF measure outputs a value by comparing the local density around the test LAF histogram in the LAF histogram collection, with respect to the density of its nearest <italic>k</italic>-neighbors. The LOF measure is constructed independently for each block <inline-formula><mml:math id="M85"><mml:msub><mml:mrow><mml:mi mathvariant="script">B</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>.</p>
<p>Specifically, we decide that a LAF histogram <inline-formula><mml:math id="M86"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> computed in block <inline-formula><mml:math id="M87"><mml:msub><mml:mrow><mml:mi mathvariant="script">B</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> at time instant <italic>t</italic>, corresponds to an anomalous motion if
<disp-formula id="E15"><label>(12)</label><mml:math id="M88"><mml:mi mathvariant="italic">LO</mml:mi><mml:msubsup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003E;</mml:mo><mml:msub><mml:mrow><mml:mn>&#x003BB;</mml:mn></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-punc">,</mml:mo></mml:math></disp-formula>
where <inline-formula><mml:math id="M89"><mml:mi mathvariant="italic">LO</mml:mi><mml:msubsup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> is the local outlier factor computed in the subspace of LAF histograms computed over time in block <inline-formula><mml:math id="M90"><mml:msub><mml:mrow><mml:mi mathvariant="script">B</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and by taking into account the nearest <italic>k</italic>-neighbors.</p>
<p>Each &#x003BB;<italic><sub>i</sub></italic> is automatically inferred from a <italic>p</italic>-value, denoted by <italic>&#x003BE;</italic>, on specific statistics of every block <inline-formula><mml:math id="M91"><mml:msub><mml:mrow><mml:mi mathvariant="script">B</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. In fact, we want &#x003BB;<italic><sub>i</sub></italic> to control the number of wrongly classified blocks. In order to do this, we exploit the computed LOF values corresponding to normal motion (for instance, using a training dataset comprising only normal motion cases). Thus, for every block <inline-formula><mml:math id="M92"><mml:msub><mml:mrow><mml:mi mathvariant="script">B</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, a distribution of LOF values is stored. As shown in P&#x000E9;cot et al. (<xref ref-type="bibr" rid="B54">2015</xref>), we can set:
<disp-formula id="E16"><label>(13)</label><mml:math id="M93"><mml:msub><mml:mrow><mml:mn>&#x003BB;</mml:mn></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-bin">&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-bin">&#x02215;</mml:mo><mml:msqrt><mml:mrow><mml:mi>&#x003BE;</mml:mi></mml:mrow></mml:msqrt><mml:mo class="MathClass-punc">,</mml:mo></mml:math></disp-formula>
where, for each <inline-formula><mml:math id="M94"><mml:msub><mml:mrow><mml:mi mathvariant="script">B</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <italic>&#x003BC;<sub>i</sub></italic> and <italic>&#x003C3;<sub>i</sub></italic> are the trimmed mean and the winsorized variance (Huber, <xref ref-type="bibr" rid="B33">1981</xref>) computed from the empirical distribution of the stored LOFs, while discarding the 20% more extreme values in order to reduce the effect of spurious LAF histograms. Equation (<xref ref-type="disp-formula" rid="E13">13</xref>) does not imply that the distribution of LOF values is close to a Gaussian distribution, and in practice, it might by far from it. This relationship is merely inferred from equation (<xref ref-type="disp-formula" rid="E12">12</xref>) using the Chebyshev inequality (P&#x000E9;cot et al., <xref ref-type="bibr" rid="B54">2015</xref>).</p>
<p>Nevertheless, it is convenient to add further filtering as the initial classification output can be noisy. This is done by adding a post-processing step to our method. Indeed, anomalous motion is likely to be persistent for a (short) period of time. For the anomalous motion localization in our pipeline (decision at the block level), a percentile filter is applied on the classification output to accept or reject anomalous motion candidates. We take a temporal neighborhood of the block including the previous and next frames, as drawn in Figure <xref ref-type="fig" rid="F5">5</xref>. The neighborhood shape allows us to take into account that anomalous motion blocks may shift between two frames following the outlier moving object displacement. The classification label of block <inline-formula><mml:math id="M95"><mml:msub><mml:mrow><mml:mi mathvariant="script">B</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is updated to the value of the 16th element of the binary vector formed by the initial classification labels (1 for anomalous motion, 0 for normal motion) of its space&#x02013;time neighborhood. This vector comprises 19 components, that is the 18 labels of the space&#x02013;time neighborhood and the initial label of the block, organized in ascending order.</p>
<fig position="float" id="F5">
<label>Figure 5</label>
<caption><p><bold>Illustration of the temporal block neighborhood for anomalous motion detection-and-localization filtering</bold>. Each block has 18 neighbors.</p></caption>
<graphic xlink:href="fict-04-00010-g005.tif"/>
</fig>
<p>As for frame-level decision, one frame is said to contain anomalous motion, if at least one of its blocks is detected as such. On the other hand, for the frame-level anomalous motion detection, a temporal median filter of size 7 is applied on the frame-classification output. This classification, although simple, offers satisfactory results, as the underlying block-based LAF histograms capture very well different aspects of the visual data.</p>
</sec>
</sec>
<sec id="S5">
<label>5</label> <title>Experimental Results</title>
<p>We present several experiments to assess the performance of our anomalous motion detection method. At the end of this section, in subsections <xref ref-type="sec" rid="S5-14">5.7</xref> and <xref ref-type="sec" rid="S5-15">5.8</xref>, we will demonstrate how the two main ingredients of our method, that is, the LAF histograms and the LOF criterion, contribute to its overall performance.</p>
<p>We report comparative results on three datasets: UMN dataset (Papanikolopoulos, <xref ref-type="bibr" rid="B53">2005</xref>), PETS2009 dataset (Ferryman and Shahrokni, <xref ref-type="bibr" rid="B26">2009</xref>), and UCSD dataset (Li et al., <xref ref-type="bibr" rid="B46">2014</xref>). We did not run codes of other methods, but we only collected results when available in previously published papers. The two first datasets depict global motion anomalies, that is, people in the scene all together adopt a new dynamic behavior at the same time instant, like suddenly running. The UCSD dataset is of a different kind. It involves local anomalies, but above all, anomalies are due to the type of both object and motion. Indeed, anomalies are formed by cyclists and skateboarders riding, or vehicles driving among pedestrians walking on a campus path. Thus, this dataset is not truly intended to assess <italic>anomalous motion</italic> detection <italic>on its own</italic>, and accordingly, the most performing methods are those exploiting both appearance and motion (Antic and Ommer, <xref ref-type="bibr" rid="B4">2011</xref>; Li et al., <xref ref-type="bibr" rid="B46">2014</xref>). Yet, this dataset is a popular one in crowd anomaly detection, so we believed that it was worth evaluating our method on the UCSD dataset as well, in order to show the usefulness of our anomalous motion detection method for this task, and by doing so its versatility. Nevertheless, for a fair assessment, we will compare our method with motion-based anomaly detection methods only. Indeed, our end goal is not to define a method dedicated to crowd anomaly detection, but a generic method for anomalous motion detection. The fourth experiment will deal with two video sequences of crowded scenes which exhibit local anomalous motion, and results will be only visually assessed.</p>
<p>There are different ways to compute normal LAF histograms to populate the LAF histogram space and correctly compute the LOF criterion. If we are dealing with densely enough crowded scenes where the large majority of people undergo normal behavior and only a few local anomalous motion may appear, we can apply the online version of our method. It means that the reference LAF histograms <inline-formula><mml:math id="M96"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> can be computed in the very same image at every time <italic>t</italic>, since most of blocks <inline-formula><mml:math id="M97"><mml:msub><mml:mrow><mml:mi mathvariant="script">B</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> include normal behavior. If the anomalous motion corresponds to a global sudden change in dynamic behavior, normal LAF histograms can be computed in the first part of the video sequence. We will specify for each experiment the way LAF histograms corresponding to normal motion are computed.</p>
<p>For all the experiments, we set <italic>k</italic>&#x02009;&#x0003D;&#x02009;7 for the <italic>k</italic>-nearest neighbors in the LOF computation. This value was selected by cross-validation on the UCSD dataset, which provides with complete per-pixel annotation. The block size is defined by the grid partition. We found that this parameter does not vary much the results, as long as the blocks cover an area that is similar in size to the expected actors of the normal events. For the UNM and UCSD dataset, we fix the grid size to 12&#x02009;&#x000D7;&#x02009;8 blocks. For the rest of the presented experiments we use 8&#x02009;&#x000D7;&#x02009;8.</p>
<p>Objective comparison will be based on two performance criteria specified by previous work, namely, frame-level and pixel-level ones. The set of compared methods may vary depending on the dataset, according to availability of reported experimental results (performance numbers and ROC curves). The pixel-level criterion establishes that the frame detected anomalous is considered correctly classified if at least 40% of the truly anomalous pixels are detected. This procedure should not be confused with a truly pixelwise evaluation, but it ensures a minimal precision&#x02013;recall balance. This pixel-level criterion was introduced in Mahadevan et al. (<xref ref-type="bibr" rid="B48">2010</xref>), and it has been widely adopted in the crowd anomaly detection literature. The frame-level criterion simply acknowledges a correct classification if at least one true anomaly is detected in the frame.</p>
<sec id="S5-8">
<label>5.1</label> <title>Experiments on the UMN Dataset</title>
<p>The UMN dataset includes eleven sequences of sudden escape events corresponding to three scenes (indoor and outdoor, see Figure <xref ref-type="fig" rid="F6">6</xref> for samples). The videos depict groups of people freely walking around open spaces and performing ordinary actions inside a building lobby, which represents normal behaviors. The anomalies occur when the people start running until they get out of view. This corresponds to a global anomalous motion case. Nevertheless, we are still able to localize where anomalous motion occur in every image. From the total of 7740 frames of the dataset, 1431 depict escaping behaviors, that is, correspond to anomalous motion. Reference LAF histograms are then computed in each block <inline-formula><mml:math id="M98"><mml:msub><mml:mrow><mml:mi mathvariant="script">B</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> containing moving pixels, of the first 6000 frames displaying normal behavior.</p>
<fig position="float" id="F6">
<label>Figure 6</label>
<caption><p><bold>Left column: original samples of the UMN dataset</bold>. Right column: blocks where anomalous motion detection is localized by our method are framed in red. From top to bottom: examples respectively from scene 1, scene 2, and scene 3 of the UMN dataset.</p></caption>
<graphic xlink:href="fict-04-00010-g006.tif"/>
</fig>
<p>We report comparative results with the following motion-based anomaly detection methods: the method based on sparse reconstruction error (SRC) (Cong et al., <xref ref-type="bibr" rid="B21">2013</xref>) which exploits multi-scale histograms of optical flow, the method relying on chaotics invariants (CI) (Wu et al., <xref ref-type="bibr" rid="B67">2010</xref>), the method involving the social force model (SF) (Mehran et al., <xref ref-type="bibr" rid="B50">2009</xref>), the method built upon scan statistic (SS) (Hu et al., <xref ref-type="bibr" rid="B31">2013</xref>), and the method introducing motion influence maps (MIM) (Lee et al., <xref ref-type="bibr" rid="B42">2015</xref>). Sample visual results are gathered in Figure <xref ref-type="fig" rid="F6">6</xref>. Available ROC curves are plotted in Figure <xref ref-type="fig" rid="F7">7</xref>. The frame-level criterion is used for the UMN dataset. We report the area under the ROC curve and the equal error rate (EER) in Table <xref ref-type="table" rid="T3">3</xref>. The EER corresponds to equal false positives and false negatives. Numbers are taken from Mahadevan et al. (<xref ref-type="bibr" rid="B48">2010</xref>) and Zhang et al. (<xref ref-type="bibr" rid="B72">2016</xref>) for the other methods. To compute the frame-level evaluation in our method, we consider that a frame is anomalous if at least one block in it is labeled as such. From Table <xref ref-type="table" rid="T3">3</xref> and Figure <xref ref-type="fig" rid="F7">7</xref>, we can conclude that our method is very competitive. It provides the second best result regarding EER, and it is close to the two best ones regarding AUC. Furthermore, it outperforms other motion-based methods MIM (Lee et al., <xref ref-type="bibr" rid="B42">2015</xref>), SS (Hu et al., <xref ref-type="bibr" rid="B31">2013</xref>), and SF (Mehran et al., <xref ref-type="bibr" rid="B50">2009</xref>), when examining results scene by scene.</p>
<fig position="float" id="F7">
<label>Figure 7</label>
<caption><p><bold>ROC curves for SRC (Cong et al., <xref ref-type="bibr" rid="B21">2013</xref>), SF (Mehran et al., <xref ref-type="bibr" rid="B50">2009</xref>), and our method on the UMN dataset (TPR stands for True Positive Rate and FPR for False Positive Rate)</bold>.</p></caption>
<graphic xlink:href="fict-04-00010-g007.tif"/>
</fig>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p><bold>Anomalous motion detection performance on the UMN dataset</bold>.</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr>
<td align="center"/>
<td align="center"><bold>CI (Wu et al., <xref ref-type="bibr" rid="B67">2010</xref>)</bold></td>
<td align="center"><bold>SF (Mehran et al., <xref ref-type="bibr" rid="B50">2009</xref>)</bold></td>
<td align="center"><bold>SRC (Cong et al., <xref ref-type="bibr" rid="B21">2013</xref>)</bold></td>
<td align="center"><bold>Ours</bold></td>
</tr>
<tr>
<td align="left" colspan="5"><hr/></td>
</tr>
<tr>
<td align="left">AUC</td>
<td align="center"><underline>99.4</underline></td>
<td align="center">94.9</td>
<td align="center"><bold>99.6</bold></td>
<td align="center">99.2</td>
</tr>
<tr>
<td align="left">EER</td>
<td align="center">5.3</td>
<td align="center">12.6</td>
<td align="center"><bold>2.8</bold></td>
<td align="center"><underline>3.1</underline></td>
</tr>
<tr>
<td align="left" colspan="5"><hr/></td>
</tr>
<tr>
<td align="left"/>
<td align="center">SS (Hu et al., <xref ref-type="bibr" rid="B31">2013</xref>)</td>
<td align="center">MIM (Lee et al., <xref ref-type="bibr" rid="B42">2015</xref>)</td>
<td align="center">Ours</td>
<td align="center"/>
</tr>
<tr>
<td align="left" colspan="5"><hr/></td>
</tr>
<tr>
<td align="left">AUC (S1/S2/S3)</td>
<td align="center">99.1/<underline>95.1</underline>/<bold>99.0</bold></td>
<td align="center"><underline>99.4</underline>/90.9/98.1</td>
<td align="center"><bold>99.5/99.2/99.0</bold></td>
<td align="center"/>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Best results are indicated with bold font and second best are underlined. AUC and EER are defined in the text. Individual scores for the three scenes (S1, S2, and S3) are also given</italic>.</p>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="S5-9">
<label>5.2</label> <title>Experiments on the PETS2009 dataset</title>
<p>Each one of the scenarios of the PETS2009 dataset contains four sequences from different points of view of the scenes. We used the same number of training frames as reported in Wu et al. (<xref ref-type="bibr" rid="B68">2014</xref>), that is 30 for the first scenario and 100 for the second one, to compute the LAF histograms of normal behavior. Anomalous motion in the two scenarios consists of people who suddenly start running at some time instant of the videos.</p>
<p>We present results for the frame-level accuracy criterion, more specifically the number of correctly classified frames over the total of frames, on two selected scenarios of the PETS2009 dataset (Figure <xref ref-type="fig" rid="F8">8</xref>), as proposed in Wu et al. (<xref ref-type="bibr" rid="B68">2014</xref>). We compare our method with four different methods: chaotic invariants (CI) (Wu et al., <xref ref-type="bibr" rid="B67">2010</xref>), the social force model (SF) (Mehran et al., <xref ref-type="bibr" rid="B50">2009</xref>), the force field method (FF) (Chen and Huang, <xref ref-type="bibr" rid="B18">2011</xref>), and the Bayesian model (BM) described in Wu et al. (<xref ref-type="bibr" rid="B68">2014</xref>). Our method supplies state-of-the-art results on this dataset as shown in Table <xref ref-type="table" rid="T4">4</xref>. Indeed, our method has the best average scores for the two scenarios. This experiment also demonstrates that our method is stable and remains reliable when only few training samples are available.</p>
<fig position="float" id="F8">
<label>Figure 8</label>
<caption><p><bold>Top row: sample images from the PETS2009 dataset</bold>. Bottom row: blocks where anomalous motion is localized by our method are framed in red.</p></caption>
<graphic xlink:href="fict-04-00010-g008.tif"/>
</fig>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p><bold>Frame-level accuracy (%) for several methods on sequences of PETS2009 dataset</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center"/>
<th align="center" colspan="5">Scenario 1<hr/></th>
<th align="center" colspan="5">Scenario 2<hr/></th>
</tr>
<tr>
<th align="center"/>
<th align="center">BM (Wu et al., <xref ref-type="bibr" rid="B68">2014</xref>)</th>
<th align="center">FF (Chen and Huang, <xref ref-type="bibr" rid="B18">2011</xref>)</th>
<th align="center">CI (Wu et al., <xref ref-type="bibr" rid="B67">2010</xref>)</th>
<th align="center">SF (Mehran et al., <xref ref-type="bibr" rid="B50">2009</xref>)</th>
<th align="center">Our method</th>
<th align="center">BM (Wu et al., <xref ref-type="bibr" rid="B68">2014</xref>)</th>
<th align="center">FF (Chen and Huang, <xref ref-type="bibr" rid="B18">2011</xref>)</th>
<th align="center">CI (Wu et al., <xref ref-type="bibr" rid="B67">2010</xref>)</th>
<th align="center">SF (Mehran et al., <xref ref-type="bibr" rid="B50">2009</xref>)</th>
<th align="center">Our method</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">View 1</td>
<td align="center"><bold>92.45</bold></td>
<td align="center">37.74</td>
<td align="center">56.60</td>
<td align="center">63.21</td>
<td align="center"><underline>91.20</underline></td>
<td align="center"><bold>96.01</bold></td>
<td align="center">94.50</td>
<td align="center"><underline>94.95</underline></td>
<td align="center">91.22</td>
<td align="center">94.50</td>
</tr>
<tr>
<td align="left">View 2</td>
<td align="center"><underline>83.02</underline></td>
<td align="center">37.74</td>
<td align="center">83.02</td>
<td align="center">70.76</td>
<td align="center"><bold>92.11</bold></td>
<td align="center"><bold>94.15</bold></td>
<td align="center">63.83</td>
<td align="center"><underline>92.02</underline></td>
<td align="center">89.36</td>
<td align="center">91.03</td>
</tr>
<tr>
<td align="left">View 3</td>
<td align="center"><underline>89.62</underline></td>
<td align="center">37.74</td>
<td align="center">81.13</td>
<td align="center">52.83</td>
<td align="center"><bold>95.87</bold></td>
<td align="center">95.21</td>
<td align="center"><underline>95.48</underline></td>
<td align="center">94.15</td>
<td align="center">94.68</td>
<td align="center"><bold>99.15</bold></td>
</tr>
<tr>
<td align="left">View 4</td>
<td align="center"><underline>90.57</underline></td>
<td align="center">37.74</td>
<td align="center">52.83</td>
<td align="center">48.11</td>
<td align="center"><bold>92.17</bold></td>
<td align="center">91.49</td>
<td align="center"><underline>96.81</underline></td>
<td align="center">89.36</td>
<td align="center">64.63</td>
<td align="center"><bold>98.26</bold></td>
</tr>
<tr>
<td align="left">Overall</td>
<td align="center"><underline>88.92</underline></td>
<td align="center">37.74</td>
<td align="center">68.40</td>
<td align="center">58.73</td>
<td align="center"><bold>92.83</bold></td>
<td align="center"><underline>94.22</underline></td>
<td align="center">87.66</td>
<td align="center">92.62</td>
<td align="center">84.97</td>
<td align="center"><bold>95.73</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Best results are indicated with bold font, and second best are underlined</italic>.</p>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="S5-10">
<label>5.3</label> <title>Experiments on the UCSD Dataset</title>
<p>The UCSD dataset was introduced in Mahadevan et al. (<xref ref-type="bibr" rid="B48">2010</xref>) and consists of videos of sparse crowds divided in two scenarios. We used the ped1 subset (Figure <xref ref-type="fig" rid="F9">9</xref>) where the normal behaviors are people walking through the campus scene at a normal speed, toward and away from the camera. As aforementioned, anomalies in this dataset are composed mainly by moving cars, skateboarders, and cyclists, among others. Clearly, the anomalies of this dataset are not only specified by their motion but also by the involved object (car, bike, skate board, etc.). This explains that methods exploiting appearance features may have superior performance as the Video Parsing (VP) method (Antic and Ommer, <xref ref-type="bibr" rid="B4">2011</xref>), and the method based on a mixture of dynamic textures (MDT) (Li et al., <xref ref-type="bibr" rid="B46">2014</xref>).</p>
<fig position="float" id="F9">
<label>Figure 9</label>
<caption><p><bold>Top row: sample images of the UCSD dataset</bold>. Bottom row: blocks containing anomalies detected by our method are framed in red. From left to right: cyclist, vehicle, vehicle, cyclist.</p></caption>
<graphic xlink:href="fict-04-00010-g009.tif"/>
</fig>
<p>The ped1 scenario contains 36 testing and 34 training videos, as well as labeled ground truth at the frame level and pixel level. We use the training sequences, which are composed only by normal events, to initialize the reference LAF histogram space. Results on this dataset are summarized by the area under the ROC curve (AUC). At the frame level, error measures rely on a frame-by-frame binary classification, while at the pixel level, the measures are based on ground truth masks which are partially provided by the authors of the dataset (Mahadevan et al., <xref ref-type="bibr" rid="B48">2010</xref>), and extended to the full dataset by a more recent work (Antic and Ommer, <xref ref-type="bibr" rid="B4">2011</xref>). We will respectively refer to them as partial and full ground truth from now on.</p>
<p>We first present visual results of our method in Figure <xref ref-type="fig" rid="F9">9</xref>, where we can observe that blocks containing anomalies are accurately localized. AUC values are given in Table <xref ref-type="table" rid="T5">5</xref> for three motion-based methods AD (Adam et al., <xref ref-type="bibr" rid="B1">2008</xref>), SF (Mehran et al., <xref ref-type="bibr" rid="B50">2009</xref>), and SS (Hu et al., <xref ref-type="bibr" rid="B31">2013</xref>), and for our method (including also a multi-grid extension which is explained later on). Since the pixel-level evaluation is not available for all the motion-based methods on the full ground truth, we supply a complementary comparison on the partial ground truth in Table <xref ref-type="table" rid="T6">6</xref>.</p>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption><p><bold>Anomaly detection performance (AUC) on the UCSD dataset (ped1) at the frame level evaluated on the full ground truth (Antic and Ommer, <xref ref-type="bibr" rid="B4">2011</xref>)</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left"><bold>Criterion</bold></th>
<th align="center"><bold>AD (Adam et al., <xref ref-type="bibr" rid="B1">2008</xref>)</bold></th>
<th align="center"><bold>SF (Mehran et al., <xref ref-type="bibr" rid="B50">2009</xref>)</bold></th>
<th align="center"><bold>SS (Hu et al., <xref ref-type="bibr" rid="B31">2013</xref>)</bold></th>
<th align="center"><bold>Ours single-grid</bold></th>
<th align="center"><bold>Ours 3-grids</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Frame</td>
<td align="center">65.0</td>
<td align="center">77.0</td>
<td align="center"><bold>87.0</bold></td>
<td align="center">79.9</td>
<td align="center"><underline>82.8</underline></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Best results are indicated with bold font, and second best are underlined</italic>.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T6">
<label>Table 6</label>
<caption><p><bold>Comparison with motion-based methods</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left"><bold>Criterion</bold></th>
<th align="center"><bold>AD (Adam et al., <xref ref-type="bibr" rid="B1">2008</xref>)</bold></th>
<th align="center"><bold>SF (Mehran et al., <xref ref-type="bibr" rid="B50">2009</xref>)</bold></th>
<th align="center"><bold>MIM (Lee et al., <xref ref-type="bibr" rid="B42">2015</xref>)</bold></th>
<th align="center"><bold>SS (Hu et al., <xref ref-type="bibr" rid="B31">2013</xref>)</bold></th>
<th align="center"><bold>Ours single-grid</bold></th>
<th align="center"><bold>Ours 3-grids</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Pixel</td>
<td align="center">18.0</td>
<td align="center">21.0</td>
<td align="center"><underline>64.9</underline></td>
<td align="center"><bold>66.0</bold></td>
<td align="center">59.48</td>
<td align="center">63.77</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Best results are indicated with bold font and second best are underlined</italic>.</p>
<p><italic>Anomaly detection performance (AUC) on the UCSD dataset (ped1) at the pixel level evaluated with the partial ground truth (Mahadevan et al., <xref ref-type="bibr" rid="B48">2010</xref>)</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>Our anomaly localization output is given as a binary variable for each block <inline-formula><mml:math id="M99"><mml:msub><mml:mrow><mml:mi mathvariant="script">B</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> at every time <italic>t</italic>. However, in order to compute the ROC curves, we make use of moving object masks &#x003D2;(<italic>t</italic>) computed before the determination of the per-pixel motion classes. We intersect them with blocks labeled as anomalous. Thus, for the pixel-level evaluation of our method, the anomaly detection mask at each time instant <italic>t</italic>, is given by <inline-formula><mml:math id="M100"><mml:msub><mml:mrow><mml:mo class="MathClass-bin">&#x0222A;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mo class="MathClass-op">&#x003D2;</mml:mo><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x02229;</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="script">B</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:math></inline-formula>, for the <inline-formula><mml:math id="M101"><mml:msub><mml:mrow><mml:mi mathvariant="script">B</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>&#x02019;s labeled as anomalous.</p>
<p>Since our per-block anomaly detection output depends on how anomalous events are positioned within the fixed block grid, our method may fail to detect an anomaly (or at least part of its support), when the anomaly lies astride two or more blocks. In order to overcome this problem, we can extend our method by combining additional output of our algorithm computed over two more grids. These grids are horizontally and vertically displaced versions of the original one by half the length of a single block. The combination is simply achieved by intersecting all the blocks detected as anomalous in the three grids, with the motion detection support at each frame.</p>
<p>In this demanding dataset, our method shows strengths that enable it to detect most anomalies. For instance, Figure <xref ref-type="fig" rid="F9">9</xref> (4th column) shows correct anomaly detection of a cyclist, which is a difficult case because it is moving at a similar speed as the walking people. Let us also stress that from the perspective of the camera, the cyclist looks not that different from a normal pedestrian. However, the difference in the leg motion of the cyclist (or the skateboarders in other sequences) with respect to pedestrians is captured by the LAF histograms. In fact, the normal walking usually involves Scaling motion classes, which is not the case of the cyclist.</p>
<p>Our method supplies competitive results in this dataset as reported in Table <xref ref-type="table" rid="T5">5</xref>. Our method is the second top performing one among motion-based methods for the frame-level criterion on the full ground truth dataset. For the pixel-level evaluation, only results on the partial ground truth dataset are available for other motion-based methods. They are given in Table <xref ref-type="table" rid="T6">6</xref>. This score partly allows for localization assessment. We can notice that our method exhibits a very significant performance improvement of almost 40 points with respect to the motion-based methods (Adam et al., <xref ref-type="bibr" rid="B1">2008</xref>; Mehran et al., <xref ref-type="bibr" rid="B50">2009</xref>), while being (for the 3-grid version) on par with MIM (Lee et al., <xref ref-type="bibr" rid="B42">2015</xref>), which is a recently published method developed in parallel to ours, and slightly inferior to SS (Hu et al., <xref ref-type="bibr" rid="B31">2013</xref>). However, in contrast to ours, the latter cannot actually deliver results on the fly, since it is based on a two-round scanning which needs the global distribution of the likelihood test values computed in the first scan of the video.</p>
</sec>
<sec id="S5-11">
<label>5.4</label> <title>Additional Experiments on Videos with Local Anomalous Motion</title>
<sec id="S5-11-1">
<label>5.4.1</label> <title>Wrong Way Video</title>
<p>The Wrong way video contains 445 frames acquired by a camera pointing toward a crowd passing by. A person is walking in the opposite direction of the crowd. Thus, this interval comprises the anomalous event. We processed the video from frame 160 to frame 223. We split the training set into two parts to capture normal behaviors. For the lower half of the scene, we use the interval of frames starting from frame 30 to frame 75. For the upper part of the scene, we take the interval of frames that goes from frame 235 to frame 280. Sample images with overlaid results are shown in Figure <xref ref-type="fig" rid="F10">10</xref>. The interaction with the other people, of the man pushing his way through the crowd, makes the other people modify their own motion. They rotate to avoid him, and consequently, participate to the anomalous event. Visual results provided in Figure <xref ref-type="fig" rid="F10">10</xref> show that our block-based detection method performs well and is able to accurately detect both the anomalous motion of the man and of the people he is in contact with.</p>
<fig position="float" id="F10">
<label>Figure 10</label>
<caption><p><bold>Results obtained by our method (single grid version) on the Wrong Way video at four time points</bold>. Blocks including anomalous motion detected by our method are framed in red.</p></caption>
<graphic xlink:href="fict-04-00010-g010.tif"/>
</fig>
</sec>
<sec id="S5-11-2">
<label>5.4.2</label> <title>Music Festival Video</title>
<p>In this video sequence, people are starting a &#x0201C;circle pit&#x0201D; during a music festival. The processed video clip contains 18 frames. We took the top half of the first 5 frames to populate the reference LAF histogram space. Sample results are reported in Figure <xref ref-type="fig" rid="F11">11</xref>. Our algorithm is able to capture the beginning of the anomalous event (Figure <xref ref-type="fig" rid="F11">11</xref> left) and correctly delineate it when the circle pit is fully formed (Figure <xref ref-type="fig" rid="F11">11</xref> right).</p>
<fig position="float" id="F11">
<label>Figure 11</label>
<caption><p><bold>Result samples of the Music Festival video</bold>. Blocks including anomalous motion detected by our method (single grid version) are framed in red.</p></caption>
<graphic xlink:href="fict-04-00010-g011.tif"/>
</fig>
</sec>
</sec>
<sec id="S5-12">
<label>5.5</label> <title>Drawbacks of Our Method</title>
<p>In previous sections, we described our method and demonstrated its relevance for the complex task of detecting motion-based anomalies in videos. However, our method is not free of drawbacks and failure cases. In particular, since the LOF measure is density-based, a sufficient number of &#x0201C;normal&#x0201D; events should be captured on the training phase of our method. Furthermore, the resolution of detected anomalies is directly related with the chosen block size, together with the capabilities of the chosen motion detector. In other words, anomalies that are too small to be captured by either our block-based method or motion detector might be ignored. Nonetheless, our method provides promising results on real (noisy) datasets, where in fact these issues did not seem to be of major concern. Let us recall that UMN, UCSD, and PETS datasets were built from real commercial surveillance cameras, at low resolutions and with compression artifacts.</p>
</sec>
<sec id="S5-13">
<label>5.6</label> <title>Computation Time</title>
<p>The current implementation of our whole workflow in C&#x0002B;&#x0002B; enables us to process 1.3 frames per second with 2.5&#x02009;GHz CPU, as reported in Table <xref ref-type="table" rid="T7">7</xref>, the computation of the collection of affine motion models included. Several steps of the method are parallelizable, which, if implemented, could lead to greatly decrease the computing time. In particular, computation of per-block histograms and evaluation of LOF, which currently takes around 65% of the total execution time, can be effectively processed in parallel.</p>
<table-wrap position="float" id="T7">
<label>Table 7</label>
<caption><p><bold>Frames per second (FPS) processing of various methods</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Method</th>
<th align="center">SRC (Cong et al., <xref ref-type="bibr" rid="B21">2013</xref>)</th>
<th align="center">BM (Wu et al., <xref ref-type="bibr" rid="B68">2014</xref>)</th>
<th align="center">SS (Hu et al., <xref ref-type="bibr" rid="B31">2013</xref></th>
<th align="center"><bold>Ours</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">FPS</td>
<td align="center">0.263</td>
<td align="center">1.037</td>
<td align="center"><bold>5</bold></td>
<td align="center"><underline>1.302</underline></td>
</tr>
<tr>
<td align="left">CPU (GHz)</td>
<td align="center">2.6</td>
<td align="center">3.16</td>
<td align="center">3</td>
<td align="center">2.5</td>
</tr>
<tr>
<td align="left">Platform</td>
<td align="center">Matlab</td>
<td align="center">Matlab</td>
<td align="center">n.c.</td>
<td align="center">C&#x0002B;&#x0002B;</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Best rate is shown in bold, and second best are underlined</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>We also provide in Table <xref ref-type="table" rid="T7">7</xref> a comparison of the execution time measured in processed frames per second (FPS) for several motion-based anomaly detection methods. We acknowledge that the numbers correspond to different implementations, so that this comparison is only indicative. Besides, the reported execution time may not encompass the whole workflow and the size of the processed images may vary, making computation time comparison tricky. For instance, the execution time for the SS method (Hu et al., <xref ref-type="bibr" rid="B31">2013</xref>) does not include the computation of the optical flow fields and of the cumulative flow word histograms. Nevertheless, our method is prone to process more efficiently than other reported methods, due to its capacity to be highly parallelized.</p>
</sec>
<sec id="S5-14">
<label>5.7</label> <title>Impact of LAF Histograms</title>
<p>In this section, we aim to demonstrate the contribution of the LAF histograms we have introduced. To this end, we compare our algorithm with a modified version of itself. The modification consists in building histograms of optical flow (which translates into histograms of Translation motion classes). In fact, we used optical flows provided by three methods: the pyramidal implementation of the Lucas&#x02013;Kanade method (LK) (Bouguet, <xref ref-type="bibr" rid="B11">2001</xref>), the variational method defined by Brox et al. (<xref ref-type="bibr" rid="B13">2004</xref>), and the polynomial expansion-based flow estimation method (FB) (Farneb&#x000E4;ck, <xref ref-type="bibr" rid="B25">2003</xref>). We built these flow-based variants of our method by computing optical flow histograms (HOF) still weighted by motion vector magnitudes (which is equivalent to histograms of Translation classes only), but with more quantized orientations (12 bins). Results on the UCSD ped1 (full ground truth) dataset are reported in Figure <xref ref-type="fig" rid="F12">12</xref>. They show that our original method greatly outperforms the flow-based versions, which yet benefit from a finer quantization of translation orientations. Our method clearly leverages labeled affine flows and LAF histograms to get superior performance. We believe that the reason is twofold: (i) computing affine flow yields less noisy flow vectors and (ii) introducing histograms of local motion classes bring more explicit information on the nature of the motion.</p>
<fig position="float" id="F12">
<label>Figure 12</label>
<caption><p><bold>Left: anomaly detection evaluated with the pixel-level criterion</bold>. Plots of ROC curves for our method with LAF histograms and for flow-based baselines on the UCSD ped1 (full ground truth) dataset. Right: ROC curves for the pixel-level criterion on the UCSD ped1 dataset with full ground truth. In green: results obtained using the LOF measure and 8 Translation classes. In red: results obtained with the LOF measure with only 4 Translation classes. In blue: results obtained with the baseline version without LOF (histogram distance thresholding).</p></caption>
<graphic xlink:href="fict-04-00010-g012.tif"/>
</fig>
</sec>
<sec id="S5-15">
<label>5.8</label> <title>Impact of LOF Criterion and Number of Translation Classes</title>
<p>We now demonstrate the beneficial role of the LOF criterion. To assess it, we have compared the LOF criterion to a baseline version of our algorithm which directly thresholds LAF histogram distances. For this baseline version, the histogram <inline-formula><mml:math id="M102"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> extracted from training data is called the &#x0201C;reference histogram&#x0201D; of a given block <inline-formula><mml:math id="M103"><mml:msub><mml:mrow><mml:mi mathvariant="script">B</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. Each bin <italic>l</italic> of this histogram is computed as follows:
<disp-formula id="E17"><label>(14)</label><mml:math id="M104"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>l</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mrow><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>&#x003C2;</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:munderover></mml:mstyle></mml:mrow><mml:mtext>&#x02009;</mml:mtext><mml:mrow><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>&#x003C4;</mml:mi><mml:mo>=</mml:mo><mml:mi>&#x003C2;</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>&#x003C2;</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover></mml:mstyle></mml:mrow><mml:mtext>&#x02009;</mml:mtext><mml:mrow><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mi>&#x0212C;</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:munder></mml:mstyle></mml:mrow><mml:mtext>&#x02009;</mml:mtext><mml:mi>&#x003C8;</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo class="MathClass-punc">,</mml:mo><mml:mn>&#x003C4;</mml:mn><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>l</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-punc">,</mml:mo></mml:math></disp-formula>
where <italic>T<sub>i</sub></italic> is the number of histograms computed over the training sequence corresponding to normal behaviors for block <inline-formula><mml:math id="M105"><mml:msub><mml:mrow><mml:mi mathvariant="script">B</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, and <italic>&#x003C8;</italic>(<italic>p</italic>, <italic>t</italic>, <italic>l</italic>) is defined in equation (<xref ref-type="disp-formula" rid="E6">6</xref>). In the proposed method described in the paper, these <italic>T<sub>i</sub></italic> histograms form the feature space where the LOF measure of the histogram <inline-formula><mml:math id="M106"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> is computed for a given block <inline-formula><mml:math id="M107"><mml:msub><mml:mrow><mml:mi mathvariant="script">B</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> at time <italic>t</italic>. For the baseline method, we compute the average histogram <inline-formula><mml:math id="M108"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> from these <italic>T<sub>i</sub></italic> histograms as defined in equation (<xref ref-type="disp-formula" rid="E14">14</xref>).</p>
<p>The rule for setting the detection threshold in the baseline version is similar to the one used in our proposed method with LOF but applies directly to the distance between the test histogram <inline-formula><mml:math id="M109"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>, computed at time instant <italic>t</italic>, and the reference histogram <inline-formula><mml:math id="M110"><mml:msubsup><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> for the corresponding block <inline-formula><mml:math id="M111"><mml:msub><mml:mrow><mml:mi mathvariant="script">B</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. The threshold in the baseline version is computed at every block from a <italic>p</italic>-value on the distribution of distances of the reference histogram to the available training histograms for that block. The same spatiotemporal filtering is applied to the output of the thresholding of the distance between the test histogram and the reference histogram.</p>
<p>We compare these two versions by providing ROC curves obtained on the UCSD ped1 dataset for the pixel-level evaluation criterion in Figure <xref ref-type="fig" rid="F12">12</xref>. At the same time, we provide results with a smaller number of translation classes for our full LOF-based method. The areas under the ROC curves are summarized in Table <xref ref-type="table" rid="T8">8</xref>. With these experiments, it is clearly demonstrated that the local outlier factor is effectively of great importance in our proposed pipeline. Moreover, the use of eight translation classes shows a substantial advantage over using only four translation classes.</p>
<table-wrap position="float" id="T8">
<label>Table 8</label>
<caption><p><bold>Areas under the curves (AUC) of three versions of our anomalous motion detection method on the UCSD ped1 dataset with full ground truth</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Variant</th>
<th align="center">Baseline without LOF</th>
<th align="center">8 translation classes (LOF)</th>
<th align="center">4 translation classes (LOF)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Pixel level</td>
<td align="center">63.10</td>
<td align="center"><bold>74.36</bold></td>
<td align="center">71.27</td>
</tr>
<tr>
<td align="left">Frame level</td>
<td align="center">74.64</td>
<td align="center"><bold>79.85</bold></td>
<td align="center">76.68</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Best results are indicated with bold font and second best are underlined</italic>.</p>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="S5-16">
<label>5.9</label> <title>Statistical Significance of the Presented Experiments</title>
<p>In order to determine whether the gain in performance is statistically significant, we adopt a binomial test of statistic significance for all the comparisons at the frame level provided in the previous subsections. The choice of the binomial test is explained by the fact that frame-level detection involves a binary labeling process (normal vs. anomalies). Assuming that the null hypothesis is to have methods with equal score (<italic>p</italic><sub>1</sub>&#x02009;&#x0003D;&#x02009;<italic>p</italic><sub>2</sub>), and the alternative hypothesis to have different scores (<italic>p</italic><sub>1</sub>&#x02009;&#x02260;&#x02009;<italic>p</italic><sub>2</sub>), we need to compute the test statistic, which is given by
<disp-formula id="E18"><label>(15)</label><mml:math id="M112"><mml:mi>z</mml:mi><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>p</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>p</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msqrt><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-op">&#x0005E;</mml:mo></mml:mover><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-op">&#x0005E;</mml:mo></mml:mover></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mn>2</mml:mn><mml:mo class="MathClass-bin">&#x02215;</mml:mo><mml:mi>N</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac><mml:mo class="MathClass-punc">,</mml:mo></mml:math></disp-formula>
where <inline-formula><mml:math id="M113"><mml:mover accent="true"><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo class="MathClass-op">&#x0005E;</mml:mo></mml:mover><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>p</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo class="MathClass-bin">&#x0002B;</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>p</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac></mml:math></inline-formula>, <inline-formula><mml:math id="M114"><mml:msub><mml:mover accent='true'><mml:mi>p</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>, and <inline-formula><mml:math id="M115"><mml:msub><mml:mover accent='true'><mml:mi>p</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> are the computed scores normalized to 0&#x02013;1 range, and <italic>N</italic> is the number of frames of the particular experiment (i.e., total number of frames in the videos of the given dataset).</p>
<p>The test is intended to verify if our scores are different (better or worse) to the scores of other methods with a particular degree of statistical significance. We set the test to be at the 95% confidence level. In other words, to reject the null hypothesis, we need to compare <italic>z</italic> to the critical region value of <italic>z<sub><italic>&#x003B1;</italic></sub></italic><sub>/2</sub>, with the cutoff <italic>&#x003B1;</italic>&#x02009;&#x0003D;&#x02009;0.05, i.e., <italic>z<sub><italic>&#x003B1;</italic></sub></italic><sub>/2</sub>&#x02009;&#x0003D;&#x02009;1.96. If &#x0007C;<italic>z</italic>&#x0007C;&#x02009;&#x0003C;&#x02009;<italic>z<sub><italic>&#x003B1;</italic></sub></italic><sub>/2</sub>, the null hypothesis is rejected. We can infer the best method by the sign of <italic>z</italic>, or, alternatively, the sign of <inline-formula><mml:math id="M116"><mml:msub><mml:mover accent='true'><mml:mi>p</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>p</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>.</p>
<p>In particular, for the UMN and UCSD dataset we take accuracy scores at the Equal Error Rate point of the ROC curve to evaluate significance. For PETS 2009, we simply use the accuracy scores provided in the &#x0201C;Overall&#x0201D; row of Table <xref ref-type="table" rid="T4">4</xref>. All the significance tests are summarized in Table <xref ref-type="table" rid="T9">9</xref>.</p>
<table-wrap position="float" id="T9">
<label>Table 9</label>
<caption><p><bold>Significance of the frame-level experiments for all the presented datasets</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center" colspan="3">UCSD<hr/></th>
<th align="center" colspan="3">UMN<hr/></th>
<th align="center" colspan="3">PETS2009 S1<hr/></th>
<th align="center" colspan="3">PETS2009 S2<hr/></th>
</tr>
<tr>
<th align="center">Comp.</th>
<th align="center"><italic>z</italic></th>
<th align="center">Sig.</th>
<th align="center">Comp.</th>
<th align="center"><italic>z</italic></th>
<th align="center">Sig.</th>
<th align="center">Comp. <italic>z</italic></th>
<th align="center">p</th>
<th align="center">Sig.</th>
<th align="center">Comp</th>
<th align="center">p</th>
<th align="center">Sig.</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">SS</td>
<td align="center">7.03</td>
<td align="center">Yes</td>
<td align="center">SRC</td>
<td align="center">1.09</td>
<td align="center"><bold>No</bold></td>
<td align="center">BM</td>
<td align="center">&#x02212;0.99</td>
<td align="center"><bold>No</bold></td>
<td align="center">BM</td>
<td align="center">&#x02212;0.90</td>
<td align="center"><bold>No</bold></td>
</tr>
<tr>
<td align="left">SF</td>
<td align="center">&#x02212;8.68</td>
<td align="center">Yes</td>
<td align="center">SF</td>
<td align="center">&#x02212;21.90</td>
<td align="center">Yes</td>
<td align="center">SF</td>
<td align="center">&#x02212;8.50</td>
<td align="center">Yes</td>
<td align="center">SF</td>
<td align="center">&#x02212;3.83</td>
<td align="center">Yes</td>
</tr>
<tr>
<td align="left">AD</td>
<td align="center">&#x02212;24.31</td>
<td align="center">Yes</td>
<td align="center">CI</td>
<td align="center">&#x02212;6.91</td>
<td align="center">Yes</td>
<td align="center">CI</td>
<td align="center">&#x02212;4.54</td>
<td align="center">Yes</td>
<td align="center">CI</td>
<td align="center">&#x02212;1.74</td>
<td align="center"><bold>No</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>The non-significant gains are shown in bold for clarity. The <italic>z</italic> scores are computed from a binomial test (see text)</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>For UMN, it turns out that SRC and our method are not statistically different, but our method surpasses with statistical significance SF and CI methods. Similarly, for the PETS 2009 experiments, the results of our method are significantly better than FF, CI and SF methods. Our improvement over BM turned out to be not significant under the 95% confidence test. Finally, for UCSD dataset, all the results from Table <xref ref-type="table" rid="T5">5</xref> are statistically significant when comparing our multi-grid method against all the others (including our single-grid method).</p>
</sec>
</sec>
<sec id="S6">
<label>6</label> <title>Conclusion</title>
<p>We have presented an original and efficient anomalous motion detection-and-localization method which can capture diverse kinds of anomalous motion in common real scenarios. It can work in a fully unsupervised and online way for crowded scenes. The LAF histogram and the data-driven detection criterion based on the LOF factor are two distinctive contributions of our approach. Threshold value for anomaly detection decision can be automatically and locally adapted, based on statistical arguments. The current implementation of our algorithm is fast, although it could be significantly further accelerated by doing massive parallelization over blocks. Our method supplies state-of-the art results in several experiments and competitive results for the other ones. It can successfully deal with datasets comprising different camera viewpoints and dynamic contents involving both local and global anomalous motion. Local anomalous motion localization is inherent in the block-based proposed method and was experimentally demonstrated accurate enough.</p>
</sec>
<sec id="S7">
<title>Author Contributions</title>
<p>The authors equally contribute to the research and the paper writing.</p>
</sec>
<sec id="S8">
<title>Conflict of Interest Statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<ack>
<p>We thank the authors of Antic and Ommer (<xref ref-type="bibr" rid="B4">2011</xref>) and Cong et al. (<xref ref-type="bibr" rid="B21">2013</xref>) for providing us useful experimental data. We would also like to thank the authors of Ferryman and Shahrokni (<xref ref-type="bibr" rid="B26">2009</xref>), Li et al. (<xref ref-type="bibr" rid="B46">2014</xref>), and Papanikolopoulos (<xref ref-type="bibr" rid="B53">2005</xref>) for generating the datasets that were used in the experimental section of this paper, and for providing them to the public for use and reproduction without need of any permissions. This work was partially supported by R&#x000E9;gion Bretagne (Brittany Council) through a contribution to AB&#x02019;s PhD student grant.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Adam</surname> <given-names>A.</given-names></name> <name><surname>Rivlin</surname> <given-names>E.</given-names></name> <name><surname>Shimshoni</surname> <given-names>I.</given-names></name> <name><surname>Reinitz</surname> <given-names>D.</given-names></name></person-group> (<year>2008</year>). <article-title>Robust real-time unusual event detection using multiple fixed-location monitors</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell.</source> <volume>30</volume>, <fpage>555</fpage>&#x02013;<lpage>560</lpage>.<pub-id pub-id-type="doi">10.1109/TPAMI.2007.70825</pub-id></citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aggarwal</surname> <given-names>J. K.</given-names></name> <name><surname>Ryoo</surname> <given-names>M. S.</given-names></name></person-group> (<year>2011</year>). <article-title>Human activity analysis: a review</article-title>. <source>ACM Comput. Surv.</source> <volume>43</volume>, <fpage>16</fpage>.<pub-id pub-id-type="doi">10.1145/1922649.1922653</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Andersson</surname> <given-names>M.</given-names></name> <name><surname>Gustafsson</surname> <given-names>F.</given-names></name> <name><surname>St-Laurent</surname> <given-names>L.</given-names></name> <name><surname>Prevost</surname> <given-names>D.</given-names></name></person-group> (<year>2013</year>). <article-title>Recognition of anomalous motion patterns in urban surveillance</article-title>. <source>IEEE J. Sel. Top. Signal Process.</source> <volume>7</volume>, <fpage>102</fpage>&#x02013;<lpage>110</lpage>.<pub-id pub-id-type="doi">10.1109/JSTSP.2013.2237882</pub-id></citation></ref>
<ref id="B4"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Antic</surname> <given-names>B.</given-names></name> <name><surname>Ommer</surname> <given-names>B.</given-names></name></person-group> (<year>2011</year>). &#x0201C;<article-title>Video parsing for abnormality detection</article-title>,&#x0201D; in <source>ICCV</source>, <publisher-loc>Barcelona</publisher-loc>.</citation></ref>
<ref id="B5"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Basharat</surname> <given-names>A.</given-names></name> <name><surname>Gritai</surname> <given-names>A.</given-names></name> <name><surname>Shah</surname> <given-names>M.</given-names></name></person-group> (<year>2008</year>). &#x0201C;<article-title>Learning object motion patterns for anomaly detection and improved object detection</article-title>,&#x0201D; in <source>CVPR</source>, <publisher-loc>Anchorage</publisher-loc>.</citation></ref>
<ref id="B6"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Basset</surname> <given-names>A.</given-names></name> <name><surname>Bouthemy</surname> <given-names>P.</given-names></name> <name><surname>Kervrann</surname> <given-names>C.</given-names></name></person-group> (<year>2014</year>). &#x0201C;<article-title>Recovery of motion patterns and dominant paths in videos of crowded scenes</article-title>,&#x0201D; in <source>ICIP</source>.</citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Benezeth</surname> <given-names>Y.</given-names></name> <name><surname>Jodoin</surname> <given-names>P.-M.</given-names></name> <name><surname>Saligrama</surname> <given-names>V.</given-names></name></person-group> (<year>2011</year>). <article-title>Abnormality detection using low-level co-occurring events</article-title>. <source>Pattern Recognit. Lett.</source> <volume>32</volume>, <fpage>423</fpage>&#x02013;<lpage>431</lpage>.<pub-id pub-id-type="doi">10.1016/j.patrec.2010.10.008</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bertini</surname> <given-names>M.</given-names></name> <name><surname>Del Bimbo</surname> <given-names>A.</given-names></name> <name><surname>Seidenari</surname> <given-names>L.</given-names></name></person-group> (<year>2012</year>). <article-title>Multi-scale and real-time non-parametric approach for anomaly detection and localization</article-title>. <source>Comput. Vis. Image Underst.</source> <volume>116</volume>, <fpage>320</fpage>&#x02013;<lpage>329</lpage>.<pub-id pub-id-type="doi">10.1016/j.cviu.2011.09.009</pub-id></citation></ref>
<ref id="B9"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Biswas</surname> <given-names>S.</given-names></name> <name><surname>Babu</surname> <given-names>R. V.</given-names></name></person-group> (<year>2014</year>). &#x0201C;<article-title>Sparse representation based anomaly detection with enhanced local dictionaries</article-title>,&#x0201D; in <source>ICIP</source>, <publisher-loc>Paris</publisher-loc>.</citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Boiman</surname> <given-names>O.</given-names></name> <name><surname>Irani</surname> <given-names>M.</given-names></name></person-group> (<year>2007</year>). <article-title>Detecting irregularities in images and in video</article-title>. <source>Int. J. Comput. Vis.</source> <volume>74</volume>, <fpage>17</fpage>&#x02013;<lpage>31</lpage>.<pub-id pub-id-type="doi">10.1007/s11263-006-0009-9</pub-id></citation></ref>
<ref id="B11"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Bouguet</surname> <given-names>J.-Y.</given-names></name></person-group> (<year>2001</year>). <source>Pyramidal Implementation of the Affine Lucas Kanade Feature Tracker. Description of the Algorithm</source>. <publisher-loc>Standford</publisher-loc>: <publisher-name>Intel Co</publisher-name>, <fpage>5</fpage>.</citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Breunig</surname> <given-names>M. M.</given-names></name> <name><surname>Kriegel</surname> <given-names>H.-P.</given-names></name> <name><surname>Ng</surname> <given-names>R. T.</given-names></name> <name><surname>Sander</surname> <given-names>J.</given-names></name></person-group> (<year>2000</year>). &#x0201C;<article-title>LOF: identifying density-based local outliers</article-title>,&#x0201D; in <source>ACM SIGMOD Record</source>, Vol. <volume>29</volume>, <fpage>93</fpage>&#x02013;<lpage>104</lpage>.</citation></ref>
<ref id="B13"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Brox</surname> <given-names>T.</given-names></name> <name><surname>Bruhn</surname> <given-names>A.</given-names></name> <name><surname>Papenberg</surname> <given-names>N.</given-names></name> <name><surname>Weickert</surname> <given-names>J.</given-names></name></person-group> (<year>2004</year>). &#x0201C;<article-title>High accuracy optical flow estimation based on a theory for warping</article-title>,&#x0201D; in <source>ECCV</source>, <publisher-loc>Prague</publisher-loc>.</citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cavanaugh</surname> <given-names>J. E.</given-names></name></person-group> (<year>1997</year>). <article-title>Unifying the derivations for the Akaike and corrected Akaike information criteria</article-title>. <source>Stat. Probab. Lett.</source> <volume>33</volume>, <fpage>201</fpage>&#x02013;<lpage>208</lpage>.<pub-id pub-id-type="doi">10.1016/S0167-7152(96)00128-9</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cedras</surname> <given-names>C.</given-names></name> <name><surname>Shah</surname> <given-names>M.</given-names></name></person-group> (<year>1995</year>). <article-title>Motion-based recognition a survey</article-title>. <source>Image Vis. Comput.</source> <volume>13</volume>, <fpage>129</fpage>&#x02013;<lpage>155</lpage>.<pub-id pub-id-type="doi">10.1016/0262-8856(95)93154-K</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cha</surname> <given-names>S.-H.</given-names></name> <name><surname>Srihari</surname> <given-names>S. N.</given-names></name></person-group> (<year>2002</year>). <article-title>On measuring the distance between histograms</article-title>. <source>Pattern Recognit.</source> <volume>35</volume>, <fpage>1355</fpage>&#x02013;<lpage>1370</lpage>.<pub-id pub-id-type="doi">10.1016/S0031-3203(01)00118-2</pub-id></citation></ref>
<ref id="B17"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Chandola</surname> <given-names>V.</given-names></name> <name><surname>Banerjee</surname> <given-names>A.</given-names></name> <name><surname>Kumar</surname> <given-names>V.</given-names></name></person-group> (<year>2009</year>). &#x0201C;<article-title>Anomaly detection: a survey</article-title>,&#x0201D; in <source>ACM CSUR</source> (<publisher-loc>New York</publisher-loc>), <fpage>41</fpage>.</citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>D.-Y.</given-names></name> <name><surname>Huang</surname> <given-names>P.-C.</given-names></name></person-group> (<year>2011</year>). <article-title>Motion-based unusual event detection in human crowds</article-title>. <source>J. Vis. Commun. Image Represent.</source> <volume>22</volume>, <fpage>178</fpage>&#x02013;<lpage>186</lpage>.<pub-id pub-id-type="doi">10.1016/j.jvcir.2010.12.004</pub-id></citation></ref>
<ref id="B19"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Cheng</surname> <given-names>K.-W.</given-names></name> <name><surname>Chen</surname> <given-names>Y.-T.</given-names></name> <name><surname>Fang</surname> <given-names>W.-H.</given-names></name></person-group> (<year>2015</year>). &#x0201C;<article-title>Video anomaly detection and localization using hierarchical feature representation and Gaussian process regression</article-title>,&#x0201D; in <source>CVPR</source>, <publisher-loc>Boston</publisher-loc>.</citation></ref>
<ref id="B20"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Chockalingam</surname> <given-names>T.</given-names></name> <name><surname>Emonet</surname> <given-names>R.</given-names></name> <name><surname>Odobez</surname> <given-names>J.-M.</given-names></name></person-group> (<year>2013</year>). &#x0201C;<article-title>Localized anomaly detection via hierarchical integrated activity discovery</article-title>,&#x0201D; in <source>AVSS</source>, <publisher-loc>Krakow</publisher-loc>.</citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cong</surname> <given-names>Y.</given-names></name> <name><surname>Yuan</surname> <given-names>J.</given-names></name> <name><surname>Liu</surname> <given-names>J.</given-names></name></person-group> (<year>2013</year>). <article-title>Abnormal event detection in crowded scenes using sparse representation</article-title>. <source>Pattern Recognit.</source> <volume>46</volume>, <fpage>1851</fpage>&#x02013;<lpage>1864</lpage>.<pub-id pub-id-type="doi">10.1016/j.patcog.2012.11.021</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Crivelli</surname> <given-names>T.</given-names></name> <name><surname>Bouthemy</surname> <given-names>P.</given-names></name> <name><surname>Cernuschi-Frias</surname> <given-names>B.</given-names></name> <name><surname>Yao</surname> <given-names>J.-F.</given-names></name></person-group> (<year>2011</year>). <article-title>Simultaneous motion detection and background reconstruction with a conditional mixed-state Markov random field</article-title>. <source>Int. J. Comput. Vis.</source> <volume>94</volume>, <fpage>295</fpage>&#x02013;<lpage>316</lpage>.<pub-id pub-id-type="doi">10.1007/s11263-011-0429-z</pub-id></citation></ref>
<ref id="B23"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Cui</surname> <given-names>X.</given-names></name> <name><surname>Liu</surname> <given-names>Q.</given-names></name> <name><surname>Gao</surname> <given-names>M.</given-names></name> <name><surname>Metaxas</surname> <given-names>D. N.</given-names></name></person-group> (<year>2011</year>). &#x0201C;<article-title>Abnormal detection using interaction energy potentials</article-title>,&#x0201D; in <source>CVPR</source>, <publisher-loc>Colorado Springs</publisher-loc>.</citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fang</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>Z.</given-names></name> <name><surname>Lin</surname> <given-names>W.</given-names></name> <name><surname>Fang</surname> <given-names>Z.</given-names></name></person-group> (<year>2014</year>). <article-title>Video saliency incorporating spatiotemporal cues and uncertainty weighting</article-title>. <source>IEEE Trans. Image Process.</source> <volume>23</volume>, <fpage>3910</fpage>&#x02013;<lpage>3921</lpage>.<pub-id pub-id-type="doi">10.1109/TIP.2014.2336549</pub-id><pub-id pub-id-type="pmid">25051549</pub-id></citation></ref>
<ref id="B25"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Farneb&#x000E4;ck</surname> <given-names>G.</given-names></name></person-group> (<year>2003</year>). &#x0201C;<article-title>Two-frame motion estimation based on polynomial expansion</article-title>,&#x0201D; in <source>SCIA</source>, <publisher-loc>Halmstad</publisher-loc>.</citation></ref>
<ref id="B26"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Ferryman</surname> <given-names>J.</given-names></name> <name><surname>Shahrokni</surname> <given-names>A.</given-names></name></person-group> (<year>2009</year>). &#x0201C;<article-title>An overview of the PETS 2009 challenge</article-title>,&#x0201D; in <source>PETS</source>.</citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fortun</surname> <given-names>D.</given-names></name> <name><surname>Bouthemy</surname> <given-names>P.</given-names></name> <name><surname>Kervrann</surname> <given-names>C.</given-names></name></person-group> (<year>2015</year>). <article-title>Optical flow modeling and computation: a survey</article-title>. <source>Comput. Vis. Image Underst.</source> <volume>134</volume>, <fpage>1</fpage>&#x02013;<lpage>21</lpage>.<pub-id pub-id-type="doi">10.1016/j.cviu.2015.02.008</pub-id></citation></ref>
<ref id="B28"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Georgiadis</surname> <given-names>G.</given-names></name> <name><surname>Ayvaci</surname> <given-names>A.</given-names></name> <name><surname>Soatto</surname> <given-names>S.</given-names></name></person-group> (<year>2012</year>). &#x0201C;<article-title>Actionable saliency detection: Independent motion detection without independent motion estimation</article-title>,&#x0201D; in <source>CVPR</source>, <publisher-loc>Rhode Island</publisher-loc>.</citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goyette</surname> <given-names>N.</given-names></name> <name><surname>Jodoin</surname> <given-names>P.-M.</given-names></name> <name><surname>Porikli</surname> <given-names>F.</given-names></name> <name><surname>Konrad</surname> <given-names>J.</given-names></name> <name><surname>Ishwar</surname> <given-names>P.</given-names></name></person-group> (<year>2014</year>). <article-title>A novel video dataset for change detection benchmarking</article-title>. <source>IEEE Trans. Image Process.</source> <volume>23</volume>, <fpage>4663</fpage>&#x02013;<lpage>4679</lpage>.<pub-id pub-id-type="doi">10.1109/TIP.2014.2346013</pub-id><pub-id pub-id-type="pmid">25122568</pub-id></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hospedales</surname> <given-names>T.</given-names></name> <name><surname>Gong</surname> <given-names>S.</given-names></name> <name><surname>Xiang</surname> <given-names>T.</given-names></name></person-group> (<year>2012</year>). <article-title>Video behaviour mining using a dynamic topic model</article-title>. <source>Int. J. Comput. Vis.</source> <volume>98</volume>, <fpage>303</fpage>&#x02013;<lpage>323</lpage>.<pub-id pub-id-type="doi">10.1007/s11263-011-0510-7</pub-id></citation></ref>
<ref id="B31"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Hu</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Davis</surname> <given-names>L.</given-names></name></person-group> (<year>2013</year>). &#x0201C;<article-title>Unsupervised abnormal crowd activity detection using semiparametric scan statistic</article-title>,&#x0201D; in <source>CVPRW</source>, <publisher-loc>Portland</publisher-loc>.</citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>C.-R.</given-names></name> <name><surname>Chang</surname> <given-names>Y.-J.</given-names></name> <name><surname>Yang</surname> <given-names>Z.-X.</given-names></name> <name><surname>Lin</surname> <given-names>Y.-Y.</given-names></name></person-group> (<year>2014</year>). <article-title>Video saliency map detection by dominant camera motion removal</article-title>. <source>IEEE Trans. Circuits Syst. Video Technol.</source> <volume>24</volume>, <fpage>1336</fpage>&#x02013;<lpage>1349</lpage>.<pub-id pub-id-type="doi">10.1109/TCSVT.2014.2308652</pub-id></citation></ref>
<ref id="B33"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Huber</surname> <given-names>P. J.</given-names></name></person-group> (<year>1981</year>). <source>Robust Statistics</source>. <publisher-loc>New Jersey</publisher-loc>: <publisher-name>Wiley</publisher-name>.</citation></ref>
<ref id="B34"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Itti</surname> <given-names>L.</given-names></name> <name><surname>Baldi</surname> <given-names>P.</given-names></name></person-group> (<year>2005</year>). &#x0201C;<article-title>A principled approach to detecting surprising events in video</article-title>,&#x0201D; in <source>CVPR</source>, <publisher-loc>San Diego</publisher-loc>.</citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jiang</surname> <given-names>F.</given-names></name> <name><surname>Yuan</surname> <given-names>J.</given-names></name> <name><surname>Tsaftaris</surname> <given-names>S. A.</given-names></name> <name><surname>Katsaggelos</surname> <given-names>A. K.</given-names></name></person-group> (<year>2011</year>). <article-title>Anomalous video event detection using spatiotemporal context</article-title>. <source>Comput. Vis. Image Underst.</source> <volume>115</volume>, <fpage>323</fpage>&#x02013;<lpage>333</lpage>.<pub-id pub-id-type="doi">10.1016/j.cviu.2010.10.008</pub-id></citation></ref>
<ref id="B36"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Jiang</surname> <given-names>M.</given-names></name> <name><surname>Xu</surname> <given-names>J.</given-names></name> <name><surname>Zhao</surname> <given-names>Q.</given-names></name></person-group> (<year>2014</year>). &#x0201C;<article-title>Saliency in crowd</article-title>,&#x0201D; in <source>ECCV</source>, <publisher-loc>Zurich</publisher-loc>.</citation></ref>
<ref id="B37"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>J.</given-names></name> <name><surname>Grauman</surname> <given-names>K.</given-names></name></person-group> (<year>2009</year>). &#x0201C;<article-title>Observe locally, infer globally: a space-time <sc>mrf</sc> for detecting abnormal activities with incremental updates</article-title>,&#x0201D; in <source>CVPR</source>, <publisher-loc>Miami Beach</publisher-loc>.</citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>W.</given-names></name> <name><surname>Kim</surname> <given-names>C.</given-names></name></person-group> (<year>2014</year>). <article-title>Spatiotemporal saliency detection using textural contrast and its applications</article-title>. <source>IEEE Trans. Circuits Syst. Video Technol.</source> <volume>24</volume>, <fpage>646</fpage>&#x02013;<lpage>659</lpage>.<pub-id pub-id-type="doi">10.1109/TCSVT.2013.2290579</pub-id></citation></ref>
<ref id="B39"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Kratz</surname> <given-names>L.</given-names></name> <name><surname>Nishino</surname> <given-names>K.</given-names></name></person-group> (<year>2009</year>). &#x0201C;<article-title>Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models</article-title>,&#x0201D; in <source>CVPR</source>, <publisher-loc>Miami Beach</publisher-loc>.</citation></ref>
<ref id="B40"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Lazarevic</surname> <given-names>A.</given-names></name> <name><surname>Ert&#x000F6;z</surname> <given-names>L.</given-names></name> <name><surname>Kumar</surname> <given-names>V.</given-names></name> <name><surname>Ozgur</surname> <given-names>A.</given-names></name> <name><surname>Srivastava</surname> <given-names>J.</given-names></name></person-group> (<year>2003</year>). &#x0201C;<article-title>A comparative study of anomaly detection schemes in network intrusion detection</article-title>,&#x0201D; in <source>SIAM Data Mining</source>, <publisher-loc>San Francisco</publisher-loc>.</citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Leach</surname> <given-names>M. J.</given-names></name> <name><surname>Sparks</surname> <given-names>E. P.</given-names></name> <name><surname>Robertson</surname> <given-names>N. M.</given-names></name></person-group> (<year>2014</year>). <article-title>Contextual anomaly detection in crowded surveillance scenes</article-title>. <source>Pattern Recognit. Lett.</source> <volume>44</volume>, <fpage>71</fpage>&#x02013;<lpage>79</lpage>.<pub-id pub-id-type="doi">10.1016/j.patrec.2013.11.018</pub-id></citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>D.-G.</given-names></name> <name><surname>Suk</surname> <given-names>H.-I.</given-names></name> <name><surname>Park</surname> <given-names>S.-K.</given-names></name> <name><surname>Lee</surname> <given-names>S.-W.</given-names></name></person-group> (<year>2015</year>). <article-title>Motion influence map for unusual human activity detection and localization in crowded scenes</article-title>. <source>IEEE Trans. Circuits Syst. Video Technol.</source> <volume>25</volume>, <fpage>1612</fpage>&#x02013;<lpage>1623</lpage>.<pub-id pub-id-type="doi">10.1109/TCSVT.2015.2395752</pub-id></citation></ref>
<ref id="B43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>C.</given-names></name> <name><surname>Han</surname> <given-names>Z.</given-names></name> <name><surname>Ye</surname> <given-names>Q.</given-names></name> <name><surname>Jiao</surname> <given-names>J.</given-names></name></person-group> (<year>2013</year>). <article-title>Visual abnormal behavior detection based on trajectory sparse reconstruction analysis</article-title>. <source>Neurocomputing</source> <volume>119</volume>, <fpage>94</fpage>&#x02013;<lpage>100</lpage>.<pub-id pub-id-type="doi">10.1016/j.neucom.2012.03.040</pub-id></citation></ref>
<ref id="B44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Le Meur</surname> <given-names>O.</given-names></name> <name><surname>Shen</surname> <given-names>L.</given-names></name></person-group> (<year>2015a</year>). <article-title>Spatiotemporal saliency detection based on superpixel-level trajectory</article-title>. <source>Signal Process. Image Commun.</source> <volume>38</volume>, <fpage>100</fpage>&#x02013;<lpage>114</lpage>.<pub-id pub-id-type="doi">10.1016/j.image.2015.04.014</pub-id></citation></ref>
<ref id="B45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>T.</given-names></name> <name><surname>Chang</surname> <given-names>H.</given-names></name> <name><surname>Wang</surname> <given-names>M.</given-names></name> <name><surname>Ni</surname> <given-names>B.</given-names></name> <name><surname>Hong</surname> <given-names>R.</given-names></name> <name><surname>Yan</surname> <given-names>S.</given-names></name></person-group> (<year>2015b</year>). <article-title>Crowded scene analysis: a survey</article-title>. <source>IEEE Trans. Circuits Syst. Video Technol.</source> <volume>25</volume>, <fpage>367</fpage>&#x02013;<lpage>386</lpage>.<pub-id pub-id-type="doi">10.1109/TCSVT.2014.2358029</pub-id></citation></ref>
<ref id="B46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>W.</given-names></name> <name><surname>Mahadevan</surname> <given-names>V.</given-names></name> <name><surname>Vasconcelos</surname> <given-names>N.</given-names></name></person-group> (<year>2014</year>). <article-title>Anomaly detection and localization in crowded scenes</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell.</source> <volume>36</volume>, <fpage>18</fpage>&#x02013;<lpage>32</lpage>.<pub-id pub-id-type="doi">10.1109/TPAMI.2013.111</pub-id><pub-id pub-id-type="pmid">24231863</pub-id></citation></ref>
<ref id="B47"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Lu</surname> <given-names>C.</given-names></name> <name><surname>Shi</surname> <given-names>J.</given-names></name> <name><surname>Jia</surname> <given-names>J.</given-names></name></person-group> (<year>2013</year>). &#x0201C;<article-title>Abnormal event detection at 150 fps in Matlab</article-title>,&#x0201D; in <source>ICCV</source>, <publisher-loc>Sydney</publisher-loc>.</citation></ref>
<ref id="B48"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Mahadevan</surname> <given-names>V.</given-names></name> <name><surname>Li</surname> <given-names>W.</given-names></name> <name><surname>Bhalodia</surname> <given-names>V.</given-names></name> <name><surname>Vasconcelos</surname> <given-names>N.</given-names></name></person-group> (<year>2010</year>). &#x0201C;<article-title>Anomaly detection in crowded scenes</article-title>,&#x0201D; in <source>CVPR</source>, <publisher-loc>San Francisco</publisher-loc>.</citation></ref>
<ref id="B49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mahadevan</surname> <given-names>V.</given-names></name> <name><surname>Vasconcelos</surname> <given-names>N.</given-names></name></person-group> (<year>2010</year>). <article-title>Spatiotemporal saliency in dynamic scenes</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell.</source> <volume>32</volume>, <fpage>171</fpage>&#x02013;<lpage>177</lpage>.<pub-id pub-id-type="doi">10.1109/TPAMI.2009.112</pub-id><pub-id pub-id-type="pmid">19926907</pub-id></citation></ref>
<ref id="B50"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Mehran</surname> <given-names>R.</given-names></name> <name><surname>Oyama</surname> <given-names>A.</given-names></name> <name><surname>Shah</surname> <given-names>M.</given-names></name></person-group> (<year>2009</year>). &#x0201C;<article-title>Abnormal crowd behavior detection using social force model</article-title>,&#x0201D; in <source>CVPR</source>, <publisher-loc>Miami Beach</publisher-loc>.</citation></ref>
<ref id="B51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mo</surname> <given-names>X.</given-names></name> <name><surname>Monga</surname> <given-names>V.</given-names></name> <name><surname>Bala</surname> <given-names>R.</given-names></name> <name><surname>Fan</surname> <given-names>Z.</given-names></name></person-group> (<year>2014</year>). <article-title>Adaptive sparse representations for video anomaly detection</article-title>. <source>IEEE Trans. Circuits Syst. Video Technol.</source> <volume>24</volume>, <fpage>631</fpage>&#x02013;<lpage>645</lpage>.<pub-id pub-id-type="doi">10.1109/TCSVT.2013.2280061</pub-id></citation></ref>
<ref id="B52"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Odobez</surname> <given-names>J.-M.</given-names></name> <name><surname>Bouthemy</surname> <given-names>P.</given-names></name></person-group> (<year>1995</year>). &#x0201C;<article-title>Robust multiresolution estimation of parametric motion models</article-title>,&#x0201D; in <source>JVCIR</source> (<publisher-loc>Amsterdam</publisher-loc>), Vol. <volume>6</volume>, <fpage>348</fpage>&#x02013;<lpage>365</lpage>.</citation></ref>
<ref id="B53"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Papanikolopoulos</surname> <given-names>N.</given-names></name></person-group> (<year>2005</year>). <source>Unusual Crowd Behaviour Dataset</source>. <publisher-name>University of Minnesota</publisher-name>. Available at: <uri xlink:href="http://mha.cs.umn.edu/Movies/Crowd-Activity-All.avi">http://mha.cs.umn.edu/Movies/Crowd-Activity-All.avi</uri></citation></ref>
<ref id="B54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>P&#x000E9;cot</surname> <given-names>T.</given-names></name> <name><surname>Bouthemy</surname> <given-names>P.</given-names></name> <name><surname>Boulanger</surname> <given-names>J.</given-names></name> <name><surname>Chessel</surname> <given-names>A.</given-names></name> <name><surname>Bardin</surname> <given-names>S.</given-names></name> <name><surname>Salamero</surname> <given-names>J.</given-names></name> <etal/></person-group> (<year>2015</year>). <article-title>Background fluorescence estimation and vesicle segmentation in live cell imaging with conditional random fields</article-title>. <source>IEEE Trans. Image Process.</source> <volume>24</volume>, <fpage>667</fpage>&#x02013;<lpage>680</lpage>.<pub-id pub-id-type="doi">10.1109/TIP.2014.2380178</pub-id><pub-id pub-id-type="pmid">25531952</pub-id></citation></ref>
<ref id="B55"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Piciarelli</surname> <given-names>C.</given-names></name> <name><surname>Micheloni</surname> <given-names>C.</given-names></name> <name><surname>Foresti</surname> <given-names>G. L.</given-names></name></person-group> (<year>2008</year>). <article-title>Trajectory-based anomalous event detection</article-title>. <source>IEEE Trans. Circuits Syst. Video Technol.</source> <volume>18</volume>, <fpage>1544</fpage>&#x02013;<lpage>1554</lpage>.<pub-id pub-id-type="doi">10.1109/TCSVT.2008.2005599</pub-id></citation></ref>
<ref id="B56"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Porikli</surname> <given-names>F.</given-names></name> <name><surname>Haga</surname> <given-names>T.</given-names></name></person-group> (<year>2004</year>). &#x0201C;<article-title>Event detection by eigenvector decomposition using object and frame features</article-title>,&#x0201D; in <source>CVPRW</source>, <publisher-loc>Washington</publisher-loc>.</citation></ref>
<ref id="B57"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roshtkhari</surname> <given-names>M. J.</given-names></name> <name><surname>Levine</surname> <given-names>M. D.</given-names></name></person-group> (<year>2013</year>). <article-title>An on-line, real-time learning method for detecting anomalies in videos using spatio-temporal compositions</article-title>. <source>Comput. Vis. Image Underst.</source> <volume>117</volume>, <fpage>1436</fpage>&#x02013;<lpage>1452</lpage>.<pub-id pub-id-type="doi">10.1016/j.cviu.2013.06.007</pub-id></citation></ref>
<ref id="B58"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Solmaz</surname> <given-names>B.</given-names></name> <name><surname>Moore</surname> <given-names>B. E.</given-names></name> <name><surname>Shah</surname> <given-names>M.</given-names></name></person-group> (<year>2012</year>). <article-title>Identifying behaviors in crowd scenes using stability analysis for dynamical systems</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell.</source> <volume>34</volume>, <fpage>2064</fpage>&#x02013;<lpage>2070</lpage>.<pub-id pub-id-type="doi">10.1109/TPAMI.2012.123</pub-id><pub-id pub-id-type="pmid">22641705</pub-id></citation></ref>
<ref id="B59"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stauffer</surname> <given-names>C.</given-names></name> <name><surname>Grimson</surname> <given-names>W. E. L.</given-names></name></person-group> (<year>2000</year>). <article-title>Learning patterns of activity using real-time tracking</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell.</source> <volume>22</volume>, <fpage>747</fpage>&#x02013;<lpage>757</lpage>.<pub-id pub-id-type="doi">10.1109/34.868677</pub-id></citation></ref>
<ref id="B60"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Thida</surname> <given-names>M.</given-names></name> <name><surname>Yong</surname> <given-names>Y. L.</given-names></name> <name><surname>Climent-P&#x000E9;rez</surname> <given-names>P.</given-names></name> <name><surname>Eng</surname> <given-names>H.-L.</given-names></name> <name><surname>Remagnino</surname> <given-names>P.</given-names></name></person-group> (<year>2013</year>). &#x0201C;<article-title>A literature review on video analytics of crowded scenes</article-title>,&#x0201D; in <source>IMS</source> (<publisher-loc>Berlin Heidelberg</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>17</fpage>&#x02013;<lpage>36</lpage>.</citation></ref>
<ref id="B61"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Varadarajan</surname> <given-names>J.</given-names></name> <name><surname>Emonet</surname> <given-names>R.</given-names></name> <name><surname>Odobez</surname> <given-names>J.-M.</given-names></name></person-group> (<year>2007</year>). <article-title>A sequential topic model for mining recurrent activities from long term video logs</article-title>. <source>Int. J. Comput. Vis.</source> <volume>103</volume>, <fpage>100</fpage>&#x02013;<lpage>126</lpage>.<pub-id pub-id-type="doi">10.1007/s11263-012-0596-6</pub-id></citation></ref>
<ref id="B62"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Veit</surname> <given-names>T.</given-names></name> <name><surname>Cao</surname> <given-names>F.</given-names></name> <name><surname>Bouthemy</surname> <given-names>P.</given-names></name></person-group> (<year>2011</year>). <article-title>An a contrario decision framework for region-based motion detection</article-title>. <source>Int. J. Comput. Vis.</source> <volume>68</volume>, <fpage>163</fpage>&#x02013;<lpage>178</lpage>.<pub-id pub-id-type="doi">10.1007/s11263-006-6661-2</pub-id></citation></ref>
<ref id="B63"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vishwakarma</surname> <given-names>S.</given-names></name> <name><surname>Agrawal</surname> <given-names>A.</given-names></name></person-group> (<year>2013</year>). <article-title>A survey on activity recognition and behavior understanding in video surveillance</article-title>. <source>Vis. Comput.</source> <volume>29</volume>, <fpage>983</fpage>&#x02013;<lpage>1009</lpage>.<pub-id pub-id-type="doi">10.1007/s00371-012-0752-6</pub-id></citation></ref>
<ref id="B64"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>W.</given-names></name> <name><surname>Shen</surname> <given-names>J.</given-names></name> <name><surname>Shao</surname> <given-names>L.</given-names></name></person-group> (<year>2015</year>). <article-title>Consistent video saliency using local gradient flow optimization and global refinement</article-title>. <source>IEEE Trans. Image Process.</source> <volume>24</volume>, <fpage>4185</fpage>&#x02013;<lpage>4196</lpage>.<pub-id pub-id-type="doi">10.1109/TIP.2015.2460013</pub-id><pub-id pub-id-type="pmid">26208348</pub-id></citation></ref>
<ref id="B65"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Ma</surname> <given-names>K. T.</given-names></name> <name><surname>Ng</surname> <given-names>G.-W.</given-names></name> <name><surname>Grimson</surname> <given-names>W. E. L.</given-names></name></person-group> (<year>2011</year>). <article-title>Trajectory analysis and semantic region modeling using nonparametric hierarchical Bayesian models</article-title>. <source>Int. J. Comput. Vis.</source> <volume>95</volume>, <fpage>287</fpage>&#x02013;<lpage>312</lpage>.<pub-id pub-id-type="doi">10.1007/s11263-011-0459-6</pub-id></citation></ref>
<ref id="B66"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Ma</surname> <given-names>X.</given-names></name> <name><surname>Grimson</surname> <given-names>W. E. L.</given-names></name></person-group> (<year>2009</year>). <article-title>Unsupervised activity perception in crowded and complicated scenes using hierarchical Bayesian models</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell.</source> <volume>31</volume>, <fpage>539</fpage>&#x02013;<lpage>555</lpage>.<pub-id pub-id-type="doi">10.1109/TPAMI.2008.87</pub-id><pub-id pub-id-type="pmid">19147880</pub-id></citation></ref>
<ref id="B67"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>S.</given-names></name> <name><surname>Moore</surname> <given-names>B. E.</given-names></name> <name><surname>Shah</surname> <given-names>M.</given-names></name></person-group> (<year>2010</year>). &#x0201C;<article-title>Chaotic invariants of Lagrangian particle trajectories for anomaly detection in crowded scenes</article-title>,&#x0201D; in <source>CVPR</source>, <publisher-loc>San Francisco</publisher-loc>.</citation></ref>
<ref id="B68"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>S.</given-names></name> <name><surname>Wong</surname> <given-names>H.-S.</given-names></name> <name><surname>Yu</surname> <given-names>Z.</given-names></name></person-group> (<year>2014</year>). <article-title>A Bayesian model for crowd escape behavior detection</article-title>. <source>IEEE Trans. Circuits Syst. Video Technol.</source> <volume>24</volume>, <fpage>85</fpage>&#x02013;<lpage>98</lpage>.<pub-id pub-id-type="doi">10.1109/TCSVT.2013.2276151</pub-id></citation></ref>
<ref id="B69"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Zen</surname> <given-names>G.</given-names></name> <name><surname>Ricci</surname> <given-names>E.</given-names></name> <name><surname>Sebe</surname> <given-names>N.</given-names></name></person-group> (<year>2012</year>). &#x0201C;<article-title>Exploiting sparse representations for robust analysis of noisy complex video scenes</article-title>,&#x0201D; in <source>ECCV</source>, <publisher-loc>Firenze</publisher-loc>.</citation></ref>
<ref id="B70"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhan</surname> <given-names>B.</given-names></name> <name><surname>Monekosso</surname> <given-names>D. N.</given-names></name> <name><surname>Remagnino</surname> <given-names>P.</given-names></name> <name><surname>Velastin</surname> <given-names>S. A.</given-names></name> <name><surname>Xu</surname> <given-names>L.-Q.</given-names></name></person-group> (<year>2008</year>). <article-title>Crowd analysis: a survey</article-title>. <source>Mach. Vis. Appl.</source> <volume>19</volume>, <fpage>345</fpage>&#x02013;<lpage>357</lpage>.<pub-id pub-id-type="doi">10.1007/s00138-008-0132-4</pub-id></citation></ref>
<ref id="B71"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>D.</given-names></name> <name><surname>Lu</surname> <given-names>G.</given-names></name></person-group> (<year>2001</year>). <article-title>Segmentation of moving objects in image sequence: a review</article-title>. <source>IEEE Trans. Circuits Syst. Signal Process.</source> <volume>20</volume>, <fpage>143</fpage>&#x02013;<lpage>183</lpage>.<pub-id pub-id-type="doi">10.1007/BF01201137</pub-id></citation></ref>
<ref id="B72"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Lu</surname> <given-names>H.</given-names></name> <name><surname>Zhang</surname> <given-names>L.</given-names></name> <name><surname>Ruan</surname> <given-names>X.</given-names></name></person-group> (<year>2016</year>). <article-title>Combining motion and appearance cues for anomaly detection</article-title>. <source>Pattern Recognit.</source> <volume>51</volume>, <fpage>443</fpage>&#x02013;<lpage>452</lpage>.<pub-id pub-id-type="doi">10.1016/j.patcog.2015.09.005</pub-id></citation></ref>
<ref id="B73"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Qin</surname> <given-names>L.</given-names></name> <name><surname>Ji</surname> <given-names>R.</given-names></name> <name><surname>Yao</surname> <given-names>H.</given-names></name> <name><surname>Huang</surname> <given-names>Q.</given-names></name></person-group> (<year>2015</year>). <article-title>Social attribute-aware force model: exploiting richness of interaction for abnormal crowd detection</article-title>. <source>IEEE Trans. Circuits Syst. Video Technol.</source> <volume>25</volume>, <fpage>1231</fpage>&#x02013;<lpage>1245</lpage>.<pub-id pub-id-type="doi">10.1109/TCSVT.2014.2355711</pub-id></citation></ref>
<ref id="B74"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhao</surname> <given-names>B.</given-names></name> <name><surname>Fei-Fei</surname> <given-names>L.</given-names></name> <name><surname>Xing</surname> <given-names>E. P.</given-names></name></person-group> (<year>2011</year>). &#x0201C;<article-title>Online detection of unusual events in videos via dynamic sparse coding</article-title>,&#x0201D; in <source>CVPR</source>, <publisher-loc>Colorado Springs</publisher-loc>.</citation></ref>
<ref id="B75"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhong</surname> <given-names>H.</given-names></name> <name><surname>Shi</surname> <given-names>J.</given-names></name> <name><surname>Visontai</surname> <given-names>M.</given-names></name></person-group> (<year>2004</year>). &#x0201C;<article-title>Detecting unusual activity in video</article-title>,&#x0201D; in <source>CVPR</source>, <publisher-loc>Washington</publisher-loc>.</citation></ref>
<ref id="B76"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>X.</given-names></name> <name><surname>Liu</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>C.</given-names></name> <name><surname>Lu</surname> <given-names>H.</given-names></name></person-group> (<year>2014</year>). <article-title>Sparse representation for robust abnormality detection in crowded scenes</article-title>. <source>Pattern Recognit.</source> <volume>47</volume>, <fpage>1791</fpage>&#x02013;<lpage>1799</lpage>.<pub-id pub-id-type="doi">10.1016/j.patcog.2013.11.018</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn id="fn1"><p><sup>1</sup><uri xlink:href="http://www.irisa.fr/vista/Motion2D/">http://www.irisa.fr/vista/Motion2D/</uri>.</p></fn>
</fn-group>
</back>
</article>