<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Big Data</journal-id>
<journal-title>Frontiers in Big Data</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Big Data</abbrev-journal-title>
<issn pub-type="epub">2624-909X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">594302</article-id>
<article-id pub-id-type="doi">10.3389/fdata.2020.594302</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Big Data</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Detecting Group Anomalies in Tera-Scale Multi-Aspect Data via Dense-Subtensor Mining</article-title>
<alt-title alt-title-type="left-running-head">Shin et&#x20;al.</alt-title>
<alt-title alt-title-type="right-running-head">Detecting Group Anomalies in Tensors</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Shin</surname>
<given-names>Kijung</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1056700/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Hooi</surname>
<given-names>Bryan</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Kim</surname>
<given-names>Jisu</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Faloutsos</surname>
<given-names>Christos</given-names>
</name>
<xref ref-type="aff" rid="aff4">
<sup>4</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff1">
<label>
<sup>1</sup>
</label>Graduate School of AI and School of Electrical Engineering, KAIST, <addr-line>Daejeon</addr-line>, <country>South Korea</country>
</aff>
<aff id="aff2">
<label>
<sup>2</sup>
</label>School of Computing and Institute of Data Science, National University of Singapore, <addr-line>Singapore</addr-line>, <country>Singapore</country>
</aff>
<aff id="aff3">
<label>
<sup>3</sup>
</label>DataShape, Inria Saclay, <addr-line>Palaiseau</addr-line>, <country>France</country>
</aff>
<aff id="aff4">
<label>
<sup>4</sup>
</label>School of Computer Science, Carnegie Mellon University, <addr-line>Pittsburgh</addr-line>, <addr-line>PA</addr-line>, <country>United&#x20;States</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/892446/overview">Meng Jiang</ext-link>, University of Notre Dame, United&#x20;States</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/735551/overview">Kai Shu</ext-link>, Illinois Institute of Technology, United&#x20;States</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/903512/overview">Kun Kuang</ext-link>, Zhejiang University, China</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1072038/overview">Tong Zhao</ext-link>, University of Notre Dame, United&#x20;States</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Kijung Shin, <email>kijungs@kaist.ac.kr</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Big Data Networks, a section of the journal Frontiers in Big&#x20;Data</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>29</day>
<month>04</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2020</year>
</pub-date>
<volume>3</volume>
<elocation-id>594302</elocation-id>
<history>
<date date-type="received">
<day>13</day>
<month>08</month>
<year>2020</year>
</date>
<date date-type="accepted">
<day>17</day>
<month>12</month>
<year>2020</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Shin, Hooi, Kim and Faloutsos.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Shin, Hooi, Kim and Faloutsos</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>How can we detect fraudulent lockstep behavior in large-scale multi-aspect data (i.e.,&#x20;tensors)? Can we detect it when data are too large to fit in memory or even on a disk? Past studies have shown that dense subtensors in real-world tensors (e.g., social media, Wikipedia, TCP dumps, etc.) signal anomalous or fraudulent behavior such as retweet boosting, bot activities, and network attacks. Thus, various approaches, including tensor decomposition and search, have been proposed for detecting dense subtensors rapidly and accurately. However, existing methods suffer from low accuracy, or they assume that tensors are small enough to fit in main memory, which is unrealistic in many real-world applications such as social media and web. To overcome these limitations, we propose <sc>D-Cube</sc>, a disk-based dense-subtensor detection method, which also can run in a distributed manner across multiple machines. Compared to state-of-the-art methods, <sc>D-Cube</sc> is (1) Memory Efficient: requires up to <italic>1,561&#xd7; less memory</italic> and handles <italic>1,000&#xd7; larger</italic> data (<italic>2.6TB</italic>), (2) Fast: up to <italic>7&#xd7; faster</italic> due to its near-linear scalability, (3) Provably Accurate: gives a guarantee on the densities of the detected subtensors, and (4) Effective: spotted network attacks from TCP dumps and synchronized behavior in rating data most accurately.</p>
</abstract>
<kwd-group>
<kwd>tensor</kwd>
<kwd>dense subtensor</kwd>
<kwd>anomaly detection</kwd>
<kwd>fraud detection</kwd>
<kwd>out-of-core algorithm</kwd>
<kwd>distributed algorithm</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>Given a tensor that is too large to fit in memory, how can we detect dense subtensors? Especially, can we spot dense subtensors without sacrificing speed and accuracy provided by in-memory algorithms?</p>
<p>A common application of this problem is review fraud detection, where we aim to spot suspicious lockstep behavior among groups of fraudulent user accounts who review suspiciously similar sets of products. Previous work (<xref ref-type="bibr" rid="B26">Maruhashi et&#x20;al., 2011</xref>; <xref ref-type="bibr" rid="B18">Jiang et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B36">Shin et&#x20;al., 2018</xref>) has shown the benefit of incorporating extra information, such as timestamps, ratings, and review keywords, by modeling review data as a tensor. Tensors allow us to consider additional dimensions in order to identify suspicious behavior of interest more accurately and specifically. That is, extraordinarily dense subtensors indicate groups of users with lockstep behaviors both in the products they review and along the additional dimensions (e.g., multiple users reviewing the same products at the exact same time).</p>
<p>In addition to review-fraud detection, spotting dense subtensors has been found effective for many anomaly-detection tasks. Examples include network-intrusion detection in TCP dumps (<xref ref-type="bibr" rid="B26">Maruhashi et&#x20;al., 2011</xref>; <xref ref-type="bibr" rid="B36">Shin et&#x20;al., 2018</xref>), retweet-boosting detection in online social networks (<xref ref-type="bibr" rid="B18">Jiang et&#x20;al., 2015</xref>), bot-activity detection in Wikipedia (<xref ref-type="bibr" rid="B36">Shin et&#x20;al., 2018</xref>), and genetics applications (<xref ref-type="bibr" rid="B33">Saha et&#x20;al., 2010</xref>; <xref ref-type="bibr" rid="B26">Maruhashi et&#x20;al., 2011</xref>).</p>
<p>Due to these wide applications, several methods have been proposed for rapid and accurate dense-subtensor detection, and search-based methods have shown the best performance. Specifically, search-based methods (<xref ref-type="bibr" rid="B18">Jiang et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B36">Shin et&#x20;al., 2018</xref>) outperform methods based on tensor decomposition, such as CP Decomposition and HOSVD (<xref ref-type="bibr" rid="B26">Maruhashi et&#x20;al., 2011</xref>), in terms of accuracy and flexibility with regard to the choice of density metrics. Moreover, the latest search-based methods (<xref ref-type="bibr" rid="B36">Shin et&#x20;al., 2018</xref>) provide a guarantee on the densities of the subtensors it finds, while methods based on tensor decomposition do&#x20;not.</p>
<p>However, existing search methods for dense-subtensor detection assume that input tensors are small enough to fit in memory. Moreover, they are not directly applicable to tensors stored in disk since using them for such tensors incurs too many disk I/Os due to their highly iterative nature. However, real applications, such as social media and web, often involve disk-resident tensors with terabytes or even petabytes, which in-memory algorithms cannot handle. This leaves a growing gap that needs to be filled.</p>
<sec id="s1-1">
<title>1.1 Our Contributions</title>
<p>To overcome these limitations, we propose <sc>D-Cube</sc> a dense-subtensor detection method for disk-resident tensors. <sc>D-Cube</sc> works under the W-Stream model (<xref ref-type="bibr" rid="B32">Ruhl, 2003</xref>), where data are only sequentially read and written during computation. As seen in <xref ref-type="table" rid="T1">Table&#x20;1</xref>, only <sc>D-Cube</sc> supports out-of-core computation, which allows it to process data too large to fit in main memory. <sc>D-Cube</sc> is optimized for this setting by carefully minimizing the amount of disk I/O and the number of steps requiring disk accesses, without losing accuracy guarantees it provides. Moreover, we present a distributed version of <sc>D-Cube</sc> using the <sc>MapReduce</sc> framework (<xref ref-type="bibr" rid="B11">Dean and Ghemawat, 2008</xref>), specifically its open source implementation <sc>Hadoop</sc>&#x20;.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Comparison of <sc>D-Cube</sc> and state-of-the-art dense-subtensor detection methods. &#x2713;denotes &#x2018;supported&#x2019;.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left"/>
<th align="center">
<sc>M-Zoom</sc> and <sc>M-Biz</sc> (<xref ref-type="bibr" rid="B36">Shin et&#x20;al., 2018</xref>)</th>
<th align="center">
<sc>DenseStream</sc> and <sc>DenseAlert</sc> (<xref ref-type="bibr" rid="B38">Shin et&#x20;al., 2017a</xref>)</th>
<th align="center">
<sc>CrossSpot</sc> (<xref ref-type="bibr" rid="B18">Jiang et&#x20;al., 2015</xref>)</th>
<th align="center">MAF (<xref ref-type="bibr" rid="B26">Maruhashi et&#x20;al., 2011</xref>)</th>
<th align="center">
<sc>Fraudar</sc> (<xref ref-type="bibr" rid="B16">Hooi et&#x20;al., 2017</xref>)</th>
<th align="center">
<sc>D-cube</sc> (proposed)</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">High-order tensors</td>
<td align="char" char=".">&#x2713;</td>
<td align="char" char=".">&#x2713;</td>
<td align="char" char=".">&#x2713;</td>
<td align="char" char=".">&#x2713;</td>
<td align="left"/>
<td align="char" char=".">&#x2713;</td>
</tr>
<tr>
<td align="left">Flexibility in density measures</td>
<td align="char" char=".">&#x2713;</td>
<td align="left"/>
<td align="char" char=".">&#x2713;</td>
<td align="left"/>
<td align="char" char=".">&#x2713;</td>
<td align="char" char=".">&#x2713;</td>
</tr>
<tr>
<td align="left">Accuracy guarantees</td>
<td align="char" char=".">&#x2713;</td>
<td align="char" char=".">&#x2713;</td>
<td align="left"/>
<td align="left"/>
<td align="char" char=".">&#x2713;</td>
<td align="char" char=".">&#x2713;</td>
</tr>
<tr>
<td align="left">Out-of-core computation</td>
<td align="left"/>
<td align="left"/>
<td align="left"/>
<td align="left"/>
<td align="left"/>
<td align="char" char=".">&#x2713;</td>
</tr>
<tr>
<td align="left">Distributed computation</td>
<td align="left"/>
<td align="left"/>
<td align="left"/>
<td align="left"/>
<td align="left"/>
<td align="char" char=".">&#x2713;</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The main strengths of <sc>D-Cube</sc> are as follows:<list list-type="simple">
<list-item>
<p>Memory Efficient: <sc>D-Cube</sc> requires up to <italic>1,561&#xd7;</italic> less memory and successfully handles <italic>1,000&#xd7;</italic> larger data (<italic>2.6TB</italic>) than its best competitors (<xref ref-type="fig" rid="F1">Figures&#x20;1A</xref>,<xref ref-type="fig" rid="F1">B</xref>).</p>
</list-item>
<list-item>
<p>Fast: <sc>D-Cube</sc> detects dense subtensors up to <italic>7&#xd7;</italic> faster in real-world tensors and <italic>12&#xd7;</italic> faster in synthetic tensors than its best competitors due to its near-linear scalability with all aspects of tensors (<xref ref-type="fig" rid="F1">Figure&#x20;1A</xref>).</p>
</list-item>
<list-item>
<p>Provably Accurate: <sc>D-Cube</sc> provides a guarantee on the densities of the subtensors it finds (Theorem 3), and it shows similar or higher accuracy in dense-subtensor detection than its best competitors on real-world tensors (<xref ref-type="fig" rid="F1">Figure&#x20;1B</xref>).</p>
</list-item>
<list-item>
<p>Effective: <sc>D-Cube</sc> successfully spotted network attacks from TCP dumps, and lockstep behavior in rating data, with the highest accuracy (<xref ref-type="fig" rid="F1">Figure&#x20;1C</xref>).</p>
</list-item>
</list>
</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Strengths of <sc>D-Cube</sc> . &#x2018;O.O.M&#x2019; stands for &#x2018;out of memory&#x2019;. <bold>(A)</bold> Fast and Scalable: <sc>D-Cube</sc> was 12&#xd7; faster and successfully handled 1,000&#xd7; larger data (2.6TB) than its best competitors. <bold>(B)</bold> Efficient and Accurate: <sc>D-Cube</sc> required 47&#xd7; less memory and found subtensors as dense as those found by its best competitors from English Wikipedia revision history. <bold>(C)</bold> Effective: <sc>D-Cube</sc> accurately spotted network attacks from TCP dumps. See <xref ref-type="sec" rid="s4">Section 4</xref> for the detailed experimental settings.</p>
</caption>
<graphic xlink:href="fdata-03-594302-g001.tif"/>
</fig>
<p>Reproducibility: The code and data used in the paper are available at <ext-link ext-link-type="uri" xlink:href="http://dmlab.kaist.ac.kr/dcube">http://dmlab.kaist.ac.kr/dcube</ext-link>.</p>
</sec>
<sec id="s1-2">
<title>1.2 Related Work</title>
<p>We discuss previous work on (a) dense-subgraph detection, (b) dense-subtensor detection, (c) large-scale tensor decomposition, and (d) other anomaly/fraud detection methods.</p>
<p>
<italic>Dense Subgraph Detection</italic>. Dense-subgraph detection in graphs has been extensively studied in theory; see <xref ref-type="bibr" rid="B24">Lee et&#x20;al. (2010)</xref> for a survey. Exact algorithms (<xref ref-type="bibr" rid="B15">Goldberg, 1984</xref>; <xref ref-type="bibr" rid="B22">Khuller and Saha, 2009</xref>) and approximate algorithms (<xref ref-type="bibr" rid="B10">Charikar, 2000</xref>; <xref ref-type="bibr" rid="B22">Khuller and Saha, 2009</xref>) have been proposed for finding subgraphs with maximum average degree. These have been extended for incorporating size restrictions (<xref ref-type="bibr" rid="B4">Andersen and Chellapilla, 2009</xref>), alternative metrics for denser subgraphs (<xref ref-type="bibr" rid="B40">Tsourakakis et&#x20;al., 2013</xref>), evolving graphs (<xref ref-type="bibr" rid="B13">Epasto et&#x20;al., 2015</xref>), subgraphs with limited overlap (<xref ref-type="bibr" rid="B7">Balalau et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B14">Galbrun et&#x20;al., 2016</xref>), and streaming or distributed settings (<xref ref-type="bibr" rid="B6">Bahmani et&#x20;al., 2012</xref>, <xref ref-type="bibr" rid="B5">2014</xref>). Dense subgraph detection has been applied to fraud detection in social or review networks (<xref ref-type="bibr" rid="B9">Beutel et&#x20;al., 2013</xref>; <xref ref-type="bibr" rid="B19">Jiang et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B34">Shah et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B35">Shin et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B16">Hooi et&#x20;al., 2017</xref>).</p>
<p>
<italic>Dense Subtensor Detection</italic>. Extending dense subgraph detection to tensors (<xref ref-type="bibr" rid="B18">Jiang et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B38">Shin et&#x20;al., 2017a</xref>, <xref ref-type="bibr" rid="B36">2018</xref>) incorporates additional dimensions, such as time, to identify dense regions of interest with greater accuracy and specificity. <xref ref-type="bibr" rid="B18">Jiang et&#x20;al. (2015)</xref> proposed <sc>CrossSpot</sc>, which starts from a seed subtensor and adjusts it in a greedy way until it reaches a local optimum, shows high accuracy in practice but does not provide any theoretical guarantees on its running time and accuracy. <xref ref-type="bibr" rid="B36">Shin et&#x20;al. (2018)</xref> proposed <sc>M-Zoom</sc>, which starts from the entire tensor and only shrinks it by removing attributes one by one in a greedy way, improves <sc>CrossSpot</sc> in terms of speed and approximation guarantees. <sc>M-Biz</sc>, which was proposed in <xref ref-type="bibr" rid="B36">Shin et&#x20;al. (2018)</xref>, starts from the output of <sc>M-Zoom</sc> and repeats adding or removing an attribute greedily until a local optimum is reached. Given a dynamic tensor, <sc>DenseAlert</sc> and <sc>DenseStream</sc>, which were proposed in <xref ref-type="bibr" rid="B38">Shin et&#x20;al. (2017a)</xref>, incrementally compute a single dense subtensor in it. <sc>CrossSpot</sc>, <sc>M-Zoom</sc>, <sc>M-Biz</sc>, and <sc>Densestream</sc> require all tuples of relations to be loaded into memory at once and to be randomly accessed, which limit their applicability to large-scale datasets. <sc>Densealert</sc> maintains only the tuples created within a time window, and thus it can find a dense subtensor only within the window. Dense-subtensor detection in tensors has been found useful for detecting retweet boosting (<xref ref-type="bibr" rid="B18">Jiang et&#x20;al., 2015</xref>), network attacks (<xref ref-type="bibr" rid="B26">Maruhashi et&#x20;al., 2011</xref>; <xref ref-type="bibr" rid="B38">Shin et&#x20;al., 2017a</xref>, <xref ref-type="bibr" rid="B36">2018</xref>), bot activities (<xref ref-type="bibr" rid="B36">Shin et&#x20;al., 2018</xref>), and vandalism on Wikipedia (<xref ref-type="bibr" rid="B38">Shin et&#x20;al., 2017a</xref>), and also for genetics applications (<xref ref-type="bibr" rid="B33">Saha et&#x20;al., 2010</xref>; <xref ref-type="bibr" rid="B26">Maruhashi et&#x20;al., 2011</xref>).</p>
<p>
<italic>Large-Scale Tensor Decomposition</italic>. Tensor decomposition such as HOSVD and CP decomposition (<xref ref-type="bibr" rid="B23">Kolda and Bader, 2009</xref>) can be used to spot dense subtensors, as shown in <xref ref-type="bibr" rid="B26">Maruhashi et&#x20;al. (2011)</xref>. Scalable algorithms for tensor decomposition have been developed, including disk-based algorithms (<xref ref-type="bibr" rid="B39">Shin and Kang, 2014</xref>; <xref ref-type="bibr" rid="B29">Oh et&#x20;al., 2017</xref>), distributed algorithms (<xref ref-type="bibr" rid="B20">Kang et&#x20;al., 2012</xref>; <xref ref-type="bibr" rid="B39">Shin and Kang, 2014</xref>; <xref ref-type="bibr" rid="B17">Jeon et&#x20;al., 2015</xref>), and approximate algorithms based on sampling (<xref ref-type="bibr" rid="B30">Papalexakis et&#x20;al., 2012</xref>) and count-min sketch (<xref ref-type="bibr" rid="B41">Wang et&#x20;al., 2015</xref>). However, dense-subtensor detection based on tensor decomposition has serious limitations: it usually detects subtensors with significantly lower density (see <xref ref-type="sec" rid="s4-3">Section 4.3</xref>) than search-based methods, provides no flexibility with regard to the choice of density metric, and does not provide any approximation guarantee.</p>
<p>
<italic>Other Anomaly/Fraud Detection Methods</italic>. In addition to dense-subtensor detection, many approaches, including those based on egonet features (<xref ref-type="bibr" rid="B2">Akoglu et&#x20;al., 2010</xref>), coreness (<xref ref-type="bibr" rid="B35">Shin et&#x20;al., 2016</xref>), and behavior models (<xref ref-type="bibr" rid="B31">Rossi et&#x20;al., 2013</xref>), have been used for anomaly and fraud detection in graphs. See <xref ref-type="bibr" rid="B3">Akoglu et&#x20;al. (2015)</xref> for a survey.</p>
</sec>
<sec id="s1-3">
<title>1.3 Organization of the Paper</title>
<p>In <xref ref-type="sec" rid="s2">Section 2</xref>, we provide notations and a formal problem definition. In <xref ref-type="sec" rid="s3">Section 3</xref>, we propose <sc>D-Cube</sc>, a disk-based dense-subtensor detection method. In <xref ref-type="sec" rid="s4">Section 4</xref>, we present experimental results and discuss them. In <xref ref-type="sec" rid="s5">Section 5</xref>, we offer conclusions.</p>
</sec>
</sec>
<sec id="s2">
<title>2 Preliminaries and Problem Definition</title>
<p>In this section, we first introduce notations and concepts used in the paper. Then, we define density measures and the problem of top-<italic>k</italic> dense-subtensor detection.</p>
<sec id="s2-1">
<title>2.1 Notations and Concepts</title>
<p>
<xref ref-type="table" rid="T2">Table&#x20;2</xref> lists the symbols frequently used in the paper. We use <inline-formula id="inf1">
<mml:math id="minf1">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>1,2</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> for brevity. Let <inline-formula id="inf2">
<mml:math id="minf2">
<mml:mrow>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>N</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> be a relation with <italic>N</italic> dimension attributes, denoted by <inline-formula id="inf3">
<mml:math id="minf3">
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>N</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, and a nonnegative measure attribute, denoted by <italic>X</italic> (see Example 1 for a running example). For each tuple <inline-formula id="inf4">
<mml:math id="minf4">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and for each <inline-formula id="inf5">
<mml:math id="minf5">
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>N</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf6">
<mml:math id="minf6">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and <italic>t</italic>[<italic>X</italic>] indicate the values of <italic>A</italic>
<sub>
<italic>n</italic>
</sub> and <italic>X</italic>, resp., in <italic>t</italic>. For each <inline-formula id="inf7">
<mml:math id="minf7">
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>N</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, we use <inline-formula id="inf8">
<mml:math id="minf8">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> to denote the set of distinct values of <italic>A</italic>
<sub>
<italic>n</italic>
</sub> in <inline-formula id="inf9">
<mml:math id="minf9">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula>. The relation <inline-formula id="inf10">
<mml:math id="minf10">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula> is naturally represented as an <italic>N</italic>-way tensor of size <inline-formula id="inf11">
<mml:math id="minf11">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mo>&#xd7;</mml:mo>
<mml:mo>&#x22ef;</mml:mo>
<mml:mo>&#xd7;</mml:mo>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>N</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. The value of each entry in the tensor is <italic>t</italic>[<italic>X</italic>], if the corresponding tuple <italic>t</italic> exists, and 0 otherwise. Let <inline-formula id="inf12">
<mml:math id="minf12">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> be a subset of <inline-formula id="inf13">
<mml:math id="minf13">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. Then, a <italic>subtensor</italic> <inline-formula id="inf14">
<mml:math id="minf14">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> in <inline-formula id="inf15">
<mml:math id="minf15">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula> is defined as <inline-formula id="inf16">
<mml:math id="minf16">
<mml:mrow>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>N</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mo>:</mml:mo>
<mml:mo>&#x2200;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>N</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, the set of tuples where each attribute <italic>A</italic>
<sub>
<italic>n</italic>
</sub> has a value in <inline-formula id="inf17">
<mml:math id="minf17">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. The relation <inline-formula id="inf18">
<mml:math id="minf18">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> is a &#x2018;subtensor&#x2019; because it forms a subtensor of size <inline-formula id="inf19">
<mml:math id="minf19">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#xd7;</mml:mo>
<mml:mo>&#x22ef;</mml:mo>
<mml:mo>&#xd7;</mml:mo>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>N</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> in the tensor representation of <inline-formula id="inf20">
<mml:math id="minf20">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula>, as in <xref ref-type="fig" rid="F2">Figure&#x20;2B</xref>. We define the mass of <inline-formula id="inf21">
<mml:math id="minf21">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula> as <inline-formula id="inf22">
<mml:math id="minf22">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:msub>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>, the sum of attribute <italic>X</italic> in the tuples of <inline-formula id="inf23">
<mml:math id="minf23">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula>. We denote the set of tuples of <inline-formula id="inf24">
<mml:math id="minf24">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> whose attribute <italic>A</italic>
<sub>
<italic>n</italic>
</sub> &#x3d; <italic>a</italic> by <inline-formula id="inf25">
<mml:math id="minf25">
<mml:mrow>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and its mass, called the <italic>attribute-value mass of a in A</italic>
<sub>
<italic>n</italic>
</sub>, by <inline-formula id="inf26">
<mml:math id="minf26">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:msub>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Table of symbols.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Symbol</th>
<th align="left">Definition</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">
<inline-formula id="inf27">
<mml:math id="minf27">
<mml:mrow>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>N</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="left">Relation representing an <italic>N</italic>-way tensor</td>
</tr>
<tr>
<td align="left">
<italic>N</italic>
</td>
<td align="left">Number of the dimension attributes in <inline-formula id="inf28">
<mml:math id="minf28">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">
<italic>A</italic>
<sub>
<italic>n</italic>
</sub>
</td>
<td align="left">
<italic>n</italic>th dimension attribute in <inline-formula id="inf29">
<mml:math id="minf29">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">
<italic>X</italic>
</td>
<td align="left">Measure attribute in <inline-formula id="inf30">
<mml:math id="minf30">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">
<italic>t</italic>[<italic>A</italic>
<sub>
<italic>n</italic>
</sub>] (or <italic>t</italic>[<italic>X</italic>])</td>
<td align="left">Value of attribute <italic>A</italic>
<sub>
<italic>n</italic>
</sub> (or <italic>X</italic>) in tuple <italic>t</italic> in <inline-formula id="inf31">
<mml:math id="minf31">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf32">
<mml:math id="minf32">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula>
</td>
<td align="left">a subtensor in <inline-formula id="inf33">
<mml:math id="minf33">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf34">
<mml:math id="minf34">
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="left">Density of subtensor <inline-formula id="inf35">
<mml:math id="minf35">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> in <inline-formula id="inf36">
<mml:math id="minf36">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf37">
<mml:math id="minf37">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> (or <inline-formula id="inf38">
<mml:math id="minf38">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>)</td>
<td align="left">Set of distinct values of <italic>A</italic>
<sub>
<italic>n</italic>
</sub> in <inline-formula id="inf39">
<mml:math id="minf39">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula> (or <inline-formula id="inf40">
<mml:math id="minf40">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula>)</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf41">
<mml:math id="minf41">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> (or <inline-formula id="inf42">
<mml:math id="minf42">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>)</td>
<td align="left">Mass of <inline-formula id="inf43">
<mml:math id="minf43">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula> (or <inline-formula id="inf44">
<mml:math id="minf44">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula>)</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf45">
<mml:math id="minf45">
<mml:mrow>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="left">Set of tuples with attribute <italic>A</italic>
<sub>
<italic>n</italic>
</sub>&#x3d; <italic>a</italic> in <inline-formula id="inf46">
<mml:math id="minf46">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf47">
<mml:math id="minf47">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="left">Attribute-value mass of <italic>a</italic> in <italic>A</italic>
<sub>
<italic>n</italic>
</sub>
</td>
</tr>
<tr>
<td align="left">
<italic>k</italic>
</td>
<td align="left">Number of subtensors we aim to find</td>
</tr>
<tr>
<td align="left">&#x3b8;</td>
<td align="left">Mass-threshold parameter in <sc>D-Cube</sc>
</td>
</tr>
<tr>
<td align="left">[<italic>x</italic>]</td>
<td align="left">
<inline-formula id="inf48">
<mml:math id="minf48">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>1,2</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Pictorial description of Example 1. <bold>(A)</bold> Relation <inline-formula id="inf49">
<mml:math id="minf49">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula> where the colored tuples compose relation <inline-formula id="inf50">
<mml:math id="minf50">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula>. <bold>(B)</bold> Tensor representation of <inline-formula id="inf51">
<mml:math id="minf51">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula> where the relation <inline-formula id="inf52">
<mml:math id="minf52">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> forms a subtensor.</p>
</caption>
<graphic xlink:href="fdata-03-594302-g002.tif"/>
</fig>
<p>
<sc>Example</sc> <bold>1</bold>. (Wikipedia Revision History). <italic>As in</italic> <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>
<italic>, assume a relation</italic> <inline-formula id="inf53">
<mml:math id="minf53">
<mml:mrow>
<mml:mtext>R</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:munder accentunder="true">
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
<mml:mo>,</mml:mo>
<mml:munder accentunder="true">
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
<mml:mo>,</mml:mo>
<mml:munder accentunder="true">
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mo stretchy="true">&#xaf;</mml:mo>
</mml:munder>
<mml:mo>,</mml:mo>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
<italic>, where each tuple</italic> (<italic>u</italic>, <italic>p</italic>, <italic>d</italic>, <italic>c</italic>) <italic>in</italic> <inline-formula id="inf54">
<mml:math id="minf54">
<mml:mtext>R</mml:mtext>
</mml:math>
</inline-formula> <italic>indicates that user u revised page p, c times, on date d. The first three attributes</italic>, <italic>A</italic>
<sub>1</sub> <italic>&#x3d; user</italic>, <italic>A</italic>
<sub>2</sub> <italic>&#x3d; page, and A</italic>
<sub>3</sub> <italic>&#x3d; date, are dimension attributes, and the other one, X&#x3d;count, is the measure attribute. Let</italic> <inline-formula id="inf55">
<mml:math id="minf55">
<mml:mrow>
<mml:msub>
<mml:mtext>B</mml:mtext>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>e</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>B</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
<italic>,</italic> <inline-formula id="inf56">
<mml:math id="minf56">
<mml:mrow>
<mml:msub>
<mml:mtext>B</mml:mtext>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>B</mml:mi>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
<italic>, and</italic> <inline-formula id="inf57">
<mml:math id="minf57">
<mml:mrow>
<mml:msub>
<mml:mtext>B</mml:mtext>
<mml:mn>3</mml:mn>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>y</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>29</mml:mn>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
<italic>. Then,</italic> <bold>B</bold> <italic>is the set of tuples regarding the revision of page A or B by Alice or Bob on May-29, and its mass M</italic>
<sub>
<bold>B</bold>
</sub> <italic>is</italic> 19<italic>, the total number of such revisions. The attribute-value mass of Alice (i.e.,</italic> <inline-formula id="inf58">
<mml:math id="minf58">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mtext>B</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>e</mml:mi>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
<italic>) is</italic> 9<italic>, the number of revisions on A or B by exactly Alice on May-29. In the tensor representation,</italic> <bold>B</bold> <italic>composes a subtensor in</italic> <bold>R</bold>
<italic>, as depicted in</italic> <xref ref-type="fig" rid="F2">Figure&#x20;2B</xref>
<italic>.</italic>
</p>
</sec>
<sec id="s2-2">
<title>2.2 Density Measures</title>
<p>We present density measures proven useful for anomaly detection in past studies. We use them throughout the paper although our dense-subtensor detection method, explained in <xref ref-type="sec" rid="s3">Section 3</xref>, is flexible and not restricted to specific measures. Below, we slightly abuse notations to emphasize that the density measures are the functions of <inline-formula id="inf59">
<mml:math id="minf59">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf60">
<mml:math id="minf60">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf61">
<mml:math id="minf61">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf62">
<mml:math id="minf62">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, where <inline-formula id="inf63">
<mml:math id="minf63">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> is a subtensor of a relation <inline-formula id="inf64">
<mml:math id="minf64">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula>.</p>
<p>Arithmetic Average Mass (Definition 1) and Geometric Average Mass (Definition 2), which were used for detecting network intrusions and bot activities in <xref ref-type="bibr" rid="B36">Shin et&#x20;al. (2018)</xref>, are the extensions of density measures widely-used for graphs (<xref ref-type="bibr" rid="B21">Kannan and Vinay, 1999</xref>; <xref ref-type="bibr" rid="B10">Charikar, 2000</xref>).</p>
<p>
<sc>Definition</sc> 1 (Arithmetic Average Mass <inline-formula id="inf65">
<mml:math id="minf65">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>). <italic>The arithmetic average mass of a subtensor</italic> <bold>B</bold> <italic>of a relation</italic> <bold>R</bold> <italic>is defined as</italic>
<disp-formula id="equ1">
<mml:math id="mequ1">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>N</mml:mi>
</mml:mfrac>
<mml:mstyle displaystyle="true">
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:mfrac>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<sc>Definition</sc> 2 (Geometric Average Mass <inline-formula id="inf66">
<mml:math id="minf66">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>o</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>). <italic>The geometric average mass of a subtensor</italic> <bold>B</bold> <italic>of a relation</italic> <bold>R</bold> <italic>is defined as</italic>
<disp-formula id="equ2">
<mml:math id="mequ2">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>o</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>o</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mstyle displaystyle="true">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:msubsup>
<mml:mo>&#x220f;</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:mstyle>
<mml:mtext>&#x200b;</mml:mtext>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>N</mml:mi>
</mml:mfrac>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfrac>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>Suspiciousness (Definition 3), which was used for detecting &#x2018;retweet-boosting&#x2019; activities in <xref ref-type="bibr" rid="B19">Jiang et&#x20;al. (2014)</xref>, is the negative log-likelihood that <inline-formula id="inf67">
<mml:math id="minf67">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> has mass <inline-formula id="inf68">
<mml:math id="minf68">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> under the assumption that each entry of <inline-formula id="inf69">
<mml:math id="minf69">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula> is i.i.d from a Poisson distribution.</p>
<p>
<sc>Definition</sc> 3 (Suspiciousness <inline-formula id="inf70">
<mml:math id="minf70">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>). <italic>The suspiciousness of a subtensor</italic> <bold>B</bold> <italic>of a relation</italic> <bold>R</bold> <italic>is defined as</italic>
<disp-formula id="equ3">
<mml:math id="mequ3">
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>log</mml:mi>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:msub>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x220f;</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mstyle>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:msub>
<mml:mo>&#x2061;</mml:mo>
<mml:mi>log</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x220f;</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>Entry Surplus (Definition 4) is the observed mass of <inline-formula id="inf71">
<mml:math id="minf71">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> subtracted by &#x3b1; times the expected mass, under the assumption that the value of each entry (in the tensor representation) in <inline-formula id="inf72">
<mml:math id="minf72">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula> is i.i.d. It is a multi-dimensional extension of edge surplus, which was proposed in <xref ref-type="bibr" rid="B40">Tsourakakis et&#x20;al. (2013)</xref> as a density metric for graphs.</p>
<p>
<sc>Definition</sc> 4 (Entry Surplus). The entry surplus of a subtensor <bold>B</bold> of a relation <bold>R</bold> is defined as<disp-formula id="equ4">
<mml:math id="mequ4">
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:msub>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x220f;</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mstyle>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>Subtensors with high entry surplus are configurable by adjusting <italic>&#x3b1;</italic>. With high <italic>&#x3b1;</italic> values, relatively small compact subtensors have higher entry surplus than large sparse subtensors, while the opposite happens with small <italic>&#x3b1;</italic> values. We show this tendency experimentally in <xref ref-type="sec" rid="s4-7">Section&#x20;4.7</xref>.</p>
</sec>
<sec id="s2-3">
<title>2.3 Problem Definition</title>
<p>Based on the concepts and density measures in the previous sections, we define the problem of top-<italic>k</italic> dense-subtensor detection in a large-scale tensor in Definition&#x20;1.</p>
<p>
<bold>Problem 1</bold> (Large-scale Top-k Densest Subtensor Detection). <bold>(1) Given:</bold> a large-scale relation <bold>R</bold> not fitting in memory, the number of subtensors k, and a density measure &#x3c1;, <bold>(2) Find:</bold> the top-k subtensors of <bold>R</bold> with the highest density in terms of &#x3c1;.</p>
<p>Even when we restrict our attention to finding one subtensor in a matrix fitting in memory (i.e.,&#x20;<italic>k</italic>&#x20;&#x3d; 1 and <italic>N</italic>&#x20;&#x3d; 2), obtaining an exact solution takes <inline-formula id="inf73">
<mml:math id="minf73">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mstyle displaystyle="true">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:mstyle>
<mml:mtext>&#x200b;</mml:mtext>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>6</mml:mn>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> time (<xref ref-type="bibr" rid="B15">Goldberg, 1984</xref>; <xref ref-type="bibr" rid="B22">Khuller and Saha, 2009</xref>), which is infeasible for large-scale tensors. Thus, our focus in this work is to design an approximate algorithm with (1) near-linear scalability with all aspects of <inline-formula id="inf74">
<mml:math id="minf74">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula>, which does not fit in memory, (2) an approximation guarantee at least for some density measures, and (3) meaningful results on real-world&#x20;data.</p>
</sec>
</sec>
<sec id="s3">
<title>3 Proposed Method</title>
<p>In this section, we propose <sc>D-Cube</sc>, a disk-based dense-subtensor detection method. We first describe <sc>D-Cube</sc> in <xref ref-type="sec" rid="s3-1">Section 3.1</xref>. Then, we prove its theoretical properties in <xref ref-type="sec" rid="s3-2">Section 3.2</xref>. Lastly, we present our <sc>MapReduce</sc> implementation of <sc>D-Cube</sc> in <xref ref-type="sec" rid="s3-3">Section 3.3</xref>. Throughout these subsections, we assume that the entries of tensors (i.e.,&#x20;the tuples of relations) are stored on disk and read/written only in a sequential way. However, all other data (e.g., distinct attribute-value sets and the mass of each attribute value) are assumed to be stored in memory.</p>
<p>
<statement content-type="algorithm" id="Algorithm_1">
<p>Algorithm_1</p>
</statement>
<statement content-type="algorithm" id="Algorithm_2">
<p>Algorithm_2</p>
</statement>
</p>
<sec id="s3-1">
<title>3.1 Algorithm</title>
<p>
<sc>D-Cube</sc> is a search method that starts with the given relation and removes attribute values (and the tuples with the attribute values) sequentially so that a dense subtensor is left. Contrary to previous approaches, <sc>D-Cube</sc> removes multiple attribute values (and the tuples with the attribute values) at a time to reduce the number of iterations and also disk I/Os. In addition to this advantage, <sc>D-Cube</sc> carefully chooses attribute values to remove to give the same accuracy guarantee as if attribute values were removed one by one, and shows similar or even higher accuracy empirically.</p>
<sec id="s3-1-1">
<title>3.1.1 Overall Structure of D-Cube (<xref ref-type="statement" rid="Algorithm_1">Algorithm 1</xref>)</title>
<p>
<xref ref-type="statement" rid="Algorithm_1">Algorithm 1</xref> describes the overall structure of <sc>D-Cube</sc> . It first copies and assigns the given relation <inline-formula id="inf75">
<mml:math id="minf75">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula> to <inline-formula id="inf76">
<mml:math id="minf76">
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> (line 1); and computes the sets of distinct attribute values composing <inline-formula id="inf77">
<mml:math id="minf77">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula> (line 2). Then, it finds <italic>k</italic> dense subtensors one by one from <inline-formula id="inf78">
<mml:math id="minf78">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula> (line 6) using its mass as a parameter (line 5). The detailed procedure for detecting a single dense subtensor from <inline-formula id="inf79">
<mml:math id="minf79">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula> is explained in <xref ref-type="sec" rid="s3-1-2">Section 3.1.2</xref>. After each subtensor <inline-formula id="inf80">
<mml:math id="minf80">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> is found, the tuples included in <inline-formula id="inf81">
<mml:math id="minf81">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> are removed from <inline-formula id="inf82">
<mml:math id="minf82">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula> (line 7) to prevent the same subtensor from being found again. Due to this change in <inline-formula id="inf83">
<mml:math id="minf83">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula>, subtensors found from <inline-formula id="inf84">
<mml:math id="minf84">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula> are not necessarily the subtensors of the original relation <inline-formula id="inf85">
<mml:math id="minf85">
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>. Thus, instead of <inline-formula id="inf86">
<mml:math id="minf86">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula>, the subtensor in <inline-formula id="inf87">
<mml:math id="minf87">
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> formed by the same attribute values forming <inline-formula id="inf88">
<mml:math id="minf88">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> is added to the list of <italic>k</italic> dense subtensors (lines 8&#x2013;9). Notice that, due to this step, <sc>D-Cube</sc> can detect overlapping dense subtensors. That is, a tuple can be included in multiple dense subtensors.</p>
<p>Based on our assumption that the sets of distinct attribute values (i.e.,&#x20;<inline-formula id="inf89">
<mml:math id="minf89">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf90">
<mml:math id="minf90">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>) are stored in memory and can be randomly accessed, all the steps in <xref ref-type="statement" rid="Algorithm_1">Algorithm 1</xref> can be performed by sequentially reading and writing tuples in relations (i.e.,&#x20;tensor entries) in disk without loading all the tuples in memory at once. For example, the filtering steps in lines 7&#x2013;8 can be performed by sequentially reading each tuple from disk and writing the tuple to disk only if it satisfies the given condition.</p>
<table-wrap id="T8" position="float">
<label>Algorithm 1</label>
<caption>
<p>D&#x2010;CUBE</p>
</caption>
<table>
<tbody>
<tr>
<td>
<inline-graphic xlink:href="fdata-03-594302-fx1.tif"/>
</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Note that this overall structure of <sc>D-Cube</sc> is similar to that of <sc>M-Zoom</sc> (<xref ref-type="bibr" rid="B36">Shin et&#x20;al., 2018</xref>) except that tuples are stored on disk. However, the methods differ significantly in the way each dense subtensor is found from <inline-formula id="inf91">
<mml:math id="minf91">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula>, which is explained in the following section.</p>
</sec>
<sec id="s3-1-2">
<title>3.1.2 Single Subtensor Detection (<xref ref-type="statement" rid="Algorithm_2">Algorithm 2</xref>)</title>
<p>
<xref ref-type="statement" rid="Algorithm_2">Algorithm 2</xref> describes how <sc>D-Cube</sc> detects each dense subtensor from the given relation <inline-formula id="inf92">
<mml:math id="minf92">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula>. It first initializes a subtensor <inline-formula id="inf93">
<mml:math id="minf93">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> to <inline-formula id="inf94">
<mml:math id="minf94">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula> (lines 1&#x2013;2) then repeatedly removes attribute values and the tuples of <inline-formula id="inf95">
<mml:math id="minf95">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> with those attribute values until all values are removed (line&#x20;5).</p>
<p>Specifically, in each iteration, <sc>D-Cube</sc> first chooses a dimension attribute <italic>A</italic>
<sub>
<italic>i</italic>
</sub> that attribute values are removed from (line 7). Then, it computes <italic>D</italic>
<sub>
<italic>i</italic>
</sub>, the set of attribute values whose masses are less than <inline-formula id="inf96">
<mml:math id="minf96">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mo>&#x2265;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> times the average (line 8). We explain how the dimension attribute is chosen, in <xref ref-type="sec" rid="s3-1-3">Section 3.1.3</xref> and analyze the effects of &#x3b8; on the accuracy and the time complexity, in <xref ref-type="sec" rid="s3-2">Section 3.2</xref>. The tuples whose attribute values of <italic>A</italic>
<sub>
<italic>i</italic>
</sub> are in <italic>D</italic>
<sub>
<italic>i</italic>
</sub> are removed from <inline-formula id="inf97">
<mml:math id="minf97">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> at once within a single scan of <inline-formula id="inf98">
<mml:math id="minf98">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> (line 16). However, deleting a subset of <italic>D</italic>
<sub>
<italic>i</italic>
</sub> may achieve higher value of the metric &#x3c1;. Hence, <sc>D-Cube</sc> computes the changes in the density of <inline-formula id="inf99">
<mml:math id="minf99">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> (line 11) as if the attribute values in <italic>D</italic>
<sub>
<italic>i</italic>
</sub> were removed one by one, in an increasing order of their masses. This allows <sc>D-Cube</sc> to optimize &#x3c1; as if we removed attributes one by one, while still benefiting from the computational speedup of removing multiple attributes in each scan. Note that these changes in &#x3c1; can be computed exactly without actually removing the tuples from <inline-formula id="inf100">
<mml:math id="minf100">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> or even accessing the tuples in <inline-formula id="inf101">
<mml:math id="minf101">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> since its mass (i.e.,&#x20;<inline-formula id="inf102">
<mml:math id="minf102">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>) and the number of distinct attribute values (i.e.,&#x20;<inline-formula id="inf103">
<mml:math id="minf103">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>) are maintained up-to-date (11&#x2013;12). This is because removing an attribute value from a dimension attribute does not affect the masses of the other values of the same attribute. The orders that attribute values are removed and when the density of <inline-formula id="inf104">
<mml:math id="minf104">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> is maximized are maintained (lines 13&#x2013;15) so that the subtensor <inline-formula id="inf105">
<mml:math id="minf105">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> maximizing the density can be restored and returned (lines 17&#x2013;18), as the result of <xref ref-type="statement" rid="Algorithm_2">Algorithm&#x20;2</xref>.</p>
<table-wrap id="T9" position="float">
<label>Algorithm 2</label>
<caption>
<p>
<italic>find</italic>_<italic>one</italic> in <sc>D-Cube</sc>
</p>
</caption>
<table>
<tbody>
<tr>
<td>
<inline-graphic xlink:href="fdata-03-594302-fx2.tif"/>
</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Note that, in each iteration (lines 5&#x2013;16) of <xref ref-type="statement" rid="Algorithm_2">Algorithm 2</xref>, the tuples of <inline-formula id="inf106">
<mml:math id="minf106">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula>, which are stored on disk, need to be scanned only twice, once in line 6 and once in line 16. Moreover, both steps can be performed by simply sequentially reading and/or writing tuples in <inline-formula id="inf107">
<mml:math id="minf107">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> without loading all the tuples in memory at once. For example, to compute attribute-value masses in line 6, <sc>D-Cube</sc> increases <inline-formula id="inf108">
<mml:math id="minf108">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> by <inline-formula id="inf109">
<mml:math id="minf109">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> for each dimension attribute <italic>A</italic>
<sub>
<italic>n</italic>
</sub> after reading each tuple <italic>t</italic> in <inline-formula id="inf110">
<mml:math id="minf110">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> sequentially from&#x20;disk.</p>
<p>
<statement content-type="algorithm" id="Algorithm_3">
<p>Algorithm_3</p>
</statement>
</p>
<p>
<statement content-type="algorithm" id="Algorithm_4">
<p>Algorithm_4</p>
</statement>
</p>
</sec>
<sec id="s3-1-3">
<title>3.1.3 Dimension Selection (<xref ref-type="statement" rid="Algorithm_3">Algorithms 3</xref> and <xref ref-type="statement" rid="Algorithm_4">4</xref>)</title>
<p>We discuss two policies for choosing a dimension attribute that attribute values are removed from. They are used in line 7 of <xref ref-type="statement" rid="Algorithm_2">Algorithm 2</xref> offering different advantages.</p>
<p>
<italic>Maximum Cardinality Policy (Algorithm 3)</italic>: The dimension attribute with the largest cardinality is chosen, as described in <xref ref-type="statement" rid="Algorithm_3">Algorithm 3</xref>. This simple policy, however, provides an accuracy guarantee (see Theorem 3 in <xref ref-type="sec" rid="s3-2-2">Section&#x20;3.2.2</xref>).</p>
<p>
<italic>Maximum Density Policy (Algorithm 4)</italic>: The density of <inline-formula id="inf111">
<mml:math id="minf111">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> when attribute values are removed from each dimension attribute is computed. Then, the dimension attribute leading to the highest density is chosen. Note that the tuples in <inline-formula id="inf112">
<mml:math id="minf112">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula>, stored on disk, do not need to be accessed for this computation, as described in <xref ref-type="statement" rid="Algorithm_4">Algorithm 4</xref>. Although this policy does not provide the accuracy guarantee given by the maximum cardinality policy, this policy works well with various density measures and tends to spot denser subtensors than the maximum cardinality policy in our experiments with real-world&#x20;data.</p>
<table-wrap id="T10" position="float">
<label>Algorithm 3</label>
<caption>
<p>
<italic>select</italic>_<italic>dimension</italic> by cardinality</p>
</caption>
<table>
<tbody>
<tr>
<td>
<inline-graphic xlink:href="fdata-03-594302-fx3.tif"/>
</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T11" position="float">
<label>Algorithm 4</label>
<caption>
<p>
<italic>select</italic>_<italic>dimension</italic> by density</p>
</caption>
<table>
<tbody>
<tr>
<td>
<inline-graphic xlink:href="fdata-03-594302-fx4.tif"/>
</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3-1-4">
<title>3.1.4 Efficient Implementation</title>
<p>We present the optimization techniques used for the efficient implementation of <sc>D-Cube</sc>.</p>
<p>
<italic>Combining Disk-Accessing Steps</italic>
<bold>.</bold> The amount of disk I/O can be reduced by combining multiple steps involving disk accesses. In <xref ref-type="statement" rid="Algorithm_1">Algorithm 1</xref>, updating <inline-formula id="inf113">
<mml:math id="minf113">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula> (line 7) in an iteration can be combined with computing the mass of <inline-formula id="inf114">
<mml:math id="minf114">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula> (line 5) in the next iteration. That is, if we aggregate the values of the tuples of <inline-formula id="inf115">
<mml:math id="minf115">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula> while they are written for the update, we do not need to scan <inline-formula id="inf116">
<mml:math id="minf116">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula> again for computing its mass in the next iteration. Likewise, in <xref ref-type="statement" rid="Algorithm_2">Algorithm 2</xref>, updating <inline-formula id="inf117">
<mml:math id="minf117">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> (line 16) in an iteration can be combined with computing attribute-value masses (line 6) in the next iteration. This optimization reduces the amount of disk I/O in <sc>D-Cube</sc> about&#x20;30%.</p>
<p>
<italic>Caching Tensor Entries in Memory</italic>
<bold>.</bold> Although we assume that tuples are stored on disk, storing them in memory up to the memory capacity speeds up <sc>D-Cube</sc> up to 3&#x20;times in our experiments (see <xref ref-type="sec" rid="s4-4">Section 4.4</xref>). We cache the tuples in <inline-formula id="inf118">
<mml:math id="minf118">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula>, which are more frequently accessed than those in <inline-formula id="inf119">
<mml:math id="minf119">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula> or <inline-formula id="inf120">
<mml:math id="minf120">
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, in memory with the highest priority.</p>
</sec>
</sec>
<sec id="s3-2">
<title>3.2 Analyses</title>
<p>In this section, we prove the time and space complexities of <sc>D-Cube</sc> and the accuracy guarantee provided by <sc>D-Cube</sc> . Then, we theoretically compare <sc>D-Cube</sc> with <sc>M-Zoom</sc> and <sc>M-Biz</sc> (<xref ref-type="bibr" rid="B36">Shin et&#x20;al., 2018</xref>).</p>
<sec id="s3-2-1">
<title>3.2.1 Complexity Analyses</title>
<p>
<statement>
<p>
<sc>Theorem</sc> 1 states the worst-case time complexity, which equals to the worst-case I/O complexity, of <sc>D-Cube</sc>&#x20;.</p>
</statement>
</p>
<p>
<statement content-type="lemma" id="Lemma_1">
<label>
<sc>Lemma</sc> 1</label>
<p>(Maximum Number of Iterations in <xref ref-type="statement" rid="Algorithm_2">Algorithm 2</xref>). Let <inline-formula id="inf121">
<mml:math id="minf121">
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>N</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mtext>R</mml:mtext>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Then, the number of iterations (lines 5&#x2013;16) in <xref ref-type="statement" rid="Algorithm_2">Algorithm 2</xref> is at most<disp-formula id="equ5">
<mml:math id="mequ5">
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mi>min</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>log</mml:mi>
</mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mi>L</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
</statement>
</p>
<p>
<statement content-type="proof" id="uProof_1">
<label>
<sc>Proof</sc>
</label>
<p>In each iteration (lines 5&#x2013;16) of <xref ref-type="statement" rid="Algorithm_2">Algorithm 2</xref>, among the values of the chosen dimension attribute <inline-formula id="inf122">
<mml:math id="minf122">
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, attribute values whose masses are at most <inline-formula id="inf123">
<mml:math id="minf123">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>, where <inline-formula id="inf124">
<mml:math id="minf124">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>&#x2265;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, are removed. The set of such attribute values is denoted by <inline-formula id="inf125">
<mml:math id="minf125">
<mml:mrow>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. We will show that, if <inline-formula id="inf126">
<mml:math id="minf126">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#x3e;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, then<disp-formula id="e1">
<mml:math id="me1">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>\</mml:mo>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#x3c;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>Note that, when <inline-formula id="inf127">
<mml:math id="minf127">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>\</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, <xref ref-type="disp-formula" rid="e1">Eq. (1)</xref> trivially holds. When <inline-formula id="inf128">
<mml:math id="minf128">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>\</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#x3e;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf129">
<mml:math id="minf129">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> can be factorized and lower bounded as<disp-formula id="equ6">
<mml:math id="mequ6">
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>\</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:munder>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:munder>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:munder>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo>&#x2265;</mml:mo>
<mml:munder>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>\</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:munder>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3e;</mml:mo>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>\</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>where the last strict inequality is from the definition of <inline-formula id="inf130">
<mml:math id="minf130">
<mml:mrow>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and that <inline-formula id="inf131">
<mml:math id="minf131">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>\</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#x3e;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>. This strict inequality implies <inline-formula id="inf132">
<mml:math id="minf132">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:msub>
<mml:mo>&#x3e;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, and thus dividing both sides by <inline-formula id="inf133">
<mml:math id="minf133">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula> gives <xref ref-type="disp-formula" rid="e1">Eq. 1</xref>. Now, <xref ref-type="disp-formula" rid="e1">Eq. 1</xref> implies that the number of remaining values of the chosen attribute after each iteration is less than <inline-formula id="inf134">
<mml:math id="minf134">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> of that before the iteration. Hence each attribute can be chosen at most <inline-formula id="inf135">
<mml:math id="minf135">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>log</mml:mi>
</mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mi>L</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> times before all of its values are removed. Thus, the maximum number of iterations is at most <inline-formula id="inf136">
<mml:math id="minf136">
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>log</mml:mi>
</mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mi>L</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. Also, by <xref ref-type="disp-formula" rid="e1">Eq. 1</xref>, at least one attribute value is removed per iteration. Hence, the maximum number of iterations is at most the number of attribute values, which is upper bounded by <inline-formula id="inf137">
<mml:math id="minf137">
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mi>L</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. Hence the number of iterations is upper bounded by <inline-formula id="inf138">
<mml:math id="minf138">
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mi>max</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>log</mml:mi>
</mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mi>L</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.&#x220e;</p>
</statement>
</p>
<p>
<statement>
<p>
<sc>Theorem</sc> 1 (Worst-case Time Complexity). <italic>Let</italic> <inline-formula id="inf139">
<mml:math id="minf139">
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>N</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
<italic>. If</italic> <inline-formula id="inf140">
<mml:math id="minf140">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>L</mml:mi>
</mml:mfrac>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
<italic>, which is a weaker condition than</italic> <inline-formula id="inf141">
<mml:math id="minf141">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
<italic>, the worst-case time complexity of</italic> <xref ref-type="statement" rid="Algorithm_1">Algorithm 1</xref> <italic>is</italic>
<disp-formula id="e2">
<mml:math id="me2">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:msup>
<mml:mi>N</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mi>min</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>log</mml:mi>
</mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mi>L</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(2)</label>
</disp-formula>
</p>
</statement>
</p>
<p>
<statement content-type="proof" id="uProof_2">
<label>
<sc>Proof</sc>
</label>
<p>From Lemma 1, the number of iterations (lines 5&#x2013;16) in <xref ref-type="statement" rid="Algorithm_2">Algorithm 2</xref> is <inline-formula id="inf142">
<mml:math id="minf142">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mi>min</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>log</mml:mi>
</mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mi>L</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Executing lines 6 and 16&#x20;<inline-formula id="inf143">
<mml:math id="minf143">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mi>min</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>log</mml:mi>
</mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mi>L</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> times takes <inline-formula id="inf144">
<mml:math id="minf144">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>N</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mi>min</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>log</mml:mi>
</mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mi>L</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, which dominates the time complexity of the other parts. For example, repeatedly executing line 9 takes <inline-formula id="inf145">
<mml:math id="minf145">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mi>L</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>log</mml:mi>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x2061;</mml:mo>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, and by our assumption, it is dominated by <inline-formula id="inf146">
<mml:math id="minf146">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>N</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mi>min</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>log</mml:mi>
</mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mi>L</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Thus, the worst-case time complexity of <xref ref-type="statement" rid="Algorithm_2">Algorithm 2</xref> is <inline-formula id="inf147">
<mml:math id="minf147">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>N</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mi>min</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>log</mml:mi>
</mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mi>L</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, and that of <xref ref-type="statement" rid="Algorithm_1">Algorithm 1</xref>, which executes <xref ref-type="statement" rid="Algorithm_2">Algorithm 2</xref>, k times, is <inline-formula id="inf148">
<mml:math id="minf148">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:msup>
<mml:mi>N</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mi>min</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>log</mml:mi>
</mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mi>L</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.&#x220e;</p>
<p>
<italic>However, this worst-case time complexity, which allows the worst distributions of the measure attribute values of tuples, is too pessimistic. In</italic> <xref ref-type="sec" rid="s4-4">Section 4.4</xref>, <italic>we experimentally show that</italic> <sc>
<italic>D-Cube</italic>
</sc> <italic>scales linearly with k, N, and</italic> <inline-formula id="inf149">
<mml:math id="minf149">
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:math>
</inline-formula>
<italic>; and sub-linearly with L even when &#x3b8; is its smallest value&#x20;1.</italic>
</p>
<p>
<italic>Theorem 2 states the memory requirement of</italic> <sc>
<italic>D-Cube</italic>
</sc> <italic>. Since the tuples do not need to be stored in memory all at once in</italic> <sc>
<italic>D-Cube</italic>
</sc>
<italic>, its memory requirement does not depend on the number of tuples (i.e.,</italic> <inline-formula id="inf150">
<mml:math id="minf150">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
<italic>).</italic>
</p>
</statement>
</p>
<p>
<statement content-type="theorem" id="Theorem_2">
<label>
<sc>Theorem</sc> 2</label>
<p> (Memory Requirements). <italic>The amount of memory space in</italic> <xref ref-type="statement" rid="Algorithm_1">Algorithm 1</xref> <italic>is</italic> <inline-formula id="inf151">
<mml:math id="minf151">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mstyle displaystyle="true">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:mstyle>
<mml:mtext>&#x200b;</mml:mtext>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</statement>
</p>
<p>
<statement content-type="proof" id="uProof_3">
<label>
<sc>Proof</sc>
</label>
<p> In <xref ref-type="statement" rid="Algorithm_1">Algorithm 1</xref>, <inline-formula id="inf152">
<mml:math id="minf152">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf153">
<mml:math id="minf153">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf154">
<mml:math id="minf154">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> need to be loaded into memory at once. Each has at most <inline-formula id="inf155">
<mml:math id="minf155">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula> values. Thus, the memory requirement is <inline-formula id="inf156">
<mml:math id="minf156">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mstyle displaystyle="true">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:mstyle>
<mml:mtext>&#x200b;</mml:mtext>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. &#x220e;</p>
</statement>
</p>
</sec>
<sec id="s3-2-2">
<title>3.2.2 Accuracy in Dense-Subtensor Detection</title>
<p>We show that <sc>D-Cube</sc> gives the same accuracy guarantee with in-memory algorithms proposed in <xref ref-type="bibr" rid="B36">Shin et&#x20;al. (2018)</xref>, if we set <italic>&#x3b8;</italic> to 1, although accesses to tuples (stored on disk) are restricted in <sc>D-Cube</sc> to reduce disk I/Os. Specifically, Theorem 3 states that the subtensor found by <xref ref-type="statement" rid="Algorithm_2">Algorithm 2</xref> with the maximum cardinality policy has density at least <inline-formula id="inf157">
<mml:math id="minf157">
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula> of the optimum when <inline-formula id="inf158">
<mml:math id="minf158">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is used as the density measure.</p>
<p>
<statement content-type="theorem" id="Theorem_3">
<label>
<sc>Theorem</sc> 3</label>
<p>(<italic>&#x3b8;N</italic>-Approximation Guarantee). <italic>Let</italic> <inline-formula id="inf245">
<mml:math id="minf245">
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
</mml:math>
</inline-formula> <italic>be the subtensor</italic> <inline-formula id="inf246">
<mml:math id="minf246">
<mml:mrow>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> <italic>maximizing</italic> <inline-formula id="inf159">
<mml:math id="minf159">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> <italic>in the given relation </italic>R<italic>. Let</italic> <inline-formula id="inf160">
<mml:math id="minf160">
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo stretchy="true">&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> <italic>be the subtensor returned by</italic> <xref ref-type="statement" rid="Algorithm_2">Algorithm 2</xref> <italic>with</italic> <inline-formula id="inf161">
<mml:math id="minf161">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> <italic>and the maximum cardinality policy. Then,</italic>
<disp-formula id="equ7">
<mml:math id="mequ7">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo stretchy="true">&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2265;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
</statement>
</p>
<p>
<statement content-type="proof" id="uProof_4">
<label>
<sc>Proof</sc>
</label>
<p>First, the maximal subtensor <inline-formula id="inf249">
<mml:math id="minf249">
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
</mml:math>
</inline-formula> satisfies that, for any <inline-formula id="inf162">
<mml:math id="minf162">
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>N</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and for any attribute value <inline-formula id="inf163">
<mml:math id="minf163">
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msubsup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>i</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, its attribute-value mass <inline-formula id="inf164">
<mml:math id="minf164">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is at least <inline-formula id="inf165">
<mml:math id="minf165">
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>N</mml:mi>
</mml:mfrac>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. This is since the maximality of <inline-formula id="inf166">
<mml:math id="minf166">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> implies <inline-formula id="inf167">
<mml:math id="minf167">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, and plugging in Definition 1 to <inline-formula id="inf168">
<mml:math id="minf168">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> gives <inline-formula id="inf169">
<mml:math id="minf169">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>N</mml:mi>
</mml:mfrac>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mstyle displaystyle="true">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>n</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msubsup>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:mstyle>
<mml:mtext>&#x200b;</mml:mtext>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>N</mml:mi>
</mml:mfrac>
<mml:msup>
<mml:mstyle displaystyle="true">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>n</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msubsup>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:mstyle>
<mml:mtext>&#x200b;</mml:mtext>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>, which reduces to<disp-formula id="e3">
<mml:math id="me3">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2265;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>N</mml:mi>
</mml:mfrac>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(3)</label>
</disp-formula>
</p>
<p>Consider the earliest iteration (lines 5&#x2013;16) in <xref ref-type="statement" rid="Algorithm_2">Algorithm 2</xref> where an attribute value a of <inline-formula id="inf170">
<mml:math id="minf170">
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
</mml:math>
</inline-formula> is included in <inline-formula id="inf171">
<mml:math id="minf171">
<mml:mrow>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. Let <inline-formula id="inf172">
<mml:math id="minf172">
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> be <inline-formula id="inf173">
<mml:math id="minf173">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:math>
</inline-formula> in the beginning of the iteration. Our goal is to prove <inline-formula id="inf174">
<mml:math id="minf174">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2265;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, which we will show as <inline-formula id="inf175">
<mml:math id="minf175">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo stretchy="true">&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2265;</mml:mo>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2265;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mfrac>
<mml:mo>&#x2265;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mfrac>
<mml:mo>&#x2265;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
</p>
<p>First, <inline-formula id="inf176">
<mml:math id="minf176">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo stretchy="true">&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2265;</mml:mo>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is from the maximality of <inline-formula id="inf177">
<mml:math id="minf177">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo stretchy="true">&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> among the densities of the subtensors generated in the iterations (lines 1:line:single:order1-1:line:single:order2 in <xref ref-type="statement" rid="Algorithm_2">Algorithm 2</xref>). Second, applying <inline-formula id="inf178">
<mml:math id="minf178">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>i</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msubsup>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#x2265;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>N</mml:mi>
</mml:mfrac>
<mml:msup>
<mml:mstyle displaystyle="true">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>n</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msubsup>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:mstyle>
<mml:mtext>&#x200b;</mml:mtext>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> from the maximum cardinality policy (<xref ref-type="statement" rid="Algorithm_3">Algorithm 3</xref>) to Definition 1 of <inline-formula id="inf179">
<mml:math id="minf179">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> gives <inline-formula id="inf180">
<mml:math id="minf180">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>N</mml:mi>
</mml:mfrac>
<mml:msup>
<mml:mstyle displaystyle="true">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>n</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msubsup>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:mstyle>
<mml:mtext>&#x200b;</mml:mtext>
</mml:msup>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x2265;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>i</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msubsup>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>. And <inline-formula id="inf181">
<mml:math id="minf181">
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> gives <inline-formula id="inf182">
<mml:math id="minf182">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x2265;</mml:mo>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. So combining these gives <inline-formula id="inf183">
<mml:math id="minf183">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2265;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>. Third, <inline-formula id="inf184">
<mml:math id="minf184">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mfrac>
<mml:mo>&#x2265;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula> is from <inline-formula id="inf185">
<mml:math id="minf185">
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo>&#x2283;</mml:mo>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>. Fourth, <inline-formula id="inf186">
<mml:math id="minf186">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mfrac>
<mml:mo>&#x2265;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is from <xref ref-type="disp-formula" rid="e3">Eq. (3)</xref>. Hence, <inline-formula id="inf187">
<mml:math id="minf187">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2265;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> holds. &#x220e;</p>
</statement>
</p>
</sec>
<sec id="s3-2-3">
<title>3.2.3 Theoretical Comparison with <sc>M-Zoom</sc> and <sc>M-Biz</sc> (<xref ref-type="bibr" rid="B36">Shin et&#x20;al., 2018</xref>)</title>
<p>While <sc>D-Cube</sc> requires only <inline-formula id="inf188">
<mml:math id="minf188">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mstyle displaystyle="true">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:mstyle>
<mml:mtext>&#x200b;</mml:mtext>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> memory space (see Theorem 2), which does not depend on the number of tuples (i.e.,&#x20;<inline-formula id="inf189">
<mml:math id="minf189">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>), <sc>M-Zoom</sc> and <sc>M-Biz</sc> require additional <inline-formula id="inf190">
<mml:math id="minf190">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> space for storing all tuples in main memory. The worst-case time complexity of <sc>D-Cube</sc> is <inline-formula id="inf191">
<mml:math id="minf191">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:msup>
<mml:mi>N</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mi>min</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>log</mml:mi>
</mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:msub>
<mml:mi>L</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> (see Theorem 1), and it is slightly higher than that of <sc>M-Zoom</sc>, which is <inline-formula id="inf192">
<mml:math id="minf192">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mi>log</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Empirically, however, <sc>D-Cube</sc> is up to 7&#xd7; faster than <sc>M-Zoom</sc>, as we show in <xref ref-type="sec" rid="s4">Section 4</xref>. The main reason is that <sc>D-Cube</sc> reads and writes tuples only sequentially, allowing efficient caching based on spatial locality. On the other hand, <sc>M-Zoom</sc> requires tuples to be stored and accessed in hash tables, making efficient caching difficult.<xref ref-type="fn" rid="FN1">
<sup>1</sup>
</xref> The time complexity of <sc>M-Biz</sc> depends on the number of iterations until reaching a local optimum, and there is no known upper bound on the number of iterations tighter than <inline-formula id="inf193">
<mml:math id="minf193">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mn>2</mml:mn>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mstyle displaystyle="true">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:mstyle>
<mml:mtext>&#x200b;</mml:mtext>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. If <inline-formula id="inf194">
<mml:math id="minf194">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is used, <sc>M-Zoom</sc> and <sc>M-Biz</sc>
<xref ref-type="fn" rid="FN2">
<sup>2</sup>
</xref> give an approximation ratio of N, which is the approximation ratio of <sc>D-Cube</sc> when <italic>&#x3b8;</italic> is set to 1 (see Theorem&#x20;3).</p>
</sec>
</sec>
<sec id="s3-3">
<title>3.3 MapReduce Implementation</title>
<p>We present our <sc>MapReduce</sc> implementation of <sc>D-Cube</sc>, assuming that tuples in relations are stored in a distributed file system. Specifically, we describe four <sc>MapReduce</sc> algorithms that cover the steps of <sc>D-Cube</sc> accessing tuples.</p>
<p>(1) <italic>Filtering Tuples</italic>. In lines 7-8 <xref ref-type="statement" rid="Algorithm_1">Algorithm 1</xref> and line 16 of <xref ref-type="statement" rid="Algorithm_3">Algorithm 2</xref>, <sc>D-Cube</sc> filters the tuples satisfying the given conditions. These steps are done by the following map-only algorithm, where we broadcast the data used in each condition (e.g., <inline-formula id="inf195">
<mml:math id="minf195">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> in line 7 of <xref ref-type="statement" rid="Algorithm_1">Algorithm 1</xref>) to mappers using the distributed cache functionality.<list list-type="simple">
<list-item>
<p>Map-stage: Take a tuple t (i.e.,&#x20;<inline-formula id="inf196">
<mml:math id="minf196">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2329;</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>N</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x232a;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>) and emit t if t satisfies the given condition. Otherwise, the tuple is ignored.</p>
</list-item>
</list>
</p>
<p>(2) <italic>Computing Attribute-value Masses</italic>. Line 6 of <xref ref-type="statement" rid="Algorithm_3">Algorithm 2</xref> is performed by the following algorithm, where we reduce the amount of shuffled data by combining the intermediate results within each mapper.<list list-type="simple">
<list-item>
<p>Map-stage: Take a tuple t (i.e.,&#x20;<inline-formula id="inf197">
<mml:math id="minf197">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2329;</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>N</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x232a;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>) and emit N key/value pairs <inline-formula id="inf198">
<mml:math id="minf198">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2329;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x232a;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</list-item>
<list-item>
<p>Combine-stage/Reduce-stage: Take <inline-formula id="inf199">
<mml:math id="minf199">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2329;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">values</mml:mi>
</mml:mrow>
<mml:mo>&#x232a;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and emit <inline-formula id="inf200">
<mml:math id="minf200">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2329;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">sum</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">values</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x232a;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</list-item>
</list>
</p>
<p>Each tuple <inline-formula id="inf201">
<mml:math id="minf201">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2329;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">value</mml:mi>
</mml:mrow>
<mml:mo>&#x232a;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> of the final output indicates that <inline-formula id="inf202">
<mml:math id="minf202">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mi mathvariant="bold">&#x212c;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="normal">value</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>(3) <italic>Computing Mass</italic>. Line 5 of <xref ref-type="statement" rid="Algorithm_1">Algorithm 1</xref> can be performed by the following algorithm, where we reduce the amount of shuffled data by combining the intermediate results within each mapper.<list list-type="simple">
<list-item>
<p>Map-stage: Take a tuple t (i.e.,&#x20;<inline-formula id="inf203">
<mml:math id="minf203">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2329;</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>N</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x232a;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>) and emit <inline-formula id="inf204">
<mml:math id="minf204">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2329;</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x232a;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</list-item>
<list-item>
<p>Combine-stage/Reduce-stage: Take <inline-formula id="inf205">
<mml:math id="minf205">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2329;</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">values</mml:mi>
</mml:mrow>
<mml:mo>&#x232a;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and emit <inline-formula id="inf206">
<mml:math id="minf206">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2329;</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">sum</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">values</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x232a;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</list-item>
</list>
</p>
<p>The value of the final tuple corresponds to <inline-formula id="inf207">
<mml:math id="minf207">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>(4) <italic>Computing Attribute-value Sets</italic>. Line 2 of <xref ref-type="statement" rid="Algorithm_1">Algorithm 1</xref> can be performed by the following algorithm, where we reduce the amount of shuffled data by combining the intermediate results within each mapper.<list list-type="simple">
<list-item>
<p>Map-stage: Take a tuple t (i.e.,&#x20;<inline-formula id="inf208">
<mml:math id="minf208">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2329;</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>N</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x232a;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>) and emit N key/value pairs <inline-formula id="inf209">
<mml:math id="minf209">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2329;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>T</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mo>&#x232a;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</list-item>
<list-item>
<p>Combine-stage/Reduce-stage: Take <inline-formula id="inf210">
<mml:math id="minf210">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2329;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">values</mml:mi>
</mml:mrow>
<mml:mo>&#x232a;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and emit <inline-formula id="inf211">
<mml:math id="minf211">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2329;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">0</mml:mi>
</mml:mrow>
<mml:mo>&#x232a;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</list-item>
</list>
</p>
<p>Each tuple <inline-formula id="inf212">
<mml:math id="minf212">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x2329;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">0</mml:mi>
</mml:mrow>
<mml:mo>&#x232a;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> of the final output indicates that a is a member of <inline-formula id="inf213">
<mml:math id="minf213">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</sec>
</sec>
<sec sec-type="results|discussion" id="s4">
<title>4 Results and Discussion</title>
<p>We designed and conducted experiments to answer the following questions:<list list-type="simple">
<list-item>
<p>
<bold>Q1. Memory Efficiency</bold>: How much memory space does <sc>D-Cube</sc> require for analyzing real-world tensors? How large tensors can <sc>D-Cube</sc> handle?</p>
</list-item>
<list-item>
<p>
<bold>Q2. Speed and Accuracy in Dense-subtensor Detection</bold>: How rapidly and accurately does <sc>D-Cube</sc> identify dense subtensors? Does <sc>D-Cube</sc> outperform its best competitors?</p>
</list-item>
<list-item>
<p>
<bold>Q3. Scalability</bold>: Does <sc>D-Cube</sc> scale linearly with all aspects of data? Does <sc>D-Cube</sc> scale&#x20;out?</p>
</list-item>
<list-item>
<p>
<bold>Q4. Effectiveness in Anomaly Detection</bold>: Which anomalies does <sc>D-Cube</sc> detect in real-world tensors?</p>
</list-item>
<list-item>
<p>
<bold>Q5. Effect of</bold> &#x3b8;: How does the mass-threshold parameter &#x3b8; affect the speed and accuracy of <sc>D-Cube</sc> in dense-subtensor detection?</p>
</list-item>
<list-item>
<p>
<bold>Q6. Effect of</bold> &#x3b1;: How does the parameter &#x3b1; in density metric <inline-formula id="inf214">
<mml:math id="minf214">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> affect subtensors that <sc>D-Cube</sc> detects?</p>
</list-item>
</list>
</p>
<sec id="s4-1">
<title>4.1 Experimental Settings</title>
<sec id="s4-1-1">
<title>4.1.1 Machines</title>
<p>We ran all serial algorithms on a machine with 2.67GHz Intel Xeon E7-8837 CPUs and 1TB memory. We ran <sc>MapReduce</sc> algorithms on a 40-node Hadoop cluster, where each node has an Intel Xeon E3-1230 3.3GHz CPU and 32GB memory.</p>
</sec>
<sec id="s4-1-2">
<title>4.1.2 Datasets</title>
<p>We describe the real-world and synthetic tensors used in our experiments. Real-world tensors are categorized into four groups: (a) Rating data (SWM, Yelp, Android, Netflix, and YahooM.), (b) Wikipedia revision histories (KoWiki and EnWiki), (c) Temporal social networks (Youtube and SMS), and (d) TCP dumps (DARPA and AirForce). Some statistics of these datasets are summarized in <xref ref-type="table" rid="T3">Table&#x20;3</xref>.</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>Summary of real-world datasets.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Name</th>
<th align="center">Volume</th>
<th align="center">&#x23;Tuples</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td colspan="3" align="left">Rating data (<underline>user</underline>, <underline>item</underline>, <underline>timestamp</underline>, <underline>rating</underline>, &#x23;reviews)</td>
</tr>
<tr>
<td align="left">&#x2003;SWM</td>
<td align="left">967&#xa0;K &#xd7; 15.1&#xa0;K &#xd7; 1.38&#xa0;K &#xd7; 5</td>
<td align="left">1.13&#xa0;M</td>
</tr>
<tr>
<td align="left">&#x2003;Yelp</td>
<td align="left">552&#xa0;K &#xd7; 77.1&#xa0;K &#xd7; 3.80&#xa0;K &#xd7; 5</td>
<td align="left">2.23&#xa0;M</td>
</tr>
<tr>
<td align="left">&#x2003;Android</td>
<td align="left">1.32&#xa0;M &#xd7; 61.3&#xa0;K &#xd7; 1.28&#xa0;K &#xd7; 5</td>
<td align="left">2.64&#xa0;M</td>
</tr>
<tr>
<td align="left">&#x2003;Netflix</td>
<td align="left">480&#xa0;K &#xd7; 17.8&#xa0;K &#xd7; 2.18&#xa0;K &#xd7; 5</td>
<td align="left">99.1&#xa0;M</td>
</tr>
<tr>
<td align="left">&#x2003;YahooM.</td>
<td align="left">1.00&#xa0;M &#xd7; 625&#xa0;K &#xd7; 84.4&#xa0;K &#xd7; 101</td>
<td align="left">253&#xa0;M</td>
</tr>
<tr>
<td colspan="3" align="left">Wiki revision histories (<underline>user</underline>, <underline>page</underline>, <underline>timestamp</underline>, &#x23;revisions)</td>
</tr>
<tr>
<td align="left">&#x2003;KoWiki</td>
<td align="left">470&#xa0;K &#xd7; 1.18&#xa0;M &#xd7; 101&#xa0;K</td>
<td align="left">11.0&#xa0;M</td>
</tr>
<tr>
<td align="left">&#x2003;EnWiki</td>
<td align="left">44.1&#xa0;M &#xd7; 38.5&#xa0;M &#xd7; 129&#xa0;K</td>
<td align="left">483&#xa0;M</td>
</tr>
<tr>
<td colspan="3" align="left">Social networks (<underline>user</underline>, <underline>user</underline>, <underline>timestamp</underline>, &#x23;interactions)</td>
</tr>
<tr>
<td align="left">&#x2003;Youtube</td>
<td align="left">3.22&#xa0;M &#xd7; 3.22&#xa0;M &#xd7; 203</td>
<td align="left">18.7&#xa0;M</td>
</tr>
<tr>
<td align="left">&#x2003;SMS</td>
<td align="left">1.25&#xa0;M &#xd7; 7.00&#xa0;M &#xd7; 4.39&#xa0;K</td>
<td align="left">103&#xa0;M</td>
</tr>
<tr>
<td colspan="3" align="left">TCP dumps (<underline>src IP</underline>, <underline>dst IP</underline>, <underline>timestamp</underline>, &#x23;connections)</td>
</tr>
<tr>
<td align="left">&#x2003;DARPA</td>
<td align="left">9.48&#xa0;K &#xd7; 23.4&#xa0;K &#xd7; 46.6&#xa0;K</td>
<td align="left">522&#xa0;K</td>
</tr>
<tr>
<td colspan="3" align="left">TCP dumps (<underline>protocol</underline>, <underline>service</underline>, <underline>src bytes</underline>, &#x2026;, &#x23;connections)</td>
</tr>
<tr>
<td align="left">&#x2003;AirForce</td>
<td align="left">3 &#xd7; 70 &#xd7; 11 &#xd7; 7.20&#xa0;K</td>
<td align="left">648&#xa0;K</td>
</tr>
<tr>
<td align="left"/>
<td align="left">&#xd7; 21.5&#xa0;K &#xd7; 512 &#xd7; 512</td>
<td align="left"/>
</tr>
</tbody>
</table>
</table-wrap>
<p>
<italic>Rating data</italic>. Rating data are relations with schema (<underline>user</underline>, <underline>item</underline>, <underline>timestamp</underline>, <underline>score</underline>, &#x23;ratings). Each tuple (u,i,t,s,r) indicates that user u gave item i score s, r times, at timestamp t. In the SWM dataset (<xref ref-type="bibr" rid="B1">Akoglu et&#x20;al., 2013</xref>), the timestamps are in dates, and the items are entertaining software from a popular online software marketplace. In the Yelp dataset, the timestamps are in dates, and the items are businesses listed on Yelp, a review site. In the Android dataset (<xref ref-type="bibr" rid="B27">McAuley et&#x20;al., 2015</xref>), the timestamps are hours, and the items are Android apps on Amazon, an online store. In the Netflix dataset (<xref ref-type="bibr" rid="B8">Bennett and Lanning, 2007</xref>), the timestamps are in dates, and the items are movies listed on Netflix, a movie rental and streaming service. In the YahooM. dataset (<xref ref-type="bibr" rid="B12">Dror et&#x20;al., 2012</xref>), the timestamps are in hours, and the items are musical items listed on Yahoo! Music, a provider of various music services.</p>
<p>
<italic>Wikipedia revision history</italic>. Wikipedia revision histories are relations with schema (<underline>user</underline>, <underline>page</underline>, <underline>timestamp</underline>, &#x23;revisions). Each tuple (u,p,t,r) indicates that user u revised page p, r times, at timestamp t (in hour) in Wikipedia, a crowd-sourcing online encyclopedia. In the KoWiki dataset, the pages are from Korean Wikipedia. In the EnWiki dataset, the pages are from English Wikipedia.</p>
<p>
<italic>Temporal social networks</italic>. Temporal social networks are relations with schema (<underline>source</underline>, <underline>destination</underline>, <underline>timestamp</underline>, &#x23;interactions). Each tuple (s,d,t,i) indicates that user s interacts with user d, i times, at timestamp t. In the Youtube dataset (<xref ref-type="bibr" rid="B28">Mislove et&#x20;al., 2007</xref>), the timestamps are in hours, and the interactions are becoming friends on Youtube, a video-sharing website. In the SMS dataset, the timestamps are in hours, and the interactions are sending text messages.</p>
<p>
<italic>TCP Dumps</italic>. The DARPA dataset (<xref ref-type="bibr" rid="B25">Lippmann et&#x20;al., 2000</xref>), collected by the Cyber Systems and Technology Group in 1998, is a relation with schema (<underline>source IP</underline>, <underline>destination IP</underline>, <underline>timestamp</underline>, &#x23;connections). Each tuple (s,d,t,c) indicates that c connections were made from IP s to IP d at timestamp t (in minutes). The AirForce dataset, used for KDD Cup. 1999, is a relation with schema (<underline>protocol</underline>, <underline>service</underline>, <underline>src bytes</underline>, <underline>dst bytes</underline>, <underline>flag</underline>, <underline>host count</underline>, <underline>srv count</underline>, &#x23;connections). The description of each attribute is as follows:<list list-type="simple">
<list-item>
<p>protocol: type of protocol (tcp, udp,&#x20;etc.).</p>
</list-item>
<list-item>
<p>service: service on destination (http, telnet,&#x20;etc.).</p>
</list-item>
<list-item>
<p>src bytes: bytes sent from source to destination.</p>
</list-item>
<list-item>
<p>dst bytes: bytes sent from destination to source.</p>
</list-item>
<list-item>
<p>flag: normal or error status.</p>
</list-item>
<list-item>
<p>host count: number of connections made to the same host in the past two seconds.</p>
</list-item>
<list-item>
<p>srv count: number of connections made to the same service in the past two seconds.</p>
</list-item>
<list-item>
<p>&#x23;connections: number of connections with the given dimension attribute values.</p>
</list-item>
</list>
</p>
<p>
<italic>Synthetic Tensors</italic>: We used synthetic tensors for scalability tests. Each tensor was created by generating a random binary tensor and injecting ten random dense subtensors, whose volumes are 10<sup>
<italic>N</italic>
</sup> and densities (in terms of <inline-formula id="inf215">
<mml:math id="minf215">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>) are between 10&#xd7; and 100&#xd7; of that of the entire tensor.</p>
</sec>
<sec id="s4-1-3">
<title>4.1.3 Implementations</title>
<p>We implemented the following dense-subtensor detection methods for our experiments.<list list-type="simple">
<list-item>
<p>
<sc>D-Cube</sc> (Proposed): We implemented <sc>D-Cube</sc> in Java with Hadoop 1.2.1. We set the mass-threshold parameter &#x3b8; to 1 and used the maximum density policy for dimension selection, unless otherwise stated.</p>
</list-item>
<list-item>
<p>
<sc>M-Zoom</sc> and <sc>M-Biz</sc> (<xref ref-type="bibr" rid="B36">Shin et&#x20;al., 2018</xref>): We used the open-source Java implementations of <sc>M-Zoom</sc> and <sc>M-Biz</sc>
<xref ref-type="fn" rid="FN3">
<sup>3</sup>
</xref>. As suggested in <xref ref-type="bibr" rid="B36">Shin et&#x20;al. (2018)</xref>, we used the outputs of <sc>M-Zoom</sc> as the initial states in <sc>M-Biz</sc>&#x20;.</p>
</list-item>
<list-item>
<p>
<sc>CrossSpot</sc> (<xref ref-type="bibr" rid="B18">Jiang et&#x20;al., 2015</xref>): We used a Java implementation of the open-source implementation of <sc>CrossSpot</sc>
<xref ref-type="fn" rid="FN4">
<sup>4</sup>
</xref>. Although <sc>CrossSpot</sc> was originally designed to maximize <inline-formula id="inf216">
<mml:math id="minf216">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, we used its variants that directly maximize the density metric compared in each experiment. We used CPD as the seed selection method of <sc>CrossSpot</sc> as in <xref ref-type="bibr" rid="B36">Shin et&#x20;al. (2018)</xref>.</p>
</list-item>
<list-item>
<p>CPD (CP Decomposition): Let <inline-formula id="inf217">
<mml:math id="minf217">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold">A</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>N</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> be the factor matrices obtained by CP Decomposition (<xref ref-type="bibr" rid="B23">Kolda and Bader (2009)</xref>). The <italic>i</italic>th dense subtensor is composed by every attribute value <inline-formula id="inf218">
<mml:math id="minf218">
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> whose corresponding element in the <italic>i</italic>th column of <inline-formula id="inf219">
<mml:math id="minf219">
<mml:mrow>
<mml:msup>
<mml:mi>A</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> is greater than or equal to <inline-formula id="inf220">
<mml:math id="minf220">
<mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. We used the Tensor Toolbox<xref ref-type="fn" rid="FN5">
<sup>5</sup>
</xref> for CP Decomposition.</p>
</list-item>
<list-item>
<p>MAF (<xref ref-type="bibr" rid="B26">Maruhashi et&#x20;al., 2011</xref>): We used the Tensor Toolbox for CP Decomposition, which MAF is largely based&#x20;on.</p>
</list-item>
</list>
</p>
</sec>
</sec>
<sec id="s4-2">
<title>4.2 Q1. Memory Efficiency</title>
<p>We compare the amount of memory required by different methods for handling the real-world datasets. As seen in <xref ref-type="fig" rid="F3">Figure&#x20;3</xref>, <sc>D-Cube</sc>, which does not require tuples to be stored in memory, needed up to <bold>1,561&#xd7; less memory</bold> than the second most memory-efficient method, which stores tuples in memory.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>
<sc>D-Cube</sc> is memory efficient. <sc>D-Cube</sc> requires up to 1,561&#xd7; less memory than the second most memory-efficient method.</p>
</caption>
<graphic xlink:href="fdata-03-594302-g003.tif"/>
</fig>
<p>Due to its memory efficiency, <sc>D-Cube</sc> successfully handled <bold>1,000&#xd7; larger data</bold> than its competitors within a memory budget. We ran methods on 3-way synthetic tensors with different numbers of tuples (i.e.,&#x20;<inline-formula id="inf221">
<mml:math id="minf221">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>), with a memory budget of 16GB per machine. In every tensor, the cardinality of each dimension attribute was <inline-formula id="inf222">
<mml:math id="minf222">
<mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:mn>1000</mml:mn>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> of the number of tuples, i.e.,&#x20;<inline-formula id="inf223">
<mml:math id="minf223">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi mathvariant="bold">&#x211b;</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:mn>1000</mml:mn>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf224">
<mml:math id="minf224">
<mml:mrow>
<mml:mo>&#x2200;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>N</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. <xref ref-type="fig" rid="F1">Figure&#x20;1A</xref> in <xref ref-type="sec" rid="s1">Section 1</xref> shows the result. The <sc>Hadoop</sc> implementation of <sc>D-Cube</sc> successfully spotted dense subtensors in a tensor with <inline-formula id="inf225">
<mml:math id="minf225">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>11</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> tuples (<bold>2.6TB</bold>), and the serial version of <sc>D-Cube</sc> successfully spotted dense subtensors in a tensor with 10<sup>10</sup> tuples (<bold>240GB</bold>), which was the largest tensor that can be stored on a disk. However, all other methods ran out of memory even on a tensor with 10<sup>9</sup> tuples (21GB).</p>
</sec>
<sec id="s4-3">
<title>4.3 Q2. Speed and Accuracy in Dense-Subtensor Detection</title>
<p>We compare how rapidly and accurately <sc>D-Cube</sc> (the serial version) and its competitors detect dense subtensors in the real-world datasets. We measured the wall-clock time (average over three runs) taken for detecting three subtensors by each method, and we measured the maximum density of the three subtensors found by each method using different density measures in <xref ref-type="sec" rid="s2-2">Section 2.2</xref>. For this experiment, we did not limit the memory budget so that every method can handle every dataset. <sc>D-Cube</sc> also utilized extra memory space by caching tuples in memory, as explained in <xref ref-type="sec" rid="s3-1-4">Section&#x20;3.1.4</xref>.</p>
<p>
<xref ref-type="fig" rid="F4">Figure&#x20;4</xref> shows the results averaged over all considered datasets.<xref ref-type="fn" rid="FN6">
<sup>6</sup>
</xref> The results in each data set can be found in the supplementary material. <sc>D-Cube</sc> provided the best trade-off between speed and accuracy. Specifically, <sc>D-Cube</sc> <bold>was up to 7&#xd7; faster</bold> (on average 3.6<bold>&#xd7;</bold> faster) than the second fastest method <sc>M-Zoom</sc>. Moreover, <sc>D-Cube</sc> <bold>with the maximum density policy spotted high-density subtensors consistently regardless of target density measures</bold>. Specifically, on average, <sc>D-Cube</sc> with the maximum density policy was most accurate in dense-subtensor detection when <inline-formula id="inf226">
<mml:math id="minf226">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>o</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf227">
<mml:math id="minf227">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> were used; and it was second most accurate when <inline-formula id="inf228">
<mml:math id="minf228">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf229">
<mml:math id="minf229">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> were used. When <inline-formula id="inf230">
<mml:math id="minf230">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> was used, <sc>M-Zoom</sc>, <sc>M-Biz</sc>, and <sc>D-Cube</sc> with the maximum cardinality policy were on average more accurate than <sc>D-Cube</sc> with the maximum density policy. Although MAF does not appear in <xref ref-type="fig" rid="F4">Figure&#x20;4</xref>, it consistently provided sparser subtensors than CPD with similar&#x20;speed.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>
<sc>D-Cube</sc> rapidly and accurately detects dense subtensors. In each plot, points indicate the the densities of subtensors detected by different methods and their running times, averaged over all considered real-world tensors. Upper-left region indicates better performance. <sc>D-Cube</sc> is about 3.6&#xd7; faster than the second fastest method <sc>M-Zoom</sc>. Moreover, <sc>D-Cube</sc> with the maximum density consistently finds dense subtensors regardless of target density measures.</p>
</caption>
<graphic xlink:href="fdata-03-594302-g004.tif"/>
</fig>
</sec>
<sec id="s4-4">
<title>4.4 Q3. Scalability</title>
<p>We show that <sc>D-Cube</sc> scales (sub-)linearly with every input factor, i.e.,&#x20;the number of tuples, the number of dimension attributes, and the cardinality of dimension attributes, and the number of subtensors that we aim to find. To measure the scalability with each factor, we started with finding a dense subtensor in a synthetic tensor with 10<sup>8</sup> tuples and 3 dimension attributes each of whose cardinality is 10<sup>5</sup>. Then, we measured the running time as we changed one factor at a time while fixing the other factors. The threshold parameter <italic>&#x3b8;</italic> was fixed to 1. As seen in <xref ref-type="fig" rid="F5">Figure&#x20;5</xref>, <sc>D-Cube</sc> scaled linearly with every factor and sub-linearly with the cardinality of attributes even when <italic>&#x3b8;</italic> was set to its minimum value 1. This supports our claim in <xref ref-type="sec" rid="s3-2-1">Section 3.2.1</xref> that the worst-case time complexity of <sc>D-Cube</sc> (Theorem 1) is too pessimistic. This linear scalability of <sc>D-Cube</sc> held both with enough memory budget (blue solid lines in <xref ref-type="fig" rid="F5">Figure&#x20;5</xref>) to store all tuples and with minimum memory budget (red dashed lines in <xref ref-type="fig" rid="F5">Figure&#x20;5</xref>) to barely meet the requirements although <sc>D-Cube</sc> was up to 3&#xd7; faster in the former&#x20;case.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>
<sc>D-Cube</sc> scales (sub-)linearly with all input factors regardless of memory budgets.</p>
</caption>
<graphic xlink:href="fdata-03-594302-g005.tif"/>
</fig>
<p>We also evaluate the machine scalability of the <sc>MapReduce</sc> implementation of <sc>D-Cube</sc>. We measured its running time taken for finding a dense subtensor in a synthetic tensor with 10<sup>10</sup> tuples and 3 dimension attributes each of whose cardinality is 10<sup>7</sup>, as we increased the number of machines running in parallel from 1 to 40. <xref ref-type="fig" rid="F6">Figure&#x20;6</xref> shows the changes in the running time and the speed-up, which is defined as <italic>T</italic>
<sub>1</sub>/<italic>T</italic>
<sub>
<italic>M</italic>
</sub> where <italic>T</italic>
<sub>
<italic>M</italic>
</sub> is the running time with M machines. The speed-up increased near linearly when a small number of machines were used, while it flattened as more machines were added due to the overhead in the distributed system.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>
<sc>D-Cube</sc> scales out. The <sc>MapReduce</sc> implementation of <sc>D-Cube</sc> is speeded up 8&#xd7; with 10 machines, and 20&#xd7; with 40 machines.</p>
</caption>
<graphic xlink:href="fdata-03-594302-g006.tif"/>
</fig>
</sec>
<sec id="s4-5">
<title>4.5 Q4. Effectiveness in Anomaly Detection</title>
<p>We demonstrate the effectiveness of <sc>D-Cube</sc> in four applications using real-world tensors.</p>
<sec id="s4-5-1">
<title>4.5.1 Network Intrusion Detection from TCP Dumps</title>
<p>
<sc>D-Cube</sc> detected network attacks from TCP dumps accurately by spotting corresponding dense subtensors. We consider two TCP dumps that are modeled differently. The DARPA dataset is a 3-way tensor where the dimension attributes are source IPs, destination IPs, and timestamps in minutes; and the measure attribute is the number of connections. The AirForce dataset, which does not include IP information, is a 7-way tensor where the measure attribute is the same but the dimension attributes are the features of the connections, including protocols and services. Both datasets include labels indicating whether each connection is malicious or&#x20;not.</p>
<p>
<xref ref-type="fig" rid="F1">Figure&#x20;1C</xref> in <xref ref-type="sec" rid="s1">Section 1</xref> lists the five densest subtensors (in terms of <inline-formula id="inf231">
<mml:math id="minf231">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>o</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>) found by <sc>D-Cube</sc> in each dataset. Notice that the dense subtensors are mostly composed of various types of network attacks. Based on this observation, we classified each connection as malicious or benign based on the density of the densest subtensor including the connection (i.e.,&#x20;the denser the subtensor including a connection is, the more suspicious the connection is). This led to high area under the ROC curve (AUROC) as seen in <xref ref-type="table" rid="T4">Table&#x20;4</xref>, where we report the AUROC when each method was used with the density measure giving the highest AUROC. In both datasets, using <sc>D-Cube</sc> resulted in the highest AUROC.</p>
<table-wrap id="T4" position="float">
<label>TABLE 4</label>
<caption>
<p>
<sc>D-Cube</sc> spots network attacks and synchronized behavior fastest and most accurately from TCP dumps and rating datasets, respectively.</p>
</caption>
<table>
<thead>
<tr>
<th rowspan="3" align="left">Datasets</th>
<th colspan="2" align="center">AirForce</th>
<th colspan="2" align="center">DARPA</th>
<th colspan="2" align="center">Android</th>
<th colspan="2" align="center">Yelp</th>
</tr>
<tr>
<th align="center">Elapsed</th>
<th rowspan="2" align="center">AUROC</th>
<th align="center">Elapsed</th>
<th rowspan="2" align="center">AUROC</th>
<th align="center">Elapsed</th>
<th align="center">Recall @</th>
<th align="center">Elapsed</th>
<th align="center">Recall @</th>
</tr>
<tr>
<td align="center">Time (s)</td>
<td align="center">Time (s)</td>
<td align="center">Time (s)</td>
<td align="center">Top-10</td>
<td align="center">Time (s)</td>
<td align="center">Top-10</td>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">CPD</td>
<td align="char" char=".">413.2</td>
<td align="char" char=".">0.854</td>
<td align="char" char=".">105.0</td>
<td align="char" char=".">0.926</td>
<td align="char" char=".">59.9</td>
<td align="char" char=".">0.54</td>
<td align="char" char=".">47.5</td>
<td align="char" char=".">0.52</td>
</tr>
<tr>
<td align="left">MAF</td>
<td align="char" char=".">486.6</td>
<td align="char" char=".">0.912</td>
<td align="char" char=".">102.4</td>
<td align="char" char=".">0.514</td>
<td align="char" char=".">95.0</td>
<td align="char" char=".">0.54</td>
<td align="char" char=".">49.4</td>
<td align="char" char=".">0.52</td>
</tr>
<tr>
<td align="left">
<sc>CrossSpot</sc>
</td>
<td align="char" char=".">575.5</td>
<td align="char" char=".">0.924</td>
<td align="char" char=".">132.2</td>
<td align="char" char=".">0.923</td>
<td align="char" char=".">71.3</td>
<td align="char" char=".">0.54</td>
<td align="char" char=".">56.7</td>
<td align="char" char=".">0.52</td>
</tr>
<tr>
<td align="left">
<sc>M-Zoom</sc>
</td>
<td align="char" char=".">27.7</td>
<td align="char" char=".">0.975</td>
<td align="char" char=".">22.7</td>
<td align="char" char=".">0.923</td>
<td align="char" char=".">28.4</td>
<td align="char" char=".">0.70</td>
<td align="char" char=".">17.7</td>
<td align="char" char=".">0.30</td>
</tr>
<tr>
<td align="left">
<sc>M-Biz</sc>
</td>
<td align="char" char=".">29.8</td>
<td align="char" char=".">0.977</td>
<td align="char" char=".">22.7</td>
<td align="char" char=".">0.923</td>
<td align="char" char=".">30.6</td>
<td align="char" char=".">0.70</td>
<td align="char" char=".">19.5</td>
<td align="char" char=".">0.30</td>
</tr>
<tr>
<td align="left">
<sc>D-Cube</sc>
</td>
<td align="char" char=".">15.6</td>
<td align="char" char=".">0.987</td>
<td align="char" char=".">9.1</td>
<td align="char" char=".">0.930</td>
<td align="char" char=".">7.0</td>
<td align="char" char=".">0.90</td>
<td align="char" char=".">4.9</td>
<td align="char" char=".">0.60</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4-5-2">
<title>4.5.2 Synchronized Behavior Detection in Rating Data</title>
<p>
<sc>D-Cube</sc> spotted suspicious synchronized behavior accurately in rating data. Specifically, we assume an attack scenario where fraudsters in a review site, who aim to boost (or lower) the ratings of the set of items, create multiple user accounts and give the same score to the items within a short period of time. This lockstep behavior forms a dense subtensor with volume (&#x23; fake accounts &#xd7; &#x23; target items &#xd7; 1&#x20;&#xd7; 1) in the rating dataset, whose dimension attributes are users, items, timestamps, and rating scores.</p>
<p>We injected 10 such random dense subtensors whose volumes varied from 15&#xd7;15&#xd7;1&#xd7;1 to 60&#xd7;60&#xd7;1&#xd7;1 in the Yelp and Android datasets. We compared the ratio of the injected subtensors detected by each dense-subtensor detection method. We considered each injected subtensor as overlooked by a method if the subtensor did not belong to any of the top-10 dense subtensors spotted by the method or it was hidden in a natural dense subtensor at least 10&#x20;times larger than the injected subtensor. That is, we measured the recall at top 10. We repeated this experiment 10 times, and the averaged results are summarized in <xref ref-type="table" rid="T4">Table&#x20;4</xref>. For each method, we report the results with the density measure giving the highest recall. In both datasets, <sc>D-Cube</sc> detected a largest number of the injected subtensors. Especially, in the Android dataset, <sc>D-Cube</sc> detected 9 out of the 10 injected subtensors, while the second best method detected only 7 injected subtensors on average.</p>
</sec>
<sec id="s4-5-3">
<title>4.5.3&#x20;Spam-Review Detection in Rating Data</title>
<p>
<sc>D-Cube</sc> successfully spotted spam reviews in the SWM dataset, which contains reviews from an online software marketplace. We modeled the SWM dataset as a 4-way tensor whose dimension attributes are users, software, ratings, and timestamps in dates, and we applied <sc>D-Cube</sc> (with <inline-formula id="inf232">
<mml:math id="minf232">
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>) to the dataset. <xref ref-type="table" rid="T6">Table&#x20;6</xref> shows the statistics of the top-3 dense subtensors. Although ground-truth labels were not available, as the examples in <xref ref-type="table" rid="T5">Table&#x20;5</xref> show, all the reviews composing the first and second dense subtensors were obvious spam reviews. In addition, at least 48% of the reviews composing the third dense subtensor were obvious spam reviews.</p>
<table-wrap id="T5" position="float">
<label>TABLE 5</label>
<caption>
<p>D-Cube successfully detects spam reviews in the SWM dataset.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th colspan="3" align="left">Subtensor 1 (100% spam)</th>
</tr>
<tr>
<th align="left">User</th>
<th align="left">Review</th>
<th align="left">Date</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">&#x2003;Ti&#x2a;</td>
<td align="left">Type in &#x2a;&#x2a;&#x2a; and you will get &#x2026;</td>
<td align="left">Mar-4</td>
</tr>
<tr>
<td align="left">&#x2003;Fo&#x2a;</td>
<td align="left">Type in for the bonus code: &#x2026;</td>
<td align="left">Mar-4</td>
</tr>
<tr>
<td align="left">&#x2003;dj&#x2a;</td>
<td align="left">Typed in the code: &#x2a;&#x2a;&#x2a; &#x2026;</td>
<td align="left">Mar-4</td>
</tr>
<tr>
<td align="left">&#x2003;Di&#x2a;</td>
<td align="left">Enter this code to start with &#x2026;</td>
<td align="left">Mar-4</td>
</tr>
<tr>
<td align="left">&#x2003;Fe&#x2a;</td>
<td align="left">Enter code: &#x2a;&#x2a;&#x2a; to win even &#x2026;</td>
<td align="left">Mar-4</td>
</tr>
<tr>
<td colspan="3" align="left">Subtensor 2 (100% spam)</td>
</tr>
<tr>
<td align="left">&#x2003;Sk&#x2a;</td>
<td align="left">Invite code&#x2a;&#x2a;&#x2a;, referral &#x2026;</td>
<td align="left">Apr-18</td>
</tr>
<tr>
<td align="left">&#x2003;fu&#x2a;</td>
<td align="left">Use my code for bonus &#x2026;</td>
<td align="left">Apr-18</td>
</tr>
<tr>
<td align="left">&#x2003;Ta&#x2a;</td>
<td align="left">Enter the code &#x2a;&#x2a;&#x2a; for &#x2026;</td>
<td align="left">Apr-18</td>
</tr>
<tr>
<td align="left">&#x2003;Ap&#x2a;</td>
<td align="left">Bonus code &#x2a;&#x2a;&#x2a; for points &#x2026;</td>
<td align="left">Apr-18</td>
</tr>
<tr>
<td align="left">&#x2003;De&#x2a;</td>
<td align="left">Bonus code: &#x2a;&#x2a;&#x2a;, be one &#x2026;</td>
<td align="left">Apr-18</td>
</tr>
<tr>
<td colspan="3" align="left">Subtensor 3 (at least 48% spam)</td>
</tr>
<tr>
<td align="left">&#x2003;Mr&#x2a;</td>
<td align="left">Entered this code and got &#x2026;</td>
<td align="left">Nov-23</td>
</tr>
<tr>
<td align="left">&#x2003;Max&#x2a;</td>
<td align="left">Enter the bonus code: &#x2a;&#x2a;&#x2a; &#x2026;</td>
<td align="left">Nov-23</td>
</tr>
<tr>
<td align="left">&#x2003;Je&#x2a;</td>
<td align="left">Enter &#x2a;&#x2a;&#x2a; when it asks&#x2026;</td>
<td align="left">Nov-23</td>
</tr>
<tr>
<td align="left">&#x2003;Man&#x2a;</td>
<td align="left">Just enter &#x2a;&#x2a;&#x2a; for a boost &#x2026;</td>
<td align="left">Nov-23</td>
</tr>
<tr>
<td align="left">&#x2003;Ty&#x2a;</td>
<td align="left">Enter &#x2a;&#x2a;&#x2a; ro receive a &#x2026;</td>
<td align="left">Nov-23</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4-5-4">
<title>4.5.4 Anomaly Detection in Wikipedia Revision Histories</title>
<p>
<sc>D-Cube</sc> detected interesting anomalies in Wikipedia revision histories, which we model as 3-way tensors whose dimension attributes are users, pages, and timestamps in hours. <xref ref-type="table" rid="T6">Table&#x20;6</xref> gives the statistics of the top-3 dense subtensors detected by <sc>D-Cube</sc> (with <inline-formula id="inf234">
<mml:math id="minf234">
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and the maximum cardinality policy) in the KoWiki dataset and by <sc>D-Cube</sc> (with <inline-formula id="inf235">
<mml:math id="minf235">
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>o</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and the maximum density policy) in the EnWiki dataset. All three subtensors detected in the KoWiki dataset indicated edit wars. For example, the second subtensor corresponded to an edit war where 4 users changed 4 pages, 1,011 times, within 5&#xa0;h. On the other hand, all three subtensors detected in the Enwiki dataset indicated bot activities. For example, the third subtensor corresponded to 3 bots which edited 1,067 pages 973,747 times. The users composing the top-5 dense subtensors in the EnWiki dataset are listed in <xref ref-type="table" rid="T7">Table&#x20;7</xref>. Notice that all of them are&#x20;bots.</p>
<table-wrap id="T6" position="float">
<label>TABLE 6</label>
<caption>
<p>Summary of the dense subtensors that <sc>D-Cube</sc> detects in the SWM, KoWiki, and EnWiki datasets.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Dataset</th>
<th align="center">Order</th>
<th align="center">Volume</th>
<th align="center">Mass</th>
<th align="center">
<inline-formula id="inf233">
<mml:math id="minf233">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
</th>
<th align="center">Type</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">SWM</td>
<td align="char" char=".">1</td>
<td align="center">120</td>
<td align="center">308</td>
<td align="center">44.0</td>
<td align="left">Spam reviews</td>
</tr>
<tr>
<td align="left"/>
<td align="char" char=".">2</td>
<td align="center">612</td>
<td align="center">435</td>
<td align="center">31.6</td>
<td align="left">Spam reviews</td>
</tr>
<tr>
<td align="left"/>
<td align="char" char=".">3</td>
<td align="center">231,240</td>
<td align="center">771</td>
<td align="center">20.3</td>
<td align="left">Spam reviews</td>
</tr>
<tr>
<td align="left">KoWiki</td>
<td align="char" char=".">1</td>
<td align="center">8</td>
<td align="center">546</td>
<td align="center">273.0</td>
<td align="left">Edit war</td>
</tr>
<tr>
<td align="left"/>
<td align="char" char=".">2</td>
<td align="center">80</td>
<td align="center">1,011</td>
<td align="center">233.3</td>
<td align="left">Edit war</td>
</tr>
<tr>
<td align="left"/>
<td align="char" char=".">3</td>
<td align="center">270</td>
<td align="center">1,126</td>
<td align="center">168.9</td>
<td align="left">Edit war</td>
</tr>
<tr>
<td align="left">EnWiki</td>
<td align="char" char=".">1</td>
<td align="center">9.98&#xa0;M</td>
<td align="center">1.71&#xa0;M</td>
<td align="center">7,931</td>
<td align="left">Bot activities</td>
</tr>
<tr>
<td align="left"/>
<td align="char" char=".">2</td>
<td align="center">541&#xa0;K</td>
<td align="center">343&#xa0;K</td>
<td align="center">4,211</td>
<td align="left">Bot activities</td>
</tr>
<tr>
<td align="left"/>
<td align="char" char=".">3</td>
<td align="center">23.5&#xa0;M</td>
<td align="center">973&#xa0;K</td>
<td align="center">3,395</td>
<td align="left">Bot activities</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T7" position="float">
<label>TABLE 7</label>
<caption>
<p>D-Cube successfully spots bot activities in the EnWiki dataset.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Subtensor &#x23;</th>
<th align="center">Users in each subtensor (100% bots)</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">1</td>
<td align="left">WP 1.0 bot</td>
</tr>
<tr>
<td align="left">2</td>
<td align="left">AAlertBot</td>
</tr>
<tr>
<td align="left">3</td>
<td align="left">AlexNewArtBot, VeblenBot, InceptionBot</td>
</tr>
<tr>
<td align="left">4</td>
<td align="left">WP 1.0 bot</td>
</tr>
<tr>
<td align="left">5</td>
<td align="left">Cydebot, VeblenBot</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s4-6">
<title>4.6 Q5. Effects of Parameter <italic>&#x3b8;</italic> on Speed and Accuracy in Dense-Subtensor Detection</title>
<p>We investigate the effects of the mass-threshold parameter <italic>&#x3b8;</italic> on the speed and accuracy of <sc>D-Cube</sc> in dense-subtensor detection. We used the serial version of <sc>D-Cube</sc> with a memory budget of 16GB, and we measured the relative density of detected subtensors and its running time, as in <xref ref-type="sec" rid="s4-3">Section 4.3</xref>. <xref ref-type="fig" rid="F7">Figure&#x20;7</xref> shows the results averaged over all considered datasets. Different <italic>&#x3b8;</italic> values provided a trade-off between speed and accuracy in dense-subtensor detection. Specifically, increasing <italic>&#x3b8;</italic> tended to make <sc>D-Cube</sc> faster but also make it detect sparser subtensors. This tendency is consistent with our theoretical analyses (Theorems 1&#x2013;3 in <xref ref-type="sec" rid="s3-2">Section 3.2</xref>). The sensitivity of the dense-subtensor detection accuracy to <italic>&#x3b8;</italic> depended on the used density measures. Specifically, the sensitivity was lower with <inline-formula id="inf236">
<mml:math id="minf236">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> than with the other density measures.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>The mass-threshold parameter &#x3b8; gives a trade-off between the speed and accuracy of <sc>D-Cube</sc> in dense-subtensor detection. We report the running time and the density of detected subtensors, averaged over all considered real-world datasets. As &#x3b8; increases, <sc>D-Cube</sc> tends to be faster, detecting sparser subtensors.</p>
</caption>
<graphic xlink:href="fdata-03-594302-g007.tif"/>
</fig>
</sec>
<sec id="s4-7">
<title>4.7 Q6. Effects of Parameter &#x3b1; in <inline-formula id="inf237">
<mml:math id="minf237">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> on Subtensors Detected by <sc>D-Cube</sc>
</title>
<p>We show that the dense subtensors detected by <sc>D-Cube</sc> are configurable by the parameter &#x3b1; in density measure <inline-formula id="inf238">
<mml:math id="minf238">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. <xref ref-type="fig" rid="F8">Figure&#x20;8</xref> shows the volumes and masses of subtensors detected in the Youtube and Yelp datasets by <sc>D-Cube</sc> when <inline-formula id="inf239">
<mml:math id="minf239">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> with different &#x3b1; values were used as the density metrics. With large <italic>&#x3b1;</italic> values, <sc>D-Cube</sc> tended to spot relatively small but compact subtensors. With small &#x3b1; values, however, <sc>D-Cube</sc> tended to spot relatively sparse but large subtensors. Similar tendencies were obtained with the other datasets.</p>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption>
<p>Subtensors detected by <sc>D-Cube</sc> are configurable by the parameter &#x3b1; in density metric <inline-formula id="inf240">
<mml:math id="minf240">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3b1;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. As &#x3b1; increases, <sc>D-Cube</sc> spots smaller but more compact subtensors.</p>
</caption>
<graphic xlink:href="fdata-03-594302-g008.tif"/>
</fig>
</sec>
</sec>
<sec sec-type="conclusion" id="s5">
<title>5 Conclusion</title>
<p>In this work, we propose <sc>D-Cube</sc>, a disk-based dense-subtensor detection method, to deal with disk-resident tensors too large to fit in main memory. <sc>D-Cube</sc> is optimized to minimize disk I/Os while providing a guarantee on the quality of the subtensors it finds. Moreover, we propose a distributed version of <sc>D-Cube</sc> running on <sc>MapReduce</sc> for terabyte-scale or larger data distributed across multiple machines. In summary, <sc>D-Cube</sc> achieves the following advantages over its state-of-the-art competitors:<list list-type="simple">
<list-item>
<p>
<italic>Memory Efficient</italic>: <sc>D-Cube</sc> handles 1,000&#xd7; larger data (2.6TB) by reducing memory usage up to 1,561&#xd7; compared to in-memory algorithms (<xref ref-type="sec" rid="s4-2">Section&#x20;4.2</xref>).</p>
</list-item>
<list-item>
<p>
<italic>Fast</italic>
<bold>:</bold> Even when data fit in memory, <sc>D-Cube</sc> is up to 7&#xd7; faster than its competitors (<xref ref-type="sec" rid="s4-3">Section 4.3</xref>) with near-linear scalability (<xref ref-type="sec" rid="s4-4">Section&#x20;4.4</xref>).</p>
</list-item>
<list-item>
<p>
<italic>Provably Accurate</italic>: <sc>D-Cube</sc> is one of the methods guaranteeing the best approximation ratio (Theorem 3) in dense-subtensor detection and spotting the densest subtensors in practice (<xref ref-type="sec" rid="s4-3">Section&#x20;4.3</xref>).</p>
</list-item>
<list-item>
<p>
<italic>Effective</italic>: <sc>D-Cube</sc> was most accurate in two applications: detecting network attacks from TCP dumps and lockstep behavior in rating data (<xref ref-type="sec" rid="s4-5">Section&#x20;4.5</xref>).</p>
</list-item>
</list>
<italic>Reproducibility</italic>: The code and data used in the paper are available at <ext-link ext-link-type="uri" xlink:href="http://dmlab.kaist.ac.kr/dcube">http://dmlab.kaist.ac.kr/dcube</ext-link>
</p>
</sec>
</body>
<back>
<sec id="s6">
<title>Data Availability Statement</title>
<p>Publicly available datasets were analyzed in this study. This data can be found here: <ext-link ext-link-type="uri" xlink:href="http://dmlab.kaist.ac.kr/dcube">http://dmlab.kaist.ac.kr/dcube</ext-link>.</p>
</sec>
<sec id="s7">
<title>Author Contributions</title>
<p>KS, BH, and CF contributed to conception and design of the study. KS performed the experiments. JK performed the mathematical analysis. KS wrote the first draft of the manuscript. KS and BH wrote sections of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.</p>
</sec>
<sec id="s8">
<title>Funding</title>
<p>This research was supported by National Research Foundation of Korea (NRF) Grant funded by the Korea government (MSIT) (No. NRF-2020R1C1C1008296) and Institute of Information and Communications Technology Planning and Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2019-0-00075, Artificial Intelligence Graduate School Program (KAIST)). This research was also supported by the National Science Foundation under Grant Nos. CNS-1314632 and IIS-1408924. This research was sponsored by the Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-09-2-0053. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, or other funding parties. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here&#x20;on.</p>
</sec>
<sec sec-type="COI-statement" id="s9">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<ack>
<p>The content of the manuscript has been presented in part at the 10th ACM International Conference on Web Search and Data Mining (<xref ref-type="bibr" rid="B37">Shin et&#x20;al., 2017b</xref>). In this extended version, we refined <sc>D-Cube</sc> with a new parameter <italic>&#x3b8;</italic>, and we proved that the time complexity of <sc>D-Cube</sc> is significantly improved with the refinement (Lemma 1 and Theorem 1). We also proved that, for N-way tensors, <sc>D-Cube</sc> gives an &#x3b8;<italic>N</italic>-approximation guarantee for Problem 1 (Theorem 3). Additionally, we considered an extra density measure (Definition 3) and an extra competitor (i.e.,&#x20;<sc>M-Biz</sc>); and we applied <sc>D-Cube</sc> to three more real-world datasets (i.e.,&#x20;KoWiki, EnWiki, and SWM) and successfully detected edit wars, bot activities, and spam reviews (<xref ref-type="table" rid="T5">Tables 5</xref>&#x2013;<xref ref-type="table" rid="T7">7</xref>). Lastly, we conducted experiments showing the effects of parameters <italic>&#x3b8;</italic> and <italic>&#x3b1;</italic> on the speed and accuracy of <sc>D-Cube</sc> in dense-subtensor detection (<xref ref-type="fig" rid="F7">Figures 7</xref> and <xref ref-type="fig" rid="F8">8</xref>). Most of this work was also included in the PhD thesis of the first author&#x20;(KS).</p>
</ack>
<sec id="s10">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.frontiersin.org/articles/10.3389/fdata.2020.594302/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fdata.2020.594302/full#supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="DataSheet1.PDF" id="SM1" mimetype="application/PDF" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<fn-group>
<fn id="FN1">
<label>1</label>
<p>M-Zoom repeats retrieving all tuples with a given attribute value, and thus it requires storing and accessing tuples in hash tables for quick retrievals.</p>
</fn>
<fn id="FN2">
<label>2</label>
<p>We assume that M-Biz uses the outputs of M-Zoom as its initial states, as suggested in <xref ref-type="bibr" rid="B36">Shin et&#x20;al. (2018)</xref>.</p>
</fn>
<fn id="FN3">
<label>3</label>
<p>
<ext-link ext-link-type="uri" xlink:href="%20https://github.com/kijungs/mzoom">https://github.com/kijungs/mzoom</ext-link>
</p>
</fn>
<fn id="FN4">
<label>4</label>
<p>
<ext-link ext-link-type="uri" xlink:href="%20https://github.com/mjiang89/CrossSpot">https://github.com/mjiang89/CrossSpot</ext-link>
</p>
</fn>
<fn id="FN5">
<label>5</label>
<p>
<ext-link ext-link-type="uri" xlink:href="https://www.sandia.gov/tgkolda/TensorToolbox/">https://www.sandia.gov/tgkolda/TensorToolbox/</ext-link>
</p>
</fn>
<fn id="FN6">
<label>6</label>
<p>In each dataset, we measured the relative running time of each method (compared to the running time of D-Cube with the maximum density policy) and the relative density of detected dense subtensors (compared to the density of subtensors detected by D-Cube with the maximum density policy). Then, we averaged them over all considered datasets.</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Akoglu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Chandy</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Faloutsos</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Opinion fraud detection in online reviews by network effects</article-title>. <comment>ICWSM</comment>. </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Akoglu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>McGlohon</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Faloutsos</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Oddball: spotting anomalies in weighted graphs</article-title>. <comment>PAKDD</comment>. </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Akoglu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Tong</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Koutra</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Graph based anomaly detection and description: a survey</article-title>. <source>Data Mining Knowl. Discov.</source> <volume>29</volume>, <fpage>626</fpage>&#x2013;<lpage>688</lpage>. <pub-id pub-id-type="doi">10.1201/b15352-15</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Andersen</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Chellapilla</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Finding dense subgraphs with size bounds</article-title>. <comment>WAW</comment>. </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bahmani</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Goel</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Munagala</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Efficient primal-dual graph algorithms for mapreduce</article-title>. <comment>WAW</comment>. </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bahmani</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Vassilvitskii</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Densest subgraph in streaming and mapreduce</article-title>. <source>PVLDB</source> <volume>5</volume>, <fpage>454</fpage>&#x2013;<lpage>465</lpage>. <pub-id pub-id-type="doi">10.14778/2140436.2140442</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Balalau</surname>
<given-names>O. D.</given-names>
</name>
<name>
<surname>Bonchi</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Chan</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Gullo</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Sozio</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Finding subgraphs with maximum total density and limited overlap</article-title>. <comment>WSDM</comment>. </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bennett</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Lanning</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>The netflix prize</article-title>. <comment>KDD Cup</comment>. </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Beutel</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Guruswami</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Palow</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Faloutsos</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Copycatch: stopping group attacks by spotting lockstep behavior in social networks</article-title>. <comment>WWW</comment>. </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Charikar</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2000</year>). <article-title>Greedy approximation algorithms for finding dense components in a graph</article-title>. <comment>APPROX</comment>. </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dean</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Ghemawat</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Mapreduce: simplified data processing on large clusters</article-title>. <source>Commun. ACM</source> <volume>51</volume>, <fpage>107</fpage>&#x2013;<lpage>113</lpage>. <pub-id pub-id-type="doi">10.21276/ijre.2018.5.5.4</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dror</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Koenigstein</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Koren</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Weimer</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>The yahoo! music dataset and kdd-cup&#x2019;11</article-title>. <comment>KDD Cup</comment>. </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Epasto</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Lattanzi</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Sozio</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Efficient densest subgraph computation in evolving graphs</article-title>. <comment>WWW</comment>. </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Galbrun</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Gionis</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Tatti</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Top-k overlapping densest subgraphs</article-title>. <source>Data Mining Knowl. Discov.</source> <volume>30</volume>, <fpage>1134</fpage>&#x2013;<lpage>1165</lpage>. <pub-id pub-id-type="doi">10.1007/s10618-016-0464-z</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goldberg</surname>
<given-names>A. V.</given-names>
</name>
</person-group> (<year>1984</year>). <article-title>Finding a maximum density subgraph</article-title>. <comment>Technical Report</comment>. </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hooi</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Shin</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Song</surname>
<given-names>H. A.</given-names>
</name>
<name>
<surname>Beutel</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Shah</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Faloutsos</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Graph-based fraud detection in the face of camouflage</article-title>. <source>ACM Trans. Knowl. Discov. Data</source> <volume>11</volume>, <fpage>44</fpage>. <pub-id pub-id-type="doi">10.1145/3056563</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jeon</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Papalexakis</surname>
<given-names>E. E.</given-names>
</name>
<name>
<surname>Kang</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Faloutsos</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Haten2: billion-scale tensor decompositions</article-title>. <comment>ICDE</comment>, <fpage>1047</fpage>&#x2013;<lpage>1058</lpage>. </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jiang</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Beutel</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Cui</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Hooi</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Faloutsos</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>A general suspiciousness metric for dense blocks in multimodal data</article-title>. <comment>ICDM</comment>. </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jiang</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Cui</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Beutel</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Faloutsos</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Catchsync: catching synchronized behavior in large directed graphs</article-title>. <comment>KDD</comment>. </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kang</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Papalexakis</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Harpale</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Faloutsos</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Gigatensor: scaling tensor analysis up by 100&#x20;times-algorithms and discoveries</article-title>. <comment>KDD</comment>. </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kannan</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Vinay</surname>
<given-names>V.</given-names>
</name>
</person-group> (<year>1999</year>). <article-title>Analyzing the structure of large graphs</article-title>. <comment>Technical Report</comment>. </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Khuller</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Saha</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>On finding dense subgraphs</article-title>. <comment>ICALP</comment>, <fpage>597</fpage>&#x2013;<lpage>608</lpage>. </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kolda</surname>
<given-names>T. G.</given-names>
</name>
<name>
<surname>Bader</surname>
<given-names>B. W.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Tensor decompositions and applications</article-title>. <source>SIAM Rev.</source> <volume>51</volume>, <fpage>455</fpage>&#x2013;<lpage>500</lpage>. <pub-id pub-id-type="doi">10.2172/755101</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Lee</surname>
<given-names>V. E.</given-names>
</name>
<name>
<surname>Ruan</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Jin</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Aggarwal</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>A survey of algorithms for dense subgraph discovery</article-title>. <publisher-name>Managing and Mining Graph Data</publisher-name>, <fpage>303</fpage>&#x2013;<lpage>336</lpage>. </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lippmann</surname>
<given-names>R. P.</given-names>
</name>
<name>
<surname>Fried</surname>
<given-names>D. J.</given-names>
</name>
<name>
<surname>Graf</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Haines</surname>
<given-names>J.&#x20;W.</given-names>
</name>
<name>
<surname>Kendall</surname>
<given-names>K. R.</given-names>
</name>
<name>
<surname>McClung</surname>
<given-names>D.</given-names>
</name>
<etal/>
</person-group> (<year>2000</year>). <article-title>Evaluating intrusion detection systems: the 1998 darpa off-line intrusion detection evaluation</article-title>. <comment>DISCEX</comment>. </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Maruhashi</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Faloutsos</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Multiaspectforensics: pattern mining on large-scale heterogeneous networks with tensor analysis</article-title>. <comment>ASONAM</comment>. </citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>McAuley</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Pandey</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Leskovec</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Inferring networks of substitutable and complementary products</article-title>. <comment>KDD</comment>. </citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mislove</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Marcon</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Gummadi</surname>
<given-names>K. P.</given-names>
</name>
<name>
<surname>Druschel</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Bhattacharjee</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Measurement and analysis of online social networks</article-title>. <comment>IMC</comment>. </citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Oh</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Shin</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Papalexakis</surname>
<given-names>E. E.</given-names>
</name>
<name>
<surname>Faloutsos</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>S-hot</surname>
</name>
</person-group> (<year>2017</year>). <article-title>Scalable high-order tucker decomposition</article-title>. <comment>WSDM</comment>. </citation>
</ref>
<ref id="B30">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Papalexakis</surname>
<given-names>E. E.</given-names>
</name>
<name>
<surname>Faloutsos</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Sidiropoulos</surname>
<given-names>N. D.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Parcube: sparse parallelizable tensor decompositions</article-title>. <publisher-name>PKDD</publisher-name>. </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rossi</surname>
<given-names>R. A.</given-names>
</name>
<name>
<surname>Gallagher</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Neville</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Henderson</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Modeling dynamic behavior in large evolving graphs</article-title>. <comment>WSDM</comment>. </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ruhl</surname>
<given-names>J.&#x20;M.</given-names>
</name>
</person-group> (<year>2003</year>). <article-title>Efficient algorithms for new computational models</article-title>. <comment>Ph.D. thesis, Massachusetts Institute of Technology</comment>. </citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Saha</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Hoch</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Khuller</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Raschid</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X. N.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Dense subgraphs with restrictions and applications to gene annotation graphs</article-title>. <comment>RECOMB</comment>. </citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shah</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Beutel</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Gallagher</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Faloutsos</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Spotting suspicious link behavior with fbox: an adversarial perspective</article-title>. <comment>ICDM</comment>. </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shin</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Eliassi-Rad</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Faloutsos</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Corescope: graph&#x20;mining&#x20;using k-core analysis&#x2014;patterns, anomalies and algorithms</article-title>. <comment>ICDM</comment>. </citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shin</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Hooi</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Faloutsos</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Fast, accurate, and flexible algorithms for dense subtensor mining</article-title>. <source>ACM Trans. Knowledge Discov. Data</source> <volume>12</volume>, <fpage>28</fpage>. <pub-id pub-id-type="doi">10.1145/3154414.1-2830</pub-id> </citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shin</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Hooi</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Faloutsos</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2017b</year>). <article-title>D-cube: dense-block detection in terabyte-scale tensors</article-title>. <comment>WSDM</comment>. </citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shin</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Hooi</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Faloutsos</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2017a</year>). <article-title>Densealert: incremental dense-subtensor detection in tensor streams. KDD</article-title>. </citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shin</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Kang</surname>
<given-names>U.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Distributed methods for high-dimensional and large-scale tensor factorization</article-title>. <comment>ICDM</comment>. </citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tsourakakis</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Bonchi</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Gionis</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Gullo</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Tsiarli</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees</article-title>. <comment>KDD</comment>. </citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tung</surname>
<given-names>H. Y.</given-names>
</name>
<name>
<surname>Smola</surname>
<given-names>A. J.</given-names>
</name>
<name>
<surname>Anandkumar</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Fast and guaranteed tensor decomposition via sketching</article-title>. <comment>NIPS</comment>. </citation>
</ref>
</ref-list>
</back>
</article>