<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Big Data</journal-id>
<journal-title>Frontiers in Big Data</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Big Data</abbrev-journal-title>
<issn pub-type="epub">2624-909X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fdata.2023.899345</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Big Data</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>CURTAINs for your sliding window: Constructing unobserved regions by transforming adjacent intervals</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Raine</surname> <given-names>John Andrew</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1583126/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Klein</surname> <given-names>Samuel</given-names></name>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Sengupta</surname> <given-names>Debajyoti</given-names></name>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1899607/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Golling</surname> <given-names>Tobias</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/1132780/overview"/>
</contrib>
</contrib-group>
<aff><institution>D&#x000E9;partement de Physique Nucl&#x000E9;aire et Corpusculaire, Universit&#x000E9; de Gen&#x000E8;ve</institution>, <addr-line>Geneva</addr-line>, <country>Switzerland</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Thea Aarrestad, European Organization for Nuclear Research (CERN), Switzerland</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: David Shih, Rutgers University, United States; Raffaele DAgnolo, Commissariat &#x000E0; l&#x00027;Energie Atomique et aux Energies Alternatives (CEA), France</p></fn>
<corresp id="c001">&#x0002A;Correspondence: John Andrew Raine <email>john.raine&#x00040;unige.ch</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Big Data and AI in High Energy Physics, a section of the journal Frontiers in Big Data</p></fn>
<fn fn-type="equal" id="fn002"><p>&#x02020;These authors have contributed equally to this work</p></fn></author-notes>
<pub-date pub-type="epub">
<day>21</day>
<month>03</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>6</volume>
<elocation-id>899345</elocation-id>
<history>
<date date-type="received">
<day>18</day>
<month>03</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>28</day>
<month>02</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2023 Raine, Klein, Sengupta and Golling.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Raine, Klein, Sengupta and Golling</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>We propose a new model independent technique for constructing background data templates for use in searches for new physics processes at the LHC. This method, called Curtains, uses invertible neural networks to parameterise the distribution of side band data as a function of the resonant observable. The network learns a transformation to map any data point from its value of the resonant observable to another chosen value. Using Curtains, a template for the background data in the signal window is constructed by mapping the data from the side-bands into the signal region. We perform anomaly detection using the Curtains background template to enhance the sensitivity to new physics in a bump hunt. We demonstrate its performance in a sliding window search across a wide range of mass values. Using the LHC Olympics dataset, we demonstrate that Curtains matches the performance of other leading approaches which aim to improve the sensitivity of bump hunts, can be trained on a much smaller range of the invariant mass, and is fully data driven.</p></abstract>
<kwd-group>
<kwd>machine learning</kwd>
<kwd>anomaly detection</kwd>
<kwd>invertible neural network</kwd>
<kwd>particle physics</kwd>
<kwd>bump hunting</kwd>
<kwd>new physics</kwd>
<kwd>model independent</kwd>
<kwd>unsupervised learning</kwd>
</kwd-group>
<contract-num rid="cn001"> 200020_18198</contract-num>
<contract-num rid="cn001">CRSII5_193716</contract-num>
<contract-sponsor id="cn001">Schweizerischer Nationalfonds zur F&#x000F6;rderung der Wissenschaftlichen Forschung<named-content content-type="fundref-id">10.13039/501100001711</named-content></contract-sponsor>
<counts>
<fig-count count="11"/>
<table-count count="1"/>
<equation-count count="4"/>
<ref-count count="54"/>
<page-count count="14"/>
<word-count count="9583"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>In the ongoing search for new physics phenomena to explain the fundamental nature of the universe, particle colliders such as the Large Hadron Collider (LHC) provide an unparalleled window into the energy and intensity frontiers in particle physics. Searches for new particles not contained within the Standard Model of particle physics (SM) are a core focus of the physics programme, with the hope to explain observations in the universe which are inconsistent with predictions from the SM, such as dark matter, gravity, and the observed matter anti-matter asymmetry.</p>
<p>Many searches at the LHC target specific models built upon theories which contain new particles with particular attributes. However, these searches are only sensitive to a specific model. Due to the vast space of models which could extend the SM, it is unfeasible to perform dedicated searches for all of them.</p>
<p>One of the cornerstones in the model independent hunt for new physics phenomena at the LHC is the bump hunt, a search for a localised excess on top of a smooth background. The most sensitive observable for the bump hunt is in an invariant mass spectrum which corresponds to the mass of the particle produced at resonance in particle collisions or decays. The invariant mass spectrum comprises non-resonant events, which produce a falling background across all mass values, with particles appearing as bumps on top of this background. The width of a bump is driven by the decay width of the particle and the detectors resolution. At the ATLAS and CMS Collaborations (ATLAS Collaboration, <xref ref-type="bibr" rid="B5">2008</xref>; CMS Collaboration, <xref ref-type="bibr" rid="B17">2008</xref>) bump hunt techniques are employed to search for new fundamental particles, and were a crucial in the observation of the Higgs boson (ATLAS Collaboration, <xref ref-type="bibr" rid="B6">2012</xref>; CMS Collaboration, <xref ref-type="bibr" rid="B18">2012</xref>). At the LHCb experiment (LHCb Collaboration, <xref ref-type="bibr" rid="B41">2008</xref>), these techniques have also been successfully employed to observe new resonances in composite particles (LHCb Collaboration, <xref ref-type="bibr" rid="B42">2020</xref>, <xref ref-type="bibr" rid="B43">2021</xref>, <xref ref-type="bibr" rid="B44">2022</xref>).</p>
<p>In a bump hunt, the assumption is made that any resonant signal will be localised. With this assumption, a sliding window fit can be performed using a signal region with a side-band region on either side. As the signal is assumed to be localised, the expected background contribution in the signal region can be extrapolated from the two side-bands. The data in the signal region can be compared to the extrapolated background to test for a significant excess. This test is performed across the whole spectrum by sliding the window. In a standard bump hunt, only the resonant observable is used in the sliding window fit to extrapolate the background and test for localised excesses.</p>
<p>However, with the incredible amounts of data collected by the ATLAS and CMS Experiments, and lack of evidence for new particles (ATLAS Collaboration, <xref ref-type="bibr" rid="B9">2021a</xref>,<xref ref-type="bibr" rid="B10">b</xref>,<xref ref-type="bibr" rid="B11">c</xref>; CMS Collaboration, <xref ref-type="bibr" rid="B20">2022a</xref>,<xref ref-type="bibr" rid="B21">b</xref>,<xref ref-type="bibr" rid="B22">c</xref>), the prospect of observing a bump on a single spectrum as more data is collected is growing ever more unlikely. Therefore, attention has turned to using advanced machine learning techniques to improve the sensitivity of searches for new physics, and in particular to improving the reach of the bump hunt approach. Such approaches typically utilise additional discriminatory variables for separating signal from background.</p>
<p>If an accurate background template over discriminatory features can be constructed for the signal region, then the classification without labels method (<sc>CWoLa</sc>) (Metodiev et al., <xref ref-type="bibr" rid="B45">2017</xref>) can be used to extend the bump hunt. As shown in Collins et al. (<xref ref-type="bibr" rid="B23">2019</xref>), the data in the side-bands can be used to construct the template for training the classifier if the discriminatory features are uncorrelated with the resonant variable.</p>
<p>In this paper we introduce a new method, Constructing Unobserved Regions by Transforming Adjacent Intervals (C<sc>urtain</sc>s). By combining invertible neural networks (INNs) with an optimal transport loss (Rubner et al., <xref ref-type="bibr" rid="B50">2000</xref>; Villani, <xref ref-type="bibr" rid="B54">2009</xref>; Cuturi, <xref ref-type="bibr" rid="B24">2013</xref>), we learn the optimal transport function between the two side-bands, and use this trained network (henceforth, referred to as the &#x0201C;transformer&#x0201D;) to construct a background template by transforming the data from each side-band into the signal region.</p>
<p>C<sc>urtain</sc>s is able to construct a background template for any set of observables, thus classifiers can be constructed using strongly correlated observables. These variables provide additional information and are often the best variables for discriminating signal from background and therefore increase the sensitivity of the search. Furthermore, C<sc>urtain</sc>s is a fully data driven approach, requiring no simulated data.</p>
<p>In this paper, we apply C<sc>urtain</sc>s to a search for new physics processes in dijet events produced at the LHC and recorded by a general purpose detector, similar to the ATLAS or CMS experiments. We demonstrate the performance of this method using the R&#x00026;D dataset provided from the LHC Olympics (LHCO) (Kasieczka et al., <xref ref-type="bibr" rid="B37">2019</xref>), a community challenge for applying anomaly detection and other machine learning approaches to the search for new physics (Kasieczka et al., <xref ref-type="bibr" rid="B36">2021</xref>).</p>
<p>We demonstrate that C<sc>urtain</sc>s can accurately learn the conditional transformation of background data given the original and target invariant mass of the events. Classifiers trained using the background template provided by C<sc>urtain</sc>s outperform leading approaches, and the improved sensitivity to signal processes matches or improves upon the performance in an idealised anomaly detection scenario.</p>
<p>Finally, to demonstrate its applicability to a bump hunt and observing potential new signals, we apply the C<sc>urtain</sc>s method in a sliding window approach for various levels of injected signal data and show that excesses above the expected background can be observed without biases or spurious excesses in the absence of a signal process.</p>
</sec>
<sec id="s2">
<title>2. The dataset</title>
<p>The LHCO R&#x00026;D dataset comprises two sets of labelled data. Background data from the Standard Model is produced through QCD dijet production, and signal events from the decay of a new particle to two lighter new particles, which each decay to two quarks <inline-formula><mml:math id="M1"><mml:msup><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>&#x02192;</mml:mo><mml:mi>X</mml:mi><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:mo>&#x02192;</mml:mo><mml:mi>q</mml:mi><mml:mover accent="true"><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow><mml:mi>Y</mml:mi><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:mo>&#x02192;</mml:mo><mml:mi>q</mml:mi><mml:mover accent="true"><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:math></inline-formula>, where the three new particles have mass <inline-formula><mml:math id="M2"><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>3</mml:mn><mml:mo>.</mml:mo><mml:mn>5</mml:mn></mml:math></inline-formula> TeV, <italic>m</italic><sub><italic>X</italic></sub> &#x0003D; 500 GeV, and <italic>m</italic><sub><italic>Y</italic></sub> &#x0003D; 100 GeV. Both samples are generated with <monospace>Pythia</monospace><monospace> 8.219</monospace> (Sj&#x000F6;strand et al., <xref ref-type="bibr" rid="B52">2008</xref>) and interfaced to <monospace>Delphes 3.4.1</monospace> (de Favereau et al., <xref ref-type="bibr" rid="B27">2014</xref>) for the detector simulation. The reconstructed particles are clustered into jets using the anti-<italic>k</italic><sub><italic>t</italic></sub> algorithm (Cacciari et al., <xref ref-type="bibr" rid="B14">2008</xref>) using the <monospace>FastJet</monospace> package (Cacciari et al., <xref ref-type="bibr" rid="B15">2012</xref>), with a radius parameter <italic>R</italic> &#x0003D; 1.0. Each event is required to have two jets, with at least one jet passing a cut on its transverse momentum <inline-formula><mml:math id="M3"><mml:msubsup><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mtext>T</mml:mtext></mml:mrow><mml:mrow><mml:mi>J</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x0003E;</mml:mo><mml:mn>1</mml:mn><mml:mo>.</mml:mo><mml:mn>2</mml:mn></mml:math></inline-formula> TeV to simulate a jet trigger in the detector.</p>
<p>In total 1 million QCD dijet events and 100,000 signal events are generated. C<sc>urtain</sc>s uses all the QCD dijet events as the standard background sample, and in addition doped samples are constructed using all the QCD events and a small number of events from the signal sample from the 100,000 available <italic>W</italic>&#x02032; events. The standard benchmark datasets used to asses the performance of C<sc>urtain</sc>s comprise the full background dataset, with 0, 500, 667, 1,000, or 8,000 injected signal events.</p>
<p>All event observables are constructed from the two highest <italic>p</italic><sub>T</sub> jets, with the two jets ordered by their invariant mass, such that <italic>J</italic><sub>1</sub> has <italic>m</italic><sub><italic>J</italic><sub>1</sub></sub> &#x0003E; <italic>m</italic><sub><italic>J</italic><sub>2</sub></sub>. The studied features include the base set of variables introduced in Nachman and Shih (<xref ref-type="bibr" rid="B46">2020</xref>) and applied in Hallin et al. (<xref ref-type="bibr" rid="B33">2022</xref>),</p>
<disp-formula id="E1"><mml:math id="M4"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>J</mml:mi><mml:mi>J</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x000A0;&#x00394;</mml:mtext><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>J</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msubsup><mml:mrow><mml:mi>&#x003C4;</mml:mi></mml:mrow><mml:mrow><mml:mn>21</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msubsup><mml:mrow><mml:mi>&#x003C4;</mml:mi></mml:mrow><mml:mrow><mml:mn>21</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msubsup><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where &#x003C4;<sub>21</sub> is the <italic>n</italic>-subjettiness ratio of a large radius jet (Thaler and Van Tilburg, <xref ref-type="bibr" rid="B53">2011</xref>), measuring whether a jet has underlying substructure more like a two prong or one decay, and <italic>m</italic><sub><italic>JJ</italic></sub> is the invariant mass of the dijet system. As an additional feature we include</p>
<disp-formula id="E2"><mml:math id="M5"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mtext>&#x00394;</mml:mtext><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>J</mml:mi><mml:mi>J</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>which is the angular separation between the two jets in <italic>&#x003B7;</italic> &#x02212; <italic>&#x003D5;</italic> space. This additional feature is included as it can bring additional sensitivity to some signal models. Furthermore, it is strongly correlated with the resonant feature, <italic>m</italic><sub><italic>JJ</italic></sub>, and so including it when training the transformer and classifier provides a stringent test of the C<sc>urtain</sc>s method.</p>
<p>The width of the signal region in the sliding window is set to 200 GeV by default, with 200 GeV wide side-bands either side. In this paper, we simplify the sliding window approach by shifting the window by 200 GeV such that there is no overlap between signal regions. This would reduce the sensitivity for cases where the signal peak falls on the boundary of the signal region. We avoid this by defining our bins such that the signal is centred within a signal region. Where the signal is unknown overlapping windows would need to be employed, with a strategy in place to avoid selecting the same data twice in the final analysis. The turn on in the dijet invariance mass spectrum caused by the trigger requirements of both jets is removed by only performing the sliding window scan with signal regions above 3.0 TeV. The full range used for the sliding window scan is up to a dijet invariant mass of 4.6 TeV.</p>
<p>To evaluate the performance of classifiers using this dataset, a <italic>k</italic>-fold procedure with five folds is employed, using three fifths of the dataset for training, one fifth for validation and one fifth as a hold out set per fold. No optimisation is performed on the hold out sets, and all optimisation criteria are satisfied using the validation set per fold. This ensures all available data are used in a statistical analysis, which is even more crucial in data driven approaches, where statistical precision is key in the search for new physics. The remaining 92,000 signal events not used to construct the doped datasets are used to evaluate the classifier performance, maximising the statistical precision.</p>
</sec>
<sec id="s3">
<title>3. Method</title>
<sec>
<title>3.1. CURTAINs</title>
<p>In C<sc>urtain</sc>s conditional invertible neural networks (cINNs) (Ardizzone et al., <xref ref-type="bibr" rid="B3">2019a</xref>,<xref ref-type="bibr" rid="B4">b</xref>) are employed to transform data points from an input distribution to those from the target distribution. The transformation is conditioned on a function <italic>f</italic> of the resonant feature <italic>m</italic><sub><italic>JJ</italic></sub> of the input and target data points. Unlike flows (Rezende and Mohamed, <xref ref-type="bibr" rid="B48">2016</xref>; Kobyzev et al., <xref ref-type="bibr" rid="B39">2021</xref>), which use the exact maximum likelihood of transforming data to a desired distribution, usually a multivariate normal distribution, we use an optimal transport loss to train the network to transform data between the two desired distributions.</p>
<p>As the cINN can be used in both directions, the inputs to the conditional function are referred to as the lower and higher values <inline-formula><mml:math id="M6"><mml:msubsup><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>J</mml:mi><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>w</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M7"><mml:msubsup><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>J</mml:mi><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>. In the case of a forward pass through the network, <inline-formula><mml:math id="M8"><mml:msubsup><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>J</mml:mi><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>w</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> are the true values of the input data, with <inline-formula><mml:math id="M9"><mml:msubsup><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>J</mml:mi><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> the target values, and vice versa in the case of an inverse pass. Furthermore, instead of training the cINN in only the forward direction, we iterate between both the forward and inverse directions to ensure better closure between the output and target distributions and to prevent a bias toward transformations in one direction.</p>
<p>Several different network architectures for the transformer were studied in the development of C<sc>urtain</sc>s. The transformers presented in this paper are built on the invertible transformations introduced in Durkan et al. (<xref ref-type="bibr" rid="B28">2019</xref>) which use rational-quadratic (RQ) splines, which are found to be very expressive and easy to train. The conditioning function <italic>f</italic> is chosen to be</p>
<disp-formula id="E3"><label>(1)</label><mml:math id="M10"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>J</mml:mi><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>w</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>J</mml:mi><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>J</mml:mi><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msubsup><mml:mo>-</mml:mo><mml:msubsup><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>J</mml:mi><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>w</mml:mi></mml:mrow></mml:msubsup><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The features which are to be transformed determine the input and target dimensions of the C<sc>urtain</sc>s transformer.</p>
<p>To train the network, batches of data are drawn from the low-mass side-band SB1 and the high-mass side-band SB2. The data from SB1 is first fed through the network in a forward pass, conditioned using <inline-formula><mml:math id="M11"><mml:msubsup><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>J</mml:mi><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>w</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M12"><mml:msubsup><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>J</mml:mi><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>, with target values for each event assigned by randomly pairing the masses drawn from each side-band. The loss between the transformed data and target data is calculated using the sinkhorn divergence (Cuturi, <xref ref-type="bibr" rid="B24">2013</xref>) across the whole batch, in order to measure the distance between the distributions of the two sets of data. The gradient of this loss is used to update the network weights. In the next batch the data from SB2 is fed through the network in an inverse pass and the same procedure is performed. This alternating procedure is repeated throughout the training. A schematic overview of the C<sc>urtain</sc>s transformer model is shown in <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>A schematic overview of the C<sc>urtain</sc>s model. A feature <italic>x</italic> is correlated with <italic>m</italic>, as can be seen from the 2D contour plots for each side-band in blue. Samples <inline-formula><mml:math id="M13"><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M14"><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> of batch size <italic>n</italic> are drawn randomly from the two side-bands. In the forward pass the samples from SB1, <inline-formula><mml:math id="M15"><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>, are passed through the conditional INN where each sample <inline-formula><mml:math id="M16"><mml:msubsup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> is conditioned on <inline-formula><mml:math id="M17"><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:math></inline-formula>, producing the set <inline-formula><mml:math id="M18"><mml:mrow><mml:msubsup><mml:mrow><mml:mo>&#x0007B;</mml:mo><mml:msubsup><mml:mi>z</mml:mi><mml:mn>2</mml:mn><mml:mi>i</mml:mi></mml:msubsup><mml:mo>&#x0007D;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula>. The cost function is defined as the distance between this output and the sample from SB2 <inline-formula><mml:math id="M19"><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>. In the inverse pass the roles of each side-band are exchanged. In applying the model, any value for <italic>m</italic> can be chosen as long as the correct inverse or forward pass is applied.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fdata-06-899345-g0001.tif"/>
</fig>
<p>With this training procedure the optimal transport function is not exactly derived as the conditional information is only implicit and a transformed event will not necessarily be paired to the event with the mass to which it was mapped in the loss calculation. However, after training the network we observe that the learned transformation is a good approximation of the true optimal transformation.</p>
<p>In order to improve the closure of the transformed data to regions other than the side-bands, an additional training step is performed. After an epoch of training the network between SB1 and SB2, each side-band itself is split into two equal width sub side-bands. The network is then trained for an epoch of each intra side-band training, following the same procedure as for the inter side-band training. Although not necessary for the C<sc>urtain</sc>s method, this extra step is performed in order to extend the range of values of the conditional information used to train the network. Instead of having a minimum value of <inline-formula><mml:math id="M20"><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>J</mml:mi><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>w</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>J</mml:mi><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:math></inline-formula> equal to the width of the signal region separating SB1 and SB2, its minimum values is now zero. This ensures that the conditioning variables used to map data to the signal region always lie in the distribution of values used during training.</p>
<p>The C<sc>urtain</sc>s transformer is trained for 1,000 epochs with a batch size of 256 using the <monospace>Adam</monospace> optimiser (Kingma and Ba, <xref ref-type="bibr" rid="B38">2017</xref>). A cosine annealing learning rate schedule is used with an initial learning rate of 10<sup>&#x02212;4</sup>. A typical training time of 6 h using an NVIDIA<sup>&#x000AE;</sup> 3080 RTX GPU is required for a central window encompassing 10<sup>5</sup> samples across the two side-bands.</p>
<p>The C<sc>urtain</sc>s transformers are trained separately for each step in the sliding window, using all the available data in the side-bands. In order to construct a background template in another region, all the data from SB1 and SB2 are transformed in either a forward or inverse pass to mass values sampled from the target window. To create the background template in the signal region, the data from SB1 (SB2) are transformed to values of <italic>m</italic><sub><italic>JJ</italic></sub> corresponding to the signal region in a forward (inverse) pass with the C<sc>urtain</sc>s transformer. These two transformed datasets are combined to create the background template in the signal region.</p>
<p>In the case of validating the C<sc>urtain</sc>s transformer, the side-band data can be transformed to a target window with the same width as the signal region but going in the opposite direction in <italic>m</italic><sub><italic>JJ</italic></sub>, defining outer-band regions for SB1 (OB1) and SB2 (OB2). These regions can be used to validate and tune the C<sc>urtain</sc>s method in a real world setting. The five bands of one sliding window are illustrated in <xref ref-type="fig" rid="F2">Figure 2</xref>, with the depicted signal region centred on the invariant mass of the injected signal. In the studies presented in this paper the width of the side-bands and validation regions is set to 200 GeV by default, unless otherwise specified. To increase the statistics of the constructed datasets the transformer can be applied many times to the same data with different <italic>m</italic><sub><italic>JJ</italic></sub> target values in each pass.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Schematic showing the relative locations of the two side-bands (SB1 and SB2), the signal region (SR) and the two outer-bands (OB1 and OB2) on the resonant observable <italic>m</italic><sub><italic>JJ</italic></sub>. In this example, the non-resonant background is shown as a falling blue line, and the signal region is centered at 3.5 TeV, corresponding to the mass of the injected signal, shown not to scale in red.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fdata-06-899345-g0002.tif"/>
</fig>
<p>The hyperparameters and architecture of the C<sc>urtain</sc>s transformer were optimised in a grid search by measuring the agreement between data transformed into the two outer-band regions from the two side-bands for one step of the sliding window without any doping of signal events. The agreement is measured by training a classifier to separate the two datasets and ensuring the Receiver Operator Characteristic (ROC) curve had a linear response with an area under the curve close to 0.5, which suggests the network was unable to differentiate between real and transformed data in this region. The optimal C<sc>URTAIN</sc>s transformer is made up of eight stacked RQ spline coupling layers. Each coupling layer is constructed from three residual blocks each of two hidden layers of 32 nodes with L<sc>EAKY</sc> R<sc>E</sc>LU activations, resulting in an output spline with four bins. The <monospace>n-flows</monospace> package (Durkan et al., <xref ref-type="bibr" rid="B29">2020</xref>) is used to implement the network architecture in Pytorch 1.8.0 (Paszke et al., <xref ref-type="bibr" rid="B47">2019</xref>). These settings are then used to train all C<sc>urtain</sc>s transformers for each step of the sliding window, and for all doping levels.</p>
</sec>
<sec>
<title>3.2. Mass fitting and sampler</title>
<p>In order to sample target values for the C<sc>urtain</sc>s transformer and not be biased to the presence of any excess of events in the signal region, the distribution of the resonant feature in the signal region needs to be extrapolated from the side-band data. Here we model the QCD dijet background with the functional form</p>
<disp-formula id="E4"><label>(2)</label><mml:math id="M21"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi>z</mml:mi></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M22"><mml:mi>z</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>J</mml:mi><mml:mi>J</mml:mi></mml:mrow></mml:msub><mml:mo>/</mml:mo><mml:msqrt><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msqrt></mml:math></inline-formula> with the centre of mass energy of the collision <inline-formula><mml:math id="M23"><mml:msqrt><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msqrt><mml:mo>=</mml:mo></mml:math></inline-formula>13 TeV. The parameters <italic>p</italic><sub>1</sub>, <italic>p</italic><sub>2</sub>, and <italic>p</italic><sub>3</sub> are obtained from an unbinned fit to the side-band data in using the <monospace>zfit</monospace> package (Eschle et al., <xref ref-type="bibr" rid="B30">2020</xref>). This Ansatz has been used previously in analyses performed at the LHC (ATLAS Collaboration, <xref ref-type="bibr" rid="B7">2016</xref>) and is similar to that used in more recent searches with the omission of the last free parameter (CMS Collaboration, <xref ref-type="bibr" rid="B19">2018</xref>; ATLAS Collaboration, <xref ref-type="bibr" rid="B8">2020</xref>). Once fit to the side-band data, the learned parameters are used in the PDF from which to sample target <italic>m</italic><sub><italic>JJ</italic></sub> values for the transformer.</p>
</sec>
<sec>
<title>3.3. Anomaly detection</title>
<p>Once the background data has been transformed into the signal region from the side-bands, it is possible to use them as the background template to test for the presence of signal in the data from the region. There are several approaches which could be used for anomaly detection with the data transformed with the C<sc>urtain</sc>s method, however in this paper we will focus on the <sc>CWoLa</sc> classifier, as applied also in Collins et al. (<xref ref-type="bibr" rid="B23">2019</xref>), Benkendorfer et al. (<xref ref-type="bibr" rid="B12">2021</xref>), and Hallin et al. (<xref ref-type="bibr" rid="B33">2022</xref>) on this dataset.</p>
<p>For a <sc>CWoLa</sc> classifier, it can be shown that the performance of a classifier trained on two sets of data, each containing a different mixture of signal and background data will result in the optimal classifier trained on pure sets of signal and background data. Here, we assume our transformed data represents a sample of pure background events, and test the hypothesis that in our signal region data there is a mixture of signal and background data. In the presence of signal events in the signal region, the classifier will be able to separate the signal region data from the background template, with the true signal events having higher classification scores than the true background data. By applying a cut on the classifier output to reject a given fraction of the background, calculated from the scores of the background template, the significance of the signal events can be enhanced.</p>
<p>In cases where there is signal contamination in at least one of the side-bands of the sliding window, the background template constructed with C<sc>urtain</sc>s will also contain a non-zero signal to background fraction. With the assumption that the signal is localised, and the bin widths are not too small, the relative fraction of signal in the signal region will be different from the background template. As such, the <sc>CWoLa</sc> method will still be able to approach the performance of the ideal classifier. The background template provided by C<sc>urtain</sc>s will have a lower signal to background ratio than the signal region in at least one step of the sliding window, and in this bin an excess can be expected.</p>
<p>In the event of the signal being fully localised within a side-band, this will result in the opposite labels being used in the training of the <sc>CWoLa</sc> classifier with regards to which dataset contains the higher fraction of signal. After applying a cut on the classifier a slight reduction in events with respect to the prediction could therefore be expected. However, in practise we observe no significant deviation with the dataset under consideration.</p>
<p>The values used as acceptance thresholds on the output of the classifier are independently determined for each classifier in the signal regions across all sliding windows and levels of doping. These cuts are used to enhance the sensitivity to the presence of signal data in each window of the fit. The amount of data which remains after the cut can be compared to the expected background, determined by taking the total number of data in each signal region multiplied by the background rejection factor. In the presence of a signal, a significant excess of data will be observed above the expected background.</p>
<p>A further test of the performance when using the C<sc>URTAIN</sc>s method is to compare against three benchmark classifiers. The first is a fully supervised classifier, trained with knowledge of which events were from the signal process and which were QCD background. Two further classifiers, the idealised classifiers, are trained in the same manner as with the C<sc>URTAIN</sc>s background template, except that the background template comprises true background data from the signal region itself.</p>
<p>Both the supervised and idealised classifiers are only trained for the window in which the signal region is aligned with the peak of the signal data. The supervised classifier provides an upper bound on the achievable performance on the dataset. The idealised classifier sets the target level of performance which can be achieved with a perfect background template, and can be used to validate the performance of C<sc>urtain</sc>s for use in anomaly detection.</p>
<p>All the classifiers for all signal regions and all levels of signal doping share the same architecture and hyperparameters. The classifiers used in this paper have been chosen as they are robust to changes in datasets and initial conditions, in particular when using <italic>k</italic>-fold training and low training statistics. The classifiers are constructed from multilayer perceptrons with three hidden layers with 32 nodes and R<sc>E</sc>LU activations. The classifiers are trained for 20 epochs using the <monospace>Adam</monospace> optimiser with a batch size of 128, and an initial learning rate of 0.001 which anneals to zero following a cosine curve over 20 epochs.</p>
</sec>
</sec>
<sec id="s4">
<title>4. Comparison to other work</title>
<p>Our method is one of several approaches with aims to enhance the sensitivity to new physics processes coming from the resonant production of a new particle using machine learning (Collins et al., <xref ref-type="bibr" rid="B23">2019</xref>; Andreassen et al., <xref ref-type="bibr" rid="B2">2020</xref>; Nachman and Shih, <xref ref-type="bibr" rid="B46">2020</xref>; Benkendorfer et al., <xref ref-type="bibr" rid="B12">2021</xref>; Hallin et al., <xref ref-type="bibr" rid="B33">2022</xref>).</p>
<p>In comparison to the C<sc>athode</sc> method introduced in Hallin et al. (<xref ref-type="bibr" rid="B33">2022</xref>), which is one of the current best anomaly detection methods for resonant signals using the <sc>CWoLa</sc> approach, our method shares some similarities but differs on key points. Although both approaches make use of INNs, C<sc>urtain</sc>s does not train a flow with maximum likelihood but instead uses an optimal transport loss in order to minimise the distance between the output of the model and the target data, with the aim to approximate the optimal transport function between two points in feature space when moving along the resonant spectrum. As a result, C<sc>urtain</sc>s does not generate new samples to construct the background template, but instead transforms the data in the side-bands to equivalent datapoints with a mass in the signal region. This approach avoids the need to match data encodings to an intermediate prior distribution, normally a multidimensional gaussian distribution, which can lead to mismodelling of underlying correlations between the observables in the data if the trained posterior is not in perfect agreement with the prior distribution. The C<sc>athode</sc> method has no regularisation on the model&#x00027;s dependence on the resonant variable, and this dependence is non trivial, so extrapolating to unseen datapoints&#x02014;such as the signal region&#x02014;can be unreliable. In contrast, the C<sc>urtain</sc>s method can be constructed such that at evaluation the conditioning variable is never outside of the values seen from the training data.</p>
<p>Furthermore, in comparison to C<sc>athode</sc>, C<sc>urtain</sc>s is designed to be trained only in the sliding window with all information extracted over a narrow range of the resonant observable, as is standard in a bump hunt. This means C<sc>urtain</sc>s is less sensitive to effects from multiple resonances on the same spectrum, and is not dominated by areas of the distribution with more data. Furthermore, thanks to the optimal transformation learned between the side-bands, it can also be applied to transform side-band data into additional validation regions and not just to construct the background template in the signal region.</p>
<p>In contrast to the methods proposed in Andreassen et al. (<xref ref-type="bibr" rid="B2">2020</xref>) (S<sc>alad</sc>) and Benkendorfer et al. (<xref ref-type="bibr" rid="B12">2021</xref>) (SA-CW<sc>o</sc>L<sc>a</sc>), C<sc>urtain</sc>s does not rely on any simulation and is a completely data-driven technique. In C<sc>urtain</sc>s the side-band data is able to be transformed directly into the signal region, instead of deriving a reweighting between the data and simulated data from the side-bands, which is subsequently applied to transform the simulated data in the signal region into a background template. Due to the resampling of the value of the resonant observable, C<sc>urtain</sc>s is also able to produce a background template with additional statistics, rather than being limited by the number of events in the signal region from the simulated sample.</p>
<p>There are also a wide range of approaches looking for new physics that do not rely on resonant signals. Many techniques are built on autoencoders (Aguilar-Saavedra et al., <xref ref-type="bibr" rid="B1">2017</xref>; Blance et al., <xref ref-type="bibr" rid="B13">2019</xref>; Cerri et al., <xref ref-type="bibr" rid="B16">2019</xref>; Heimel et al., <xref ref-type="bibr" rid="B34">2019</xref>; Roy and Vijay, <xref ref-type="bibr" rid="B49">2019</xref>; Farina et al., <xref ref-type="bibr" rid="B31">2020</xref>; Hajer et al., <xref ref-type="bibr" rid="B32">2020</xref>; Jawahar et al., <xref ref-type="bibr" rid="B35">2022</xref>), looking to identify uncommon events or objects. These models are subsequently used to reject SM-like processes in favour of potential new physics. Other approaches are motivated from the ratio of probability densities and directly measure a test statistic from the comparison of a sample of events with respect to a set of reference distributed events (D&#x00027;Agnolo and Wulzer, <xref ref-type="bibr" rid="B26">2019</xref>; Simone and Jacques, <xref ref-type="bibr" rid="B51">2019</xref>; D&#x00027;Agnolo et al., <xref ref-type="bibr" rid="B25">2021</xref>; Letizia et al., <xref ref-type="bibr" rid="B40">2022</xref>). A comparison of a wide range of methods is also performed in Kasieczka et al. (<xref ref-type="bibr" rid="B36">2021</xref>), which summarises a community challenge for anomaly detection in high energy physics.</p>
</sec>
<sec sec-type="results" id="s5">
<title>5. Results</title>
<sec>
<title>5.1. Validating CURTAINs transformer</title>
<p>The first test of performance in C<sc>urtain</sc>s is to demonstrate that the transformation learned between the two side-bands is accurate, and further to determine whether the learned transformation can extrapolate well to the validation regions. As Monte Carlo simulation is being used for the studies, we can control the composition of the samples in the studies. The performance of the approach is evaluated using a sample containing only background data, as well as various levels of signal doping. The same model configuration is used for all samples, and the sliding window is chosen such that it is centred on the true signal peak with a signal region width of 200 GeV.</p>
<p>The input features and their correlations for the input, target and transformed data distributions are shown in <xref ref-type="fig" rid="F3">Figure 3</xref> for the two side-bands trained in the case of no signal, and in <xref ref-type="fig" rid="F4">Figure 4</xref> for the two validation regions. As can be seen, the transformed data distributions are well reproduced with the C<sc>urtain</sc>s approach. The ability of C<sc>urtain</sc>s to handle features which are strongly correlated with <italic>m</italic><sub><italic>JJ</italic></sub> can be seen from the agreement of the &#x00394;<italic>R</italic><sub><italic>JJ</italic></sub> distributions between SB1 and SB2, which exhibit very different shapes.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Input, target, and transformed data distributions for the base variable set with the addition of &#x00394;<italic>R</italic><sub><italic>JJ</italic></sub>, for transforming data from SB1 to SB2 <bold>(left)</bold> and SB2 to SB1 <bold>(right)</bold>, with the model trained on SB1 (3,200 &#x02264; <italic>m</italic><sub><italic>JJ</italic></sub> &#x0003C; 3,400 GeV) and SB2 (3,600 &#x02264; <italic>m</italic><sub><italic>JJ</italic></sub> &#x0003C; 3,800 GeV). The data from SB1 (SB2) is transformed with a forward (inverse) pass of the C<sc>urtain</sc>s model into the target region. The diagonal elements show the individual features with the off diagonal elements showing a contour plot between the two observables for the transformed and trained data.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fdata-06-899345-g0003.tif"/>
</fig>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Input, target, and transformed data distributions for the base variable set with the addition of &#x00394;<italic>R</italic><sub><italic>JJ</italic></sub>, for transforming data from SB1 to OB1 <bold>(left)</bold> and SB2 to OB2 <bold>(right)</bold>, with the model trained on SB1 (3,200 &#x02264; <italic>m</italic><sub><italic>JJ</italic></sub> &#x0003C; 3,400 GeV) and SB2 (3,600 &#x02264; <italic>m</italic><sub><italic>JJ</italic></sub> &#x0003C; 3,800 GeV), with OB1 and OB2 defined as 200 GeV wide windows directly next to SB1 and SB2 away from the signal region. The data from SB1 (SB2) is transformed with an inverse (forward) pass of the C<sc>urtain</sc>s model into the target region. The diagonal elements show the individual features with the off diagonal elements showing a contour plot between the two observables for the transformed and trained data.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fdata-06-899345-g0004.tif"/>
</fig>
<p>In the case of no signal being present, we can also verify whether the background template constructed by transforming data from the side-bands with C<sc>urtain</sc>s matches the target data in the signal region. The performance of the C<sc>urtain</sc>s method can be seen in <xref ref-type="fig" rid="F5">Figure 5</xref>, with the transformed data closely matching the data distributions and correlations.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Input, target, and transformed data distributions for the base variable set with the addition of &#x00394;<italic>R</italic><sub><italic>JJ</italic></sub>, for transforming data from SB1 and SB2 to the signal region to create the background template, with the model trained on SB1 (3,200 &#x02264; <italic>m</italic><sub><italic>JJ</italic></sub> &#x0003C; 3,400 GeV) and SB2 (3,600 &#x02264; <italic>m</italic><sub><italic>JJ</italic></sub> &#x0003C; 3,800 GeV). The data from SB1 (SB2) is transformed with a forward (inverse) pass of the C<sc>urtain</sc>s model into the target region. The diagonal elements show the individual features with the off diagonal elements showing a contour plot between the two observables for the transformed and trained data.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fdata-06-899345-g0005.tif"/>
</fig>
<p>To quantify the level of agreement between the transformed distributions and the target data, classifiers are trained to separate the two datasets, and the area under the ROC curve is measured. The level of agreement between the C<sc>urtain</sc>s transformed data and target data can be seen for several levels of signal doping in <xref ref-type="table" rid="T1">Table 1</xref>. We can see that C<sc>urtain</sc>s has very good agreement with the target distribution in all signal regions and in all cases is seen to be better than in the validation region. The reduced performance in OB1 and OB2 is a result of the transformer extrapolating outside of the trained sliding window. This demonstrates their ability to be used for validating the C<sc>urtain</sc>s method and future classification architectures.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Quantitative agreement between the data distributions of the transformed data and the target data as measured by the AUC of the ROC curve trained on the two samples, as measured for various levels of signal doping with a 200 GeV wide signal region.</p></caption>
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th/>
<th valign="top" align="center"><bold>SB1 &#x02192; SB2</bold></th>
<th valign="top" align="center"><bold>SB2 &#x02192; SB1</bold></th>
<th valign="top" align="center"><bold>SB1 &#x02192; OB1</bold></th>
<th valign="top" align="center"><bold>SB2 &#x02192; OB2</bold></th>
<th valign="top" align="center"><bold>SB1 &#x02192; SR &#x0222A;SB2 &#x02192; SR</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">0 signal</td>
<td valign="top" align="center">0.504</td>
<td valign="top" align="center">0.504</td>
<td valign="top" align="center">0.519</td>
<td valign="top" align="center">0.512</td>
<td valign="top" align="center">0.509</td>
</tr>
<tr>
<td valign="top" align="left">500 signal</td>
<td valign="top" align="center">0.503</td>
<td valign="top" align="center">0.503</td>
<td valign="top" align="center">0.518</td>
<td valign="top" align="center">0.506</td>
<td valign="top" align="center">0.506</td>
</tr>
<tr>
<td valign="top" align="left">667 signal</td>
<td valign="top" align="center">0.505</td>
<td valign="top" align="center">0.504</td>
<td valign="top" align="center">0.516</td>
<td valign="top" align="center">0.514</td>
<td valign="top" align="center">0.505</td>
</tr>
<tr>
<td valign="top" align="left">1,000 signal</td>
<td valign="top" align="center">0.499</td>
<td valign="top" align="center">0.502</td>
<td valign="top" align="center">0.520</td>
<td valign="top" align="center">0.502</td>
<td valign="top" align="center">0.512</td>
</tr>
<tr>
<td valign="top" align="left">8,000 signal</td>
<td valign="top" align="center">0.508</td>
<td valign="top" align="center">0.511</td>
<td valign="top" align="center">0.523</td>
<td valign="top" align="center">0.521</td>
<td valign="top" align="center">0.522</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>5.2. Application to anomaly detection</title>
<p>To demonstrate the performance of C<sc>urtain</sc>s to produce a robust background template, the sliding window is centred on the resonant mass of the signal events, and the performance of the <sc>CWoLa</sc> classifier is compared against a background template produced using the C<sc>athode</sc> method. The signal region width is set to 400 GeV to contain the majority of the signal events, resulting in 120,000 background events. The background template is produced with oversampling, with a total of nine times the number of expected events in the signal region. Two comparisons to C<sc>athode</sc> can be performed, one using the same training windows as for C<sc>urtain</sc>s, which we refer to as C<sc>athode</sc> (local), and one using the full invariant mass distribution outside the signal region, as presented in Hallin et al. (<xref ref-type="bibr" rid="B33">2022</xref>), which we refer to as C<sc>athode</sc> (full).</p>
<p>For reference, the methods are compared to a classifier trained using an idealised background template and to a fully supervised classifier. The idealised background template constructed using true background events from the signal region, and the supervised classifier is trained to separate the signal data from the background data using class labels. To construct the idealised background dataset we either use an equal number of background data as there are in the signal region (Eq-Idealised) to measure the performance assuming we had access to a perfect model of the background data, or the same number of data points as are produced with the C<sc>urtain</sc>s and C<sc>athode</sc> approaches (Over-Idealised), which should approach the best possible performance for models which can oversample.</p>
<p>The performance of the classifiers with the different methods are shown for the doped sample with 3,000 injected signal events (of which 2,214 are in the signal region) in <xref ref-type="fig" rid="F6">Figure 6</xref>, comparing the background rejection as a function of signal efficiency and the significance improvement as a function of the background rejection. In order to maintain a fair comparison to the Eq-Idealised classifier, which requires true background data from the signal region for the background template, only half of the available data in the signal region is used for training with the <italic>k</italic>-fold strategy for all other approaches. The maximum significance improvement is shown for a wide range of doping levels in <xref ref-type="fig" rid="F7">Figure 7</xref>. This metric is a good measure of performance for anomaly detection, rather than the area under the ROC curve, as it translates to the expected performance gain when applying an optimal cut on a classifier.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Background rejection as a function of signal efficiency <bold>(left)</bold> and signal improvement as a function of background rejection <bold>(right)</bold> for the different background template models (C<sc>urtain</sc>s&#x02014;red, C<sc>athode</sc>&#x02014;blue, Eq-Idealised&#x02014;green, Over-Idealised&#x02014;dashed green) and a fully supervised classifier (black). The sample with 3,000 injected signal events in used to train all classifiers in the signal region 3,300 &#x02264; <italic>m</italic><sub><italic>JJ</italic></sub> &#x0003C; 3,700 GeV. The solid lines show the mean value of fifty classifier trainings with different random seeds. The uncertainty encompasses 68% of the runs either side of the mean.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fdata-06-899345-g0006.tif"/>
</fig>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>The significance improvement as a function of decreasing signal purity (raw signal events) for the different background template models [C<sc>urtain</sc>s&#x02014;red, C<sc>athode</sc> (local)&#x02014;blue, Eq-Idealised&#x02014;green, Over-Idealised&#x02014;dashed green] and a fully supervised classifier (black). All classifiers trained in the signal region 3,300 &#x02264; <italic>m</italic><sub><italic>JJ</italic></sub> &#x0003C; 3,700 GeV for varying levels of signal doping. The solid lines show the mean value of fifty classifier trainings with different random seeds. The uncertainty encompasses 68% of the runs either side of the mean.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fdata-06-899345-g0007.tif"/>
</fig>
<p>We can see that C<sc>urtain</sc>s not only outperforms C<sc>athode</sc> (local), but also approaches the performance of the Over-Idealised and supervised scenarios. When using the full range outside of the signal regions to train C<sc>athode</sc> (full) the performance recovers and C<sc>urtain</sc>s is only able to match the performance at high levels of background rejection, as seen in <xref ref-type="fig" rid="F8">Figure 8</xref>. However, this demonstrates that C<sc>urtain</sc>s is able to reach a higher level of performance when trained on lower numbers of events.</p>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p>Background rejection as a function of signal efficiency <bold>(left)</bold> and signal improvement as a function of background rejection <bold>(right)</bold> for the C<sc>urtain</sc>s (red), C<sc>athode</sc> (local) (blue, solid), and C<sc>athode</sc> (full) (blue, dashed) background template models compared to a supervised classifier (black). The dashed C<sc>athode</sc> (full) model is trained using all data outside of the signal region, whereas the two solid lines are trained using the default 200 GeV side-bands. All classifiers are trained on the sample with 3,000 injected signal events for the signal region 3,300 &#x02264; <italic>m</italic><sub><italic>JJ</italic></sub> &#x0003C; 3,700 GeV. The lines show the mean value of fifty classifier trainings with different random seeds. The uncertainty encompasses 68% of the runs either side of the mean.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fdata-06-899345-g0008.tif"/>
</fig>
</sec>
<sec>
<title>5.3. Application in a sliding window</title>
<p>As it is not possible to know the location of the signal events when applying C<sc>urtain</sc>s to data, the real test of the performance and robustness of the method is in the sliding window setting.</p>
<p>Both C<sc>urtain</sc>s and C<sc>athode</sc> (local and full) are used to generate the background templates in a sliding window scan in the range 3,000&#x02013;4,600 GeV, with steps of 200 GeV and equal 200 GeV wide signal regions. Classifiers are trained to separate the signal region data from the background template, and cuts are applied to retain 20, 5, and 0.1% of the background events.</p>
<p>These scans are performed for several levels of signal doping and are shown in <xref ref-type="fig" rid="F9">Figure 9</xref> for the case where there is no signal present, and in <xref ref-type="fig" rid="F10">Figure 10</xref> for doped samples with 500, 667, 1,000, and 8,000 injected signal events. Each signal region is subdivided into two bins of equal width in <italic>m</italic><sub><italic>JJ</italic></sub> for the plot. The expected background is determined by multiplying the original yield of each bin by the chosen background retention factor.</p>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption><p>The dijet invariant mass for the range of signal regions probed in the sliding window, from 3,300 to 4,600 GeV, for the case of zero doping. Each signal region is 200 GeV wide and split into two 100 GeV wide bins. The dashed line shows the expected background after applying a cut on classifier trained using the background predictions from the C<sc>urtain</sc>s (red), C<sc>athode</sc> (local) (blue), and C<sc>athode</sc> (full) (green) methods at specific background rejections. Three different cut levels are applied retaining 20, 5, and 0.1% of background events, respectively. The cut values are calculated per signal region using the background template.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fdata-06-899345-g0009.tif"/>
</fig>
<fig id="F10" position="float">
<label>Figure 10</label>
<caption><p>The dijet invariant mass for the range of signal regions probed in the sliding window, from 3,300 to 4,600 GeV, for the case of samples doped with 500 <bold>(top left)</bold>, 667 <bold>(top right)</bold>, 1,000 <bold>(bottom left)</bold>, and 8,000 <bold>(bottom right)</bold> signal events. Each signal region is 200 GeV wide and split into two 100 GeV wide bins. The dashed line shows the expected background after applying a cut on classifier trained using the background predictions from the C<sc>urtain</sc>s (red), C<sc>athode</sc> (local) (blue), and C<sc>athode</sc> (full) (green) methods at specific background rejections. Three different cut levels are applied retaining 20, 5, and 0.1% of background events, respectively. The cut values are calculated per signal region using the background template.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fdata-06-899345-g0010.tif"/>
</fig>
<p>In contrast to the <sc>CWoLa</sc> bump hunt approach introduced in Collins et al. (<xref ref-type="bibr" rid="B23">2019</xref>), which uses the classifier trained in the signal region to apply a cut on all events on the invariant mass spectrum before performing a traditional bump hunt, we treat each signal region as an independent region and do not apply the classifiers outside of the regions in which they are trained. This sliding window approach tests how the C<sc>urtain</sc>s and C<sc>athode</sc> approaches perform as the sideband windows used to train the networks as well as the signal region transition between the presence of signal, to signal in one of the sidebands as well as the case where there is perfect alignment of signal in the signal region. This approach does not test the ability of the trained classifiers to extrapolate outside of the values of invariant mass used to train them. Were they to be applied outside of the respective regions it is expected sculpting of the invariant mass distribution would occur after applying cuts on the classifier due to the strong correlation between &#x00394;<italic>R</italic><sub><italic>JJ</italic></sub> and <italic>m</italic><sub><italic>JJ</italic></sub>.</p>
<p>As can be seen from the sliding window scans in <xref ref-type="fig" rid="F9">Figures 9</xref>, <xref ref-type="fig" rid="F10">10</xref>, using C<sc>urtain</sc>s and C<sc>athode</sc> (full) we are able to correctly identify the location of the signal events for even reasonably low levels of signal. Where there are no or very few signal events the yields after each cut do not deviate too far from expected background. The corresponding significance of excesses seen in each bin for three cuts on background efficiency are shown in <xref ref-type="fig" rid="F11">Figure 11</xref>. In the case where there is no signal injected into the sample, both C<sc>urtain</sc>s and C<sc>athode</sc> (full) have relatively low local excesses reaching a 4&#x003C3; deviation only in one bin at the 1% background efficiency and not exceeding 3&#x003C3; deviations for tighter cuts when considering only statistical uncertainties on the yields in each bin.</p>
<fig id="F11" position="float">
<label>Figure 11</label>
<caption><p>Measured excesses in each of the signal regions probed in the sliding window, from 3,300 to 4,600 GeV, for the case of samples doped with 0 <bold>(left)</bold>, 667 <bold>(middle)</bold>, and 1,000 <bold>(right)</bold> signal events. Each signal region is 200 GeV wide and split into two 100 GeV wide bins. The solid, dashed, and dotted lines show the probability of the observed excesses (<italic>p</italic><sub>0</sub>) over the background after applying a cut on classifier trained using the background predictions from the C<sc>urtain</sc>s (red) and C<sc>athode</sc> (full) (green) methods at 1, 0.1, and 0.01%, respectively. The cut values are calculated per signal region using the background template.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fdata-06-899345-g0011.tif"/>
</fig>
<p>However, at lower levels of background rejection, significant local excesses over the expectation are observed. At 5% background efficiency a maximum local deviation is observed at 4&#x003C3; for C<sc>athode</sc> (full) and 5&#x003C3; for C<sc>urtain</sc>s. In the presence of signal both C<sc>urtain</sc>s and C<sc>athode</sc> (full) have observed excesses at the signal mass peak for each cut level for the lower levels of signal events, with C<sc>athode</sc> (full) approaches results in a more prominent excess.</p>
<p>The C<sc>athode</sc> (local) approach yields an excess across the whole spectrum in the absence of signal and for all levels of injected signal. However, it also finds an excess under the signal peak in the cases where signal is injected, which at higher levels of signal exceeds that found by C<sc>urtain</sc>s.</p>
<p>In an analysis a systematic excess over the expectation calculated from the original yields per bin would not necessarily be problematic, as the expectation could be determined from a side band fit in <italic>m</italic><sub><italic>JJ</italic></sub> after applying the cut. Additionally, these values do not take any systematic uncertainties into account and only consider statistical uncertainties on the number of events passing each cut from the yields.</p>
<p>Although the ability to isolate the signal events when using C<sc>urtain</sc>s in the window scan decreases at low numbers of signal events and signal purity, this is also seen for both idealised cases in <xref ref-type="fig" rid="F7">Figure 7</xref> and suggests this as rather an area where the classifier architecture and anomaly detection method need to be optimised. The performance of C<sc>urtain</sc>s in this setting could also be further improved by optimising the binning used in the sliding window, and the number of subdivisions within each signal region.</p>
</sec>
</sec>
<sec sec-type="conclusions" id="s6">
<title>6. Conclusions</title>
<p>In this paper we have proposed a new method, C<sc>urtain</sc>s, for use in weakly supervised anomaly detection which can be used to extend the sensitivity of bump hunt searches for new resonances. This method stays true to the bump hunt approach by remaining completely data driven, and with all templates and signal extraction performed on a local region in a sliding window configuration.</p>
<p>C<sc>urtain</sc>s is able to produce a background template in the signal region which closely matches the true background distributions. When applied in conjunction with anomaly detection techniques to identify signal events, C<sc>urtain</sc>s matches the performance of an idealised setting in which the background template is defined using background events from the signal region. It also does not produce spurious excesses in the absence of signal events.</p>
<p>As real data points are used with the C<sc>urtain</sc>s transformer to produce the background template, we avoid problems which can arise from sampling a prior distribution leading to non perfect agreement over distributions of features and their correlations. By conditioning the transformation on the difference in input and target <italic>m</italic><sub><italic>JJ</italic></sub>, we also avoid the need to interpolate or extrapolate outside of the values seen in training. Using this approach we see C<sc>urtain</sc>s is able to reach similar levels of performance in comparison to state-of-the-art methods. C<sc>urtain</sc>s delivers this performance even when using much less training data, as seen when using side-bands as opposed to the full data distribution outside of the signal region.</p>
<p>Another key advantage of C<sc>urtain</sc>s over other proposed techniques is the ability to apply it to validation regions. By transforming the side-bands data to other regions than the signal region, validation regions can be defined in which the transformer and classifier architectures can be optimised on real data. Here the C<sc>urtain</sc>s transformer can be validated and optimised by ensuring the agreement between the transformed data and target data distributions is as close as possible, and the classifier architecture can be optimised to make sure it does not pick up on residual differences between transformed and target data. In this paper, only the former optimisation procedure was performed, with the classifier architecture instead chosen for its robustness to variability in initial conditions.</p>
<p>However, care must be taken to optimise the width of the signal region when training the C<sc>urtain</sc>s model to make sure that the signal to background ratio is not constant across the side-band and signal regions.</p>
<p>It may be possible to extend C<sc>urtain</sc>s to extrapolation tasks, where a model would be trained on one control region and applied to all other regions. This could allow one model to be trained per bump hunt, or a model could be trained to extrapolate to the tails of distributions, allowing these regions to be probed in a model independent fashion. Thanks to its performance and ability to be applied to a sliding window fit, C<sc>urtain</sc>s is simple to apply to current sliding window fits and should bring significant gains in sensitivity in the search for new physics at the LHC and other domains.</p>
</sec>
<sec sec-type="data-availability" id="s7">
<title>Data availability statement</title>
<p>Publicly available datasets were analyzed in this study. This data can be found at: <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/4536377">https://zenodo.org/record/4536377</ext-link>.</p>
</sec>
<sec sec-type="author-contributions" id="s8">
<title>Author contributions</title>
<p>SK and DS: training, optimisation, and modelling studies. JR and TG: conceptualisation. JR: strategy, approach, and editor. All authors have read and agreed on the content this draft and are accountable for the content of the work.</p>
</sec>
</body>
<back>
<sec sec-type="funding-information" id="s9">
<title>Funding</title>
<p>The authors would like to acknowledge funding through the SNSF Sinergia grant called Robust Deep Density Models for High-Energy Particle Physics and Solar Flare Analysis (RODEM) with funding number CRSII5_193716, and the SNSF project grant 200020_181984 called Exploiting LHC data with machine learning and preparations for HL-LHC.</p>
</sec>
<ack><p>The authors would like to thank Matthias Schlaffer, our resident C<sc>athode</sc> Guru, for his invaluable input in establishing a reliable baseline for comparisons and useful discussions, and Knut Zoch for input on the initial studies and samples used. Both Knut and Matthias are also thanked for their feedback on this manuscript.</p>
</ack>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec sec-type="supplementary-material" id="s11">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fdata.2023.899345/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fdata.2023.899345/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.PDF" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aguilar-Saavedra</surname> <given-names>J. A.</given-names></name> <name><surname>Collins</surname> <given-names>J. H.</given-names></name> <name><surname>Mishra</surname> <given-names>R. K.</given-names></name></person-group> (<year>2017</year>). <article-title>A generic anti-QCD jet tagger</article-title>. <source>J. High Energy Phys</source>. <volume>11</volume>:<fpage>163</fpage>. <pub-id pub-id-type="doi">10.1007/JHEP11(2017)163</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Andreassen</surname> <given-names>A.</given-names></name> <name><surname>Nachman</surname> <given-names>B.</given-names></name> <name><surname>Shih</surname> <given-names>D.</given-names></name></person-group> (<year>2020</year>). <article-title>Simulation assisted likelihood-free anomaly detection</article-title>. <source>Phys. Rev. D</source> <volume>101</volume>:<fpage>095004</fpage>. <pub-id pub-id-type="doi">10.1103/PhysRevD.101.095004</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Ardizzone</surname> <given-names>L.</given-names></name> <name><surname>Kruse</surname> <given-names>J.</given-names></name> <name><surname>Wirkert</surname> <given-names>S.</given-names></name> <name><surname>Rahner</surname> <given-names>D.</given-names></name> <name><surname>Pellegrini</surname> <given-names>E. W.</given-names></name> <name><surname>Klessen</surname> <given-names>R. S.</given-names></name> <etal/></person-group>. (<year>2019a</year>). <source>Analyzing Inverse Problems With Invertible Neural Networks</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/1808.04730">https://arxiv.org/abs/1808.04730</ext-link> (accessed March 15, 2022).</citation>
</ref>
<ref id="B4">
<citation citation-type="web"><person-group person-group-type="author"><collab>Ardizzone L. L&#x000FC;th C. Kruse J. Rother C. K&#x000F6;the U.</collab></person-group> (<year>2019b</year>). <source>Guided Image Generation With Conditional Invertible Neural Networks</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/1907.02392">https://arxiv.org/abs/1907.02392</ext-link> (accessed March 15, 2022).</citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><collab>ATLAS Collaboration</collab></person-group> (<year>2008</year>). <article-title>The ATLAS experiment at the CERN large Hadron collider</article-title>. <source>J. Instrum</source> <volume>3</volume>, <fpage>S08003</fpage>. <pub-id pub-id-type="doi">10.1088/1748-0221/3/08/S08003</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><collab>ATLAS Collaboration</collab></person-group> (<year>2012</year>). <article-title>Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC</article-title>. <source>Phys. Lett. B</source> <volume>716</volume>, <fpage>1</fpage>&#x02013;<lpage>29</lpage>. <pub-id pub-id-type="doi">10.1016/j.physletb.2012.08.020</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><collab>ATLAS Collaboration</collab></person-group> (<year>2016</year>). <article-title>Search for new phenomena in dijet mass and angular distributions from <italic>pp</italic> collisions at <inline-formula><mml:math id="M24"><mml:msqrt><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msqrt></mml:math></inline-formula> = 13 TeV with the ATLAS detector</article-title>. <source>Phys. Lett. B</source> <volume>754</volume>, <fpage>302</fpage>&#x02013;<lpage>322</lpage>. <pub-id pub-id-type="doi">10.48550/arXiv.1512.01530</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><collab>ATLAS Collaboration</collab></person-group> (<year>2020</year>). <article-title>Search for new resonances in mass distributions of jet pairs using 139 fb<sup>&#x02212;</sup>1 of <italic>pp</italic> collisions at <inline-formula><mml:math id="M25"><mml:msqrt><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msqrt></mml:math></inline-formula>=13 TeV with the ATLAS detector</article-title>. <source>J. High Energy Phys</source>. <volume>3</volume>:<fpage>145</fpage>.</citation>
</ref>
<ref id="B9">
<citation citation-type="web"><person-group person-group-type="author"><collab>ATLAS Collaboration</collab></person-group> (<year>2021a</year>). <source>Summary Plots for Heavy Particle Searches and Long-lived Particle Searches - <italic>July 2021</italic></source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://cds.cern.ch/record/2777015">https://cds.cern.ch/record/2777015</ext-link> (accessed March 15, 2022).</citation>
</ref>
<ref id="B10">
<citation citation-type="web"><person-group person-group-type="author"><collab>ATLAS Collaboration</collab></person-group> (<year>2021b</year>). <source>Summary Plots from ATLAS Searches for Pair-Produced Leptoquarks - <italic>June 2021</italic></source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://cds.cern.ch/record/2771726">https://cds.cern.ch/record/2771726</ext-link> (accessed March 15, 2022).</citation>
</ref>
<ref id="B11">
<citation citation-type="web"><person-group person-group-type="author"><collab>ATLAS Collaboration</collab></person-group> (<year>2021c</year>). <source>SUSY Summary Plots - <italic>June 2021</italic></source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://cds.cern.ch/record/2771785">https://cds.cern.ch/record/2771785</ext-link> (accessed March 15, 2022).</citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Benkendorfer</surname> <given-names>K.</given-names></name> <name><surname>Pottier</surname> <given-names>L. L.</given-names></name> <name><surname>Nachman</surname> <given-names>B.</given-names></name></person-group> (<year>2021</year>). <article-title>Simulation-assisted decorrelation for resonant anomaly detection</article-title>. <source>Phys. Rev. D</source> <volume>104</volume>:<fpage>035003</fpage>. <pub-id pub-id-type="doi">10.1103/PhysRevD.104.035003</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Blance</surname> <given-names>A.</given-names></name> <name><surname>Spannowsky</surname> <given-names>M.</given-names></name> <name><surname>Waite</surname> <given-names>P.</given-names></name></person-group> (<year>2019</year>). <article-title>Adversarially-trained autoencoders for robust unsupervised new physics searches</article-title>. <source>J. High Energy Phys</source>. <volume>2019</volume>:<fpage>47</fpage>. <pub-id pub-id-type="doi">10.1007/JHEP10(2019)047</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cacciari</surname> <given-names>M.</given-names></name> <name><surname>Salam</surname> <given-names>G. P.</given-names></name> <name><surname>Soyez</surname> <given-names>G.</given-names></name></person-group> (<year>2008</year>). <article-title>The anti-<italic>k</italic><sub><italic>t</italic></sub> jet clustering algorithm</article-title>. <source>J. High Energy Phys</source>. <volume>4</volume>:<fpage>63</fpage>. <pub-id pub-id-type="doi">10.1088/1126-6708/2008/04/063</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cacciari</surname> <given-names>M.</given-names></name> <name><surname>Salam</surname> <given-names>G. P.</given-names></name> <name><surname>Soyez</surname> <given-names>G.</given-names></name></person-group> (<year>2012</year>). <article-title>FastJet user manual</article-title>. <source>Eur. Phys. J. C</source> <volume>72</volume>:<fpage>1896</fpage>. <pub-id pub-id-type="doi">10.1140/epjc/s10052-012-1896-2</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cerri</surname> <given-names>O.</given-names></name> <name><surname>Nguyen</surname> <given-names>T. Q.</given-names></name> <name><surname>Pierini</surname> <given-names>M.</given-names></name> <name><surname>Spiropulu</surname> <given-names>M.</given-names></name> <name><surname>Vlimant</surname> <given-names>J.-R.</given-names></name></person-group> (<year>2019</year>). <article-title>Variational autoencoders for new physics mining at the large hadron collider</article-title>. <source>J. High Energy Phys</source>. <volume>2019</volume>:<fpage>36</fpage>. <pub-id pub-id-type="doi">10.1007/JHEP05(2019)036</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><collab>CMS Collaboration</collab></person-group> (<year>2008</year>). <article-title>The CMS experiment at the CERN LHC</article-title>. <source>J. Instrum</source>. 3, S08004. <pub-id pub-id-type="doi">10.1088/1748-0221/3/08/S08004</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><collab>CMS Collaboration</collab></person-group> (<year>2012</year>). <article-title>Observation of a New Boson at a mass of 125 GeV with the CMS experiment at the LHC</article-title>. <source>Phys. Lett. B</source> <volume>716</volume>, <fpage>30</fpage>&#x02013;<lpage>61</lpage>. <pub-id pub-id-type="doi">10.1016/j.physletb.2012.08.021</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><collab>CMS Collaboration</collab></person-group> (<year>2018</year>). <article-title>Search for narrow and broad dijet resonances in proton-proton collisions at <inline-formula><mml:math id="M26"><mml:msqrt><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msqrt></mml:math></inline-formula>=13 TeV and constraints on dark matter mediators and other new particles</article-title>. <source>J. High Energy Phys</source>. <volume>8</volume>:<fpage>130</fpage>.</citation>
</ref>
<ref id="B20">
<citation citation-type="web"><person-group person-group-type="author"><collab>CMS Collaboration</collab></person-group> (<year>2022a</year>). <source>CMS Summary Plots EXO 13 TeV</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://twiki.cern.ch/twiki/bin/view/CMSPublic/SummaryPlotsEXO13TeV">https://twiki.cern.ch/twiki/bin/view/CMSPublic/SummaryPlotsEXO13TeV</ext-link> (accessed September 14, 2022).</citation>
</ref>
<ref id="B21">
<citation citation-type="web"><person-group person-group-type="author"><collab>CMS Collaboration</collab></person-group> (<year>2022b</year>). <source>CMS Physics Results B2G</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://twiki.cern.ch/twiki/bin/view/CMSPublic/PhysicsResultsB2G">https://twiki.cern.ch/twiki/bin/view/CMSPublic/PhysicsResultsB2G</ext-link> (accessed September 14, 2022).</citation>
</ref>
<ref id="B22">
<citation citation-type="web"><person-group person-group-type="author"><collab>CMS Collaboration</collab></person-group> (<year>2022c</year>). <source>CMS Physics Results SUS</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://twiki.cern.ch/twiki/bin/view/CMSPublic/PhysicsResultsSUS">https://twiki.cern.ch/twiki/bin/view/CMSPublic/PhysicsResultsSUS</ext-link> (accessed September 14, 2022).</citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Collins</surname> <given-names>J. H.</given-names></name> <name><surname>Howe</surname> <given-names>K.</given-names></name> <name><surname>Nachman</surname> <given-names>B.</given-names></name></person-group> (<year>2019</year>). <article-title>Extending the search for new resonances with machine learning</article-title>. <source>Phys. Rev. D</source> <volume>99</volume>, <fpage>014038</fpage>. <pub-id pub-id-type="doi">10.1103/PhysRevD.99.014038</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Cuturi</surname> <given-names>M.</given-names></name></person-group> (<year>2013</year>). <source>Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/1306.0895">https://arxiv.org/abs/1306.0895</ext-link> (accessed March 15, 2022).</citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>D&#x00027;Agnolo</surname> <given-names>R. T.</given-names></name> <name><surname>Grosso</surname> <given-names>G.</given-names></name> <name><surname>Pierini</surname> <given-names>M.</given-names></name> <name><surname>Wulzer</surname> <given-names>A.</given-names></name> <name><surname>Zanetti</surname> <given-names>M.</given-names></name></person-group> (<year>2021</year>). <article-title>Learning multivariate new physics</article-title>. <source>Eur. Phys. J. C</source> <volume>81</volume>:<fpage>89</fpage>. <pub-id pub-id-type="doi">10.1140/epjc/s10052-021-08853-y</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>D&#x00027;Agnolo</surname> <given-names>R. T.</given-names></name> <name><surname>Wulzer</surname> <given-names>A.</given-names></name></person-group> (<year>2019</year>). <article-title>Learning new physics from a machine</article-title>. <source>Phys. Rev. D</source> <volume>99</volume>:<fpage>015014</fpage>. <pub-id pub-id-type="doi">10.1103/PhysRevD.99.015014</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>de Favereau</surname> <given-names>J.</given-names></name> <name><surname>Delaere</surname> <given-names>C.</given-names></name> <name><surname>Demin</surname> <given-names>P.</given-names></name> <name><surname>Giammanco</surname> <given-names>A.</given-names></name> <name><surname>Lema&#x000EE;tre</surname> <given-names>V.</given-names></name> <name><surname>Mertens</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>DELPHES 3, A modular framework for fast simulation of a generic collider experiment</article-title>. <source>J. High Energy Phys</source>. <volume>2</volume>:<fpage>57</fpage>. <pub-id pub-id-type="doi">10.1007/JHEP02(2014)057</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Durkan</surname> <given-names>C.</given-names></name> <name><surname>Bekasov</surname> <given-names>A.</given-names></name> <name><surname>Murray</surname> <given-names>I.</given-names></name> <name><surname>Papamakarios</surname> <given-names>G.</given-names></name></person-group> (<year>2019</year>). <source>Neural Spline Flows</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/1906.04032">https://arxiv.org/abs/1906.04032</ext-link> (accessed March 15, 2022).</citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Durkan</surname> <given-names>C.</given-names></name> <name><surname>Bekasov</surname> <given-names>A.</given-names></name> <name><surname>Murray</surname> <given-names>I.</given-names></name> <name><surname>Papamakarios</surname> <given-names>G.</given-names></name></person-group> (<year>2020</year>). <source>nflows: normalizing flows in PyTorch</source> (<publisher-loc>Zenodo</publisher-loc>). <pub-id pub-id-type="doi">10.5281/zenodo.4296287</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Eschle</surname> <given-names>J.</given-names></name> <name><surname>Puig Navarro</surname> <given-names>A.</given-names></name> <name><surname>Silva Coutinho</surname> <given-names>R.</given-names></name> <name><surname>Serra</surname> <given-names>N.</given-names></name></person-group> (<year>2020</year>). <article-title>zfit: Scalable pythonic fitting</article-title>. <source>SoftwareX</source> <volume>11</volume>:<fpage>100508</fpage>. <pub-id pub-id-type="doi">10.1016/j.softx.2020.100508</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Farina</surname> <given-names>M.</given-names></name> <name><surname>Nakai</surname> <given-names>Y.</given-names></name> <name><surname>Shih</surname> <given-names>D.</given-names></name></person-group> (<year>2020</year>). <article-title>Searching for new physics with deep autoencoders</article-title>. <source>Phys. Rev. D</source> <volume>101</volume>:<fpage>075021</fpage>. <pub-id pub-id-type="doi">10.1103/PhysRevD.101.075021</pub-id></citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hajer</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>Y.-Y.</given-names></name> <name><surname>Liu</surname> <given-names>T.</given-names></name> <name><surname>Wang</surname> <given-names>H.</given-names></name></person-group> (<year>2020</year>). <article-title>Novelty detection meets collider physics</article-title>. <source>Phys. Rev. D</source> <volume>101</volume>:<fpage>076015</fpage>. <pub-id pub-id-type="doi">10.1103/PhysRevD.101.076015</pub-id></citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hallin</surname> <given-names>A.</given-names></name> <name><surname>Isaacson</surname> <given-names>J.</given-names></name> <name><surname>Kasieczka</surname> <given-names>G.</given-names></name> <name><surname>Krause</surname> <given-names>C.</given-names></name> <name><surname>Nachman</surname> <given-names>B.</given-names></name> <name><surname>Quadfasel</surname> <given-names>T.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>Classifying anomalies through outer density estimation</article-title>. <source>Phys. Rev. D.</source> <volume>106</volume>, <fpage>055006</fpage>. <pub-id pub-id-type="doi">10.1103/PhysRevD.106.055006</pub-id></citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Heimel</surname> <given-names>T.</given-names></name> <name><surname>Kasieczka</surname> <given-names>G.</given-names></name> <name><surname>Plehn</surname> <given-names>T.</given-names></name> <name><surname>Thompson</surname> <given-names>J. M.</given-names></name></person-group> (<year>2019</year>). <article-title>QCD or what?</article-title> <source>SciPost Phys</source>. <volume>6</volume>:<fpage>30</fpage>. <pub-id pub-id-type="doi">10.21468/SciPostPhys.6.3.030</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jawahar</surname> <given-names>P.</given-names></name> <name><surname>Aarrestad</surname> <given-names>T.</given-names></name> <name><surname>Chernyavskaya</surname> <given-names>N.</given-names></name> <name><surname>Pierini</surname> <given-names>M.</given-names></name> <name><surname>Wozniak</surname> <given-names>K. A.</given-names></name> <name><surname>Ngadiuba</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>Improving variational autoencoders for new physics detection at the LHC with normalizing flows</article-title>. <source>Front. Big Data</source> <volume>5</volume>:<fpage>803685</fpage>. <pub-id pub-id-type="doi">10.3389/fdata.2022.803685</pub-id><pub-id pub-id-type="pmid">35295683</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kasieczka</surname> <given-names>G.</given-names></name></person-group> (<year>2021</year>). <article-title>The LHC olympics 2020 a community challenge for anomaly detection in high energy physics</article-title>. <source>Rept. Prog. Phys</source>. 84, 124201. <pub-id pub-id-type="doi">10.1088/1361-6633/ac36b9</pub-id><pub-id pub-id-type="pmid">34736231</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kasieczka</surname> <given-names>G.</given-names></name> <name><surname>Nachman</surname> <given-names>B.</given-names></name> <name><surname>Shih</surname> <given-names>D.</given-names></name></person-group> (<year>2019</year>). <source>R&#x00026;D Dataset for LHC Olympics 2020 Anomaly Detection Challenge</source> (<publisher-loc>Zenodo</publisher-loc>). <pub-id pub-id-type="doi">10.5281/zenodo.4536377</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Kingma</surname> <given-names>D. P.</given-names></name> <name><surname>Ba</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). <source>Adam: A Method for Stochastic Optimization</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/1412.6980">https://arxiv.org/abs/1412.6980</ext-link> (accessed March 15, 2022).</citation>
</ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kobyzev</surname> <given-names>I.</given-names></name> <name><surname>Prince</surname> <given-names>S. J.</given-names></name> <name><surname>Brubaker</surname> <given-names>M. A.</given-names></name></person-group> (<year>2021</year>). <article-title>Normalizing flows: an introduction and review of current methods</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell</source>. <volume>43</volume>, <fpage>3964</fpage>&#x02013;<lpage>3979</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2020.2992934</pub-id><pub-id pub-id-type="pmid">32396070</pub-id></citation></ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Letizia</surname> <given-names>M.</given-names></name> <name><surname>Losapio</surname> <given-names>G.</given-names></name> <name><surname>Rando</surname> <given-names>M.</given-names></name> <name><surname>Grosso</surname> <given-names>G.</given-names></name> <name><surname>Wulzer</surname> <given-names>A.</given-names></name> <name><surname>Pierini</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>Learning new physics efficiently with nonparametric methods</article-title>. <source>arXiv</source>. 82, 879. <pub-id pub-id-type="doi">10.1140/epjc/s10052-022-10830-y</pub-id><pub-id pub-id-type="pmid">36212113</pub-id></citation></ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><collab>LHCb Collaboration</collab></person-group> (<year>2008</year>). <article-title>The LHCb Detector at the LHC</article-title>. <source>J. Instrum</source>. <volume>3</volume>:<fpage>S08005</fpage>. <pub-id pub-id-type="doi">10.1088/1748-0221/3/08/S08005</pub-id></citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><collab>LHCb Collaboration</collab></person-group> (<year>2020</year>). <article-title>Observation of structure in the <italic>J</italic>/&#x003C8; -pair mass spectrum</article-title>. <source>Sci. Bull</source>. <volume>65</volume>, <fpage>1983</fpage>&#x02013;<lpage>1993</lpage>. <pub-id pub-id-type="doi">10.1016/j.scib.2020.08.032</pub-id><pub-id pub-id-type="pmid">36659056</pub-id></citation></ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><collab>LHCb Collaboration</collab></person-group> (<year>2021</year>). <source>Observation of an Exotic Narrow Doubly Charmed Tetraquark</source>. CERN-EP-2021-165, LHCb-PAPER-2021-031. LHCb Collaboration.</citation>
</ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><collab>LHCb Collaboration</collab></person-group> (<year>2022</year>). <source>Observation of the Doubly Charmed Baryon Decay</source> it<inline-formula><mml:math id="M27"><mml:msubsup><mml:mrow><mml:mi>&#x0039E;</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msubsup></mml:math></inline-formula> to <inline-formula><mml:math id="M28"><mml:msubsup><mml:mrow><mml:mi>&#x0039E;</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:msup><mml:mrow></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msubsup><mml:msup><mml:mrow><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002B;</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula>. LHCb-PAPER-2021-052, CERN-EP-2022-016. LHCb Collaboration.</citation>
</ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Metodiev</surname> <given-names>E. M.</given-names></name> <name><surname>Nachman</surname> <given-names>B.</given-names></name> <name><surname>Thaler</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). <article-title>Classification without labels: learning from mixed samples in high energy physics</article-title>. <source>J. High Energy Phys</source>. 10, 174. <pub-id pub-id-type="doi">10.1007/JHEP10(2017)174</pub-id></citation>
</ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nachman</surname> <given-names>B.</given-names></name> <name><surname>Shih</surname> <given-names>D.</given-names></name></person-group> (<year>2020</year>). <article-title>Anomaly detection with density estimation</article-title>. <source>Phys. Rev. D</source> <volume>101</volume>:<fpage>075042</fpage>. <pub-id pub-id-type="doi">10.1103/PhysRevD.101.075042</pub-id></citation>
</ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paszke</surname> <given-names>A.</given-names></name> <name><surname>Gross</surname> <given-names>S.</given-names></name> <name><surname>Massa</surname> <given-names>F.</given-names></name> <name><surname>Lerer</surname> <given-names>A.</given-names></name> <name><surname>Bradbury</surname> <given-names>J.</given-names></name> <name><surname>Chanan</surname> <given-names>G.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>&#x0201C;Pytorch: an imperative style, high-performance deep learning library,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems, Vol. 32</source> (<publisher-loc>Curran Associates, Inc.</publisher-loc>), <fpage>8024</fpage>&#x02013;<lpage>8035</lpage>.</citation>
</ref>
<ref id="B48">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Rezende</surname> <given-names>D. J.</given-names></name> <name><surname>Mohamed</surname> <given-names>S.</given-names></name></person-group> (<year>2016</year>). <source>Variational Inference With Normalizing Flows</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/1505.05770">https://arxiv.org/abs/1505.05770</ext-link> (accessed March 15, 2022).<pub-id pub-id-type="pmid">32200210</pub-id></citation></ref>
<ref id="B49">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Roy</surname> <given-names>T. S.</given-names></name> <name><surname>Vijay</surname> <given-names>A. H.</given-names></name></person-group> (<year>2019</year>). <source>A robust anomaly finder based on autoencoders. <italic>arXiv</italic></source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/1903.02032">https://arxiv.org/abs/1903.02032</ext-link> (accessed March 14, 2022).</citation>
</ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rubner</surname> <given-names>Y.</given-names></name> <name><surname>Tomasi</surname> <given-names>C.</given-names></name> <name><surname>Guibas</surname> <given-names>L. J.</given-names></name></person-group> (<year>2000</year>). <article-title>The earth mover&#x00027;s distance as a metric for image retrieval</article-title>. <source>Int. J. Comput. Vis</source>. <volume>40</volume>, <fpage>99</fpage>&#x02013;<lpage>121</lpage>. <pub-id pub-id-type="doi">10.1023/A:1026543900054</pub-id></citation>
</ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Simone</surname> <given-names>A. D.</given-names></name> <name><surname>Jacques</surname> <given-names>T.</given-names></name></person-group> (<year>2019</year>). <article-title>Guiding new physics searches with unsupervised learning</article-title>. <source>Eur. Phys. J. C</source> <volume>79</volume>:<fpage>289</fpage>. <pub-id pub-id-type="doi">10.1140/epjc/s10052-019-6787-3</pub-id></citation>
</ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sj&#x000F6;strand</surname> <given-names>T.</given-names></name> <name><surname>Mrenna</surname> <given-names>S.</given-names></name> <name><surname>Skands</surname> <given-names>P. Z.</given-names></name></person-group> (<year>2008</year>). <article-title>A brief introduction to PYTHIA 8.1</article-title>. <source>Comput. Phys. Commun</source>. <volume>178</volume>:<fpage>852</fpage>. <pub-id pub-id-type="doi">10.1016/j.cpc.2008.01.036</pub-id></citation>
</ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Thaler</surname> <given-names>J.</given-names></name> <name><surname>Van Tilburg</surname> <given-names>K.</given-names></name></person-group> (<year>2011</year>). <article-title>Identifying boosted objects with n-subjettiness</article-title>. <source>J. High Energy Phys</source>. <volume>2011</volume>:<fpage>15</fpage>. <pub-id pub-id-type="doi">10.1007/JHEP03(2011)015</pub-id></citation>
</ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Villani</surname> <given-names>C.</given-names></name></person-group> (<year>2009</year>). <source>Optimal Transport: Old and New, Vol. 338</source>. Berlin; Heidelberg: Springer, <fpage>93</fpage>&#x02013;<lpage>111</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-540-71050-9</pub-id></citation>
</ref>
</ref-list> 
</back>
</article>