<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Big Data</journal-id>
<journal-title>Frontiers in Big Data</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Big Data</abbrev-journal-title>
<issn pub-type="epub">2624-909X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">659986</article-id>
<article-id pub-id-type="doi">10.3389/fdata.2021.659986</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Big Data</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>CutLang v2: Advances in a Runtime-Interpreted Analysis Description Language for HEP Data</article-title>
<alt-title alt-title-type="left-running-head">Unel et&#x20;al.</alt-title>
<alt-title alt-title-type="right-running-head">CutLang V2</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Unel</surname>
<given-names>G.</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1215490/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Sekmen</surname>
<given-names>S.</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/932232/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Toon</surname>
<given-names>A. M.</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Gokturk</surname>
<given-names>B.</given-names>
</name>
<xref ref-type="aff" rid="aff4">
<sup>4</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1268409/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Orgen</surname>
<given-names>B.</given-names>
</name>
<xref ref-type="aff" rid="aff4">
<sup>4</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1290343/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Paul</surname>
<given-names>A.</given-names>
</name>
<xref ref-type="aff" rid="aff5">
<sup>5</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1216578/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Ravel</surname>
<given-names>N.</given-names>
</name>
<xref ref-type="aff" rid="aff6">
<sup>6</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1218548/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Setpal</surname>
<given-names>J.</given-names>
</name>
<xref ref-type="aff" rid="aff7">
<sup>7</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1215673/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<label>
<sup>1</sup>
</label>Department of Physics and Astronomy, University of California, Irvine, <addr-line>Irvine</addr-line>, <addr-line>CA</addr-line>, <country>United&#x20;States</country>
</aff>
<aff id="aff2">
<label>
<sup>2</sup>
</label>Department of Physics, Kyungpook National University, <addr-line>Daegu</addr-line>, <country>South Korea</country>
</aff>
<aff id="aff3">
<label>
<sup>3</sup>
</label>Department of Computer Software Engineering, Saint Joseph University of Beirut, <addr-line>Beirut</addr-line>, <country>Lebanon</country>
</aff>
<aff id="aff4">
<label>
<sup>4</sup>
</label>Department of Physics, Bogazici University, <addr-line>Istanbul</addr-line>, <country>Turkey</country>
</aff>
<aff id="aff5">
<label>
<sup>5</sup>
</label>The Abdus Salam International Centre for Theoretical Physics, <addr-line>Trieste</addr-line>, <country>Italy</country>
</aff>
<aff id="aff6">
<label>
<sup>6</sup>
</label>Department of Physics, University of Ankatso, <addr-line>Antananarivo</addr-line>, <country>Madagascar</country>
</aff>
<aff id="aff7">
<label>
<sup>7</sup>
</label>R.N. Podar School, <addr-line>Mumbai</addr-line>, <country>India</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/689617/overview">Bo Jayatilaka</ext-link>, Fermilab (DOE), United&#x20;States</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/696037/overview">Stefano Belforte</ext-link>, National Institute of Nuclear Physics of Trieste, Italy</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/691851/overview">Lindsey Andrew Gray</ext-link>, Fermilab (DOE), United&#x20;States</p>
</fn>
<corresp id="c001">
<italic>&#x2a;Correspondence:</italic> G. Unel, <email>gokhan.unel@cern.ch</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Big Data and AI in High Energy Physics, a section of the journal Frontiers in Big&#x20;Data</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>07</day>
<month>06</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>4</volume>
<elocation-id>659986</elocation-id>
<history>
<date date-type="received">
<day>28</day>
<month>01</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>03</day>
<month>05</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Unel, Sekmen, Toon, Gokturk, Orgen, Paul, Ravel and Setpal.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Unel, Sekmen, Toon, Gokturk, Orgen, Paul, Ravel and Setpal</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>We will present the latest developments in CutLang, the runtime interpreter of a recently-developed analysis description language (ADL) for collider data analysis. ADL is a domain-specific, declarative language that describes the contents of an analysis in a standard and unambiguous way, independent of any computing framework. In ADL, analyses are written in human-readable plain text files, separating object, variable and event selection definitions in blocks, with a syntax that includes mathematical and logical operations, comparison and optimisation operators, reducers, four-vector algebra and commonly used functions. Adopting ADLs would bring numerous benefits to the LHC experimental and phenomenological communities, ranging from analysis preservation beyond the lifetimes of experiments or analysis software to facilitating the abstraction, design, visualization, validation, combination, reproduction, interpretation and overall communication of the analysis contents. Since their initial release, ADL and CutLang have been used for implementing and running numerous LHC analyses. In this process, the original syntax from CutLang v1 has been modified for better ADL compatibility, and the interpreter has been adapted to work with that syntax, resulting in the current release v2. Furthermore, CutLang has been enhanced to handle object combinatorics, to include tables and weights, to save events at any analysis stage, to benefit from multi-core/multi-CPU hardware among other improvements. In this contribution, these and other enhancements are discussed in details. In addition, real life examples from LHC analyses are presented together with a user manual.</p>
</abstract>
<kwd-group>
<kwd>LHC</kwd>
<kwd>collider</kwd>
<kwd>run time analysis</kwd>
<kwd>analysis description language</kwd>
<kwd>CutLang</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction: Domain Specific Languages for High Energy Physics Analysis</title>
<p>High energy physics (HEP) collider data analyses nowadays are performed using complex software frameworks that integrate a diverse set of operations from data access to event selection, from histogramming to statistical analysis. Mastering these frameworks requires a high level knowledge of general purpose languages and software architecture. Such requirements erect a barrier between data and the physicist who may simply wish to try an analysis idea. Moreover, even for experienced physicists, obtaining a complete view of an analysis is difficult because the physics content (e.g., object definitions, event selections, background estimation methods, etc.) is often scattered throughout the different components of the framework. This makes developing, understanding, communicating and interpreting analyses very challenging. At the LHC, almost all analysis teams have their own frameworks. There are also frameworks like CheckMate (<xref ref-type="bibr" rid="B25">Drees et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B30">Kim et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B45">Tattersall et&#x20;al., 2016</xref>) and MadAnalysis (<xref ref-type="bibr" rid="B23">Conte et&#x20;al., 2013</xref>; <xref ref-type="bibr" rid="B22">Conte and Fuks, 2014</xref>) for phenomenology studies, and Rivet (<xref ref-type="bibr" rid="B48">Waugh et&#x20;al., 2006</xref>; <xref ref-type="bibr" rid="B21">Buckley et&#x20;al., 2013</xref>) focused on preserving LHC analyses with unfolded results for comparison with Monte Carlo event generator predictions. Yet, working with multiple frameworks is an extra challenge, since each framework has a different way of implementing the physics content.</p>
<p>It is therefore crucial to invest time in alternative approaches aiming towards the rather elusive point of easy to learn, expressive, extensible, and effective analysis ecosystem that would allow to shift the focus away from programming technicalities to physics analysis design. One way to achieve this is via a well-constructed set of libraries in a GPL supplemented with a well-designed interfaces that intrinsically imply a standard and user-friendly analysis structure. A most promising example in this area is the Scientific Python ecosystem SciPy<xref ref-type="fn" rid="FN1">
<sup>1</sup>
</xref> which brings together a popular GPL and a rich collection of already existing bricks of classic numerical methods, plotting and data processing tools. Frameworks can be built based on the SciPy ecosystem for effective analysis, such as Coffea framework (<xref ref-type="bibr" rid="B27">Gray et&#x20;al., 2020</xref>) that provides a user interface for columnar analysis of HEP&#x20;data.</p>
<p>The approach that we propose in this paper to address these difficulties is the consideration of a domain specific language (DSL) capable of describing the analysis flow in a standard and unambiguous way. A DSL could be based on a completely original syntax, or it could be based on the syntax of a general purpose language, such as Python. The important aspect would be to provide a unique and organized way of expressing the analysis content. Applying the DSL concept to HEP analysis was first thoroughly explored as a community initiative by a group of experimentalists and phenomenologists in the 2015 Les Houches PhysTeV workshop led to the initial design of LHADA (Les Houches Analysis Description Accord), to systematically document and run the contents of LHC physics analyses (<xref ref-type="bibr" rid="B16">Brooijmans et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B17">Brooijmans et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B18">Brooijmans et&#x20;al., 2020</xref>). At the same time, some of the LHADA designers were already developing CutLang (<xref ref-type="bibr" rid="B42">Sekmen and Unel, 2018</xref>; <xref ref-type="bibr" rid="B46">Unel et&#x20;al., 2019</xref>), an interpreted language directly executable on events. Being based on the same principles, in 2019, LHADA and CutLang were merged by combining the best ideas from both into a unified DSL called &#x201c;Analysis Description Language (ADL)&#x201d; (<xref ref-type="bibr" rid="B38">Prosper et&#x20;al., 2020</xref>), which is described in this&#x20;paper.</p>
<p>While the prototyping of LHADA, CutLang and ADL was in progress, parallel efforts arose in the LHC community with the aim to improve and systematize analysis development infrastructures. One approach views each event as a database that can be queried using a language inspired by SQL, and has been prototyped in LINQtoROOT (<xref ref-type="bibr" rid="B26">Gordon, 2010</xref>) and FemtoCode (<xref ref-type="bibr" rid="B36">Pivarski, 2006a</xref>). The SQL-like model is being further explored in hep_tables and dataframe_expressions (<xref ref-type="bibr" rid="B47">Watts, 2020</xref>) that work together to allow easy columnar-like access to hierarchical data, and in the recent experimental language PartiQL (<xref ref-type="bibr" rid="B37">Pivarski, 2006b</xref>) designed to inject new ideas into DSL development and its extension AwkwardQL (<xref ref-type="bibr" rid="B28">Gray, 2020</xref>), designed to perform set operations on data expressed as awkward arrays. Another study explored building a DSL embedded within YAML to describe and manage analysis content such as definitions, event selection, histogramming as well as perform data processing. The YAML-based language was integrated into the generic Python framework F.A.S.T. (<xref ref-type="bibr" rid="B32">Krikler, 2020</xref>).</p>
<p>The focused DSL developments for analyses are relatively new, but a DSL has been long embedded within the ROOT framework (<xref ref-type="bibr" rid="B20">Brun and Rademakers, 1997</xref>) under the guise of TTreeFormula, TTree::Draw and TTree::Scan, which allow visual or textual representation of TTree contents for simple and quick exploratory analysis This DSL is however limited only to simple arithmetic operations, mathematical functions and basic selection criteria. Recently, ROOT developers introduced RDataFrame, a tool to process and analyze columnar datasets as a modern alternative for data analysis (<xref ref-type="bibr" rid="B35">Piparo et&#x20;al., 2019</xref>). Although RDataFrame is not a DSL itself, it implements declarative analysis by using keywords for transformations (e.g., filtering data, defining new variables) and actions (e.g., creating histograms), and is interfaced to the ROOT classes TTreeReader and TTreeDraw. RDataFrame recently led to the development of the preliminary version of another DSL and its interpreter called NAIL (Natural Analysis Implementation Language) (<xref ref-type="bibr" rid="B40">Rizzi, 2020a</xref>). NAIL, written in Python. It takes CMS NanoAOD (<xref ref-type="bibr" rid="B39">Rizzi, 2020b</xref>) as an input event format and generates RDataFrame-based C&#x2b;&#x2b; code, either as a C&#x2b;&#x2b; program or as a C&#x2b;&#x2b; library loadable with&#x20;ROOT.</p>
<p>All these different approaches and developments were discussed among experimentalists, phenomenologists and computer scientists in the first dedicated workshop &#x201c;Analysis Description Languages for the LHC&#x201d; at Fermilab, in May 2019.<xref ref-type="fn" rid="FN2">
<sup>2</sup>
</xref> The workshop resulted in an overall agreement on the potential usefulness of DSLs for HEP analysis, elements of a DSL scope and an inclination to pursue multiple alternatives with the ultimate goal of a common DSL for the LHC that combines the best elements of the different approaches (<xref ref-type="bibr" rid="B43">Sekmen et&#x20;al., 2020</xref>). The activities in DSL development are therefore ongoing with a fast&#x20;pace.</p>
<p>This initial positive feedback has motivated further progress in ADL, which will be described here. ADL is a declarative language that can express the mathematical and logical algorithm of a physics analysis in a human-readable and standalone way, independent of any computing frameworks. Being declarative, ADL expresses the analysis logic without explicitly coding the control flow, and is designed to describe what needs to be done, but not how to do it. This consequently leads to a more tidy and efficient expression and eliminates programming errors. At its current state, ADL is capable of describing many standard operations in LHC analyses. However, it is being continuously improved and generalized to address an even wider range of analysis operations.</p>
<p>ADL is designed as a language that can be executed on data and used in real life data analyses. An analysis written with ADL could be executed by any computing framework that is capable of parsing and interpreting ADL, hence satisfying the framework independence. Currently, two approaches have been studied to realize this purpose. One is the transpiler approach, where ADL is first converted into a general purpose language, which is in turn compiled into code executable on events. A transpiler called adl2tnm converting ADL to C&#x2b;&#x2b; code is currently under development (<xref ref-type="bibr" rid="B17">Brooijmans et&#x20;al., 2018</xref>). Earlier prototype transpilers converting LHADA into code snippets that could be integrated within CheckMate (<xref ref-type="bibr" rid="B25">Drees et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B30">Kim et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B45">Tattersall et&#x20;al., 2016</xref>) and Rivet (<xref ref-type="bibr" rid="B48">Waugh et&#x20;al., 2006</xref>; <xref ref-type="bibr" rid="B21">Buckley et&#x20;al., 2013</xref>) frameworks were also studied. The other approach is that of runtime interpretation. Here ADL is directly executed on events without being intermediately converted into a code requiring compilation. This approach was used for developing CutLang (<xref ref-type="bibr" rid="B42">Sekmen and Unel, 2018</xref>; <xref ref-type="bibr" rid="B46">Unel et&#x20;al., 2019</xref>).</p>
<p>In this paper, we focus on CutLang and present in detail its current state denoted as CutLang v2, which was achieved after many improvements on the early prototype CutLang v1 introduced in (<xref ref-type="bibr" rid="B42">Sekmen and Unel, 2018</xref>). Hereafter, CutLang v2 will be referred to as CutLang for brevity. The main text emphasizes the novelties that led to ADL and improved CutLang. We start with an overview of ADL in <xref ref-type="sec" rid="s2">Section 2</xref>, then proceed with describing technicalities of runtime interpretation with CutLang in <xref ref-type="sec" rid="s3">Section 3</xref>. We next present the ADL file structure and analysis components that can be expressed by ADL, focusing on the new developments and recently added functionalities in <xref ref-type="sec" rid="s4">Section 4</xref>. This is followed by <xref ref-type="sec" rid="s5">Section 5</xref> describing analysis output, again focusing on new additions, <xref ref-type="sec" rid="s6">Section 6</xref>, explaining the newly-added multi-threaded run functionality, <xref ref-type="sec" rid="s7">Section 7</xref> on CutLang code maintenance and recently incorporated continuous integration, <xref ref-type="sec" rid="s8">Section 8</xref> detailing studies on analyses implementation, and conclusions in <xref ref-type="sec" rid="s9">Section 9</xref>. The full description of the current language syntax is given in the form of a user manual in <xref ref-type="sec" rid="s13">Supplementary Appendix A</xref>, followed by a note on the CutLang framework and external user functions in <xref ref-type="sec" rid="s13">Supplementary Appendix&#x20;B</xref>.</p>
</sec>
<sec id="s2">
<title>2 Analysis Description Language Overview: File and Functions</title>
<p>In ADL, the description of the analysis flow is done in a plain, easy-to-read text file, using syntax rules that include standard mathematical and logical operations and 4-vector algebra. In this ADL file, object, variable, event selection definitions are clearly separated into blocks with a keyword value/expression structure, where keywords specify analysis concepts and operations. Syntax includes mathematical and logical operations, comparison and optimization operators, reducers, 4-vector algebra and HEP-specific functions (e.g., <inline-formula id="inf1">
<mml:math id="m1">
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>&#x3d5;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf2">
<mml:math id="m2">
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>R</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>). However, an analysis may contain variables with complex algorithms non-trivial to express with the ADL syntax [e.g., <inline-formula id="inf3">
<mml:math id="m3">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> (<xref ref-type="bibr" rid="B15">Barr et&#x20;al., 2003</xref>), aplanarity] or non-analytic variables (e.g., efficiency tables, machine learning discriminators). Such variables are encapsulated in self-contained, standalone functions which accompany the ADL file. Variables defined by these functions are referred to from within the ADL file. As a generic rule, all keywords, operators and function names are case-insensitive. n The language content, syntax rules, and working examples of self-contained functions will be presented in the coming sections, after a technical introduction of the CutLang interpreter.</p>
</sec>
<sec id="s3">
<title>3 Technical Background of the CutLang Interpreter</title>
<p>An interpreted analysis system makes adding new event selection criteria, changing the execution order or cancelling analysis steps more practical. Therefore CutLang was designed to function as a runtime interpreter and bypass the inherent inefficiency of the modify-compile-run cycle. Avoiding the integration of the analysis description in the framework code also brings the huge advantage of being able to run many alternative analysis ideas in parallel, without having to make any code changes, hence making the analysis design phase more flexible compared to the conventional compiled framework approach.</p>
<p>CutLang runtime interpreter is written in C&#x2b;&#x2b;, around function pointer trees representing different operations such as event selection or histogramming. Therefore processing an event with a cutflow table becomes equivalent to traversing multiple expression trees with arbitrary complexities, such as the one shown in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>. Here physics objects are given as arguments.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>An expression tree example: the program traverses the tree from right to left evaluating the encountered functions from bottom to&#x20;top.</p>
</caption>
<graphic xlink:href="fdata-04-659986-g001.tif"/>
</fig>
<p>Handling of the Lorentz vector operations, pseudo-random number generation, input-output file and histogram manipulations are all based on classes of the ROOT data analysis framework (<xref ref-type="bibr" rid="B20">Brun and Rademakers, 1997</xref>). The actual parsing of the ADL text relies on automatically generated dictionaries and grammar based on traditional Unix tools, namely, Lex and Yacc.<xref ref-type="fn" rid="FN3">
<sup>3</sup>
</xref> The ADL file is split into tokens by Lex, and the hierarchical structure of the algorithm is found by Yacc. Consequently, CutLang can be compiled and operated in any modern Unix-like environment. The interpreter should be compiled only once, during the installation or when optional external functions for complex variables are added. Once the work environment is set up, the remainder is mostly a think-edit-run-observe cycle. The parsing tools also address the issue of possible user mistakes with respect to the syntax. CutLang output clearly indicates the problem, and the line number of the offending syntax error. However the logical inconsistencies, such as imposing a selection on the third jet&#x2019;s momentum while only requesting at least two jets are not yet handled. Ensuring the consistency of the algorithm needs to be done by the user. Input and output to CutLang is via ROOT files. The description of the input files and event formats are given below while the description of the output file and its contents are given in <xref ref-type="sec" rid="s5">Section&#x20;5</xref>.</p>
<sec id="s3-1">
<title>3.1 Event Input</title>
<p>The CutLang framework takes the input event information in the ROOT ntuple format and can work with different input event data types each implemented as a plug-in. Widely used event formats such as ATLAS and CMS open data,<xref ref-type="fn" rid="FN4">
<sup>4</sup>
</xref> CMS NanoAOD (<xref ref-type="bibr" rid="B39">Rizzi, 2020b</xref>), Delphes (<xref ref-type="bibr" rid="B24">de Favereau et&#x20;al., 2014</xref>) and LHCO event are by default recognized and can be directly used. New or custom input event formats can also be easily added via usage of event class headers via a well-defined procedure described in <xref ref-type="sec" rid="s13">Supplementary Appendix B.3</xref>. The potential changes in the existing event formats and addition of new event formats currently need to be adapted manually following the mentioned procedure. CutLang has its own internal event format called LVL0. The contents of the input event formats including all particle types and event properties are worked through an internal abstraction layer and adapted to LVL0, which, in turn connects to the syntax of ADL. The purpose of this approach is to have ADL be independent of the input file format and be capable of running the same ADL analysis with CutLang on any input file. This implies that only a subset of event content is readily recognized via CutLang when expressed within the ADL syntax. However, any event variables or attributes included in the existing event files and formats can be easily added through external user functions. This way, they can be referred to within the ADL files and be recognized by CutLang. The practical details of this procedure can be found in <xref ref-type="sec" rid="s13">Supplementary Appendix&#x20;B.2</xref>.</p>
</sec>
</sec>
<sec id="s4">
<title>4 Description of the Analysis Contents</title>
<p>We will now explain in detail which analysis components and physics algorithms can be described by ADL and processes with CutLang. We will prioritize highlighting the many novelties added and improvements that took place since the original versions CutLang v1 and LHADA. The descriptions here concentrates on the concepts and content that can be expressed and processed by ADL and functionalities of CutLang v2, rather than attempting to give a full layout of syntax rules, which is independently provided in the user manual in <xref ref-type="sec" rid="s13">Supplementary Appendix&#x20;A</xref>.</p>
<sec id="s4-1">
<title>4.1 Analysis Description Language File Structure in CutLang</title>
<p>As a runtime interpreter, CutLang processes events in a well-defined order. It executes the commands in the ADL file from top to bottom. Therefore, the ADL files are required to describe the analysis flow in a certain order. Some dedicated execution commands are also used within the ADL file, in order to facilitate the runtime interpretation. Throughout the ADL file, the mass, energy and momentum are all written in Giga Electron Volt (GEV) and angles in radians. User comments and explanations should be preceded by a hash (&#x23;) sign. To be executable with CutLang, an ADL file would consist of five possible sections described below, out of which, existence of one section is mandatory:<list list-type="simple">
<list-item>
<p>
<bold>initializations:</bold> This section contains commands that are related to analysis initialization and set up, for which, the relevant keywords are summarized in <xref ref-type="table" rid="T1">Table&#x20;1</xref>. The keywords and values are separated by an equal sign. The last two lines in the table refer to the lepton (electron or muon) triggers. Their utilization is described in <xref ref-type="sec" rid="s13">Supplementary Appendix A.2</xref>, it is worth noting at this point that Monte Carlo (MC) simulation weights are not taken into account when the trigger value is set to&#x20;data.</p>
</list-item>
<list-item>
<p>
<bold>countformats:</bold> This section is used for setting up the recording of already existing event counts and errors, e.g., from an experimental paper publication. It is therefore not directly relevant for event processing, but rather for studying the interplay between the results of the current analysis and its published experimental counterpart. More generally, it is used to express any set of pre-existing counts of various signals, backgrounds, and data (together with their error) of an analysis.</p>
</list-item>
<list-item>
<p>
<bold>definitions1:</bold> This section is used for defining aliases for objects and variables, in order to render them more easily referable and readable in the rest of the analysis description. For example, it can introduce shortcuts like Zhreco for a hadronically reconstructed Z boson, or values like mH, i.e.,&#x20;mass of a reconstructed Higgs boson. These definitions can only be based on the predefined keywords and objects.</p>
</list-item>
<list-item>
<p>
<bold>objects:</bold> This section can be used to define new objects based on predefined physics objects and shorthand notations declared in definitions1.</p>
</list-item>
<list-item>
<p>
<bold>definitions2:</bold> This section is allocated for further alias or shorthand definitions. Definitions here can be based on objects in the previous section and predefined particles.</p>
</list-item>
<list-item>
<p>
<bold>event categorization:</bold> This section is used for defining event selection regions and criteria in each region. Running with CutLang requires having at least one selection region with at least one command, which may include either a selection criterion or a special instruction to include MC weight factors or to fill histograms.</p>
</list-item>
</list>
</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Initialization keywords and their possible values.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Keyword</th>
<th align="center">Explanation</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">SkipHistos</td>
<td align="left">Skip (&#x3d;1) or Display (&#x3d;0) the histograms in final efficiency table</td>
</tr>
<tr>
<td align="left">SkipEffs</td>
<td align="left">Skip (&#x3d;1) or Display (&#x3d;0) the final efficiency table</td>
</tr>
<tr>
<td align="left">TRGm</td>
<td align="left">0&#x3d;Off, 1&#x3d;Data, 2&#x3d;MC for muons</td>
</tr>
<tr>
<td align="left">TRGe</td>
<td align="left">0&#x3d;Off, 1&#x3d;Data, 2&#x3d;MC for electrons</td>
</tr>
<tr>
<td align="left">RandomSeed</td>
<td align="left">random number generator seed, an integer</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>We next describe the detailed contents and usage of these sections.</p>
</sec>
<sec id="s4-2">
<title>4.2 Object Definitions</title>
<p>Generally, the starting point in an analysis algorithm is defining and selecting the collections of objects, such as jets, <italic>b</italic> jets, electrons, muons, taus, missing transverse energy, etc. that will be used in the next steps of the analysis. Usually, the input events contain very generic and loose collections for objects, which need to be further refined for analysis needs. CutLang is capable of performing a large variety of operations on objects, including deriving new objects via selection, combining objects to reconstruct new objects, accessing the daughters and constituents of objects. Once an object is defined, it is also possible to find objects with a minimum and maximum of a given attribute within the object&#x2019;s collection, or sort the collection according to an attribute.</p>
<p>In the ADL notation, object collection definitions are clearly separated from the other analysis tasks. Here the term object is used interchangeably with object collection. Each object is defined within an individual object block uniquely identified with the object&#x2019;s name. These blocks, starting with the input object collection(s)&#x2019;s name(s), list different types of operations afterwards.</p>
<p>CutLang automatically retrieves all standard object collections from the input event file without the need for any explicit user statements within the ADL file. It can read events with different formats, such as Delphes fast simulation (<xref ref-type="bibr" rid="B24">de Favereau et&#x20;al., 2014</xref>) output, CMS NanoAOD (<xref ref-type="bibr" rid="B39">Rizzi, 2020b</xref>), ATLAS or CMS open data<sup>4</sup> ntuples and recognize the object collections in these. One property unique to CutLang is that it is designed to map input object collections to common, standard object names with a standard set of attributes, as described in <xref ref-type="sec" rid="s13">Supplementary Appendices A.2 and A.3</xref>. For example, AK4jets collection in CMSNanoAOD and JET collection in Delphes are both mapped to Jet. This approach allows to process the same ADL file on different input event formats, and has proven very useful in several simple practical applications. However, we also recognize that this approach only works when different input collections have matching properties, e.g., when Delphes electrons and CMS electrons have to the same identification criteria which can be mapped to the same identification attribute, or a Delphes jet and an ATLAS jet use the same b-tagging algorithm that can be mapped to the same b-tagging attribute. Therefore, other interpreters of ADL may choose to use input collection and attribute names as they are, in order to be more unambiguous. Allowing to practice different approaches with advantages for different use cases, while still adhering to the principle of clarity is a significant aspect of&#x20;ADL.</p>
<p>The most common object operation is to take the input object collection and filter a subset by applying a set of selection criteria on object attributes. This can be done very straightforwardly in ADL by listing each selection criterion in consecutive lines. The objects in the input collection satisfying the criteria can be either selected or rejected using the select or reject keywords. Comparison operators such as <inline-formula id="inf4">
<mml:math id="m4">
<mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mo>!</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mo>&#x3e;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mo>&#x3c;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mo>&#x3e;</mml:mo>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mo>&#x3c;</mml:mo>
<mml:mo>&#x3d;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> can directly be used for expressing the criteria. Logical operators AND, OR and NOT can be used for expressing composite or reverted criteria. A complete list of these operators can be found in <xref ref-type="sec" rid="s13">Supplementary Appendix&#x20;A.5</xref>.</p>
<p>It is also possible to filter an object collection based on other object collections, such as in the cases of object cleaning or matching. For example, one can reject jets overlapping with photons, or select boosted W jets matching generator level W bosons. Such operations involve intrinsic loops, which are readily handled by CutLang. Functions such as <inline-formula id="inf5">
<mml:math id="m5">
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
<mml:mi>&#x3d5;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> or angular distance <inline-formula id="inf6">
<mml:math id="m6">
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
<mml:mi>R</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> can be readily used when comparing objects. Given an initial object collection, one can consecutively derive several objects. For example jets can be filtered to obtain cleanjets, while cleanjets can be further filtered to obtain verycleanjets. One can also use the same initial collection to define different collections such as taking muons and imposing different criteria to obtain loosemuons and tightmuons.</p>
<p>Another very common operation is to combine objects to reconstruct new objects, such as combining two leptons to form a <italic>Z</italic> boson. Sometimes, the reconstruction could be very straightforward, as in requesting to reconstruct only a single <italic>Z</italic> boson per each event. However, in other cases, one might have to reconstruct as many <italic>Z</italic> bosons as possible. In each case, reconstructed candidates might undergo filtering or selection of a single most optimal candidate among all candidates. Combination operations are very diverse, and finding a completely generic expression for them is non-trivial. In its v1, CutLang could reconstruct an explicitly defined number of objects per event. It could find the object satisfying given criteria by performing optimization operations. In v2, CutLang has been generalized to reconstruct any number of objects, by taking into account the combinatorics. Selection criteria can also be imposed on both the input and reconstructed objects. Technical information on how to perform combinations is provided in <xref ref-type="sec" rid="s13">Supplementary Appendix&#x20;A.9.3</xref>.</p>
<p>Another common situation is when objects in a collection are individually associated to other collections. Examples include mothers or daughters of generator level particles, subjets or constituents of jets, associated tracks of leptons or jets. As a first CutLang was adapted in v2 to work with jet constituents using the syntax described in <xref ref-type="sec" rid="s13">Supplementary Appendix A.9.7</xref>. Another example of association is daughters of generator truth level particles. If an analysis if performed directly on generator level particles, or if a study is required on truth level particles, information such as PDGID codes or decay chain become relevant. CutLang is now capable of accessing PDGID and the decay products of a particle (referred as &#x201c;daughters&#x201d; in HEP), with the syntax described in <xref ref-type="sec" rid="s13">Supplementary Appendices A.3.1 and A.9.8</xref>. CutLang provides both the number of daughters and a modifier to refer to the daughters. Work is in progress for finding a generalizable syntax for object association expressions.</p>
<p>Members of object collections can be directly accessed via their indices. Being declarative, ADL syntax does not include explicit statements for looping over object collections, and CutLang is capable of interpreting this implicit looping. For example, when&#x20;filtering a jet collection, one might apply a cleaning criterion which requires no electron to be in the proximity of the jet defined by a radius. Applying this criterion requires looping over electrons, however it suffices to write the electron object&#x2019;s name in order for CutLang to interpret implicit looping based&#x20;on the context. In other cases, it might be necessary to access only a subset of the collection, such as when imposing a selection on the <inline-formula id="inf7">
<mml:math id="m7">
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
<mml:mi>&#x3d5;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> between first 3 jets with highest <inline-formula id="inf8">
<mml:math id="m8">
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and the&#x20;missing transverse momentum. ADL and CutLang were&#x20;updated to allow such operations. The Python slice notation has been adapted for expressing subset ranges in&#x20;object collections, as described in <xref ref-type="sec" rid="s13">Supplementary Appendix&#x20;A.9.4</xref>.</p>
<p>Input or defined object collections are by default sorted by CutLang in the order of decreasing transverse momentum <inline-formula id="inf9">
<mml:math id="m9">
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. ADL can express sorting object collections according to any feature, in ascending or descending order, and CutLang is capable of performing such sorting operations. Moreover, so-called &#x201c;reducers&#x201d; can be applied for extracting values from existing object collections. One case is the capability to extract the maximum or minimum value of a given attribute in an object collection. For example, CutLang can give the maximum <inline-formula id="inf10">
<mml:math id="m10">
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> possessed by a jet in a jet collection, or minimum value of isolation possessed by an electron in an electron collection. Another case is the summation operation, where one can sum over the values of a given attribute over the whole collection. The most common use case here is the summation of object <inline-formula id="inf11">
<mml:math id="m11">
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>s to obtain event variables such as the hadronic transverse energy <inline-formula id="inf12">
<mml:math id="m12">
<mml:mrow>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. Sorting and reducers are recent additions to ADL and CutLang and the details on their implementation and usage are given in <xref ref-type="sec" rid="s13">Supplementary Appendices A.9.2, A.9.5, A.9.6</xref> and in the examples referred to in <xref ref-type="sec" rid="s8">Section&#x20;8</xref>.</p>
</sec>
<sec id="s4-3">
<title>4.3 Object or Event Variables</title>
<p>An object variable is a quantity defined once per object, such as a jet&#x2019;s transverse momentum <inline-formula id="inf13">
<mml:math id="m13">
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> or an electron&#x2019;s relative isolation. An event variable is a quantity defined once per event, such as missing transverse energy <inline-formula id="inf14">
<mml:math id="m14">
<mml:mrow>
<mml:msubsup>
<mml:mi>E</mml:mi>
<mml:mi>T</mml:mi>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, number of electrons selected using the tight criteria, <inline-formula id="inf15">
<mml:math id="m15">
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> of the highest <inline-formula id="inf16">
<mml:math id="m16">
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> jet, transverse mass calculated using the highest <inline-formula id="inf17">
<mml:math id="m17">
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> lepton and <inline-formula id="inf18">
<mml:math id="m18">
<mml:mrow>
<mml:msubsup>
<mml:mi>E</mml:mi>
<mml:mi>T</mml:mi>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>. Object and event variables used in object definitions or event categorization in an analysis are not always fully provided in the input event data. These quantities therefore need to be computed during the analysis using the existing inputs. ADL is designed to allow&#x20;definition of such new variables in two ways. Simple variables that could be described analytically using a single line&#x20;formula&#x20;can be expressed within the ADL file using mathematical operations. A classic example would be that of the definition of transverse mass obtained from a visible object, such as a lepton, and the missing transverse energy. To enable writing these simple formulas, CutLang is capable of parsing and processing operators such as <inline-formula id="inf19">
<mml:math id="m19">
<mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mo>&#x2212;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mtext>&#x2a;</mml:mtext>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mo>/</mml:mo>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>&#x0024;</mml:mi>
<mml:mo stretchy="true">&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mtext>&#x2009;</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula>. CutLang has also incorporated a series of internal functions to express other operations such as abs(), sqrt(), sin(), cos(), tan() and log(). Reducer operators used for reducing collections to a single value, e.g., size(), sum(), min(), max() are also available for computing quantities. For example, the hadronic transverse momentum <inline-formula id="inf20">
<mml:math id="m20">
<mml:mrow>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> can be computed from all jets in an event using the sum() reducer as sum(pT(jets)).</p>
<p>However, in many cases, variables are defined by complex algorithms non-trivial to express. Examples such as angular separation <inline-formula id="inf21">
<mml:math id="m21">
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>R</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, aplanarity, stransverse mass <inline-formula id="inf22">
<mml:math id="m22">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> (<xref ref-type="bibr" rid="B15">Barr et&#x20;al., 2003</xref>), razor variables (<xref ref-type="bibr" rid="B41">Rogan, 2010</xref>), etc. either cannot be easily written using the available operators or require multiple steps of calculation. Some of these algorithms, like angular separation and razor variables were predefined as internal functions in CutLang, and more, like <inline-formula id="inf23">
<mml:math id="m23">
<mml:mrow>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf24">
<mml:math id="m24">
<mml:mrow>
<mml:msub>
<mml:mi>M</mml:mi>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> were added recently. A list of existing variables can be found in <xref ref-type="sec" rid="s13">Supplementary Appendix A.3</xref>. Other algorithms can be easily incorporated by the user following the recently generalized recipe in <xref ref-type="sec" rid="s13">Supplementary Appendix B</xref>. Another class of sophisticated variables include quantities defined from numerical functions, such as object or trigger efficiencies used to compute object or event weights, provided in tables or histograms, or discriminators/efficiencies computed via machine learning models. All these variables are incorporated by being defined in independent, self-encapsulated functions outside the ADL file and referring to them within the ADL file. These external user functions should be seen as a natural extension of the language. The ultimate aim is to provide these functions in a well-defined and straightforwardly extendable database.</p>
<p>The expressions for variables, whether they are built directly using the available mathematical operators or indirectly via internal or user functions, can be written openly in the place of usage, e.g., in the line when a selection is applied on the variable. Alternatively, if the variable is used multiple times in an analysis, e.g., in different selection regions, it can be defined once, using the define keyword, which allows to assign an alias name to the variable. Currently, defining aliases using the define keyword is only possible for event variables in CutLang, but not for object variables. In CutLang, the define expressions are uniquely placed at the end of the object blocks and before the beginning of the event selection.</p>
</sec>
<sec id="s4-4">
<title>4.4 Event Categorization</title>
<p>In a typical collider analysis, events are categorized based on different sets of selection criteria applied on event variables into a multitude of signal regions enhancing the presence of the signal of interest, or control or validation regions used for estimating backgrounds. These regions can be derived from each other, and can be correlated or uncorrelated depending on the case. ADL organizes event categorization by defining each selection region in an independent region block<xref ref-type="fn" rid="FN5">
<sup>5</sup>
</xref> and labels each region with a unique name. The region blocks mainly consist of a list of selection criteria. As in the case for objects, each criterion is stated in a line starting with a select or a reject keyword, which allows to select or reject the events satisfying the criterion, respectively. Comparison operators, logical operators and ternary operator, syntax for which is described in <xref ref-type="sec" rid="s13">Supplementary Appendix A.5</xref> are used for expressing the criteria. Another operation that can be performed within the context of event classification is <inline-formula id="inf25">
<mml:math id="m25">
<mml:mrow>
<mml:msup>
<mml:mi>&#x3c7;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> optimization for reconstructed quantities, whose syntax is described in <xref ref-type="sec" rid="s13">Supplementary Appendix A.6</xref>. An example would be finding among several top quark candidates, the candidate with mass closest to the top quark mass, and using the optimal candidate&#x2019;s properties for further selection.</p>
<p>ADL and CutLang allow deriving selection regions from each other, e.g., deriving multiple signal regions from a baseline selection region. This is done by simply referring to the baseline region by name in the new region&#x2019;s block, and not repeating the whole selection every&#x20;time.</p>
<p>In many analyses, especially those targeting searches for new physics, events in given search regions are partitioned into many bins based on one or more variables, e.g., <inline-formula id="inf26">
<mml:math id="m26">
<mml:mrow>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf27">
<mml:math id="m27">
<mml:mrow>
<mml:msubsup>
<mml:mi>E</mml:mi>
<mml:mi>T</mml:mi>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> or some invariant mass. Data counts and background estimates in these bins constitute the result of the analysis. With the increased data, recent LHC analyses, especially inclusive searches for new physics may contain hundreds of bins. Treating each bin as an independent search region and writing a separate block for each would be highly impractical. As an alternative, recently, the capability of binning the events in a given region was added to ADL and CutLang through the bin keyword. Bins in a region, by definition, are to be non-overlapping. The CutLang interpreter and framework operate based on this principle, and skip an event once it is classified into a bin. This property distinguishes bins from regions, as different regions can be overlapping, and a given event is evaluated for all regions, independent of whether it is selected or not by the preceding regions. Bins can be described in two ways: when the binning is done using only a single variable, all bins can be defined in a single line, by specifying the variable name and the bin intervals. When bins are defined based on multiple variables, this way of description can become ambiguous, and a more explicit description, where each bin is defined in one dedicated line can be used. The usage and syntax of the bin keyword is described in <xref ref-type="sec" rid="s13">Supplementary Appendix A.11.1</xref>. In case multiple regions would have the same binning (e.g., a signal region and several control regions from which the background is estimated), currently, the binning definitions must be separately specified in each region independently. We are searching for a more practical way of expression which would avoid the repetition, while keeping with the human readability principle.</p>
</sec>
<sec id="s4-5">
<title>4.5 Event Weights</title>
<p>In an analysis, events, especially simulated events are usually weighted in order to match the real data luminosity or to correct for detector effects. CutLang has been recently adapted to incorporate the capability of applying event weights. Event weights can be applied within the region blocks via usage of the weight keyword as described in <xref ref-type="sec" rid="s13">Supplementary Appendix A.10.2</xref>. A particular event selected by two different regions can receive different weights. Event weights can be either constant numbers or functions of variables. These functions may include analytical or numerical internal or user functions. Weights based on numerical functions, such as efficiencies (e.g., trigger efficiencies) can also be applied from tables written within the ADL file, as described in <xref ref-type="sec" rid="s13">Supplementary Appendix A.8</xref>. The systematic way for expressing efficiencies in tables and applying them to objects and events was incorporated recently in ADL and CutLang.</p>
</sec>
<sec id="s4-6">
<title>4.6 Applying Efficiencies to Objects and Events Using the Hit-and-Miss Method</title>
<p>As mentioned above, applying efficiencies to events and objects, such as trigger efficiencies or object reconstruction, identification and isolation efficiencies is a common part of many analyses. <xref ref-type="sec" rid="s4-5">Section 4.5</xref> described how to apply the effect of event efficiencies as event weights. There is, however, another approach, which involves emulating the effects of efficiencies. This approach involves randomly accepting events or objects having a certain property, such that the total selected percentage reflects that of the efficiency. For example, if the overall reconstruction and identification efficiency for an electron with <inline-formula id="inf28">
<mml:math id="m28">
<mml:mrow>
<mml:mn>20</mml:mn>
<mml:mo>&#x3c;</mml:mo>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>T</mml:mi>
</mml:msub>
<mml:mo>&#x3c;</mml:mo>
<mml:mn>40</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> GeV and <inline-formula id="inf29">
<mml:math id="m29">
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>&#x3b7;</mml:mi>
<mml:mo>&#x7c;</mml:mo>
<mml:mo>&#x3c;</mml:mo>
<mml:mn>2.1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> is 60%, a given MC truth electron in that <inline-formula id="inf30">
<mml:math id="m30">
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf31">
<mml:math id="m31">
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>&#x3b7;</mml:mi>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> range is allowed to pass the selection only with a 0.6 probability. The decision for selection is made by sampling a uniform random number between 0 and 1, and accepting the event or object if the uniform random number is greater than the efficiency value. Usually, the uncertainty on the efficiency is also taken into account when making the pass/fail decision. This is called the hit-and-miss method.</p>
<p>Emulating efficiencies using the hit-and-miss method is regularly used in parameterized fast simulation frameworks. It is also becoming increasingly relevant to incorporate this functionality in the analysis step, especially for the benefit of phenomenological studies targeting interpretation or testing new analysis ideas. These studies generally use events produced by fast simulation or even at truth level instead of real collision data events or MC events produced by full detector simulation as used in experimental analyses. Experimental analyses use complicated object identification criteria, which cannot be implemented by fast simulation. Moreover, it is common to see different analyses working with different identification methods for a given object (e.g., cut-based identification versus multivariate analysis-based identification for electrons), as different methods may perform better for different physics cases. Consequently, working with different phenomenology analyses each using different identification criteria requires implementing all these criteria in the simulation step, which is highly impractical. Therefore, it is helpful for the infrastructure handling the analysis step to have the capability to emulate using efficiencies.</p>
<p>Emulating efficiencies with uncertainties was recently incorporated in CutLang. The hit-and-miss method is applied via the internal function applyHM. In the current implementation, the efficiency values and errors versus object properties are input via table blocks in the ADL file. This will be generalized to reading efficiencies from other formats, e.g., input histograms or numerical external functions in the near future.</p>
<p>The applyHM function uses a uniform distribution to decide if the central value was hit (below the value) or missed (above the value), the central value itself is recalculated in case the table contains errors. The new value is recalculated each time based on a double Gaussian function with positive and negative widths which are the errors of the associated bin in the efficiency table:<disp-formula id="e1">
<mml:math id="m32">
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>g</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2261;</mml:mo>
<mml:msqrt>
<mml:mrow>
<mml:mfrac>
<mml:mn>2</mml:mn>
<mml:mrow>
<mml:mi>&#x3c0;</mml:mi>
<mml:mo>&#x2217;</mml:mo>
<mml:msub>
<mml:mi mathvariant="italic">&#x3f5;</mml:mi>
<mml:mi>u</mml:mi>
</mml:msub>
<mml:mtext>&#x2a;</mml:mtext>
<mml:msub>
<mml:mi mathvariant="italic">&#x3f5;</mml:mi>
<mml:mi>d</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:msqrt>
<mml:mtext>&#x2a;</mml:mtext>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mo>&#x2212;</mml:mo>
</mml:msup>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3bc;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mtext>&#x2a;</mml:mtext>
<mml:msubsup>
<mml:mi mathvariant="italic">&#x3f5;</mml:mi>
<mml:mi>d</mml:mi>
<mml:mn>2</mml:mn>
</mml:msubsup>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>&#x3b8;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>&#x3bc;</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mo>&#x2212;</mml:mo>
</mml:msup>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3bc;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mtext>&#x2a;</mml:mtext>
<mml:msubsup>
<mml:mi mathvariant="italic">&#x3f5;</mml:mi>
<mml:mi>u</mml:mi>
<mml:mn>2</mml:mn>
</mml:msubsup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>where <italic>&#x3bc;</italic> is the central value of the relevant bin from efficiency table, <inline-formula id="inf32">
<mml:math id="m33">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="italic">&#x3f5;</mml:mi>
<mml:mi>u</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf33">
<mml:math id="m34">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="italic">&#x3f5;</mml:mi>
<mml:mi>d</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the errors in the same bin and finally <italic>&#x3b8;</italic> is the unit step function. The applyHM function can both be used in the object blocks for defining derived object collections. It can also be used in the region blocks to apply efficiencies on a particular object, e.g., to check whether the jet with the highest <inline-formula id="inf34">
<mml:math id="m35">
<mml:mrow>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is a b-tagged jet or not. Syntax for the applyHM function can be found in <xref ref-type="sec" rid="s13">Supplementary Appendix&#x20;A.9.9</xref>.</p>
</sec>
<sec id="s4-7">
<title>4.7 Histogramming</title>
<p>As described in the introduction, the main scope of ADL is the description of the physics content and algorithmic flow of an analysis. The language content presented up to this point serves this purpose. However further auxiliary functionalities are required for practicality while running the analysis on events. One such functionality is histogramming. Since the start of its design, CutLang has been capable of filling one-dimensional histograms of event variables. Recently, the capability of drawing two-dimensional histograms has been added. The syntax for histogramming can be found in <xref ref-type="sec" rid="s13">Supplementary Appendix A.11.3</xref>. Histogramming is currently only available for event variables. It will be added for object properties in the near future.</p>
</sec>
<sec id="s4-8">
<title>4.8 Alternatives Vocabulary and Syntax</title>
<p>The main priority of the ongoing developments is to establish the principles of ADL as a language. Here, we refer to a language as a set of instructions to implement algorithms that produce various kinds of output through abstractions for defining and manipulating data structures or controlling the flow of execution. It is however important to distinguish that a language can be expressed using alternative vocabulary or syntax. Here, vocabulary is the words with a particular meaning in the language, such as block or keyword names, and syntax is the set of rules that defines the combinations of symbols that are considered to be a correctly structured expression of the language. Our experience on the way from CutLang v1 and LHADA to ADL showed that there might not always be a single best syntax for expressing a given content. Alternative syntax options may be more favorable in different use cases, due to practicality or simply due to different tastes of the users. Recognizing this, we recently opted to host multiple syntactic alternatives in ADL and CutLang for several cases. The most obvious case is the syntax for expression of object attributes, as described in <xref ref-type="sec" rid="s13">Supplementary Appendix A.2</xref>. It should be noted that these alternatives can only exist for simple, localized syntactic expressions but not for the overall content and structure of the language. A more minor example is the name for the event classification block keyword, i.e. both region and algo are valid. Another is in the expression of specifying the input object collection in an object block, where either take keyword, using keyword or a colon &#x201c;:&#x201d; are valid. CutLang was recently updated to be able to parse and interpret different alternatives in such cases. We believe such flexibility will allow users to find the best ways to express their ideas and moreover will help CutLang to grow its overall user&#x20;base.</p>
</sec>
</sec>
<sec id="s5">
<title>5 Analysis Output</title>
<p>CutLang as an analysis framework is designed to output information and data that would be used for further analysis. The main output obtained after running an analysis in CutLang is provided in a ROOT file. The file, first of all, includes a copy of the ADL file content in order to document the provenance of the analysis. It also includes histograms with all the event counts and uncertainties obtained from the analysis and all histograms defined by the user. CutLang is also capable of skimming and saving events using the auxiliary save keyword in its internal format LVL0, as described in <xref ref-type="sec" rid="s13">Supplementary Appendix A.10.3</xref>. In case event saving is specified in the ADL file, the ROOT file also stores the saved events.</p>
<p>The output ROOT file includes a directory for each event categorization region, i.e. each region block. These directories contain all user-defined histograms specified in the ADL file. The prototype version of CutLang also had a basic cutflow histogram listing the number of events surviving each step of the selection in the given region. The cutflows, including the statistical errors on counts are also given as text output. In the current version, the cutflow histograms are improved to include the selection criteria as bin labels. Moreover, in case binning is used in a region, a bincounts histogram is also added, where each histogram bin shows the event counts and errors in each selection bin, and the histogram bin labels show the bin definition. The cutflow and bincounts histograms can be directly used in the subsequent statistical analysis of the results. A screenshot of a simple example output can be seen in Figure&#x20;4 in <xref ref-type="sec" rid="s13">Supplementary Appendix&#x20;A.11.3</xref>.</p>
<sec id="s5-1">
<title>5.1 Incorporation of Existing Counts</title>
<p>In some cases, event counts and uncertainties from external sources are needed to be systematically accessible in order to be processed together with the counts and uncertainties obtained from running the analysis via CutLang. One example is phenomenological interpretation studies, where the analysis is only run through signal samples, while the experimental results, consisting of data counts and background estimates are usually taken from the experimental publication. Having the data counts and background estimates directly available in a format compatible with the signal counts is necessary for subsequent statistical analysis. Moreover, for this particular case, it is also highly desirable to have this information documented directly within the ADL file. Another example is validation studies, when either multiple teams in an experimental group are synchronizing their cutflows, or a reimplemented analysis for a phenomenological interpretation study is validated against a cutflow provided by the original experimental publication. Similarly, having the validation counts and uncertainties in the same format would make comparison very practical.</p>
<p>Recently, a syntax was developed in ADL for systematically storing external counts and uncertainties within the ADL file. The physics process for which the information is given, and the format of the information is provided within the countsformat block using the process keyword, while the values are given in the relevant region blocks right after the definition of the relevant selection criteria using the counts keywords. The syntax is detailed in <xref ref-type="sec" rid="s13">Supplementary Appendix A.11.2</xref>. When an ADL file including external counts and errors is run with CutLang, the counts and errors are converted into cutflow and bincounts histograms with a similar format to those hosting the CutLang output. The histogram and are placed under the relevant region directories, and physics process is included in the histogram&#x20;names.</p>
</sec>
</sec>
<sec id="s6">
<title>6 Performance and Multi-Threaded Runs</title>
<p>The CutLang run-time interpreter is eventually aimed for use in the analysis of very large amounts of experimental data. Therefore its speed and performance needs to be close to those of analyses tools based on GPLs. It is expected that the process of run-time interpretation would decrease the performance due to additional tasks including lexical analysis, tokenization, etc. Yet, at its current state, CutLang&#x2019;s speed is only partially less than that of a C&#x2b;&#x2b; analyzer. For a numerical test, a sufficiently complicated supersymmetry search analysis (<xref ref-type="bibr" rid="B44">CMS et&#x20;al., 2019</xref>)<xref ref-type="fn" rid="FN6">
<sup>6</sup>
</xref> involving multiple objects, 12 event categorization regions and several variable calculations based on external functions was run both with CutLang and the C&#x2b;&#x2b;-based ADL transpiler adl2tnm using up to 1M supersymmetry signal events with the CMS NanoAOD format. The speed comparison for running in a Mac OS setup is shown in <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>. Overall, CutLang is about 20% slower compared to the same analysis performed using a pure C&#x2b;&#x2b;&#x20;code.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Speed comparison of CutLang versus the C&#x2b;&#x2b; ADL transpiler adl2tnm on a CMS supersymmetry analysis (<xref ref-type="bibr" rid="B44">CMS et&#x20;al., 2019</xref>)<xref ref-type="fn" rid="FN6">
<sup>6</sup>
</xref> using up to 1M supersymmetry signal events with the CMS NanoAOD format in a Mac OS&#x20;setup.</p>
</caption>
<graphic xlink:href="fdata-04-659986-g002.tif"/>
</fig>
<p>CutLang has been also recently enhanced with the capability of multi-threaded execution of an analysis to optimally utilize the available resources and therefore get faster results. Adding -j n to the command to start the analysis execution enables using n number of cores, e.g.,&#x20;as</p>
<disp-quote>
<p>./CLA.sh [inputrootfile] [inputeventformat] -i [adlfilename] -j&#x20;2</p>
</disp-quote>
<p>for 2 cores. The requirement for n is to be an integer between 0 and the total number of cores on the processor, where the case of -j 0 is used to select one less than the total number of cores to maximize performance for demanding analyses while leaving the operating system necessary part of the resources.</p>
<p>
<xref ref-type="fig" rid="F3">Figure&#x20;3</xref> shows the run time dependence on multi-threading. The mean and standard deviation of these results are further given in <xref ref-type="table" rid="T2">Table&#x20;2</xref>. The computer used during the test has Intel(R) Core(TM) i5-8300H with 4 cores, 8 threads and runs Ubuntu 18.04.4 LTS. The number of events analyzed was limited to 3 million due to memory restrictions in the current ROOT implementation. Although this is not the only possible way to collect results, it was convenient enough for a first implementation. It is surely possible to improve this implementation when the need arises by saving data on disk to free memory while continuing to&#x20;run.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Events processed per second when analysis is divided into 1, 2, 4, 6 and 8 threads for varying number of events. Error bars are multiplied by 10 to make them visible.</p>
</caption>
<graphic xlink:href="fdata-04-659986-g003.tif"/>
</fig>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Data points given in <xref ref-type="fig" rid="F3">Figure&#x20;3</xref>.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Threads</th>
<th align="center">Mean no. of events/s</th>
<th align="center">SD</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">1</td>
<td align="char" char=".">3,063.4</td>
<td align="char" char=".">14.5</td>
</tr>
<tr>
<td align="left">2</td>
<td align="char" char=".">5,853.5</td>
<td align="char" char=".">18.5</td>
</tr>
<tr>
<td align="left">4</td>
<td align="char" char=".">10,223.3</td>
<td align="char" char=".">22.3</td>
</tr>
<tr>
<td align="left">6</td>
<td align="char" char=".">11,028.0</td>
<td align="char" char=".">29.6</td>
</tr>
<tr>
<td align="left">8</td>
<td align="char" char=".">11,272.0</td>
<td align="char" char=".">119.6</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>As can be seen from the results, total event processing rate increases linearly as the number of cores increase up to 4. Due to the processor having only 4 physical cores with 2 logical cores each, the runs that use more than 4 threads showed minimal improvement. Simultaneous processing efficiency, resource demand of background processes and recombination of results that are obtained in parallel also contribute to the decline in the multi-threaded run performance.</p>
<p>In a different performance test, run times for 1, 2, 4 and 8 threaded analyses for varying numbers of events are given in <xref ref-type="table" rid="T3">Table&#x20;3</xref>. To simplify, a normalized version of <xref ref-type="table" rid="T3">Table&#x20;3</xref> is also provided in <xref ref-type="table" rid="T4">Table&#x20;4</xref>, where the run time of an analysis that used a single core is taken to be the norm. Looking at these tables, it can be seen that, as the analyses get more complex, higher levels of multi-threading performance gets increasingly better.</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>Variation of run times with changing number of threads.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Processed events</th>
<th colspan="4" align="center">Process time for core used (s)</th>
</tr>
<tr>
<th align="left"/>
<th align="center">1</th>
<th align="center">2</th>
<th align="center">4</th>
<th align="center">8</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">
<inline-formula id="inf35">
<mml:math id="m36">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mn>4</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="char" char=".">3.081</td>
<td align="char" char=".">3.041</td>
<td align="char" char=".">3.124</td>
<td align="char" char=".">4.600</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf36">
<mml:math id="m37">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mn>5</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="char" char=".">21.085</td>
<td align="char" char=".">12.062</td>
<td align="char" char=".">8.316</td>
<td align="char" char=".">9.630</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf37">
<mml:math id="m38">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mn>6</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="char" char=".">306.064</td>
<td align="char" char=".">155.195</td>
<td align="char" char=".">91.201</td>
<td align="char" char=".">97.968</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf38">
<mml:math id="m39">
<mml:mrow>
<mml:mn>2.5</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mn>6</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="char" char=".">776.133</td>
<td align="char" char=".">402.723</td>
<td align="char" char=".">227.817</td>
<td align="char" char=".">209.623</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf39">
<mml:math id="m40">
<mml:mrow>
<mml:mn>4.5</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mn>6</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="char" char=".">1,409.416</td>
<td align="char" char=".">722.901</td>
<td align="char" char=".">416.964</td>
<td align="char" char=".">374.946</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T4" position="float">
<label>TABLE 4</label>
<caption>
<p>Runtimes as percentages of single core runtime.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Processed events</th>
<th colspan="4" align="center">Normalized process time</th>
</tr>
<tr>
<th align="left"/>
<th align="center">1</th>
<th align="center">2</th>
<th align="center">4</th>
<th align="center">8</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">
<inline-formula id="inf40">
<mml:math id="m41">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mn>4</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">100</td>
<td align="char" char=".">98.7</td>
<td align="char" char=".">101</td>
<td align="char" char=".">149</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf41">
<mml:math id="m42">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mn>5</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">100</td>
<td align="char" char=".">57.2</td>
<td align="char" char=".">38.6</td>
<td align="char" char=".">45.7</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf42">
<mml:math id="m43">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mn>6</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">100</td>
<td align="char" char=".">50.7</td>
<td align="char" char=".">29.8</td>
<td align="char" char=".">32.0</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf43">
<mml:math id="m44">
<mml:mrow>
<mml:mn>2.5</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mn>6</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">100</td>
<td align="char" char=".">51.9</td>
<td align="char" char=".">29.4</td>
<td align="char" char=".">27.0</td>
</tr>
<tr>
<td align="left">
<inline-formula id="inf44">
<mml:math id="m45">
<mml:mrow>
<mml:mn>4.5</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mn>6</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">100</td>
<td align="char" char=".">51.3</td>
<td align="char" char=".">29.6</td>
<td align="char" char=".">26.6</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>A simple analysis task uses time mainly on reading data from disk and performing memory transfers. One should note that having a multicore system does not make an extra contribution in this scenario as there is only one disk. If the analysis becomes more complicated, the impact of read and copy operations gets reduced and CPU-intensive calculations start taking more time. Therefore in a CPU-intensive complex analysis, the benefit of having multiple cores becomes more pronounced.</p>
</sec>
<sec id="s7">
<title>7 Code Maintenance and Continuous Integration</title>
<p>The CutLang source code is public and resides in the popular software development platform GitHub<xref ref-type="fn" rid="FN7">
<sup>7</sup>
</xref>:<list list-type="simple">
<list-item>
<p>
<ext-link ext-link-type="uri" xlink:href="https://github.com/unelg/CutLang">https://github.com/unelg/CutLang</ext-link>
</p>
</list-item>
</list>
</p>
<p>CutLang uses GitHub functionalities for parallel code development across multiple developers. This development platform, apart from a wiki page for documentation and possibility for error reporting, also offers a continuous integration setup which includes a series of tasks that could be initiated at a specific time or by a trigger such as a commit to the main branch. The continuous integration setup was recently incorporated to automatically validate the code. The setup compiles the CutLang source code from scratch, and runs the resulting executable over a set of example ADL files from the package on a multitude of input data files and formats. By comparing the output from the examples to a reference output from earlier runs that were successfully executed and validated, any coding errors could be automatically detected and reported by email. The total compilation and execution time is greatly reduced by using a pre-compiled version of ROOT and by pre-installing the necessary event files onto a Docker<xref ref-type="fn" rid="FN8">
<sup>8</sup>
</xref> image integrated to a recent Linux (Ubuntu) virtual computer made available by the development platform.</p>
</sec>
<sec id="s8">
<title>8 Analysis Examples</title>
<p>ADL and CutLang are continuously being used for implementing a diverse set of LHC analyses and running these on events. The analyses implemented are being collected in the following GitHub repository<xref ref-type="fn" rid="FN9">
<sup>9</sup>
</xref>:<list list-type="simple">
<list-item>
<p>
<ext-link ext-link-type="uri" xlink:href="https://github.com/ADL4HEP/ADLLHCanalyses">https://github.com/ADL4HEP/ADLLHCanalyses</ext-link>
</p>
</list-item>
</list>
</p>
<p>The main focus so far has been to implement analyses designed for new physics searches, in particular supersymmetry searches. These supersymmetry analyses are intended to be directly used to create model efficiency maps to be used by the reinterpretation framework SModelS (<xref ref-type="bibr" rid="B31">Kraml et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B13">Ambrogi et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B14">Ambrogi et&#x20;al., 2020</xref>). The results obtained by running some of the implemented analyses have also been validated within dedicated exercises performed during the Les Houches PhysTeV workshops, in comparison to other analysis frameworks (<xref ref-type="bibr" rid="B18">Brooijmans et&#x20;al., 2020</xref>). The available analysis spectrum is currently being extended to cover Higgs and other SM analyses. Furthermore, studies are ongoing to improve the functionalities of ADL and CutLang for use in searches or interpretation studies with long-lived particles, which involve highly non-conventional objects and signatures. More recently, analyses examples for CMS Open Data<sup>4</sup> and a sensitivity study case for High Luminosity LHC and the Future Circular Collider were also added (<xref ref-type="bibr" rid="B34">Paul et&#x20;al., 2021</xref>). In addition, ADL and CutLang were used as main tools in an analysis school which took place in Istanbul in February 2020 for undergraduate students, and several analyses were implemented by the participating students (<xref ref-type="bibr" rid="B12">Adiguzel et&#x20;al., 2008</xref>). ADL and CutLang were also used to prepare hands-on exercises for data analysis at the 26th Vietnam School of Physics (VSOP) in December 2020.<xref ref-type="fn" rid="FN10">
<sup>10</sup>
</xref> The VSOP exercises involving running CutLang and further analysis of resulting histograms with ROOT were also adapted for direct use via Jupyter notebooks, and are documented in detail in VSOP hands-on exercises.<xref ref-type="fn" rid="FN11">
<sup>11</sup>
</xref> The experience in both schools justified ADL and CutLang as highly intuitive tools for introducing high energy physics data analysis to undergraduate and masters students with nearly no experience in analysis.</p>
<p>Implementing analyses with a variety of physics content led to incorporating a wider range of object and selection operations and helped to make the ADL syntax more generic and inclusive. Syntax for generalizing object combinations, numerical efficiency applications, hit-and-miss method, bins and counts and many others were introduced as a result of these studies. Consequently, the scope and functionality of CutLang interpreter and framework was also enhanced. Many internal and external functions were added to CutLang to address direct requirements of the various implemented analyses. Running different analyses on events also allowed to thoroughly test the capacity of CutLang in performing complete, realistic analysis&#x20;tasks.</p>
</sec>
<sec id="s9">
<title>9 Conclusions</title>
<p>We presented the recent developments in CutLang, leading towards a more complete analysis description language and a more robust runtime interpreter. The original syntax of the earlier CutLang prototype version and its event processing methods have been modified after a multitude of discussions with other scientists in the field interested in decoupling the physics analysis algorithms from the computational details and after implementing many HEP analyses. Modifications include significant enhancement of object definition and event classification expressions, addition of more functions for calculating event variables, incorporation of tables for applying efficiencies, adaptation of a system for including external counts, and more. Although these modifications broke the strict backward compatibility of the earlier version of the language, we believe they should be considered as improvements as they certainly will lead to a cleaner, more robust and a widely accepted analysis description language. The improved syntax processing relies on formal lexical and grammar definition tools widely available in all Unix-like operating systems.</p>
<p>One direct result of the syntax modifications originating from community-wide discussions is that, in the presented version there are more than a single way of expressing the same idea in CutLang. We believe this is a desirable property: after all, in human languages (that we try to imitate) as well, the same idea can be expressed in multiple ways. To give an example to reject events with a property smaller than a certain threshold amounts to accepting events greater than the same threshold. Such a property should not be considered as a source of potential confusion and error, but as a fertility of the language.</p>
<p>CutLang still follows the approach of runtime interpretation. We strongly believe that direct interpretation of the human readable commands and algorithms, although slower in execution as compared to a compiled binary, leads to faster and less error-prone algorithm development. The possible event processing speed issues can be cured by parallel processing of independent events and regions. The interpreted and human readable nature of CutLang and ADL have a potential area of growth and development: with the advance of machine learning hardware and software tools, the dream of being able to perform an LHC-type analysis just by talking to the computer in one&#x2019;s native tongue might not be too far-fetched.</p>
<p>The advances described in this paper brought ADL and CutLang to a state where they can handle many standard analysis expressions and operations and have developed the earlier prototype into a practically usable infrastructure. CutLang at its current stage can directly perform phenomenological studies and some simple experimental studies. However there are still some limitations to address in the language and the interpreter. In the near future, ADL syntax will be further expanded by inclusion of a generic way to describe arbitrary combinations of objects to form new ones, the capability of adding new object attributes and defining object associations, lower level objects or non-standard objects such as long-lived particles. One major addition would be the capability to express and handle variations due to systematic uncertainties. Moreover, the interpreter would benefit from further automatizing the incorporation of new input data types or external functions, which currently require manual intervention from the users. Enabling an automated syntax verification and providing explicit guidance for possible syntax errors would further facilitate the analysis process. Plans are underway to improve the design of the CutLang infrastructure in the near future based on current best practices in compiler construction to accommodate all these features and arrive at a more robust, yet flexible and user-friendly analysis ecosystem. With the growing data, our field will undoubtedly continue conceiving new analysis concepts and methods which may not be immediately applicable in ADL and CutLang. The current developer team is dedicated to following and implementing these features. Yet, we foresee that the planned improvements in the fundamental design of ADL and CutLang will lead the progress towards the ultimate goal of analysis automation.</p>
<p>Finally, as any language, CutLang/ADL grows with the people that use it to solve new problems. With every analysis requiring a new functionality, the list of already-solved problems grows. We hope that, such an internal library together with the script assisted addition of external user functions will allow the analysts of the future to spend less time on previously solved problems and to focus their energy in innovating solutions to the analysis problems of the post LHC era colliders.</p>
</sec>
</body>
<back>
<sec id="s10">
<title>Data Availability Statement</title>
<p>The original contributions presented in the study are included in the article/<xref ref-type="sec" rid="s13">Supplementary Material</xref>, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="s11">
<title>Author Contributions</title>
<p>GU: principal CutLang developer. SS: CutLang developer, tester, ADL syntax designer. AT: developer focusing on parsing with lex/yacc. BO: multicpu version developer. AP, NR: developers for many internal library functions. BG: developer focusing on external function handling. JS: developer focusing on continuous integration and testing.</p>
</sec>
<sec sec-type="COI-statement" id="s12">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<ack>
<p>We thank Harrison B. Prosper for useful discussions on the language content and help with validation of analysis results. We also thank the SModelS team for a collaboration that is helping to gradually improve CutLang. SS is supported by the National Research Foundation of Korea (NRF), funded by the Ministry of Science &#x0026; ICT under contracts 2021R1I1A3048138 and NRF-2008-00460.</p>
</ack>
<sec id="s13">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fdata.2021.659986/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fdata.2021.659986/full&#x23;supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="DataSheet1.pdf" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<fn-group>
<fn id="FN1">
<label>1</label>
<p>&#x201c;SciPy: Scientific Computing Tools for Python.&#x201d;</p>
</fn>
<fn id="FN2">
<label>2</label>
<p>&#x201c;Workshop: Analysis Description Languages for the LHC, 6&#x2013;8 May 2019, Fermilab LHC Physics Center.&#x201d; <ext-link ext-link-type="uri" xlink:href="https://indico.cern.ch/event/769263/">https://indico.cern.ch/event/769263/</ext-link>.</p>
</fn>
<fn id="FN3">
<label>3</label>
<p>&#x201c;Lex and Yacc Page.&#x201d; <ext-link ext-link-type="uri" xlink:href="http://dinosaur.compilertools.net">http://dinosaur.compilertools.net</ext-link>.</p>
</fn>
<fn id="FN4">
<label>4</label>
<p>&#x201c;CERN Open Data Portal.&#x201d; <ext-link ext-link-type="uri" xlink:href="http://opendata.cern.ch">http://opendata.cern.ch</ext-link>.</p>
</fn>
<fn id="FN5">
<label>5</label>
<p>This block was called algo in the original CutLang syntax. Even though algo is still valid in CutLang, we generally refer to the block as region, as the latter is a more domain specific&#x20;word.</p>
</fn>
<fn id="FN6">
<label>6</label>
<p>&#x201c;ADL Implementation of the CMS SUSY Razor Analysis.&#x201d; <ext-link ext-link-type="uri" xlink:href="https://github.com/ADL4HEP/ADLLHCanalyses/tree/master/CMS-SUS-16-017">https://github.com/ADL4HEP/ADLLHCanalyses/tree/master/CMS-SUS-16-017</ext-link>.</p>
</fn>
<fn id="FN7">
<label>7</label>
<p>&#x201c;CutLang GitHub Repository.&#x201d; <ext-link ext-link-type="uri" xlink:href="https://github.com/unelg/CutLang">https://github.com/unelg/CutLang</ext-link>.</p>
</fn>
<fn id="FN8">
<label>8</label>
<p>&#x201c;Docker Web Page.&#x201d; <ext-link ext-link-type="uri" xlink:href="https://www.docker.com/">https://www.docker.com/</ext-link>.</p>
</fn>
<fn id="FN9">
<label>9</label>
<p>&#x201c;ADL LHC Analyses Repository.&#x201d; <ext-link ext-link-type="uri" xlink:href="https://github.com/ADL4HEP/ADLLHCanalyses">https://github.com/ADL4HEP/ADLLHCanalyses</ext-link>.</p>
</fn>
<fn id="FN10">
<label>10</label>
<p>&#x201c;26th Vietnam School of Physics: Particles and Dark Matter, 29 Nov 2020&#x2013;11 Dec 2020, Quy Nhon.&#x201d; <ext-link ext-link-type="uri" xlink:href="https://indico.in2p3.fr/event/19437/overviewhttps://indico.in2p3.fr/event/19437/overview">https://indico.in2p3.fr/event/19437/overviewhttps://indico.in2p3.fr/event/19437/overview</ext-link>.</p>
</fn>
<fn id="FN11">
<label>11</label>
<p>&#x201c;VSOP Hands-On Exercises.&#x201d; <ext-link ext-link-type="uri" xlink:href="https://github.com/unelg/CutLang/wiki/VSOP26HandsOnEx">https://github.com/unelg/CutLang/wiki/VSOP26HandsOnEx</ext-link>.</p>
</fn>
<fn id="FN12">
<label>12</label>
<p>&#x201c;PDG Particle Identification Numbers.&#x201d; <ext-link ext-link-type="uri" xlink:href="https://pdg.lbl.gov/2013/pdgid/PDGIdentifiers.html">https://pdg.lbl.gov/2013/pdgid/PDGIdentifiers.html</ext-link>.</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Adiguzel</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Cakir</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Kaya</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Ozcan</surname>
<given-names>V. E.</given-names>
</name>
<name>
<surname>Ozturk</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Sekmen</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2008</year>). <article-title>Evaluating Analysis Description Language Concept as a First Introduction to Analysis in Particle Physics</article-title>. <source>arXiv</source>, <fpage>12034</fpage>. <pub-id pub-id-type="doi">10.1088/1361-6404/abdf67</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ambrogi</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Dutta</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Heisig</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Kraml</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kulkarni</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Laa</surname>
<given-names>U.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>SModelS v1.2: Long-Lived Particles, Combination of Signal Regions, and Other Novelties</article-title>. <source>Computer Phys. Commun.</source> <volume>251</volume>, <fpage>106848</fpage>. <comment>[arXiv:1811.10624]</comment>. <pub-id pub-id-type="doi">10.1016/j.cpc.2019.07.013</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ambrogi</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Kraml</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kulkarni</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Laa</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Lessa</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Magerl</surname>
<given-names>V.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>SModelS v1.1 User Manual: Improving Simplified Model Constraints with Efficiency Maps</article-title>. <source>Computer Phys. Commun.</source> <volume>227</volume>, <fpage>72</fpage>&#x2013;<lpage>98</lpage>. <comment>[arXiv:1701.06586]</comment>. <pub-id pub-id-type="doi">10.1016/j.cpc.2018.02.007</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Barr</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Lester</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Stephens</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2003</year>). <article-title>A Variable for Measuring Masses at Hadron Colliders when Missing Energy Is expected;mT2: the Truth behind the Glamour</article-title>. <source>J.&#x20;Phys. G: Nucl. Part. Phys.</source> <volume>29</volume>, <fpage>2343</fpage>&#x2013;<lpage>2363</lpage>. <comment>[hep-ph/0304226]</comment>. <pub-id pub-id-type="doi">10.1088/0954-3899/29/10/304</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Brooijmans</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Delaunay</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Delgado</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Englert</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Falkowski</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Fuks</surname>
<given-names>B.</given-names>
</name>
<etal/>
</person-group> (<year>2016</year>). &#x201c;<article-title>Les Houches 2015: Physics at TeV Colliders - New Physics Working Group Report</article-title>,&#x201d; in <conf-name>9th Les Houches Workshop on Physics at TeV Colliders (PhysTeV 2015)</conf-name>, <conf-loc>Les Houches, France</conf-loc>, <conf-date>June 1-19, 2015</conf-date>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://arXiv:1605.02684">arXiv:1605.02684</ext-link>
</comment> (<comment>Accessed October 10, 2020</comment>). </citation>
</ref>
<ref id="B17">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Brooijmans</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Buckley</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Caron</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Falkowski</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Fuks</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Gilbert</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>Les Houches 2017: Physics at TeV Colliders New Physics Working Group Report</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://arXiv:1803.10379">arXiv:1803.10379</ext-link>
</comment> (<comment>Accessed October 10, 2020</comment>). </citation>
</ref>
<ref id="B18">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Brooijmans</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Buckley</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Caron</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Falkowski</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Fuks</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Gilbert</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). &#x201c;<article-title>Les Houches 2019 Physics at TeV Colliders: New Physics Working Group Report</article-title>,&#x201d; in <conf-name>11th Les Houches Workshop on Physics at TeV Colliders: PhysTeV Les Houches</conf-name>, <conf-loc>Les Houches, France</conf-loc>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://arXiv:2002.12220">arXiv:2002.12220</ext-link>
</comment>. </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brun</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Rademakers</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>1997</year>). <article-title>ROOT - an Object Oriented Data Analysis Framework</article-title>. <source>Nucl. Instr. Methods Phys. Res.</source> <volume>389</volume>, <fpage>81</fpage>&#x2013;<lpage>86</lpage>. <pub-id pub-id-type="doi">10.1016/s0168-9002(97)00048-x</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Buckley</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Butterworth</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Grellscheid</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Hoeth</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>L&#xf6;nnblad</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Monk</surname>
<given-names>J.</given-names>
</name>
<etal/>
</person-group> (<year>2013</year>). <article-title>Rivet User Manual</article-title>. <source>Computer Phys. Commun.</source> <volume>184</volume>, <fpage>2803</fpage>&#x2013;<lpage>2819</lpage>. <pub-id pub-id-type="doi">10.1016/j.cpc.2013.05.021</pub-id> </citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<collab>CMS</collab>; <person-group person-group-type="author">
<name>
<surname>Sirunyan</surname>
<given-names>A. M.</given-names>
</name>
<name>
<surname>Tumasyan</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Adam</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Ambrogi</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Asilar</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Bergauer</surname>
<given-names>T.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Inclusive Search for Supersymmetry in Pp Collisions at <inline-formula id="inf45">
<mml:math id="m46">
<mml:mrow>
<mml:msqrt>
<mml:mi>s</mml:mi>
</mml:msqrt>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>13</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> TeV Using Razor Variables and Boosted Object Identification in Zero and One Lepton Final States</article-title>. <source>JHEP</source>. <volume>03</volume>, <fpage>031</fpage>. <comment>[arXiv:1812.06302]</comment>. <pub-id pub-id-type="doi">10.1007/JHEP03(2019)031</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Conte</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Fuks</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>MadAnalysis 5: Status and New Developments</article-title>. <source>J.&#x20;Phys. Conf. Ser.</source> <volume>523</volume>, <fpage>012032</fpage>. <pub-id pub-id-type="doi">10.1088/1742-6596/523/1/012032</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Conte</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Fuks</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Serret</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>MadAnalysis 5, a User-Friendly Framework for Collider Phenomenology</article-title>. <source>Computer Phys. Commun.</source> <volume>184</volume>, <fpage>222</fpage>&#x2013;<lpage>256</lpage>. <pub-id pub-id-type="doi">10.1016/j.cpc.2012.09.009</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>de Favereau</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Delaere</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Demin</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Giammanco</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Lema&#xee;tre</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Mertens</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2014</year>). <article-title>DELPHES 3: a Modular Framework for Fast Simulation of a Generic Collider experiment</article-title>. <source>J.&#x20;High Energ. Phys.</source> <volume>2014</volume>, <fpage>1</fpage>&#x2013;<lpage>25</lpage>. <pub-id pub-id-type="doi">10.1007/jhep02(2014)057</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Drees</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Dreiner</surname>
<given-names>H. K.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>J.&#x20;S.</given-names>
</name>
<name>
<surname>Schmeier</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Tattersall</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>CheckMATE: Confronting Your Favourite New Physics Model with LHC Data</article-title>. <source>Computer Phys. Commun.</source> <volume>187</volume>, <fpage>227</fpage>&#x2013;<lpage>265</lpage>. <pub-id pub-id-type="doi">10.1016/j.cpc.2014.10.018</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Gordon</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>LINQtoROOT</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://github.com/gordonwatts/LINQtoROOT">https://github.com/gordonwatts/LINQtoROOT</ext-link>
</comment> (<comment>Accessed October 10, 2020</comment>). <pub-id pub-id-type="doi">10.5270/oceanobs09.cwp.37</pub-id> </citation>
</ref>
<ref id="B28">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Gray</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>AwkwardQL</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://github.com/lgray/AwkwardQL">https://github.com/lgray/AwkwardQL</ext-link>
</comment> (<comment>Accessed October 10, 2020</comment>). <pub-id pub-id-type="doi">10.5962/bhl.title.44134</pub-id> </citation>
</ref>
<ref id="B27">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Gray</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Novak</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Thain</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Chakraborty</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>D.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Coffea: Columnar Object Framework for Effective analysis</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/4660697">https://zenodo.org/record/4660697</ext-link>
</comment> (<comment>Accessed October 10, 2020</comment>). </citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>J.&#x20;S.</given-names>
</name>
<name>
<surname>Schmeier</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Tattersall</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Rolbiecki</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>A Framework to Create Customised Lhc Analyses within CheckMATE</article-title>. <source>Computer Phys. Commun.</source> <volume>196</volume>, <fpage>535</fpage>&#x2013;<lpage>562</lpage>. <pub-id pub-id-type="doi">10.1016/j.cpc.2015.06.002</pub-id> </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kraml</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kulkarni</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Laa</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Lessa</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Magerl</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Proschofsky-Spindler</surname>
<given-names>D.</given-names>
</name>
<etal/>
</person-group> (<year>2014</year>). <article-title>SModelS: a Tool for Interpreting Simplified-Model Results from the LHC and its Application to Supersymmetry</article-title>. <source>Eur. Phys. J.&#x20;C</source> <volume>74</volume>, <fpage>2868</fpage>. <comment>[arXiv:1312.4175]</comment>. <pub-id pub-id-type="doi">10.1140/epjc/s10052-014-2868-5</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Krikler</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>FAST</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://fast-carpenter.readthedocs.io/en/latest/">https://fast-carpenter.readthedocs.io/en/latest/</ext-link>
</comment> (<comment>Accessed October 10, 2020</comment>). <pub-id pub-id-type="doi">10.1093/acprof:oso/9780198203803.001.0001</pub-id> </citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Paul</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Sekmen</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Unel</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Down Type Iso-Singlet Quarks at the HL-LHC and FCC-Hh</article-title>. <source>Eur. Phys. J.</source> <volume>81</volume>, <fpage>81</fpage>&#x2013;<lpage>85</lpage>. <pub-id pub-id-type="doi">10.1140/epjc/s10052-021-08982-4</pub-id> </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Piparo</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Canal</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Guiraud</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Valls Pla</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Ganis</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Amadio</surname>
<given-names>G.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>RDataFrame: Easy Parallel ROOT Analysis at 100 Threads</article-title>. <source>EPJ&#x20;Web Conf.</source> <volume>214</volume>, <fpage>06029</fpage>. <pub-id pub-id-type="doi">10.1051/epjconf/201921406029</pub-id> </citation>
</ref>
<ref id="B36">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Pivarski</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2006a</year>). <article-title>Femtocode</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://github.com/diana-hep/femtocode">https://github.com/diana-hep/femtocode</ext-link>
</comment> (<comment>Accessed October 10, 2020</comment>). <pub-id pub-id-type="doi">10.1063/1.2220241</pub-id> </citation>
</ref>
<ref id="B37">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Pivarski</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2006b</year>). <article-title>PartiQL</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://github.com/jpivarski/PartiQL">https://github.com/jpivarski/PartiQL</ext-link>
</comment> (<comment>Accessed October 10, 2020</comment>). <pub-id pub-id-type="doi">10.1063/1.2220241</pub-id> </citation>
</ref>
<ref id="B38">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Prosper</surname>
<given-names>H. B.</given-names>
</name>
<name>
<surname>Sekmen</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Unel</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>ADL Web Portal</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://cern.ch/adl">cern.ch/adl</ext-link>
</comment> (<comment>Accessed October 10, 2020</comment>). </citation>
</ref>
<ref id="B40">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Rizzi</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2020a</year>). <article-title>NAIL</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://indico.cern.ch/event/769263/timetable/#25-nail-a-prototype-analysis-l">https://indico.cern.ch/event/769263/timetable/&#x23;25-nail-a-prototype-analysis-l</ext-link>
</comment> (<comment>Accessed October 10, 2020</comment>). </citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rizzi</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2020b</year>). <article-title>The Evolution of Analysis Models for HL-LHC</article-title>. <source>EPJ&#x20;Web Conf.</source> <volume>245</volume>, <fpage>11001</fpage>. <pub-id pub-id-type="doi">10.1051/epjconf/202024511001</pub-id> </citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rogan</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>Kinematical Variables towards New Dynamics at the LHC</article-title>. <comment>arXiv:1006.2727</comment>. <pub-id pub-id-type="doi">10.2172/1128827</pub-id> </citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sekmen</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Gras</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Gray</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Krikler</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Pivarski</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Prosper</surname>
<given-names>H.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Analysis Description Languages for the LHC</article-title>. <source>PoS</source> <volume>LHCP2020</volume>, <fpage>065</fpage>. <comment>[arXiv:2011.01950]</comment>. <pub-id pub-id-type="doi">10.22323/1.382.0065</pub-id> </citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sekmen</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Unel</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>CutLang: A Particle Physics Analysis Description Language and Runtime Interpreter</article-title>. <source>Comput. Phys. Commun.</source> <volume>233</volume>, <fpage>215</fpage>&#x2013;<lpage>236</lpage>. <comment>[arXiv:1801.05727]</comment>. <pub-id pub-id-type="doi">10.1016/j.cpc.2018.06.023</pub-id> </citation>
</ref>
<ref id="B45">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Tattersall</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Dercks</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Desai</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>J. S.</given-names>
</name>
<name>
<surname>Poncza</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Rolbiecki</surname>
<given-names>K.</given-names>
</name>
<etal/>
</person-group> (<year>2016</year>). &#x201c;<article-title>CheckMATE: Checkmating New Physics at the LHC</article-title>,&#x201d; in <conf-name>Proceedings of the 38th International Conference on High Energy Physics (ICHEP2016)</conf-name>, <conf-loc>Chicago</conf-loc>, <conf-date>August 3&#x2013;10, 2016</conf-date>. <fpage>120</fpage>. </citation>
</ref>
<ref id="B46">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Unel</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Sekmen</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Toon</surname>
<given-names>A. M.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>CutLang: a Cut-Based HEP Analysis Description Language and Runtime Interpreter</article-title>,&#x201d; in <conf-name>19th International Workshop on Advanced Computing and Analysis Techniques in Physics Research: Empowering the Revolution: Bringing Machine Learning to High Performance Computing (ACAT 2019)</conf-name>. <conf-loc>Saas-Fee, Switzerland</conf-loc>, <conf-date>March 11&#x2013;15, 2019</conf-date>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://arXiv:1909.10621">arXiv:1909.10621</ext-link>
</comment> (<comment>Accessed October 10, 2020</comment>). </citation>
</ref>
<ref id="B47">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Watts</surname>
<given-names>G. T.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Hep_tables and Dataframe_expressions</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://gordonwatts.github.io/hep_tables_docs/intro">https://gordonwatts.github.io/hep_tables_docs/intro</ext-link>
</comment> (<comment>Accessed October 10, 2020</comment>). </citation>
</ref>
<ref id="B48">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Waugh</surname>
<given-names>B. M.</given-names>
</name>
<name>
<surname>Jung</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Buckley</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Lonnblad</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Butterworth</surname>
<given-names>J. M.</given-names>
</name>
</person-group> (<year>2006</year>). <article-title>HZTool and Rivet: Toolkit and Framework for the Comparison of Simulated Final States and Data at Colliders</article-title>. in <conf-name>15th International Conference on Computing in High Energy and Nuclear Physics</conf-name>, <conf-loc>Mumbai, India</conf-loc>, <conf-date>February 13&#x2013;17, 2006</conf-date>
<publisher-name>Tata Institute of Fundamental Research</publisher-name>
<comment>[hep-ph/0605034].</comment> </citation>
</ref>
</ref-list>
</back>
</article>