<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="other" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Robot. AI</journal-id>
<journal-title>Frontiers in Robotics and AI</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Robot. AI</abbrev-journal-title>
<issn pub-type="epub">2296-9144</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">716007</article-id>
<article-id pub-id-type="doi">10.3389/frobt.2021.716007</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Robotics and AI</subject>
<subj-group>
<subject>Systematic Review</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>The Impact of Machine Learning on 2D/3D Registration for Image-Guided Interventions: A Systematic Review and Perspective</article-title>
<alt-title alt-title-type="left-running-head">Unberath et&#x20;al.</alt-title>
<alt-title alt-title-type="right-running-head">Machine Learning in 2D/3D Registration</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Unberath</surname>
<given-names>Mathias</given-names>
</name>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1256088/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Gao</surname>
<given-names>Cong</given-names>
</name>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Hu</surname>
<given-names>Yicheng</given-names>
</name>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Judish</surname>
<given-names>Max</given-names>
</name>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Taylor</surname>
<given-names>Russell H</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1110569/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Armand</surname>
<given-names>Mehran</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/99822/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Grupp</surname>
<given-names>Robert</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1400493/overview"/>
</contrib>
</contrib-group>
<aff>Advanced Robotics and Computationally Augmented Environments (ARCADE) Lab, Department of Computer Science, Johns Hopkins University, <addr-line>Baltimore</addr-line>, <addr-line>MD</addr-line>, <country>United&#x20;States</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/267225/overview">Ka-Wai Kwok</ext-link>, The University of Hong Kong, Hong Kong, SAR China</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/816111/overview">Luigi Manfredi</ext-link>, University of Dundee, United&#x20;Kingdom</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/699671/overview">Changsheng Li</ext-link>, Beijing Institute of Technology, China</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Mathias Unberath, <email>mathias@jhu.edu</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Biomedical Robotics, a section of the journal Frontiers in Robotics and&#x20;AI</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>30</day>
<month>08</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>8</volume>
<elocation-id>716007</elocation-id>
<history>
<date date-type="received">
<day>28</day>
<month>05</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>30</day>
<month>07</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Unberath, Gao, Hu, Judish, Taylor, Armand and Grupp.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Unberath, Gao, Hu, Judish, Taylor, Armand and Grupp</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>Image-based navigation is widely considered the next frontier of minimally invasive surgery. It is believed that image-based navigation will increase the access to reproducible, safe, and high-precision surgery as it may then be performed at acceptable costs and effort. This is because image-based techniques avoid the need of specialized equipment and seamlessly integrate with contemporary workflows. Furthermore, it is expected that image-based navigation techniques will play a major role in enabling mixed reality environments, as well as autonomous and robot-assisted workflows. A critical component of image guidance is 2D/3D registration, a technique to estimate the spatial relationships between 3D structures, e.g., preoperative volumetric imagery or models of surgical instruments, and 2D images thereof, such as intraoperative X-ray fluoroscopy or endoscopy. While image-based 2D/3D registration is a mature technique, its transition from the bench to the bedside has been restrained by well-known challenges, including brittleness with respect to optimization objective, hyperparameter selection, and initialization, difficulties in dealing with inconsistencies or multiple objects, and limited single-view performance. One reason these challenges persist today is that analytical solutions are likely inadequate considering the complexity, variability, and high-dimensionality of generic 2D/3D registration problems. The recent advent of machine learning-based approaches to imaging problems that, rather than specifying the desired functional mapping, approximate it using highly expressive parametric models holds promise for solving some of the notorious challenges in 2D/3D registration. In this manuscript, we review the impact of machine learning on 2D/3D registration to systematically summarize the recent advances made by introduction of this novel technology. Grounded in these insights, we then offer our perspective on the most pressing needs, significant open problems, and possible next steps.</p>
</abstract>
<kwd-group>
<kwd>artificial intelligence</kwd>
<kwd>deep learning</kwd>
<kwd>surgical data science</kwd>
<kwd>image registration</kwd>
<kwd>computer-assisted interventions</kwd>
<kwd>robotic surgery</kwd>
<kwd>augmented reality</kwd>
</kwd-group>
<contract-num rid="cn001">R21 EB028505</contract-num>
<contract-sponsor id="cn001">National Institute of Biomedical Imaging and Bioengineering<named-content content-type="fundref-id">10.13039/100000070</named-content>
</contract-sponsor>
<contract-sponsor id="cn002">Malone Center for Engineering in Healthcare at Johns Hopkins University, Internal Funds</contract-sponsor>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<sec id="s1-1">
<title>1.1 Background</title>
<p>Advances in interventional imaging, including the miniaturization of high-resolution endoscopes and the increased availability of C-arm X-ray systems, have driven the development and adoption of minimally invasive alternatives to conventional, invasive and open surgical techniques across a wide variety of&#x20;clinical specialities. While minimally invasive approaches are generally considered safe and effective, the indirect visualization of surgical instruments relative to anatomical structures complicates spatial cognition and the more confined room for maneuvers requires precise command of the surgical instruments. It is well known that due to the aforementioned challenges, among others, outcomes after minimally invasive surgery are positively correlated with technical proficiency, experience, and procedural volume of the operator (<xref ref-type="bibr" rid="B8">Birkmeyer et&#x20;al., 2013</xref>; <xref ref-type="bibr" rid="B76">Pfandler et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B39">Hafezi-Nejad et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B22">Foley and Hsu, 2021</xref>). To mitigate the impact of experience on complication risk and outcomes, surgical navigation solutions that register specialized tools with 3D models of the anatomy using additional tracking hardware are now commercially available (<xref ref-type="bibr" rid="B63">Mezger et&#x20;al., 2013</xref>; <xref ref-type="bibr" rid="B21">Ewurum et&#x20;al., 2018</xref>). Surgical navigation promotes reproducibly good patient outcomes, and when combined with robotic assistance systems, may enable novel treatment options and improved techniques (<xref ref-type="bibr" rid="B97">van der List et&#x20;al., 2016</xref>). Unfortunately, navigation systems are not widely adopted due to, among other things, high purchase price despite limited versatility, increased procedural time and cost, and potential for disruptions to surgical workflows due to line-of-sight occlusions or other system complications (<xref ref-type="bibr" rid="B77">Picard et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B49">Joskowicz and Hazan, 2016</xref>). While frustrations induced by workflow disruption affect every operator equally, the aforementioned limitations regarding cost particularly inhibit the adoption of surgical navigation systems in geographical areas with less specialized healthcare providers with lower volumes for any procedure; areas where routine use of navigation would perhaps be most impactful.</p>
<p>To mitigate the challenges of conventional surgical navigation systems that introduce dedicated tracking hardware and instrumentation as well as workflow alterations, the computer-assisted interventions community has contributed purely image-based alternatives to surgical navigation, e.g., (<xref ref-type="bibr" rid="B72">Nolte et&#x20;al., 2000</xref>; <xref ref-type="bibr" rid="B67">Mirota et&#x20;al., 2011</xref>; <xref ref-type="bibr" rid="B53">Leonard et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B91">Tucker et&#x20;al., 2018</xref>). Image-based navigation techniques do not require specialized equipment but rely on traditional intra-operative imaging that enabled the minimally invasive technique in the first place. Therefore, these techniques do not introduce economic trade-offs. Further, because image-based navigation techniques are designed to seamlessly integrate into conventional surgical workflows, their use should&#x2014;in theory&#x2014;not cause frustration or prolonged procedure times (<xref ref-type="bibr" rid="B101">Vercauteren et&#x20;al., 2019</xref>). A central component to many if not most image-guided navigation solutions is image-based 2D/3D registration, which estimates the spatial relationship between a 3D model of the scene (potentially including anatomy and instrumentation) and 2D interventional images thereof (<xref ref-type="bibr" rid="B62">Markelj et&#x20;al., 2012</xref>; <xref ref-type="bibr" rid="B57">Liao et&#x20;al., 2013</xref>). Two examples of using 2D/3D registration for intra-operative guidance are shown in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>: Image-guidance of&#x20;periacetabular osteotomy (left, and discussed in greater detail in <xref ref-type="sec" rid="s3-3">Section 3.3</xref>) and robot-assisted femoroplasty (right). One may be tempted to assume that after several decades of research on this topic, image-based 2D/3D registration is a largely solved problem. While, indeed, analytical solutions now exist to precisely recover 2D/3D spatial relations under&#x20;certain conditions (<xref ref-type="bibr" rid="B62">Markelj et&#x20;al., 2012</xref>; <xref ref-type="bibr" rid="B95">Uneri et&#x20;al., 2013</xref>; <xref ref-type="bibr" rid="B27">Gao et&#x20;al., 2020b</xref>; <xref ref-type="bibr" rid="B33">Grupp et&#x20;al., 2020b</xref>), several hard challenges prevail.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>
<bold>(A)</bold> A high-level overview of the workflow proposed by <xref ref-type="bibr" rid="B32">Grupp et&#x20;al. (2019)</xref>, which uses 2D/3D registration for estimating the relative pose of a periacetabular osteotomy (PAO) fragment. By enabling intra-operative 3D visualizations and the calculation of biomechanical measurements, this pose information should allow surgeons to better assess when a PAO fragment requires further adjustments and potentially reduce post-operative complications. The utility of the proposed workflow is diminished by the traditional registration strategy&#x2019;s requirement for manual annotations, which are needed to initialize the pelvis pose and reconstruct the fragment shape. <bold>(B)</bold>: Image-based navigation for robot-assisted femoropalsty by <xref ref-type="bibr" rid="B26">Gao et&#x20;al. (2020a)</xref>. The intra-operative poses of the robot and the femur anatomy are estimated using X-ray-image based 2D/3D registration. The robot-held drilling/injection device is positioned according to the pre-planned trajectory that is propageted intra-operatively using pose estimates from 2D/3D registration. Image-based navigation is less invasive than fiducial-based alternatives and simplifies the procedure.</p>
</caption>
<graphic xlink:href="frobt-08-716007-g001.tif"/>
</fig>
</sec>
<sec id="s1-2">
<title>1.2 Problem Formulation</title>
<p>Generally speaking, in 2D/3D registration we are interested in finding the optimal geometric transformation that aligns a (typically pre-operative) 3D representation of objects or anatomy with (typically intra-operative) 2D observations thereof. For the purposes of this review, we will assume that the reduction in dimensionality originates from a projective, not an affine, transformation.</p>
<p>Given a set of 3D data <italic>x</italic>
<sub>
<italic>i</italic>
</sub> and 2D observations <italic>y</italic>
<sub>
<italic>v</italic>
</sub>, where subscripts <italic>i</italic>, <italic>v</italic> suggest that there may be multiple objects and multiple 2D observations, respectively, a generic way of writing the optimization problem for the common case of a single object but multiple views is:<disp-formula id="e1">
<mml:math id="m1">
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
<mml:mmultiscripts>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:none/>
<mml:mprescripts/>
<mml:none/>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:mmultiscripts>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:munder>
<mml:mrow>
<mml:mtext>arg min</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
<mml:mmultiscripts>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:none/>
<mml:mprescripts/>
<mml:none/>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:mmultiscripts>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:munder>
</mml:mrow>
<mml:mrow>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:munder>
</mml:mrow>
<mml:mi>S</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
<mml:mmultiscripts>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:none/>
<mml:mprescripts/>
<mml:none/>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:mmultiscripts>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x25e6;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>R</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
</mml:mrow>
</mml:mfenced>
<mml:mo>.</mml:mo>
</mml:math>
<label>(1)</label>
</disp-formula>
</p>
<p>In <xref ref-type="disp-formula" rid="e1">Eq. 1</xref>, <inline-formula id="inf1">
<mml:math id="m2">
<mml:msub>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> is a 3D non-rigid deformation model with parameters <inline-formula id="inf2">
<mml:math id="m3">
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula>, <inline-formula id="inf3">
<mml:math id="m4">
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
<mml:mmultiscripts>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:none/>
<mml:mprescripts/>
<mml:none/>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:mmultiscripts>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> is a rigid projection operation using a camera with intrinsic parameters <inline-formula id="inf4">
<mml:math id="m5">
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula> and pose <inline-formula id="inf5">
<mml:math id="m6">
<mml:mspace width="0.17em"/>
<mml:mmultiscripts>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:none/>
<mml:mprescripts/>
<mml:none/>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:mmultiscripts>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mi>E</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> with respect to the 3D data <italic>x</italic>, and <italic>S</italic>(&#x22c5;, &#x22c5;) is a cost function (in case of images often a similarity metric). We use the composite function <italic>P</italic>&#x25e6;<italic>D</italic>(&#x22c5;) to capture the variability, including their order, with which these operations may be applied. Finally, <italic>R</italic>(&#x22c5;) is a regularizing term that can act and/or depend on any combination of variables and parameters; its choice most often depends on the specific application since regularization can represent &#x201c;common sense&#x201d; or prior knowledge, which tends to vary with the problem domain.</p>
<p>2D/3D registration then amounts to estimating the 5&#x20;&#x2b; 6&#x20;&#x2b; <italic>N</italic>
<sub>
<italic>D</italic>
</sub> degrees of freedom (DoFs) for <inline-formula id="inf6">
<mml:math id="m7">
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
<mml:mmultiscripts>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:none/>
<mml:mprescripts/>
<mml:none/>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:mmultiscripts>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>, respectively, that minimize the optimization objective. Clearly, there are special cases to <xref ref-type="disp-formula" rid="e1">Eq. 1</xref>, e.g., for rigid registration where only <inline-formula id="inf7">
<mml:math id="m8">
<mml:msub>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
<mml:mmultiscripts>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:none/>
<mml:mprescripts/>
<mml:none/>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:mmultiscripts>
</mml:math>
</inline-formula> must be estimated and <inline-formula id="inf8">
<mml:math id="m9">
<mml:msub>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> is known, or vice&#x20;versa.</p>
<p>In traditional image-based 2D/3D registration, this optimization is usually performed iteratively where parameters are initialized with some values <inline-formula id="inf9">
<mml:math id="m10">
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
<mml:mmultiscripts>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:none/>
<mml:mprescripts/>
<mml:none/>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:mmultiscripts>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> and then adjusted incrementally. The updates <inline-formula id="inf10">
<mml:math id="m11">
<mml:mi>&#x3b4;</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
<mml:mmultiscripts>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:none/>
<mml:mprescripts/>
<mml:none/>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:mmultiscripts>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> are derived from <xref ref-type="disp-formula" rid="e1">Eq. 1</xref> using gradient-based or gradient-free methods, such as BFGS (<xref ref-type="bibr" rid="B59">Liu and Nocedal, 1989</xref>; <xref ref-type="bibr" rid="B5">Berger et&#x20;al., 2016</xref>) or CMA-ES (<xref ref-type="bibr" rid="B41">Hansen et&#x20;al., 2003</xref>) and BOBYQA (<xref ref-type="bibr" rid="B79">Powell, 2009</xref>), respectively. In certain cases when 2D and 3D data representations are not pixel or voxel grid-based but sparse, e.g., keypoints, analytic solutions to <xref ref-type="disp-formula" rid="e1">Eq. 1</xref>, such as the perspective n point (PnP) algorithm (<xref ref-type="bibr" rid="B54">Lepetit et&#x20;al., 2009</xref>),&#x20;exist.</p>
<p>The above traditional approach to solving 2D/3D image registration has spawned solutions that precisely recover the desired geometric transformations under certain conditions (<xref ref-type="bibr" rid="B62">Markelj et&#x20;al., 2012</xref>; <xref ref-type="bibr" rid="B95">Uneri et&#x20;al., 2013</xref>; <xref ref-type="bibr" rid="B27">Gao et&#x20;al., 2020b</xref>; <xref ref-type="bibr" rid="B33">Grupp et&#x20;al., 2020b</xref>). Unfortunately, despite substantial efforts over the past decades, 2D/3D registration is not yet enjoying wide popularity as a workhorse component in image-based navigation platforms at the bedside. Rather, it is shackled to the bench top because several hard open challenges inhibit its widespread adoption. They include:<list list-type="simple">
<list-item>
<p>&#x2022; <bold>Narrow capture range of similarity metrics</bold>: Conventional intensity-based methods mostly use hand-crafted similarity metrics between <italic>y</italic>
<sub>
<italic>v</italic>
</sub> and its current best estimate <inline-formula id="inf11">
<mml:math id="m12">
<mml:msub>
<mml:mrow>
<mml:mover>
<mml:mi>y</mml:mi>
<mml:mo>&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
<mml:mmultiscripts>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:none/>
<mml:mprescripts/>
<mml:none/>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:mmultiscripts>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x25e6;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>fx1 as loss function. Common choices for the similarity metric are Normalized Cross Correlation (NCC), gradient information (<xref ref-type="bibr" rid="B5">Berger et&#x20;al., 2016</xref>), or Mutual Information (MI) (<xref ref-type="bibr" rid="B61">Maes et&#x20;al., 1997</xref>). While these metrics are positively correlated with pose differences when the perturbations in <inline-formula id="inf12">
<mml:math id="m13">
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
<mml:mmultiscripts>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:none/>
<mml:mprescripts/>
<mml:none/>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:mmultiscripts>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> are small, they are generally non-convex and fail to accurately represent pose offsets when perturbations are large. Thus, without proper initialization, the optimization is prone to get stuck in local minima, returning wrong registration results. The initial estimate of the target parameters <inline-formula id="inf13">
<mml:math id="m14">
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
<mml:mmultiscripts>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:none/>
<mml:mprescripts/>
<mml:none/>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:mmultiscripts>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> must hence be close enough to the true solution in order for the optimization to converge to the global minima. Estimating good initial parameters is commonly achieved using some manual interaction, which is cumbersome and time consuming, or&#x2014;in research papers&#x2014;neglected all-together. The magnitude by which the initial parameter guesses may be incorrect for the downstream algorithm to still produce a successful registration is referred to as the capture range (<xref ref-type="bibr" rid="B62">Markelj et&#x20;al., 2012</xref>; <xref ref-type="bibr" rid="B20">Esteban et&#x20;al., 2019</xref>). Its magnitude depends, among other things, on the similarity function as well as the optimizer, and it can be stated quantitatively as the mean target registration error (mTRE) between 3D keypoints at initialization<xref ref-type="fn" rid="FN1">
<sup>1</sup>
</xref>.</p>
</list-item>
</list>
</p>
<p>The resulting challenges are two-fold: On the one hand, it is important to develop robust and automated initialization strategies for the existing image-based 2D/3D registration algorithms to succeed. On the other hand, there is interest in and opportunity for the development of better similarity metrics that better capture the evolution of image (dis)similarity. Doing so is challenging, however, because of the complexity of the task including various contrast mechanisms, imaging geometries, and inconsistencies.<list list-type="simple">
<list-item>
<p>&#x2022; <bold>Ambiguity</bold>: The aforementioned complexity also leads to registration ambiguity, which is most pronounced in single-view registration (<xref ref-type="bibr" rid="B74">Otake et&#x20;al., 2013</xref>; <xref ref-type="bibr" rid="B95">Uneri et&#x20;al., 2013</xref>). Because the spatial information along the projection line is collapsed onto the imaging plane, it is hard to precisely recover the information in the projective direction. A well-known example is the difficulty of accurately estimating the depth of 3D scene from the camera center using a single 2D image. These challenges already exist for rigid 2D/3D registration and are further exacerbated in rigid plus deformable registration settings.</p>
</list-item>
<list-item>
<p>&#x2022; <bold>High dimensional optimization problems</bold>: Even in the simplest case, 2D/3D registration describes a non-convex optimization problem with at least six DoFs to describe a rigid body transform. In the context of deformable 2D/3D registration, the high dimensional parameter <italic>&#x3c9;</italic>
<sub>
<italic>D</italic>
</sub> that describes the 3D deformation drastically increases the optimization search space. However, since the information within the 2D and 3D images remains constant, the optimization problem may easily become ill-posed. Although statistical modeling techniques exist to limit the parameter search space, the registration accuracy and sensitivity to key features remain an area of concern (<xref ref-type="bibr" rid="B119">Zhu et&#x20;al., 2021</xref>).</p>
</list-item>
<list-item>
<p>&#x2022; <bold>Verification and uncertainty</bold>: As a central component of image-based surgical navigation platforms, 2D/3D registration supplies critical information to enable precise manipulation of anatomy. To enable users to assess risk and make better decisions, there is a strong desire for registration algorithms to verify the resulting geometric parameters or supply uncertainty estimates. Perhaps the most straightforward way of verifying a registration result is to visually inspect the 2D overlay of the projected 3D data&#x2014;this approach, however, is neither quantitative nor does it scale since it is based on human intervention.</p>
</list-item>
</list>
</p>
<p>These open problems can be largely attributed to the variability in the problem settings (e.g., regarding image appearance and contrast mechanisms, pose variability, &#x2026;) that cannot easily be handled algorithmically because the desired properties cannot be formalized explicitly. Machine learning methods, including deep neural networks (NNs), have enjoyed a growing popularity across a variety of image analysis problems (<xref ref-type="bibr" rid="B101">Vercauteren et&#x20;al., 2019</xref>), precisely because they do not require explicit definitions of complex functional mappings. Rather, they optimize parametric functions, such as convolutional NNs (CNNs), on training data such that the model learns to approximate the desired mapping between input and output variables. As such, they provide opportunities to supersede heuristic components of traditional registration pipelines with learning-based alternatives that were optimized for the same task on much larger amounts of data. This allows us to expand <xref ref-type="disp-formula" rid="e1">Eq. 1</xref> into:<disp-formula id="e2">
<mml:math id="m15">
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
<mml:mmultiscripts>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:none/>
<mml:mprescripts/>
<mml:none/>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:mmultiscripts>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mi mathvariant="normal">a</mml:mi>
<mml:mi mathvariant="normal">r</mml:mi>
<mml:mi mathvariant="normal">g</mml:mi>
<mml:mspace width="0.17em"/>
<mml:mi mathvariant="normal">m</mml:mi>
<mml:mi mathvariant="normal">i</mml:mi>
<mml:mi mathvariant="normal">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
<mml:mmultiscripts>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:none/>
<mml:mprescripts/>
<mml:none/>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:mmultiscripts>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:munder>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:munder>
<mml:msup>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msup>
<mml:mspace width="-0.17em"/>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
<mml:mmultiscripts>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:none/>
<mml:mprescripts/>
<mml:none/>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:mmultiscripts>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x25e6;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msubsup>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msup>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
</mml:math>
<label>(2)</label>
</disp-formula>where we have introduced parameters <italic>&#x3b8;</italic> to several components of the objective function to indicate that they may now be machine learning models, such as CNNs. Similarly, the registration may not rely on the original 3D and 2D data itself but some higher-level representation thereof, e.g., anatomical landmarks, that are generated using some learned function <italic>G</italic>
<sup>
<italic>&#x3b8;</italic>
</sup>(&#x22c5;).</p>
<p>In this manuscript, first we summarize a systematic review of the recent literature on machine learning-based techniques for image-based 2D/3D registration, and explain how they relate to <xref ref-type="disp-formula" rid="e2">Eq. 2</xref>. Based on those observations, we identify the impact that the introduction of contemporary machine learning methodology has had on 2D/3D registration for image-guided interventions. Concurrently, we identify open challenges and contribute our perspective on possible solutions.</p>
</sec>
</sec>
<sec id="s2">
<title>2 Systematic Review</title>
<sec id="s2-1">
<title>2.1 Search Methodology</title>
<p>The aim of the systematic review is to survey those machine learning-enhanced 2D/3D registration methods in which the 3D data and 2D observations thereof are related through one or multiple perspective projections (and potentially some non-rigid deformation). This scenario arises, for example, in the registration between 3D CT and 2D X-ray, 3D magnetic resonance angiography (MRA) and 2D digital subtraction angiography (DSA), or 3D anatomical models and 2D endoscopy images. 3D/3D registration methods (such as between 3D CT and intra-operative CBCT) or 2D/3D slice-to-volume registration (as it arises, among others, in ultrasound to CT/MR registration) are beyond the scope of this review. Because we are primarily interested in surveying the impact of contemporary machine learning techniques, such as deep CNNs, on 2D/3D registration, we limit our analysis to records that appeared after January 2012, which pre-dates the onset of the ongoing surge of interest in learning-based image processing (<xref ref-type="bibr" rid="B51">Krizhevsky et&#x20;al., 2012</xref>).</p>
<p>To this end, we conducted a systematic literature review in accordance with the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) method (<xref ref-type="bibr" rid="B70">Moher et&#x20;al., 2009</xref>) (cf. <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>). We searched PubMed, embase, Business Source Ultimate, and Compendex to find articles pertinent to machine learning for 2D/3D image registration. The following search terms were used to screen titles, abstracts, and keywords of all available records from January 2012 through February 2021:</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>PRISMA flow chart illustrating the screening and inclusion process. Duplicate studies were the result of searching multiple databases. Exclusion screening was performed individually with each study&#x2019;s abstract with assistance of Covidence tool. Additional twelve studies excluded after full-text review, resulting in a pool of 48 studies included for full review.</p>
</caption>
<graphic xlink:href="frobt-08-716007-g002.tif"/>
</fig>
<p>(&#x201c;2D3D registration&#x201d; OR &#x201c;2D 3D registration&#x201d; OR &#x201c;3D2D registration&#x201d; OR &#x201c;3D 2D registration&#x201d; OR &#x201c;2D/3D registration&#x201d; OR &#x201c;3D/2D registration&#x201d; OR &#x201c;2D-3D registration&#x201d; OR &#x201c;3D-2D registration&#x201d; OR &#x201c;two-dimensional/three-dimensional registration&#x201d; OR &#x201c;three-dimensional/two-dimensional registration&#x201d;) AND (&#x201c;learning&#x201d; OR &#x201c;training&#x201d; OR &#x201c;testing&#x201d; OR &#x201c;trained&#x201d; OR &#x201c;tested&#x201d;)</p>
<p>The initial search resulted in 559 records, and after removal of duplicates, 495 unique studies were included for screening. From those, 447 were excluded because they either did not describe a machine learning-based method for 2D/3D registration, or considered a slice-to-volume registration problem (e.g., as in ultrasound or magnetic resonance imaging). The remaining 48 articles were included for an in-depth full text review, analysis, and data extraction, which was performed by five of the authors (MU, CG, MJ, YH, and RG). Initially, every reviewer analyzed five articles to develop and refine the data extraction template and coding approach. The final template involved the extraction of the following information: 1) A brief summary of the method including the key contribution; 2) modalities and registration phase (including the 3D modality, 2D modality, the registration goal, whether the method requires manual interactions, whether the method is anatomy or patient-specific, and the clinical speciality), 3) the spatial transformation to be recovered (including the number of objects to be registered, the number of views used for registration, and the transformation model used), 4) information on the machine learning model and training setup (including the explicit machine learning technique, the approach to training data curation as well as to data labeling and supervision, and the application of domain generalization or adaptation techniques), 5) the evaluation strategy (including the data source used for evaluation as well as its annotation, the metrics and techniques used for quantitative and qualitative assessment, and most importantly the deterioration of performance in presence of domain shift), and finally, 6) a more subjective summary of concerns with respect to the experimental or methodological approach or the assumptions made in the design or evaluation of the method.</p>
<p>Every one of the 48 articles was analyzed and coded by at least two of the five authors and one author (MU) merged the individual reports into a final consensus document.</p>
</sec>
<sec id="s2-2">
<title>2.2 Limitations</title>
<p>Despite our efforts to broaden the search terms regarding 2D/3D registration, we acknowledge that the list may not be exhaustive. Newer or less popular terminology, such as &#x201c;pose regression&#x201d;, were not used. We also did not use modality specific terms, such as &#x201c;video to CT registration&#x201d;, which may have excluded some manuscripts that focus on endoscopy or other RGB camera-based modalities. The search included terms like &#x201c;learning&#x201d;, &#x201c;training&#x201d;, or &#x201c;testing&#x201d; as per our interest in machine learning methods for 2D/3D image registration. This search may have excluded some studies that do not explicitly characterize their work as machine or deep &#x201c;learning&#x201d; and do not describe their training and testing approach, neither in their title nor abstract. The terminology used in the search may also have resulted in the exclusion of relevant work from the general computer vision literature. Finally, the review is limited to published manuscripts. Publication bias may have resulted in the exclusion of works relevant to this review.</p>
</sec>
<sec id="s2-3">
<title>2.3 Concise Summary of the Overall Trends</title>
<p>We first summarize the general application domain and problem setting of the 48 included papers and then review the role that machine learning plays in those applications. The specific characteristics of all included papers are summarized in <xref ref-type="table" rid="T1">Tables 1</xref> and <xref ref-type="table" rid="T2">2</xref>, respectively. We state the number of papers either in the running text or in parentheses.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Parameters defining the registration problems described in the studies included for review. Registration purpose refers to the registration stage being addressed, such as initialization (init.), precise retrieval of geometric parameters (fine regis.), or others.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">
<bold>Study ID</bold>
</th>
<th align="center">
<bold>3D Modality</bold>
</th>
<th align="center">
<bold>2D Modality</bold>
</th>
<th align="center">
<bold>Regis. Action</bold>
</th>
<th align="center">
<bold>Regis. Purpose</bold>
</th>
<th align="center">
<bold>Speciality</bold>
</th>
<th align="center">
<bold>Rigid/Non-rigid</bold>
</th>
<th align="center">
<bold>No. Views</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">
<xref ref-type="bibr" rid="B9">Brost et&#x20;al. (2012)</xref>
</td>
<td align="center">Catheter Model</td>
<td align="center">X-ray</td>
<td align="center">Pre-proc</td>
<td align="center">fine regis</td>
<td align="center">Catheter</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B58">Lin and Winey (2012)</xref>
</td>
<td align="center">CT</td>
<td align="center">X-ray</td>
<td align="center">Pre-proc</td>
<td align="center">fine regis</td>
<td align="center">Radiotherapy</td>
<td align="center">R</td>
<td align="center">2</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B15">Chou and Pizer (2013)</xref>
</td>
<td align="center">CT (3D &#x2b; t)</td>
<td align="center">X-ray</td>
<td align="center">Deformable regr</td>
<td align="center">fine regis</td>
<td align="center">Lung tumors</td>
<td align="center">NR</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B13">Chou et&#x20;al. (2013)</xref>
</td>
<td align="center">CBCT, s-NST</td>
<td align="center">X-ray</td>
<td align="center">Pose updates</td>
<td align="center">fine regis</td>
<td align="center">Head-and-neck, lungs</td>
<td align="center">Both</td>
<td align="center">R: 4, NR: 2</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B100">Varnavas et&#x20;al. (2013)</xref>
</td>
<td align="center">CT</td>
<td align="center">X-ray</td>
<td align="center">Pre-proc</td>
<td align="center">init., verif</td>
<td align="center">Spine/Vertebrae</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B14">Chou and Pizer (2014)</xref>
</td>
<td align="center">CT</td>
<td align="center">X-ray</td>
<td align="center">Deformable regr</td>
<td align="center">fine regis</td>
<td align="center">Lung</td>
<td align="center">NR</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B69">Mitrovi et&#x20;al. (2014)</xref>
</td>
<td align="center">CT</td>
<td align="center">X-ray</td>
<td align="center">N/A</td>
<td align="center">verif</td>
<td align="center">Spine/Vertebra</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B116">Zhao et&#x20;al. (2014)</xref>
</td>
<td align="center">CT</td>
<td align="center">X-ray</td>
<td align="center">Deformable regr</td>
<td align="center">fine regis</td>
<td align="center">Abdomen</td>
<td align="center">NR</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B4">Baka et&#x20;al. (2015)</xref>
</td>
<td align="center">CTA</td>
<td align="center">XA</td>
<td align="center">Pre-proc</td>
<td align="center">fine regis</td>
<td align="center">Coronary artery</td>
<td align="center">Both</td>
<td align="center">1 (temporal)</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B68">Mitrovi et&#x20;al. (2015)</xref>
</td>
<td align="center">DSA</td>
<td align="center">DSA</td>
<td align="center">Pose regr., updates</td>
<td align="center">init., fine regis</td>
<td align="center">Angiography</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B98">Varnavas et&#x20;al. (2015a)</xref>
</td>
<td align="center">CT</td>
<td align="center">X-ray</td>
<td align="center">Pre-proc</td>
<td align="center">init., verif</td>
<td align="center">Spine</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B103">Wu et&#x20;al. (2015)</xref>
</td>
<td align="center">mdl</td>
<td align="center">X-ray</td>
<td align="center">Pose regr</td>
<td align="center">init</td>
<td align="center">Knee</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B66">Miao et&#x20;al. (2016b)</xref>
</td>
<td align="center">Tool Model</td>
<td align="center">X-ray</td>
<td align="center">Pose regr</td>
<td align="center">fine regis</td>
<td align="center">Implants</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B65">Miao et&#x20;al. (2016a)</xref>
</td>
<td align="center">Implant and TEE mdl</td>
<td align="center">X-ray</td>
<td align="center">Pose regr</td>
<td align="center">fine regis</td>
<td align="center">Implants/TEE transducer</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B88">Tang and Scalzo (2016)</xref>
</td>
<td align="center">MRA</td>
<td align="center">DSA</td>
<td align="center">cost func</td>
<td align="center">fine regis</td>
<td align="center">Angiography</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B104">Wu et&#x20;al. (2016)</xref>
</td>
<td align="center">CT</td>
<td align="center">X-ray</td>
<td align="center">N/A</td>
<td align="center">verif</td>
<td align="center">Skull/Head</td>
<td align="center">R</td>
<td align="center">2</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B44">Hou et&#x20;al. (2017)</xref>
</td>
<td align="center">CT</td>
<td align="center">X-ray</td>
<td align="center">Pose regr</td>
<td align="center">init</td>
<td align="center">Thorax</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B75">Pei et&#x20;al. (2017)</xref>
</td>
<td align="center">CT</td>
<td align="center">X-ray</td>
<td align="center">Deformable regr</td>
<td align="center">fine regis</td>
<td align="center">Skull</td>
<td align="center">NR</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B107">Xie et&#x20;al. (2017)</xref>
</td>
<td align="center">CTA</td>
<td align="center">X-ray</td>
<td align="center">Pose regr</td>
<td align="center">fine regis</td>
<td align="center">Angiography</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B7">Bier et&#x20;al. (2018)</xref>
</td>
<td align="center">CT</td>
<td align="center">X-ray</td>
<td align="center">Pre-proc</td>
<td align="center">init</td>
<td align="center">Pelvis</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B12">Chen et&#x20;al. (2018)</xref>
</td>
<td align="center">CT</td>
<td align="center">X-ray</td>
<td align="center">Pre-proc</td>
<td align="center">feat. extract</td>
<td align="center">Spine</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B45">Hou et&#x20;al. (2018)</xref>
</td>
<td align="center">CT</td>
<td align="center">X-ray</td>
<td align="center">Pose regr</td>
<td align="center">init</td>
<td align="center">Thorax</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B64">Miao et&#x20;al. (2018)</xref>
</td>
<td align="center">CBCT</td>
<td align="center">X-ray</td>
<td align="center">Pose regr., updates</td>
<td align="center">fine regis</td>
<td align="center">Spine</td>
<td align="center">R</td>
<td align="center">2</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B90">Toth et&#x20;al. (2018)</xref>
</td>
<td align="center">Left ventr mdl</td>
<td align="center">X-ray</td>
<td align="center">Pose regr., updates</td>
<td align="center">fine regis</td>
<td align="center">Heart</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B115">Zhang et&#x20;al. (2018)</xref>
</td>
<td align="center">PCA Deformation Field</td>
<td align="center">X-ray</td>
<td align="center">Deformable regr</td>
<td align="center">init., fine regis</td>
<td align="center">Skull</td>
<td align="center">NR</td>
<td align="center">1 (temporal)</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B117">Zheng et&#x20;al. (2018)</xref>
</td>
<td align="center">CT, implant mdl</td>
<td align="center">X-ray</td>
<td align="center">Domain adaptation</td>
<td align="center">Pose regr., updates</td>
<td align="center">Spine and TEE transducer</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B6">Bier et&#x20;al. (2019)</xref>
</td>
<td align="center">CT</td>
<td align="center">X&#x2013;ray</td>
<td align="center">Pre-proc</td>
<td align="center">init</td>
<td align="center">Pelvis</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B23">Foote et&#x20;al. (2019)</xref>
</td>
<td align="center">CT</td>
<td align="center">X-ray</td>
<td align="center">Deformable regr</td>
<td align="center">fine regis</td>
<td align="center">Lung</td>
<td align="center">NR</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B37">Guan et&#x20;al. (2019)</xref>
</td>
<td align="center">Vasc mdl. (CT)</td>
<td align="center">DSA</td>
<td align="center">Pose regr</td>
<td align="center">fine regis</td>
<td align="center">Cardiovascular</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B56">Liao et&#x20;al. (2019)</xref>
</td>
<td align="center">CT or CBCT</td>
<td align="center">X-ray</td>
<td align="center">cost func</td>
<td align="center">fine regis</td>
<td align="center">Thorax</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B60">Luo et&#x20;al. (2019)</xref>
</td>
<td align="center">CT</td>
<td align="center">Broncho - scopy</td>
<td align="center">Pre-proc</td>
<td align="center">fine regis</td>
<td align="center">Bronchoscopy</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B82">Schaffert et&#x20;al. (2019)</xref>
</td>
<td align="center">CBCT</td>
<td align="center">X-ray</td>
<td align="center">cost func</td>
<td align="center">fine regis</td>
<td align="center">Spine</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B109">Yang and Chen (2019)</xref>
</td>
<td align="center">CT/MRI</td>
<td align="center">Stereo RGB</td>
<td align="center">Pre-proc</td>
<td align="center">feat. extract</td>
<td align="center">Head/face</td>
<td align="center">R</td>
<td align="center">2</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B17">Doerr et&#x20;al. (2020)</xref>
</td>
<td align="center">N/A</td>
<td align="center">X-ray</td>
<td align="center">Pre-proc</td>
<td align="center">init</td>
<td align="center">Spine, pedicle screws</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B24">Francois et&#x20;al. (2020)</xref>
</td>
<td align="center">MRI</td>
<td align="center">Laparo - scopy</td>
<td align="center">Pre-proc, cost func</td>
<td align="center">fine regis</td>
<td align="center">Uterus</td>
<td align="center">N/A</td>
<td align="center">N/A</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B28">Gao et&#x20;al. (2020c)</xref>
</td>
<td align="center">CT</td>
<td align="center">X-ray</td>
<td align="center">Pose regr</td>
<td align="center">Pose updates</td>
<td align="center">Pelvis</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B34">Grupp et&#x20;al. (2020c)</xref>
</td>
<td align="center">CT</td>
<td align="center">X-ray</td>
<td align="center">Pose regr</td>
<td align="center">Pose updates</td>
<td align="center">Pelvis</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B36">Gu et&#x20;al. (2020)</xref>
</td>
<td align="center">CT</td>
<td align="center">X-ray</td>
<td align="center">cost func</td>
<td align="center">Pose updates</td>
<td align="center">Pelvis</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B38">Guan et&#x20;al. (2020)</xref>
</td>
<td align="center">Aorta mdl</td>
<td align="center">DSA (synth)</td>
<td align="center">Deformable regr</td>
<td align="center">fine regis</td>
<td align="center">Cardiovascular</td>
<td align="center">NR</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B50">Karner et&#x20;al. (2020)</xref>
</td>
<td align="center">CT/MR</td>
<td align="center">RGB face img</td>
<td align="center">Pre-proc</td>
<td align="center">fine regis</td>
<td align="center">Face</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B55">Li et&#x20;al. (2020)</xref>
</td>
<td align="center">CBCT</td>
<td align="center">X-ray</td>
<td align="center">Deformable regr</td>
<td align="center">fine regis</td>
<td align="center">Skull</td>
<td align="center">NR</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B71">Neumann et&#x20;al. (2020)</xref>
</td>
<td align="center">MRA</td>
<td align="center">DSA</td>
<td align="center">cost func</td>
<td align="center">fine regis</td>
<td align="center">Angiography</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B81">Schaffert et&#x20;al. (2020a)</xref>
</td>
<td align="center">CBCT</td>
<td align="center">X-ray</td>
<td align="center">cost func</td>
<td align="center">fine regis</td>
<td align="center">Spine</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B83">Schaffert et&#x20;al. (2020b)</xref>
</td>
<td align="center">CBCT</td>
<td align="center">X-ray</td>
<td align="center">cost func</td>
<td align="center">fine regis</td>
<td align="center">Spine, head</td>
<td align="center">R</td>
<td align="center">1&#x2013;2</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B102">Wang et&#x20;al. (2020)</xref>
</td>
<td align="center">CT</td>
<td align="center">Bi - plane Fluoroscopy</td>
<td align="center">Pre-proc</td>
<td align="center">init</td>
<td align="center">Knee</td>
<td align="center">R</td>
<td align="center">2</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B106">Xiangqian et&#x20;al. (2020)</xref>
</td>
<td align="center">CT</td>
<td align="center">X-ray</td>
<td align="center">Pose regr</td>
<td align="center">fine regis</td>
<td align="center">Pelvis</td>
<td align="center">R</td>
<td align="center">1</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B114">Zhang et&#x20;al. (2020)</xref>
</td>
<td align="center">CT</td>
<td align="center">X-ray</td>
<td align="center">Post-proc</td>
<td align="center">fine regis</td>
<td align="center">Liver tumors</td>
<td align="center">NR</td>
<td align="center">20</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B19">Esfandiari et&#x20;al. (2021)</xref>
</td>
<td align="center">CT</td>
<td align="center">X-ray</td>
<td align="center">Pre-proc</td>
<td align="center">fine regis</td>
<td align="center">Spine, pedicle screws</td>
<td align="center">R</td>
<td align="center">2</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>A summary of the training and testing details for each study reviewed.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left"/>
<th align="center">Network</th>
<th colspan="5" align="center">Training</th>
<th align="center">Testing</th>
</tr>
<tr>
<th align="left">
<bold>Study ID</bold>
</th>
<th align="center">Architecture</th>
<th align="left">Data</th>
<th align="center">Domain Transfer</th>
<th align="center">Object&#x20;Number</th>
<th align="center">Anatomy&#x20;Specificity</th>
<th align="center">Technique</th>
<th align="center">Data</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">
<xref ref-type="bibr" rid="B9">Brost et&#x20;al. (2012)</xref>
</td>
<td align="center">PCA</td>
<td align="center">Real</td>
<td align="center">Segmentation</td>
<td align="center">Single</td>
<td align="center">N/A</td>
<td align="center">Unsupervised</td>
<td align="center">Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B58">Lin and Winey (2012)</xref>
</td>
<td align="center">N/A</td>
<td align="center">N/A</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Patient</td>
<td align="center">Unsupervised</td>
<td align="center">N/A</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B13">Chou et&#x20;al. (2013)</xref>
</td>
<td align="center">Linear Regression</td>
<td align="center">Synthetic</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Synthetic</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B15">Chou and Pizer (2013)</xref>
</td>
<td align="center">PCA</td>
<td align="center">Synthetic and Real</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Synthetic and Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B100">Varnavas et&#x20;al. (2013)</xref>
</td>
<td align="center">GHT</td>
<td align="center">Real</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy and Patient</td>
<td align="center">Unsupervised</td>
<td align="center">Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B14">Chou and Pizer (2014)</xref>
</td>
<td align="center">Random Forest</td>
<td align="center">Synthetic</td>
<td align="center">Gaussian Normalization</td>
<td align="center">Single</td>
<td align="center">Patient</td>
<td align="center">Supervised</td>
<td align="center">Synthetic and Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B69">Mitrovi et&#x20;al. (2014)</xref>
</td>
<td align="center">N/A</td>
<td align="center">Synthetic</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Patient</td>
<td align="center">Supervised</td>
<td align="center">Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B116">Zhao et&#x20;al. (2014)</xref>
</td>
<td align="center">Linear Regression</td>
<td align="center">Synthetic</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Patient</td>
<td align="center">Supervised</td>
<td align="center">Real and Synthetic</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B4">Baka et&#x20;al. (2015)</xref>
</td>
<td align="center">N/A</td>
<td align="center">Real</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Patient</td>
<td align="center">N/A</td>
<td align="center">Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B68">Mitrovi et&#x20;al. (2015)</xref>
</td>
<td align="center">N/A</td>
<td align="center">N/A</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Patient</td>
<td align="center">N/A</td>
<td align="center">Synthetic and Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B98">Varnavas et&#x20;al. (2015a)</xref>
</td>
<td align="center">GHT</td>
<td align="center">Synthetic</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy and Patient</td>
<td align="center">Unsupervised</td>
<td align="center">Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B103">Wu et&#x20;al. (2015)</xref>
</td>
<td align="center">PCA</td>
<td align="center">Synthetic</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Unsupervised</td>
<td align="center">Synthetic and Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B65">Miao et&#x20;al. (2016a)</xref>
</td>
<td align="center">Siamese CNNs</td>
<td align="center">Synthetic</td>
<td align="center">Realism Tuning</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B65">Miao et&#x20;al. (2016a)</xref>
</td>
<td align="center">Siamese CNNs</td>
<td align="center">Synthetic</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B88">Tang and Scalzo (2016)</xref>
</td>
<td align="center">Spectral Regression</td>
<td align="center">Real</td>
<td align="center">Abstraction</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B104">Wu et&#x20;al. (2016)</xref>
</td>
<td align="center">MLP</td>
<td align="center">Real</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">N/A</td>
<td align="center">Supervised</td>
<td align="center">Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B44">Hou et&#x20;al. (2017)</xref>
</td>
<td align="center">CaffeNet</td>
<td align="center">Synthetic</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Synthetic</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B75">Pei et&#x20;al. (2017)</xref>
</td>
<td align="center">CNNs</td>
<td align="center">Real</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Synthetic</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B107">Xie et&#x20;al. (2017)</xref>
</td>
<td align="center">CNNs</td>
<td align="center">Synthetic</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Synthetic</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B7">Bier et&#x20;al. (2018)</xref>
</td>
<td align="center">Sequential CNNs</td>
<td align="center">Synthetic</td>
<td align="center">Domain Generalization</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Synthetic</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B12">Chen et&#x20;al. (2018)</xref>
</td>
<td align="center">PCA</td>
<td align="center">Synthetic</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">N/A</td>
<td align="center">Supervised</td>
<td align="center">Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B45">Hou et&#x20;al. (2018)</xref>
</td>
<td align="center">N/A</td>
<td align="center">Synthetic</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Synthetic</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B64">Miao et&#x20;al. (2018)</xref>
</td>
<td align="center">Dilated CNNs</td>
<td align="center">Synthetic</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B90">Toth et&#x20;al. (2018)</xref>
</td>
<td align="center">CNNs</td>
<td align="center">Synthetic</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Synthetic</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B115">Zhang et&#x20;al. (2018)</xref>
</td>
<td align="center">VGG</td>
<td align="center">Synthetic</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Synthetic</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B117">Zheng et&#x20;al. (2018)</xref>
</td>
<td align="center">DA Module</td>
<td align="center">Pairwise Synthetic and Real</td>
<td align="center">Domain Adaptation</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Unsupervised</td>
<td align="center">Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B6">Bier et&#x20;al. (2019)</xref>
</td>
<td align="center">Sequential CNNs</td>
<td align="center">Synthetic</td>
<td align="center">Domain Generalization</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Synthetic and Cadaver</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B23">Foote et&#x20;al. (2019)</xref>
</td>
<td align="center">DenseNet</td>
<td align="center">Synthetic</td>
<td align="center">Equalization methods</td>
<td align="center">Single</td>
<td align="center">Patient</td>
<td align="center">Supervised</td>
<td align="center">Synthetic</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B37">Guan et&#x20;al. (2019)</xref>
</td>
<td align="center">CNNs</td>
<td align="center">Synthetic</td>
<td align="center">Retrain on patient data</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Synthetic</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B56">Liao et&#x20;al. (2019)</xref>
</td>
<td align="center">U-Net</td>
<td align="center">Real</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B60">Luo et&#x20;al. (2019)</xref>
</td>
<td align="center">Instance Learning</td>
<td align="center">Real</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B82">Schaffert et&#x20;al. (2019)</xref>
</td>
<td align="center">PointNet</td>
<td align="center">Real</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B109">Yang and Chen (2019)</xref>
</td>
<td align="center">Stacked Hourglass</td>
<td align="center">Real</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Synthetic</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B17">Doerr et&#x20;al. (2020)</xref>
</td>
<td align="center">Fast R-CNN</td>
<td align="center">Synthetic</td>
<td align="center">N/A</td>
<td align="center">Multiple</td>
<td align="center">Patient</td>
<td align="center">Supervised</td>
<td align="center">Synthetic</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B24">Francois et&#x20;al. (2020)</xref>
</td>
<td align="center">U-Net</td>
<td align="center">Real</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B28">Gao et&#x20;al. (2020c)</xref>
</td>
<td align="center">Spatial Transformer</td>
<td align="center">Synthetic</td>
<td align="center">Realistic Simulation</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Synthetic and Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B34">Grupp et&#x20;al. (2020c)</xref>
</td>
<td align="center">U-Net</td>
<td align="center">Real</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B36">Gu et&#x20;al. (2020)</xref>
</td>
<td align="center">DenseNet</td>
<td align="center">Synthetic</td>
<td align="center">Realistic Simulation</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Synthetic and Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B38">Guan et&#x20;al. (2020)</xref>
</td>
<td align="center">CNNs</td>
<td align="center">Synthetic</td>
<td align="center">Refine&#x20;on&#x20;patient&#x20;data</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Synthetic</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B50">Karner et&#x20;al. (2020)</xref>
</td>
<td align="center">Face-to-3D</td>
<td align="center">N/A</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">N/A</td>
<td align="center">Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B55">Li et&#x20;al. (2020)</xref>
</td>
<td align="center">ResNet</td>
<td align="center">Synthetic</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Self-supervised</td>
<td align="center">Synthetic&#x20;and&#x20;Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B71">Neumann et&#x20;al. (2020)</xref>
</td>
<td align="center">Siamese ResNet</td>
<td align="center">Synthetic</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Synthetic</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B81">Schaffert et&#x20;al. (2020a)</xref>
</td>
<td align="center">PointNet</td>
<td align="center">Real</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B83">Schaffert et&#x20;al. (2020b)</xref>
</td>
<td align="center">FlowNet-S</td>
<td align="center">Real</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B102">Wang et&#x20;al. (2020)</xref>
</td>
<td align="center">VGGs</td>
<td align="center">Real</td>
<td align="center">Domain Adaptation</td>
<td align="center">Multiple</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B106">Xiangqian et&#x20;al. (2020)</xref>
</td>
<td align="center">GoogleNet</td>
<td align="center">Synthetic</td>
<td align="center">Histogram Matching</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B114">Zhang et&#x20;al. (2020)</xref>
</td>
<td align="center">U-Net</td>
<td align="center">Real</td>
<td align="center">N/A</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Real</td>
</tr>
<tr>
<td align="left">
<xref ref-type="bibr" rid="B19">Esfandiari et&#x20;al. (2021)</xref>
</td>
<td align="center">PConvS</td>
<td align="center">Synthetic</td>
<td align="center">Heavy Augmentation</td>
<td align="center">Single</td>
<td align="center">Anatomy</td>
<td align="center">Supervised</td>
<td align="center">Synthetic</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The vast majority of papers (34) considers the 2D/3D registration between X-ray images and CT or cone-beam CT (CBCT) volumes, with the registration of X-ray images and 3D object models being a distant second (10). Other modality combinations included 2D RGB to 3D CT (2) and 3D MR (1), or did not specify the 3D modality (1). The clinical applications that motivate the development of those methods include orthopedics (19), with a focus on pelvis and spine, angiography (9), and radiation therapy (7), e.g., for tracking of lung or liver tumors, and cephalometry (4). We observe that eleven methods are explicitly concerned with finding a good initial parameter set to begin optimization, while 37 papers (also) describe approaches to achieve high fidelity estimates of the true geometric parameters. Further, four methods consider verification of the registration result. Resulting from the clinical task and registration phase distributions, most methods only perform rigid alignment (36), while nine methods consider non-rigid registration only and two approaches address both, rigid and non-rigid registration. To solve the alignment problem, 38 papers relied on a single 2D view, seven approaches used multiple views onto the same constant 3D scene, and two methods assumed the same view but used multiple images of a temporally dynamic 3D scene. Perhaps the most striking observation is that all but three (45) included studies only consider the registration of a single object. Two other studies that deal with multiple objects, however, are limited to object detection (<xref ref-type="bibr" rid="B17">Doerr et&#x20;al., 2020</xref>) and inpainting (<xref ref-type="bibr" rid="B19">Esfandiari et&#x20;al., 2021</xref>), respectively, and do not report registration results. The remaining study (<xref ref-type="bibr" rid="B34">Grupp et&#x20;al., 2020c</xref>) performs a 2D segmentation of multiple bones, but does not apply any additional learning to perform the registration. While there are three methods that do, in fact, describe the 2D/3D registration of multiple objects, i.e.,&#x20;vertebral bodies (<xref ref-type="bibr" rid="B100">Varnavas et&#x20;al., 2013</xref>; <xref ref-type="bibr" rid="B99">2015b</xref>) and knee anatomy (<xref ref-type="bibr" rid="B102">Wang et&#x20;al., 2020</xref>), the individual registrations are solved independently which inhibits information sharing to ease the optimization problem.</p>
<p>The focus of this review is the impact that machine learning has had on the contemporary state of 2D/3D registration and we will briefly introduce the five main themes that we identified here and then discuss them in greater detail in subsequent sections. From a high-level perspective of abstracting 2D/3D registration problems, they follow the flow of acquiring <italic>Data</italic>, fitting <italic>Model</italic> and solving the <italic>Objective</italic>. The five themes which we categorize logically aim at improving certain aspects of this flow. The themes are:<list list-type="simple">
<list-item>
<p>&#x2022; Contextualization (<xref ref-type="sec" rid="s2-4">Section 2.4</xref>): Instead of relying solely on the images themselves, the 14 methods in this theme use machine learning algorithms to extract semantic information from the 2D or 3D data, including landmark or object detection, semantic segmentation, or data quality classification. Doing so enables automatic initialization techniques, sophisticated regularizers, as well as techniques that handle inconsistencies between 2D and 3D&#x20;data.</p>
</list-item>
<list-item>
<p>&#x2022; Representation learning (<xref ref-type="sec" rid="s2-5">Section 2.5</xref>): Principal component analysis (PCA), among other techniques, are a common way to reduce the dimensionality of highly complex data&#x2014;in this case, rigid and non-rigid geometric transformations. Twelve papers used representation learning techniques as part of the registration pipeline.</p>
</list-item>
<list-item>
<p>&#x2022; Similarity modeling (<xref ref-type="sec" rid="s2-5">Section 2.5</xref>): Optimization-based image registration techniques conventionally rely on image similarity metrics that ideally should capture appearance differences due to both large and very fine scale geometric misalignment. Ten studies describe learning-based approaches to improve on similarity quantification.</p>
</list-item>
<list-item>
<p>&#x2022; Direct parameter regression (<xref ref-type="sec" rid="s2-7">Section 2.7</xref>): In contrast to iterative methods, direct parameter regression techniques seek to infer the&#x20;correct geometric parameters for 2D/3D alignment (either absolute with respect to a canonical 3D coordinate frame, or relative between a source and a target coordinate frame) directly from the 2D observation. A total of 22 manuscripts reported such approaches for either rigid or non-rigid registration.</p>
</list-item>
<list-item>
<p>&#x2022; Verification (<xref ref-type="sec" rid="s2-8">Section 2.8</xref>): Four studies used machine learning-based techniques to assess whether the estimated geometric parameters should be considered reliable.</p>
</list-item>
</list>
</p>
<p>High-level depictions of these themes are shown in <xref ref-type="fig" rid="F3">Figure&#x20;3</xref> and the respective sections below provide details for each, along with references to the individual studies. Refer to <xref ref-type="table" rid="T3">Table 3</xref> for a summary of themes attributed to each included study.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Illustrations of the main themes of machine learning in 2D/3D registration. The logic relationships of these themes are shown on top. We use a spine CT volume and a spine X-ray image as an example to show the generic 2D/3D projection geometry. Machine learning models are represented with a neural network icon. Key labels and parameters are presented and map to <xref ref-type="disp-formula" rid="e1">Eqs. 1</xref>, <xref ref-type="disp-formula" rid="e2">2</xref>.</p>
</caption>
<graphic xlink:href="frobt-08-716007-g003.tif"/>
</fig>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>A summary of each study&#x2019;s relation with the five themes of Contextualization, Representation learning, Direct parameter regression, Similarity modeling, and Verification.</p>
</caption>
<table>
<tbody valign="top">
<tr>
<td>
<inline-graphic xlink:href="frobt-08-716007-fx1.tif"/>
</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s2-4">
<title>2.4 Contextualization</title>
<p>Studies summarized in this theme use machine learning techniques to increase the information available to the 2D/3D registration problem by extracting semantic information from the 2D or 3D data (<xref ref-type="bibr" rid="B58">Lin and Winey, 2012</xref>; <xref ref-type="bibr" rid="B100">Varnavas et&#x20;al., 2013</xref>, <xref ref-type="bibr" rid="B99">2015b</xref>; <xref ref-type="bibr" rid="B7">Bier et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B12">Chen et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B6">Bier et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B60">Luo et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B109">Yang and Chen, 2019</xref>; <xref ref-type="bibr" rid="B34">Grupp et&#x20;al., 2020c</xref>; <xref ref-type="bibr" rid="B17">Doerr et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B24">Francois et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B50">Karner et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B102">Wang et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B19">Esfandiari et&#x20;al., 2021</xref>).</p>
<p>Using the notation of <xref ref-type="disp-formula" rid="e2">Eq. 2</xref>, these methods specify <inline-formula id="inf14">
<mml:math id="m16">
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>, or <inline-formula id="inf15">
<mml:math id="m17">
<mml:msup>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> although not all methods are necessarily integrated in the iterative optimization procedure. Perhaps the most prevalent approach here is the detection of anatomical landmarks on 2D images (<xref ref-type="bibr" rid="B7">Bier et&#x20;al., 2018</xref>, <xref ref-type="bibr" rid="B6">2019</xref>; <xref ref-type="bibr" rid="B109">Yang and Chen, 2019</xref>; <xref ref-type="bibr" rid="B34">Grupp et&#x20;al., 2020c</xref>; <xref ref-type="bibr" rid="B102">Wang et&#x20;al., 2020</xref>) to define correspondences with the respective 3D locations, which allows for either, explicit determination of <inline-formula id="inf16">
<mml:math id="m18">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
<mml:mmultiscripts>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:none/>
<mml:mprescripts/>
<mml:none/>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:mmultiscripts>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> using PnP (<xref ref-type="bibr" rid="B6">Bier et&#x20;al., 2019</xref>, <xref ref-type="bibr" rid="B7">2018</xref>) or stereo-reconstruction following 3D-3D matching (<xref ref-type="bibr" rid="B109">Yang and Chen, 2019</xref>; <xref ref-type="bibr" rid="B102">Wang et&#x20;al., 2020</xref>), or for the introduction of soft re-projection constraints as a regularizing term (<xref ref-type="bibr" rid="B34">Grupp et&#x20;al., 2020c</xref>). Another way of benefiting initialization through contextualization is object detection (<xref ref-type="bibr" rid="B58">Lin and Winey, 2012</xref>; <xref ref-type="bibr" rid="B100">Varnavas et&#x20;al., 2013</xref>, <xref ref-type="bibr" rid="B99">2015b</xref>; <xref ref-type="bibr" rid="B12">Chen et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B17">Doerr et&#x20;al., 2020</xref>). While all methods for landmark detection rely on deep CNNs, object detection achieved satisfactory results already using less complex learning models, i.e.,&#x20;templates in (<xref ref-type="bibr" rid="B58">Lin and Winey, 2012</xref>), PCA over object contours in (<xref ref-type="bibr" rid="B12">Chen et&#x20;al., 2018</xref>), and the Generalized Hough Transform in (<xref ref-type="bibr" rid="B100">Varnavas et&#x20;al., 2013</xref>; <xref ref-type="bibr" rid="B99">2015b</xref>). The drawback, however, is that most of these less complex approaches require patient-specific training since the models are unable to generalize beyond a single shape. <xref ref-type="bibr" rid="B17">Doerr et&#x20;al. (2020)</xref> describe a deep learning-based alternative, where the Fast R-CNN object detector is re-trained to return bounding boxes of 30 different screw types in varied&#x20;poses.</p>
<p>A complementary trend is the identification of image regions (<xref ref-type="bibr" rid="B24">Francois et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B19">Esfandiari et&#x20;al., 2021</xref>) or whole images (<xref ref-type="bibr" rid="B60">Luo et&#x20;al., 2019</xref>) that should not contribute to the optimization problem because of inconsistency. <xref ref-type="bibr" rid="B24">Francois et&#x20;al. (2020)</xref> use a U-net-like fully convolutional NN (FCN) to segment occluding contours of the uterus to reject regions while <xref ref-type="bibr" rid="B60">Luo et&#x20;al. (2019)</xref> identify and reject poor quality frames in bronchoscopy. <xref ref-type="bibr" rid="B19">Esfandiari et&#x20;al. (2021)</xref> consider mismatch introduced by intra-operative instrumentation. They contribute a non-blind image inpainting method using a FCN that seeks to restore the background anatomy after image regions corresponding to instruments were identified.</p>
<p>It is undeniable that the introduction of machine learning to contextualize 2D and 3D data enables novel techniques that quite substantially expand the tools one may rely on when designing a 2D/3D registration algorithm, and as such, are likely to become impactful. However, a general trend that we observed in most of these studies was that the impact of the contextualization component on the downstream registration task was not, in fact, evaluated. For example, while (<xref ref-type="bibr" rid="B6">Bier et&#x20;al., 2019</xref>) report quantitative results on real data of cadaveric specimens, it remains unclear whether the performance would be sufficient to actually initialize an image similarity-based 2D/3D registration algorithm. There are, of course, positive examples including (<xref ref-type="bibr" rid="B99">Varnavas et&#x20;al., 2015b</xref>; <xref ref-type="bibr" rid="B60">Luo et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B34">Grupp et&#x20;al., 2020c</xref>) that demonstrate the benefit of contextualization on overall pipeline performance&#x2014;Empirical demonstrations should strongly be preferred over arguments from authority.</p>
</sec>
<sec id="s2-5">
<title>2.5 Representation Learning</title>
<p>As highlighted in <xref ref-type="sec" rid="s1-2">Section 1.2</xref>, 2D/3D registration, especially in deformable scenarios, suffers from high dimensional parameter spaces and changes in any one of those parameters are not easily resolved due to limited information which creates ambiguity. In our review we found that unsupervised representation learning techniques are a widely adopted technique to reduce the dimensionality of the parameter space while introducing implicit regularization by confining possible solutions to the principal modes of variation across population- or patient-level observations. We identified 12 studies that propose such techniques or use them as part of the registration pipeline (<xref ref-type="bibr" rid="B9">Brost et&#x20;al., 2012</xref>; <xref ref-type="bibr" rid="B58">Lin and Winey, 2012</xref>; <xref ref-type="bibr" rid="B15">Chou and Pizer, 2013</xref>, <xref ref-type="bibr" rid="B14">2014</xref>; <xref ref-type="bibr" rid="B13">Chou et&#x20;al., 2013</xref>; <xref ref-type="bibr" rid="B116">Zhao et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B4">Baka et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B75">Pei et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B12">Chen et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B115">Zhang et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B23">Foote et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B55">Li et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B114">Zhang et&#x20;al., 2020</xref>).</p>
<p>PCA is by far the most prevalent method for representation learning and is used in all but one study. This specific study, however, used by far the most views <italic>v</italic>&#x20;&#x3d; 20 for initial estimation of a low resolution vector field, which was then regularized by projection onto a deep learning-based population model (<xref ref-type="bibr" rid="B114">Zhang et&#x20;al., 2020</xref>). We found that methods designed for cephalometry were distinct from all other approaches as their primary goal is not generally 2D/3D registration, but 3D reconstruction of the skull given a 2D X-ray. Among the papers included in this review, this problem is often formulated as the deformable 2D/3D registration between a lateral X-ray image of the skull and a 3D atlas using a PCA deformation model, the principal components <italic>&#x3c9;</italic>
<sub>
<italic>D</italic>
</sub> of which are estimated via a prior set of 3D/3D registrations (<xref ref-type="bibr" rid="B75">Pei et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B115">Zhang et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B55">Li et&#x20;al., 2020</xref>). Consequently, these methods rely on population-level models and are thus different from methods used for radiation therapy (<xref ref-type="bibr" rid="B13">Chou et&#x20;al., 2013</xref>; <xref ref-type="bibr" rid="B15">Chou and Pizer, 2013</xref>, <xref ref-type="bibr" rid="B14">2014</xref>; <xref ref-type="bibr" rid="B116">Zhao et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B23">Foote et&#x20;al., 2019</xref>) and angiography (<xref ref-type="bibr" rid="B9">Brost et&#x20;al., 2012</xref>; <xref ref-type="bibr" rid="B4">Baka et&#x20;al., 2015</xref>), which rely on patient-specific models that are built pre- and intra-operatively, respectively. It is worth mentioning that, while most methods rely on PCA to condense deformable motion parametrizations, it has also been found useful to identify and focus on the primary modes of variation in rigid registration (<xref ref-type="bibr" rid="B9">Brost et&#x20;al., 2012</xref>).</p>
<p>We note that most studies in this theme do not consider rigid alignment prior to deformable parameter estimation, e.g., by assuming perfectly lateral radiographs of the skull for cephalometry or perfectly known imaging geometry in radiation therapy. Except for few exceptional cases with highly specialized instrumentation, for example (<xref ref-type="bibr" rid="B15">Chou and Pizer, 2013</xref>) that relied on on-board CBCT imaging, assumptions around rigid alignment seem to be unjustified, which may suggest that the performance estimates need to be interpreted with care. This is further emphasized by the fact that many studies are only evaluated on synthetic data (which we fear may have sometimes been generated by sampling the PCA model also used for registration, introducing inverse crime) and may not have paired 3D data for extensive quantitative evaluation.</p>
</sec>
<sec id="s2-6">
<title>2.6 Similarity Modeling</title>
<p>As we established in <xref ref-type="sec" rid="s1-2">Section 1.2</xref> for optimization-based 2D/3D registration algorithms, the cost function&#x2014;or similarity metric&#x2014;<italic>S</italic>(&#x22c5;, &#x22c5;) is among the most important components since it will determine parameter updates. It is well known that most commonly used metrics fail to accurately represent the distances in geometric parameter space that generate the mismatch between the current observations. It is thus not surprising that ten studies describe methods to better model and quantify the similarity between the source and target images to increase the capture range, and thus, the likelihood of registration success (<xref ref-type="bibr" rid="B88">Tang and Scalzo, 2016</xref>; <xref ref-type="bibr" rid="B56">Liao et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B82">Schaffert et&#x20;al., 2019</xref>, <xref ref-type="bibr" rid="B83">2020b</xref>; <xref ref-type="bibr" rid="B81">Schaffert et&#x20;al., 2020a</xref>; <xref ref-type="bibr" rid="B28">Gao et&#x20;al., 2020c</xref>; <xref ref-type="bibr" rid="B34">Grupp et&#x20;al., 2020c</xref>; <xref ref-type="bibr" rid="B24">Francois et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B36">Gu et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B71">Neumann et&#x20;al., 2020</xref>).</p>
<p>Some studies propose novel image similarity functions <inline-formula id="inf17">
<mml:math id="m19">
<mml:msup>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mo>&#x22c5;</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> that, analogous to traditional similarity metrics, accept as input the source and target image and return a scalar or vector that is related to the mismatch in parameter space (<xref ref-type="bibr" rid="B24">Francois et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B36">Gu et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B88">Tang and Scalzo, 2016</xref>; <xref ref-type="bibr" rid="B71">Neumann et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B34">Grupp et&#x20;al., 2020c</xref>; <xref ref-type="bibr" rid="B27">Gao et&#x20;al., 2020b</xref>). Among those, two methods rely on regularization: <xref ref-type="bibr" rid="B34">Grupp et&#x20;al. (2020c)</xref> detect anatomical landmarks to expand an analytic similarity function with landmark-reprojection constraints to enhance the capture range of an intensity-based strategy, while <xref ref-type="bibr" rid="B24">Francois et&#x20;al. (2020)</xref> segment occluding contours to constrain similarity evaluation to salient regions. The other four methods use machine learning models to approximate a geometric parameter distance function based on the input images. To this end, <xref ref-type="bibr" rid="B36">Gu et&#x20;al. (2020)</xref> and <xref ref-type="bibr" rid="B27">Gao et&#x20;al. (2020b)</xref> estimate the geodesic in Riemannian tangent space between the source and target camera poses, which in an ideal case results in a convex similarity function. <xref ref-type="bibr" rid="B88">Tang and Scalzo (2016)</xref> learn a more expressive feature descriptor to better quantify the mismatch in vasculature registration, and for a similar application, <xref ref-type="bibr" rid="B71">Neumann et&#x20;al. (2020)</xref> regress the disparity between corresponding points in source and target images to quantify image dissimilarity. Both methods rely on concepts that are limited to sparse objects, such as vessels. Different to image-based similarity metrics, four studies describe methods for keypoint matching to compute image similarity (<xref ref-type="bibr" rid="B82">Schaffert et&#x20;al., 2019</xref>, <xref ref-type="bibr" rid="B81">2020a</xref>,<xref ref-type="bibr" rid="B83">b</xref>; <xref ref-type="bibr" rid="B56">Liao et&#x20;al., 2019</xref>). To this end, <xref ref-type="bibr" rid="B56">Liao et&#x20;al. (2019)</xref> train a network to establish keypoint correspondences between the source and target images. Because the geometric parameters of the source image are known, the unknown target parameters can be recovered relative to the source image using PnP-like methods. Finally, a series of three papers (<xref ref-type="bibr" rid="B82">Schaffert et&#x20;al., 2019</xref>, <xref ref-type="bibr" rid="B81">2020a</xref>,<xref ref-type="bibr" rid="B83">b</xref>) characterizes a learning-based method to adaptively weight point correspondences that are established to quantify the degree of misalignment.</p>
<p>As methods in this theme have primarily focused on expanding the capture range of contemporary similarity metrics, the potential shortcomings of other components of the registration pipeline, such as the optimizers, remain unaffected. While we introduced (<xref ref-type="bibr" rid="B27">Gao et&#x20;al., 2020b</xref>) in the context of similarity learning, the method also describes a fully differentiable 2D/3D registration pipeline that addresses optimization aspects. This enables both, end-to-end learning of <inline-formula id="inf18">
<mml:math id="m20">
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> and/or <inline-formula id="inf19">
<mml:math id="m21">
<mml:msup>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mo>&#x22c5;</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> during training as well as analytic gradient-based optimization using backpropagation during application. Further, we found that most emphasis was given on increasing the capture range of the registration pipeline and very little, if any, attention is paid to increasing the resolution and precision of these metrics. Especially in single-view registration scenarios, which we have identified to be most prevalent, it is well known that certain DoFs cannot be resolved with high accuracy. Therefore, developing methods that not only increase the capture range but also the precision of 2D/3D registration pipelines should be of high priority.</p>
</sec>
<sec id="s2-7">
<title>2.7 Direct Parameter Regression</title>
<p>So far and especially in the context of <xref ref-type="sec" rid="s2-6">Section 2.6</xref>, 2D/3D registration was motivated as an optimization-based process that compares the source image, generated using the current geometric parameter estimate, with the desired target <italic>y</italic>
<sub>
<italic>v</italic>
</sub> using some cost function. However, this problem can also be formulated in the context of regression learning, where a machine learning algorithm directly predicts the desired geometric parameters <inline-formula id="inf20">
<mml:math id="m22">
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
<mml:mmultiscripts>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:none/>
<mml:mprescripts/>
<mml:none/>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:mmultiscripts>
</mml:math>
</inline-formula>, and/or <italic>&#x3c9;</italic>
<sub>
<italic>D</italic>
</sub> from <italic>y</italic>
<sub>
<italic>v</italic>
</sub>, or from both <italic>y</italic>
<sub>
<italic>v</italic>
</sub> and <inline-formula id="inf21">
<mml:math id="m23">
<mml:msub>
<mml:mrow>
<mml:mover>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>. Such methods partially or completely skip the step of precise modeling of image formation or similarity, and instead build up the knowledge in a data-driven manner. We have identified 22 studies that describe methods for direct parameter regression (<xref ref-type="bibr" rid="B15">Chou and Pizer, 2013</xref>, <xref ref-type="bibr" rid="B14">2014</xref>; <xref ref-type="bibr" rid="B13">Chou et&#x20;al., 2013</xref>; <xref ref-type="bibr" rid="B116">Zhao et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B68">Mitrovi et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B103">Wu et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B65">Miao et&#x20;al., 2016a</xref>; <xref ref-type="bibr" rid="B66">Miao et&#x20;al., 2016b</xref>; <xref ref-type="bibr" rid="B44">Hou et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B75">Pei et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B107">Xie et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B45">Hou et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B64">Miao et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B90">Toth et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B115">Zhang et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B117">Zheng et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B37">Guan et&#x20;al., 2019</xref>, <xref ref-type="bibr" rid="B38">2020</xref>; <xref ref-type="bibr" rid="B23">Foote et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B28">Gao et&#x20;al., 2020c</xref>; <xref ref-type="bibr" rid="B55">Li et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B106">Xiangqian et&#x20;al., 2020</xref>).</p>
<p>Relying on parameter regression solely based on the target image <italic>y</italic>
<sub>
<italic>v</italic>
</sub> is particularly prevalent for radiation therapy, where the main application is the regression of the principal components of a patient-specific PCA motion model (<xref ref-type="bibr" rid="B116">Zhao et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B13">Chou et&#x20;al., 2013</xref>; <xref ref-type="bibr" rid="B14">Chou and Pizer, 2014</xref>, <xref ref-type="bibr" rid="B15">2013</xref>; <xref ref-type="bibr" rid="B23">Foote et&#x20;al., 2019</xref>). The importance of regression learning is primarily attributed to the substantially decreased run-time that enables close to real-time tumor tracking in 3D. Methods directed at cephalometry (<xref ref-type="bibr" rid="B55">Li et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B115">Zhang et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B75">Pei et&#x20;al., 2017</xref>) are identical in methodology to the radiation therapy methods. As noted in <xref ref-type="sec" rid="s2-5">Section 2.5</xref>, most methods here limit themselves to shape estimation and assume that a global rigid alignment is either performed prior or unnecessary. The remaining 14 methods consider rigid parameter regression, and we differentiate methods that infer pose directly from the target <italic>y</italic>
<sub>
<italic>v</italic>
</sub> (<xref ref-type="bibr" rid="B106">Xiangqian et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B103">Wu et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B44">Hou et&#x20;al., 2017</xref>, <xref ref-type="bibr" rid="B45">2018</xref>; <xref ref-type="bibr" rid="B107">Xie et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B37">Guan et&#x20;al., 2019</xref>, <xref ref-type="bibr" rid="B38">2020</xref>), and methods that process both <italic>y</italic>
<sub>
<italic>v</italic>
</sub> and <inline-formula id="inf22">
<mml:math id="m24">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> (<xref ref-type="bibr" rid="B90">Toth et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B65">Miao et&#x20;al., 2016a</xref>,<xref ref-type="bibr" rid="B66">b</xref>, <xref ref-type="bibr" rid="B64">2018</xref>; <xref ref-type="bibr" rid="B117">Zheng et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B28">Gao et&#x20;al., 2020c</xref>; <xref ref-type="bibr" rid="B68">Mitrovi et&#x20;al., 2015</xref>), and therefore, can run iteratively. Methods that rely on the target image only are relatively straight-forward and generally train a standard feed-forward CNN architecture to regress pose on large datasets comprising of multiple independent objects or anatomies. While the simplicity of these approaches is appealing, a general concern is that poses are absolute which requires the definition of a canonical 3D coordinate system. This challenge is mitigated for applications that consider an instrument or tool, because such a canonical system can be readily defined; however, establishing this reference frame is considerably more effortful for patient anatomy and may require (group-wise) 3D/3D registrations. Unfortunately, none of the studies reviewed here describes a dedicated effort to establish such a canonical reference frame, suggesting that those methods will eventually hit a performance ceiling that is determined by the misalignment within the reference coordinate systems in the training data. Methods that regress pose between <italic>y</italic>
<sub>
<italic>v</italic>
</sub> and <inline-formula id="inf23">
<mml:math id="m25">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> avoid the aforementioned concern because poses are relative rather than absolute, however similarly to conventional techniques, they require an initialization. There is some flexibility in how the information from source and target images is combined. (<xref ref-type="bibr" rid="B65">Miao et&#x20;al., 2016a</xref>,<xref ref-type="bibr" rid="B66">b</xref>, <xref ref-type="bibr" rid="B64">2018</xref>; <xref ref-type="bibr" rid="B117">Zheng et&#x20;al., 2018</xref>). use the image residual between <italic>y</italic>
<sub>
<italic>v</italic>
</sub> and <inline-formula id="inf24">
<mml:math id="m26">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula> in a region of interest around the projected tool model. Because only a small part of the target image is used, the initialization must be sufficiently close. <xref ref-type="bibr" rid="B90">Toth et&#x20;al. (2018)</xref> extract features using a CNN from both source and target image independently and then concatenate them for pose regression using fully connected layers. Rather than regressing pose directly, <xref ref-type="bibr" rid="B28">Gao et&#x20;al. (2020c)</xref> introduce a fully differentiable 2D/3D pipeline and compare <inline-formula id="inf25">
<mml:math id="m27">
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf26">
<mml:math id="m28">
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> using a simple <italic>L</italic>
<sub>2</sub> distance as the similarity function <italic>S</italic>(&#x22c5;, &#x22c5;). The parameters <italic>&#x3b8;</italic>
<sub>
<italic>y</italic>
</sub> and <italic>&#x3b8;</italic>
<sub>
<italic>x</italic>
</sub> are then optimized using a double backward pass on the computational graph, such that the gradient <inline-formula id="inf27">
<mml:math id="m29">
<mml:mi>&#x2202;</mml:mi>
<mml:mi>S</mml:mi>
<mml:mo>/</mml:mo>
<mml:mi>&#x2202;</mml:mi>
<mml:mspace width="0.17em"/>
<mml:mmultiscripts>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:none/>
<mml:mprescripts/>
<mml:none/>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:mmultiscripts>
</mml:math>
</inline-formula> aligns with the geodesic between the current pose estimate and the desired target&#x20;pose.</p>
<p>Pose regression directly from images is appealing because it may result in substantially faster convergence, potentially with a single forward pass of a CNN. We found that all methods included in this review are limited to registration of a single object and it remains unclear how these methods would apply to multiple objects, in part, because of the combinatorial explosion of relative poses. Further, methods that solely rely on the target image never involve neither the 3D data nor the source images created from it. This may be problematic, because it is unclear how these methods would verify that this specific 2D/3D registration data satisfy, among other things, the canonical coordinate frame assumption.</p>
</sec>
<sec id="s2-8">
<title>2.8 Verification</title>
<p>We identified four studies that leverage machine learning techniques for verifying whether a registration process produced satisfactory geometric parameter estimates (<xref ref-type="bibr" rid="B104">Wu et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B69">Mitrovi et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B100">Varnavas et&#x20;al., 2013</xref>, <xref ref-type="bibr" rid="B99">2015b</xref>).</p>
<p>An interesting observation is that none of the included methods make use of the resulting images or overlays, but rather rely on low-dimensional data. <xref ref-type="bibr" rid="B100">Varnavas et&#x20;al. (2013</xref>, <xref ref-type="bibr" rid="B99">2015b)</xref>, for example, rely on the cost function value and the relative poses of multiple objects that are registered independently as input to a support vector machine classifier. Similarly, <xref ref-type="bibr" rid="B104">Wu et&#x20;al. (2016)</xref> train a shallow NN to classify registration success based on hand-crafted features of the objective function surface around the registration estimate. Finally, <xref ref-type="bibr" rid="B69">Mitrovi et&#x20;al. (2014)</xref> compare a registration estimate to known local minima and thresholds to determine success/failure, which worked well but may be limited in practice as the approach seems to assume knowledge of the correct solution.</p>
<p>Some studies included in this theme stand out, in that they are certainly mature and were demonstrated to work well on comparably large amounts of real clinical data, such as (<xref ref-type="bibr" rid="B99">Varnavas et&#x20;al., 2015b</xref>) and (<xref ref-type="bibr" rid="B104">Wu et&#x20;al., 2016</xref>); unfortunately however, these methods are not general purpose as they rely on the registration of multiple objects and the availability of two orthogonal views, respectively. Compared to the other four themes and maybe even in general, there has been very little emphasis on and innovation in the development of more robust and general purpose methods for the verification of 2D/3D registration results, which we perceive to be a regrettable omission.</p>
</sec>
</sec>
<sec id="s3">
<title>3 Perspective</title>
<p>The introduction of machine learning methodology to the 2D/3D registration workflow was partly motivated by persistent challenges, which were not yet satisfactorily addressed by heuristic and purely algorithmic approaches. Upon review of the recent literature in <xref ref-type="sec" rid="s2">Section 2</xref>, perhaps the most pressing question is: Has machine learning resolved any of those open problems? We begin our discussion using the categorization of <xref ref-type="sec" rid="s1-2">Section 1.2</xref>:<list list-type="simple">
<list-item>
<p>&#x2022; <bold>Narrow capture range of similarity metrics</bold>: We identified many methods that quantitatively demonstrate increased capture range of 2D/3D registration pipelines. The means to accomplishing this, however, are diverse. Several methods describe innovative learning-based similarity metrics that better reflect distances in geometric parameter space, while other studies present (semi-)global initialization or regularization techniques which may rely on contextual data. The demonstrated improvements are generally significant, suggesting huge potential for machine learning in regards to this particular challenge. Contextualization, such as landmark detection and segmentation, can mimic user input to enable novel paradigms for initialization while also providing a clear interface for human-computer interaction (<xref ref-type="bibr" rid="B3">Amrehn et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B18">Du et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B113">Zapaishchykova et&#x20;al., 2021</xref>). Similarity modeling using learning-based techniques&#x2014;potentially combined with contextual information&#x2014;is a similarly powerful concept that finds broad application also in slice-to-volume registration, e.g., for ultrasound to MRI (<xref ref-type="bibr" rid="B47">Hu et&#x20;al., 2018</xref>), which was beyond this review.</p>
</list-item>
<list-item>
<p>&#x2022; <bold>Ambiguity</bold>: While several methods report an overall improved registration performance when using their novel, machine learning-enhanced algorithms, we did not register any method with a particular focus on reducing ambiguity. Because performance is usually reported as a summary statistic over multiple DoFs and instances, it is unclear to what extent performance increases should be attributed to 1) registering individual instances more precisely (which would suggest reduced ambiguity) or 2) succeeding more often (which would rather emphasize the importance of the capture range).</p>
</list-item>
<list-item>
<p>&#x2022; <bold>High dimensional optimization problems</bold>: We found that representation learning techniques, currently dominated by PCA, are a clearly established tool to reduce the dimensionality of deformable 2D/3D registration problems, and may even be useful for rigid alignment. In those lower dimensional spaces, e.g., the six parameters of a rigid transformations or the principal components of a PCA model, direct pose regression from the target and/or source images is a clearly established line of research. These approaches supersede optimization with a few forward passes of a machine learning model, which makes them comparably fast. This is particularly appealing for traditionally time critical applications, such as tumor tracking in radiation therapy. Complementary approaches that seek to enable fully differentiable 2D/3D registration pipelines for end-to-end training and analytical optimization of geometric parameters, such as (<xref ref-type="bibr" rid="B28">Gao et&#x20;al., 2020c</xref>; <xref ref-type="bibr" rid="B85">Shetty et&#x20;al., 2021</xref>), combine elements of optimization and inference. While these methods are in a development stage, their flexibility may prove a great strength in developing solutions that meet clinical&#x20;needs.</p>
</list-item>
</list>
</p>
<p>Certainly, some of the above ideas will become increasingly important in the quest to accelerate and improve 2D/3D registration pipelines. This is because image-based navigation techniques (<xref ref-type="bibr" rid="B87">Sugano, 2003</xref>; <xref ref-type="bibr" rid="B48">Hummel et&#x20;al., 2008</xref>; <xref ref-type="bibr" rid="B91">Tucker et&#x20;al., 2018</xref>) as well as visual servoing of surgical robots (<xref ref-type="bibr" rid="B110">Yi et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B29">Gao et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B92">Unberath et&#x20;al., 2019</xref>) will also require high-precision 2D/3D registration at video frame-rates.<list list-type="simple">
<list-item>
<p>&#x2022; <bold>Verification and uncertainty</bold>: Compared to the other challenges and themes, very few studies used machine learning to benefit verification of registration results. The studies that did, however, reported promising performance even with rather simple machine learning techniques on low dimensional data, i.e.,&#x20;cost function properties rather than images themselves. Quantifying uncertainty in registration slowly emerges as a research thrust in 2D/2D and 3D/3D registration (<xref ref-type="bibr" rid="B78">Pluim et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B86">Sinha et&#x20;al., 2019</xref>). It is our firm belief that the first generally applicable methods for confidence assignment and uncertainty estimation in 2D/3D registration will become trend-setting due to the nature of the clinical applications that 2D/3D registration enables.</p>
</list-item>
</list>
</p>
<p>Despite this positive prospect on the utility of machine learning for 2D/3D registration, we noted certain trends and recurring shortcomings in our review that we will discuss next. As was done before, we either specify the number of studies satisfying a specific condition in the text or state it in parentheses.</p>
<sec id="s3-1">
<title>3.1 Preserving Improvements Under Domain Shift From Training to Deployment</title>
<p>An omnipresent concern in the development of machine learning-based components for 2D/3D registration, but everywhere really, is the availability of or access to large amounts of relevant data. In some cases, the data problem amounts to a simple opportunity cost, e.g., for automation of manually performed tasks such as landmark detection. It should be noted, however, that even for this &#x201c;simple&#x201d; case to succeed, many conditions must be met including ethical review board approval, digital medicine infrastructure, and methods to reliably annotate the data. In many other&#x2014;from a research perspective perhaps more exciting&#x2014;cases, this retrospective data collection paradigm is infeasible because the task to be performed with a machine learning algorithm is not currently performed in clinical practice. The more obvious examples are visual servoing of novel robotic surgery platforms (<xref ref-type="bibr" rid="B29">Gao et&#x20;al., 2019</xref>) or robotic imaging paradigms that alter how data is acquired (<xref ref-type="bibr" rid="B112">Zaech et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B89">Thies et&#x20;al., 2020</xref>). Despite the fact that most studies included in this review address use-cases that fall under the &#x201c;opportunity cost&#x201d; category, we found that only 16 out of the 48 studies used real clinical or cadaveric data to train the machine learning algorithms. All remaining papers relied on synthetic data, namely digitally reconstructed radiographs (DRRs), that were simulated from 3D CT scans to either replace or supplement (small) real datasets.</p>
<p>Training on synthetic data has clear advantages because large datasets and corresponding annotations can be generated with relatively little effort. In addition, rigid pose and deformation parameters are perfectly known by design thus creating an unbiased learning target. Contemporary deep learning-based techniques enable the mapping of very complex functions directly from high-dimensional input data at the cost of heavily over-parameterized models that require as much data as possible to learn sensible associations. These unrelenting requirements, especially with respect to annotation, are not easily met with clinical data collection. Indeed, of the 32 studies that describe deep learning-based methods, 24 trained on synthetic data (seven trained on real data, and one did not train at all but used a pre-trained network). It is evident that data synthesis is an important idea that enables research on creative approaches that contribute to the advancement of 2D/3D registration.</p>
<p>Unfortunately, there are also substantial drawbacks of synthetic data training. Trained machine learning algorithms approximate the target function only on a compact domain (<xref ref-type="bibr" rid="B118">Zhou, 2020</xref>), and their behaviour outside this domain is unspecified. Because synthesized data is unlikely to capture all characteristics of data acquired using real systems and from real patients, the domains defined by the synthetic data used for training and the real data used during application will not, or only partially, overlap. This phenomenon is known as domain shift. Therefore, applying a synthetic data-trained machine learning model to real data is likely to result in substantially deteriorated performance (<xref ref-type="bibr" rid="B93">Unberath et&#x20;al., 2018</xref>, <xref ref-type="bibr" rid="B92">2019</xref>). While this problem exists for all machine learning algorithms, it is particularly prevalent for modern deep learning algorithms as they operate on high-resolution images directly, where mismatch in characteristics (such as noise, contrast, &#x2026; ) is most pronounced.<xref ref-type="fn" rid="FN2">
<sup>2</sup>
</xref> Indeed, among the seven studies that trained deep CNNs on synthetic data and evaluated in a way that allowed for comparisons between synthetic and real data performance, we found quite substantial performance drops (<xref ref-type="bibr" rid="B64">Miao et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B6">Bier et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B28">Gao et&#x20;al., 2020c</xref>; <xref ref-type="bibr" rid="B17">Doerr et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B36">Gu et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B38">Guan et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B55">Li et&#x20;al., 2020</xref>). Worse, three studies used different evaluation metrics in synthetic and real experiments so that comparison was not possible (<xref ref-type="bibr" rid="B65">Miao et&#x20;al., 2016a</xref>; <xref ref-type="bibr" rid="B90">Toth et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B19">Esfandiari et&#x20;al., 2021</xref>), and perhaps worst, ten studies that trained on synthetic data never even tested (meaningfully) on real data (<xref ref-type="bibr" rid="B44">Hou et&#x20;al., 2017</xref>, <xref ref-type="bibr" rid="B45">2018</xref>; <xref ref-type="bibr" rid="B75">Pei et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B107">Xie et&#x20;al., 2017</xref>; <xref ref-type="bibr" rid="B7">Bier et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B23">Foote et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B37">Guan et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B109">Yang and Chen, 2019</xref>; <xref ref-type="bibr" rid="B71">Neumann et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B114">Zhang et&#x20;al., 2020</xref>). To mitigate the negative impact of domain shift, there is growing interest in domain adaptation and generalization techniques. Several methods do, in fact, already incorporate some of those techniques (<xref ref-type="bibr" rid="B117">Zheng et&#x20;al. (2018)</xref>; <xref ref-type="bibr" rid="B102">Wang et&#x20;al. (2020)</xref> use domain adaptation to align feature representations of real and synthetic data: <xref ref-type="bibr" rid="B36">Gu et&#x20;al. (2020)</xref>; <xref ref-type="bibr" rid="B19">Esfandiari et&#x20;al. (2021)</xref>; <xref ref-type="bibr" rid="B23">Foote et&#x20;al. (2019)</xref> use heavy pixel-level transformations that approximate domain randomization, and <xref ref-type="bibr" rid="B106">Xiangqian et&#x20;al. (2020)</xref>; <xref ref-type="bibr" rid="B7">Bier et&#x20;al. (2018</xref>, <xref ref-type="bibr" rid="B6">2019)</xref>; <xref ref-type="bibr" rid="B36">Gu et&#x20;al. (2020)</xref>; <xref ref-type="bibr" rid="B28">Gao et&#x20;al. (2020c)</xref>; <xref ref-type="bibr" rid="B65">Miao et&#x20;al. (2016a)</xref> rely on realistic synthesis to reduce domain shift, e.&#x121;., using open-source physics-based DRR engines (<xref ref-type="bibr" rid="B93">Unberath et&#x20;al., 2018</xref>, <xref ref-type="bibr" rid="B92">2019</xref>)). However, as we have outlined above their impact is not yet strongly felt. For the new and exciting 2D/3D registration pipelines reviewed here to impact image-based navigation, we must develop novel techniques to increase the robustness under domain shift to preserve the method&#x2019;s level of performance when transferring from training to deployment domain.</p>
</sec>
<sec id="s3-2">
<title>3.2 Experimental Design, Reporting, and Reproducibility</title>
<p>Quantitatively evaluating registration performance is clearly important. The error metrics that were used included the standard registration pose error (commonly separated into translational and rotational DoFs), keypoint distances (<xref ref-type="bibr" rid="B14">Chou and Pizer, 2014</xref>; <xref ref-type="bibr" rid="B114">Zhang et&#x20;al., 2020</xref>), and the mean target registration error (mTRE) in 3D, and varied other metrics in 2D, such as reprojection distances (<xref ref-type="bibr" rid="B6">Bier et&#x20;al., 2019</xref>), segmentation or overlap DICE score (<xref ref-type="bibr" rid="B114">Zhang et&#x20;al., 2020</xref>), or contour differences (<xref ref-type="bibr" rid="B12">Chen et&#x20;al., 2018</xref>). Other metrics that are not uniquely attributable to a domain include the registration capture range (<xref ref-type="bibr" rid="B82">Schaffert et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B19">Esfandiari et&#x20;al., 2021</xref>) and the registration success rate (<xref ref-type="bibr" rid="B100">Varnavas et&#x20;al., 2013</xref>; <xref ref-type="bibr" rid="B69">Mitrovi et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B66">Miao et&#x20;al., 2016b</xref>; <xref ref-type="bibr" rid="B109">Yang and Chen, 2019</xref>; <xref ref-type="bibr" rid="B81">Schaffert et&#x20;al., 2020a</xref>). There are no standard routines to define the successful registrations.</p>
<p>The wealth of evaluation strategies and metrics can, in part, be attributed to the fact that different clinical applications necessitate different conditions to be met. For example, in 2D/3D deformable registration for tumor tracking during radiation therapy, accurately recovering the 3D tumor shape and position (quantified well using, e.g., the DICE score of true and estimated 3D position over time) is much more relevant than a Euclidean distance between the deformation field parameters, which would describe irrelevant errors far from the region of interest. We very clearly advocate <italic>for</italic> the use of task-specific evaluation metrics, since ultimately those metrics are the ones that will distinguish success from failure in the specific clinical application. However, we also believe that the lack of universally accepted reporting guidelines, error metrics, and datasets is a severe shortcoming that has unfortunate consequences, such as a high risk of duplicated efforts and non-interpretable performance reporting. We understand the most pressing needs to be:<list list-type="simple">
<list-item>
<p>&#x2022; <bold>Standardizing evaluation metrics</bold>: An issue that appears to have become more prevalent with the introduction of machine learning methods that are developed and trained on synthetic data is the lack of substantial results on clinically relevant, real data. While experimental conditions, including ground truth targets, are perfectly known for simulation, they are much harder to obtain for clinical or cadaveric data. A common approach to dealing with this situation is to provide detailed quantification of mTRE, registration accuracy, etc. on synthesized data where the algorithm will perform well (cf. <xref ref-type="sec" rid="s3-1">Section 3.1</xref>) while only providing much simpler, less informative, and sometimes purely qualitative metrics for real data experiments. Clearly, this practice is undesirable because 1) synthetic data experiments now cannot serve as a baseline (since they use different metrics and are thus incomparable), and 2) the true quantities of interest remain unknown (for example, a 3D mTRE is more informative than a 2D reprojection TRE since it can adequately resolve depth).</p>
</list-item>
</list>
</p>
<p>While it is evident that not all evaluation paradigms that are easily available on synthetic data can be readily transferred to clinical data, the reverse is not true. If real data experiments require simplified evaluation protocols because some gold standard quantities cannot be assessed, then these simplified approaches should at a minimum also be implemented on synthetic data to further complement the evaluation. While this approach may still leave some questions regarding real data performance unanswered, it will at least provide reliable information to assess the deterioration from sandbox to real life.<list list-type="simple">
<list-item>
<p>&#x2022; <bold>Reporting problem difficulty</bold>: A confounding factor that needs to be considered even when consistent metrics are being used is the fact that different datasets are likely to posit 2D/3D registration problems of varied difficulty. For example, evaluation on synthetic data may include data sampled from a broad range of viewpoints or deformations that are approximately uniformly distributed. Real data, on the other hand, is not uniformly sampled from such a distribution, but rather, will be clustered around certain viewpoints (see (<xref ref-type="bibr" rid="B34">Grupp et&#x20;al., 2020c</xref>) for a visualization of viewpoints used during a cadaveric surgery vs the synthetic data for the same machine learning task in (<xref ref-type="bibr" rid="B6">Bier et&#x20;al., 2019</xref>). Because those viewpoints are optimized for human interpretation, they are likely to contain more easily interpretable information which suggests that algorithmic assessment is also more likely to succeed. Then, in those cases, even if the quantitative metrics reported across those two datasets would suggest similar performance, in reality there is degradation due to evaluation on a simpler problem.</p>
</list-item>
</list>
</p>
<p>One way to address this challenge would be to attempt the harmonization of problem complexity by recreating the real dataset synthetically. Another, perhaps more feasible approach would be to develop reporting guidelines that allow for a more precise quantification of problem complexity, e.g., by more carefully describing the variation in the respective datasets.<list list-type="simple">
<list-item>
<p>&#x2022; <bold>Enabling reproduction</bold>: The recent interest in deep learning has brought about a transformational move towards open science, where large parts of the community share their source code publicly and demonstrate the performance of their solutions on public benchmarks, which allows for fair comparisons. While a too strong focus on &#x201c;winning benchmarks&#x201d; is certainly detrimental to the creativity of novel approaches, we feel that the lack of public benchmarks for 2D/3D registration and related problems is perhaps an even greater worry. In addition to the use of different metrics for validation purposes discussed above, studies may even use different definitions of the same quantity (such as the capture range, that is defined, e.g., using the decision boundary of an SVM classifier in (<xref ref-type="bibr" rid="B19">Esfandiari et&#x20;al., 2021</xref>) and using mTRE in (<xref ref-type="bibr" rid="B81">Schaffert et&#x20;al., 2020a</xref>). Further, most code-bases and more importantly datasets are kept private which inhibits reproduction, since re-implementation is particularly prone to biased conclusions. Consequently, creating public datasets with well-defined and standardized testing conditions should be a continued and reinforced effort, and wherever possible, the release of source code should be considered.</p>
</list-item>
</list>
</p>
<p>To this end, our group has previously released a relatively large dataset of CTs and &#x3e;350&#x20;X-rays across multiple viewpoints of six cadaveric specimens prior to undergoing periacetabular osteotomy (cf. <xref ref-type="fig" rid="F4">Figure&#x20;4</xref>) (<xref ref-type="bibr" rid="B35">Grupp et&#x20;al., 2020</xref>). Further, we have made available the core registration components of our intensity-based framework, xReg, as well as our open-source framework for fast and physics-based synthesis of DRRs from CT, DeepDRR, which may allow for the creation of semi-realistic but very large and precisely controlled data. Increasing the rate with which we share code and data will likely result in research that truly advances the contemporary capabilities since the baselines are transparent and much more clearly defined.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Fluoroscopic images of chiseling performed during a cadaveric periacetabular osteotomy procedure. A cut along the ischium of the pelvis is shown by the oblqiue view in <bold>(A)</bold>. Due to the difficulty with manually interpreting lateral orientations of the tool in an oblique view, the very next fluoroscopy frame was collected at an approximate anterior-posterior (AP) view, shown in <bold>(B)</bold>. Although the osteotome location was confirmed by changing view points, the process of adjusting the C-arm by such a large offset can potentially increase operative time or cause the clinician to lose some context in the previous oblique view. Another oblique view is shown in (c), where the flat osteotome is used to complete the posterior cut, resulting in the creation of two bone fragments from the pelvis. In <bold>(C)</bold>, the angled osteotome was left in the field of view and used as a visual aid for navigating the flat chisel and completing the osteotomy.</p>
</caption>
<graphic xlink:href="frobt-08-716007-g004.tif"/>
</fig>
<p>These challenges clearly restrict research, but even more dramatically inhibit translational efforts, simply because we cannot reliably understand whether a specific 2D/3D registration problem should be considered &#x201c;solved&#x201d; or what the open problems are. Finding answers to the posited questions will become especially important as 2D/3D registration technology matures and is integrated in image-based navigation solutions that are subject to regulatory approval. Then, compelling evidence will need to be provided on potential sources and extent of algorithmic bias as well as reliable estimates of real world performance (<xref ref-type="bibr" rid="B96">US Food and Drug Administration, 2021</xref>). Adopting good habits around standardized reporting will certainly be a good first step to translate research successes into patient outcomes through productization.</p>
</sec>
<sec id="s3-3">
<title>3.3 Registration of Multiple Objects, Compound or Non-rigid Motion, and Presence of Foreign Objects</title>
<p>Image-guidance systems often need to process and report information regarding the relative poses between several objects of interest, such as bones, organs and surgical instruments. We found that only three studies in our review register multiple objects through learning-based approaches (<xref ref-type="bibr" rid="B100">Varnavas et&#x20;al., 2013</xref>, <xref ref-type="bibr" rid="B99">2015b</xref>; <xref ref-type="bibr" rid="B102">Wang et&#x20;al., 2020</xref>), and moreover, these studies independently registered single objects in order to obtain relative poses. Although combining the results of two distinct single object registrations is perhaps the most straight-forward approach for obtaining the relative pose between two objects, it fails to leverage the combined information of their relative pose during any optimization process, which could have potentially yielded a less-challenging search landscape. As an example, consider the case of two adjacent objects whose independent, single view, depth estimates are erroneous in opposite directions. The relative pose computed from these two independent poses will have an exacerbated translation error resulting from the compounding effect of the independent depth errors. A multiple object registration strategy could alternatively parameterize the problem to <italic>simultaneously</italic> solve for the pose of one object with respect to the imaging device and also for the relative pose between the objects. This approach partially compounds motion of the objects during registration and completely eliminates the possibility of conflicting depth estimates.</p>
<p>Despite the small number of learning-based multiple object registration strategies, traditional intensity-based registration methods are now routinely employed to solve compound, multiple object, registration problems across broad applications, such as: kinematic measurements of bone (<xref ref-type="bibr" rid="B10">Chen et&#x20;al., 2012</xref>; <xref ref-type="bibr" rid="B73">Otake et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B1">Abe et&#x20;al., 2019</xref>), rib motion and respiratory analysis (<xref ref-type="bibr" rid="B43">Hiasa et&#x20;al., 2019</xref>), intra-operative assessment of adjusted bone fragments (<xref ref-type="bibr" rid="B32">Grupp et&#x20;al., 2019</xref>, <xref ref-type="bibr" rid="B33">2020b</xref>; <xref ref-type="bibr" rid="B40">Han et&#x20;al., 2021</xref>), confirmation of screw implant placement during spine surgery (<xref ref-type="bibr" rid="B94">Uneri et&#x20;al., 2017</xref>) and the positioning of a surgical robot with respect to target anatomy (<xref ref-type="bibr" rid="B26">Gao et&#x20;al., 2020a</xref>). We believe that the lack of new multiple object registration learning-based strategies is indicative of the substantial challenges involved with their development, rather than any perceived lack of the problem&#x2019;s importance by the community. In order to better understand this &#x201c;gap&#x201d; between learning-based and traditional intensity-based methods in the multiple object domain, we first revisit (1) and update it to account for <italic>N</italic> 3D objects. For <italic>i</italic>&#x20;&#x3d; 1, <italic>&#x2026;</italic> , <italic>N</italic>, let <inline-formula id="inf28">
<mml:math id="m30">
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> denote the deformation parameters of the <italic>i</italic>th object and let <inline-formula id="inf29">
<mml:math id="m31">
<mml:mmultiscripts>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:none/>
<mml:mprescripts/>
<mml:none/>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:mmultiscripts>
</mml:math>
</inline-formula> denote it&#x2019;s pose with respect to the <italic>v</italic>th view. Since the vast majority of studies examined concern 2D X-ray images, we also assume that 2D view modality is X-ray. This allows us to take advantage of the line integral nature of X-ray projection physics and represent the synthetic X-ray images formed from multiple objects as the sums of independent synthetic images created from each individual object, yielding the updated registration objective function:<disp-formula id="e3">
<mml:math id="m32">
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msup>
<mml:mrow>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msup>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mi mathvariant="normal">a</mml:mi>
<mml:mi mathvariant="normal">r</mml:mi>
<mml:mi mathvariant="normal">g</mml:mi>
<mml:mspace width="0.17em"/>
<mml:mi mathvariant="normal">m</mml:mi>
<mml:mi mathvariant="normal">i</mml:mi>
<mml:mi mathvariant="normal">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
<mml:mmultiscripts>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:none/>
<mml:mprescripts/>
<mml:none/>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:mmultiscripts>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:munder>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mi>S</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
<mml:mmultiscripts>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:none/>
<mml:mprescripts/>
<mml:none/>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:mmultiscripts>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x25e6;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>R</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
</mml:mrow>
</mml:mfenced>
<mml:mo>.</mml:mo>
</mml:math>
<label>(3)</label>
</disp-formula>
</p>
<p>Solving this optimization becomes more challenging as objects are added and the dimensionality of the search space grows. These challenges are somewhat mitigated by the compositional nature of the individual components of <xref ref-type="disp-formula" rid="e3">(Eq. 3)</xref>. Indeed, updating an intensity-based registration framework to compute (3) instead of <xref ref-type="disp-formula" rid="e1">(Eq. 1)</xref> is relatively straight forward from an implementation perspective: compute <italic>N</italic> synthetic radiographs instead of one and perform a pixel-wise sum of the synthetic images together before calculating the image similarity metric, <italic>S</italic>(&#x22c5;, &#x22c5;). The compositional structure of <xref ref-type="disp-formula" rid="e3">(Eq. 3)</xref> also enables the high-dimensional problem to be solved by successively solving lower-dimensional sub-problems. For example, the pose of a single object may be optimized while keeping the poses of all other objects fixed at their most recent estimates. After this optimization of a single object is complete, another object&#x2019;s pose is optimized and all other&#x2019;s are kept constant. This process is cycled until all objects have been registered once or some other termination criteria is&#x20;met.</p>
<p>Extending (3) to the ML case, requires new per-object model parameters to be introduced: <inline-formula id="inf30">
<mml:math id="m33">
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> and <inline-formula id="inf31">
<mml:math id="m34">
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> for <italic>i</italic>&#x20;&#x3d; 1, <italic>&#x2026;</italic> , <italic>N</italic>. The updated ML-based objective function for multiple objects is written as:<disp-formula id="e4">
<mml:math id="m35">
<mml:mtable class="align" columnalign="left">
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msup>
<mml:mrow>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msup>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mi mathvariant="normal">a</mml:mi>
<mml:mi mathvariant="normal">r</mml:mi>
<mml:mi mathvariant="normal">g</mml:mi>
<mml:mspace width="0.17em"/>
<mml:mi mathvariant="normal">m</mml:mi>
<mml:mi mathvariant="normal">i</mml:mi>
<mml:mi mathvariant="normal">n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
<mml:mmultiscripts>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:none/>
<mml:mprescripts/>
<mml:none/>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:mmultiscripts>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:munder>
<mml:mo>&#x00D7;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:munder>
<mml:msup>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msup>
<mml:mspace width="-0.17em"/>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="0.17em"/>
<mml:mmultiscripts>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:none/>
<mml:mprescripts/>
<mml:none/>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:mmultiscripts>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x25e6;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3c9;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:msubsup>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msubsup>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>R</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msup>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
</mml:mrow>
</mml:mfenced>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<label>(4)</label>
</disp-formula>
</p>
<p>When considering the <inline-formula id="inf32">
<mml:math id="m36">
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> terms in <xref ref-type="disp-formula" rid="e4">(Eq. 4)</xref>, it appears that contextualization methods are able to isolate some object-dependent parameters from other components of the registration problem. However, we found that each of the contextualization studies which considered multiple objects (<xref ref-type="bibr" rid="B34">Grupp et&#x20;al., 2020c</xref>; <xref ref-type="bibr" rid="B17">Doerr et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B102">Wang et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B19">Esfandiari et&#x20;al., 2021</xref>) did not explicitly isolate the parameters corresponding to each object, but instead used a single NN with data in the output layer indexed according to the appropriate object. This is likely effective for very specific and simple applications as the NN is able to learn the relative spatial relationships of the objects. For more complex applications, it may be more appropriate to learn separate, well-trained, models which may be composed together. As an example, consider two separate models, one trained to identify contextual information of the hip joint and the other trained to produce contextual information of a robotic device. Although these two models may be used in conjunction, or combined through &#x201c;fine-tuning,&#x201d; for the application of robotic hip surgery, the models may also be composed with models developed for other applications, such as measuring hip biomechanics or robotic shoulder surgery. However, a singular model trained to jointly contextualize hip and robot features would most likely fail to generalize to any additional applications. We anticipate that the development of independent and robust contextualization models, capable of composition, shall accelerate the number learning-based methods applied to the multiple object problem.</p>
<p>Although representation learning methods may function in the presence of multiple objects without additional modification, better performance is usually obtained by adding structure to account for the known spatial relationships between objects (<xref ref-type="bibr" rid="B111">Yokota et&#x20;al., 2013</xref>). We anticipate that methods relying on representation learning shall migrate away from PCA, towards auto-encoder style approaches, with embedded rigid transformer modules so that all spatial and shape relationships may be learned. As representation learning methods have the potential to reconstruct the bony anatomy of joints from a sparse number of 2D views, these could potentially become very popular by enabling navigation to a 3D anatomical model with substantially reduced radiation exposure.</p>
<p>Similarity modeling approaches are implicitly affected by the introduction of additional objects due to the inputs of <inline-formula id="inf33">
<mml:math id="m37">
<mml:msup>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mo>&#x22c5;</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> having a dependence on <inline-formula id="inf34">
<mml:math id="m38">
<mml:msubsup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf35">
<mml:math id="m39">
<mml:msubsup>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula>. None of the studies reviewed in this paper used multiple objects as part of similarity modeling. Although <xref ref-type="bibr" rid="B34">Grupp et&#x20;al. (2020c)</xref> registered multiple objects, the regularization function employed learned landmark annotations from a only a single object. We envision several challenges with extending these methods to properly accommodate multiple objects. Obtaining convexity of the learned similarity models in the ideal case for (<xref ref-type="bibr" rid="B28">Gao et&#x20;al., 2020c</xref>; <xref ref-type="bibr" rid="B36">Gu et&#x20;al., 2020</xref>) is most likely not attainable when using a weighted sum of rotation and translation components of each object&#x2019;s pose offset, especially as the number of objects increases. Methods which model 2D/3D correspondences (<xref ref-type="bibr" rid="B82">Schaffert et&#x20;al., 2019</xref>, <xref ref-type="bibr" rid="B81">2020a</xref>,<xref ref-type="bibr" rid="B83">b</xref>; <xref ref-type="bibr" rid="B56">Liao et&#x20;al., 2019</xref>) will require an additional dimension to handle the assignment of points to various obejcts, adding complexity and potentially increasing the challenges associated with training.</p>
<p>Probably most handicapped by the introduction of multiple&#x20;objects to the registration problem, are methods relying on the&#x20;direct regression of pose parameters, as they attempt to model the entire objective function. One may be&#x20;tempted to solve this problem by simply adding additional model outputs&#x20;for each object&#x2019;s estimated pose, but this does not guarantee that the limitations of independent single object pose regression are addressed. Therefore, direct multiple object pose regression methods should attempt to model the&#x20;relative poses between objects in addition to a single&#x20;absolute pose (or single pose relative to some initialization). Another limitation of regression approaches lies in the combinatorial explosion which occurs as new objects&#x20;are added to the registration problem, making training difficult in the presence of large fluctuations of the loss function.</p>
<p>Although not unique to learning-based methods, multiple object registration verification also becomes a much more complex problem as the number of objects considered increases. Questions arise, such as: should verification be reported on each relative pose or should an overall pass/fail be reported? The three verification studies examined in this paper only considered verification of a single object&#x2019;s pose estimate. As single object verification methods mature, their issues when expanding to multiple objects will likely become more apparent.</p>
<p>In light of these challenges associated with learning-based approaches, it is easy to see how contemporary intensity-based methods currently dominate the multiple object domain given&#x20;their relative ease of implementation and reasonable performance. However, there are multiple object registration problems which remain unsolved since multiple object intensity-based methods continue to suffer from the limitations previously identified in Section 1.2. Some frequent properties of these unsolved problems are misleading views with several objects having substantial overlap in 2D, the potential for a varying number of surgical instruments to be present in a view and the existence of objects which are dynamically changing shape or possibly &#x201c;splitting&#x201d; into several new objects. Real-time osteotome navigation using single view fluoroscopy, illustrated in <xref ref-type="fig" rid="F4">Figure&#x20;4</xref>, is exemplary of the many unsolved challenges outlined above and will be used as a motivating example throughout this discussion.</p>
<p>Each time the osteotome is advanced through bone, a single fluoroscopic view is collected and interpreted by the surgeon in order to accurately adjust the chiseling trajectory and safely avoid sensitive components of the anatomy which must not be damaged, such as the acetabulum shown in 4. Although the oblique views shown in <xref ref-type="fig" rid="F4">Figure&#x20;4</xref> 1) and (c) help ensure that the&#x20;osteotome tip is distinguishable from the acetabulum, the lateral orientation of the tool is challenging to interpret manually and may need to be confirmed by collecting subsequent fluoroscopic views at substantially different orientations, such as the approximate anterior-posterior (AP) view shown in <xref ref-type="fig" rid="F4">Figure&#x20;4</xref> (b). <xref ref-type="fig" rid="F4">Figure&#x20;4</xref> (c) also demonstrates the intraoperative creation of a new object which may move independently of all others, further compounding the difficulty associated with non-navigated interpretation of these views. An additional challenge with osteotomy cases is the ability to determine when all osteotomies are complete and the new bone fragment is completely free from its parent. <xref ref-type="fig" rid="F4">Figure&#x20;4</xref> (c) demonstrates that retaining the angled osteotome as a static object in the field of view during performance of the posterior cut helps to guide the flat osteotome along the desired trajectory, but does not guarantee that the acetabular fragment has in fact been freed from the pelvis. A navigation system capable of accurately tracking the osteotome with respect to the anatomy using a series of oblique views would eliminate the need for additional, &#x201c;confirmation,&#x201d; views, potentially reduce operative time, decrease radiation exposure to the patient and clinical team and reduce the frequency of breaches into sensitive anatomy.</p>
<p>We envision that this problem, and others like it, will eventually be solvable as the capabilities of learning-based methods advance further. Future contextualization and similarity modeling methods could enable a large enough capture range to provide an automatic registration to the initial oblique view used for chiseling, and regression methods, coupled with contextualization, could enable real-time pose estimates in subsequent views. Reconstructions of newly created bone fragments should be possible through similarity modeling approaches, such as by extending the framework introduced by <xref ref-type="bibr" rid="B28">Gao et&#x20;al. (2020c)</xref> to include multiple objects and non-rigid deformation. Finally, recurrent NN approaches, such as long short-term memory (LSTM) components, could provide some <italic>temporal</italic> segmentation of the intervention into phases and gestures, as is already done for laparoscopic surgery (<xref ref-type="bibr" rid="B101">Vercauteren et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B30">Garrow et&#x20;al., 2021</xref>; <xref ref-type="bibr" rid="B105">Wu et&#x20;al., 2021</xref>). This segmentation could be useful in identifying <italic>when</italic> 1) certain objects need to be tracked, 2) a radically different view has been collected, or 3) new objects have been split off from an existing object.</p>
<p>Although foreign objects such as screws, K-wires, retractors, osteotomes, etc. frequently confound traditional registration methods, their location or poses often have clinical relevance. Therefore, contrary to the inpainting approach of <xref ref-type="bibr" rid="B19">Esfandiari et&#x20;al. (2021)</xref>, we advocate that learning-based methods should make attempts to register these objects.</p>
</sec>
<sec id="s3-4">
<title>3.4 Estimating Uncertainty and Assuring Quality</title>
<p>The four studies which consider registration verification in this review all attempt to provide a low dimensional classification of registration success, e.g., pass/fail or correct/poor/incorrect (<xref ref-type="bibr" rid="B69">Mitrovi et&#x20;al., 2014</xref>; <xref ref-type="bibr" rid="B100">Varnavas et&#x20;al., 2013</xref>, <xref ref-type="bibr" rid="B98">2015a</xref>; <xref ref-type="bibr" rid="B104">Wu et&#x20;al., 2016</xref>). These strategies effectively attempt to label registration estimates as either a global (correct) or local minimum (poor/incorrect) of <xref ref-type="disp-formula" rid="e1">(Eq. 1)</xref>. Unfortunately, even when correctly classifying a global optimum, these low-dimensional categorizations of a registration result may unintentionally fail to report small, but clinically relevant, errors. This is perhaps easiest to recognize by first revisiting (1) and noting that registration strategies attempt to find singular solutions, or <italic>point</italic> estimates, which best minimize the appropriate objective function. However, several factors, including sensor noise, modeling error or numerical imprecision, may influence the landscape of the objective function and potentially even cause some variation in the location of the global minimum. This possibility of obtaining several different pose estimates under nominally equivalent conditions, reveals the inherent <italic>uncertainty</italic> of registration. Even though a failure to identify cases of large uncertainties may lead surgeons to take unintentional risks with potentially catastrophic implications for patients, to our knowledge, only one prior work has attempted to estimate the error associated with 2D/3D registration estimates (<xref ref-type="bibr" rid="B46">Hu et&#x20;al., 2016</xref>). We therefore believe that the development of methods which quantify 2D/3D registration uncertainty is of paramount importance and essential for the eventual adoption of 2D/3D registration into routine use, and even more in the advent of autonomous robotic surgery.</p>
<p>Inspiration can be drawn from the applications of 3D/3D and 2D/2D deformable image registration (<xref ref-type="bibr" rid="B11">Chen et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B25">Fu et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B42">Haskins et&#x20;al., 2020</xref>), where machine learning techniques have dominated most existing research into registration uncertainty. Some of these approaches report <italic>interval</italic> estimates, either by reformulating the registration objective function as a probability distribution and drawing samples (<xref ref-type="bibr" rid="B80">Risholm et&#x20;al., 2013</xref>; <xref ref-type="bibr" rid="B52">Le Folgoc et&#x20;al., 2016</xref>; <xref ref-type="bibr" rid="B84">Schultz et&#x20;al., 2018</xref>), approximating the sampling process using test-time NN drop-out (<xref ref-type="bibr" rid="B108">Yang et&#x20;al., 2017</xref>) or sampling using the test-time deformation covariance matrices embedded within a variational autoencoder (<xref ref-type="bibr" rid="B16">Dalca et&#x20;al., 2019</xref>). Due to the interventional nature of most 2D/3D registration applications, care needs to be taken in order to ensure that program runtimes are compatible with intra-operative workflows.</p>
<p>Although we have mostly advocated for the development of new uncertainty quantification methods, there is likely still room for improvement of the local/global minima classification problem. Given the computational capacity of contemporary GPUs and sophistication of learning frameworks, it should be feasible to extend the approach of <xref ref-type="bibr" rid="B104">Wu et&#x20;al. (2016)</xref> and pass densely sampled regions of the objective function through a NN, relying on the learning process to extract optimal features for distinguishing local and global minimal.</p>
<p>As registration methods will inevitably report failure or large uncertainties under certain conditions, we also believe that a promising topic of future research extends to intelligent agents which would determine subsequent actions to optimally reduce uncertainty, such as collecting another 2D view from a specific viewpoint. Finally, we note that registration uncertainties have the potential to augment existing robotic control methods which rely on intermittent imaging (<xref ref-type="bibr" rid="B2">Alambeigi et&#x20;al., 2019</xref>).</p>
</sec>
<sec id="s3-5">
<title>3.5 The Need for More Generic Solutions</title>
<p>Traditional image-based 2D/3D registration that relies on optimization of a cost function is limited in many ways (<xref ref-type="sec" rid="s1-2">Section 1.2</xref>); however, a major advantage of the algorithmic approach is that it is very generic, i.e.,&#x20;a pipeline configured for 2D/3D registration of the pelvis would be equally applicable to the spine. The introduction of machine learning to the 2D/3D pipeline, however, has in a way taken away this generality and methods have become substantially more specialized. This is not because specific machine learning models are only suitable for one specific task or anatomy; on the contrary, e.g., pose or PCA component regression techniques are largely identical across clinical targets. Rather, it is yet another consequence of the compact domain assumption of machine learning models that confine solutions to the data domain they were developed on. While this specificity of solutions may be acceptable, and perhaps, unavoidable for methods that seek&#x20;to contextualize data, it inhibits the use of 2D/3D registration pipelines as a tool to answer clinical or research questions. We feel that, in part due to its complexity, 2D/3D&#x20;registration already leads a niche existence and the necessity to re-train or even re-develop some algorithm components for the specific task at hand further exacerbates this situation.</p>
<p>Some approaches already point towards potential solutions to this issue: Methods like (<xref ref-type="bibr" rid="B100">Varnavas et&#x20;al., 2013</xref>; <xref ref-type="bibr" rid="B99">2015b</xref>) rely on patient- and object-specific training sets, and therefore, avoid the task-specificity that comes with one-time training. Another noteworthy method learns to establish sparse point correspondences between random points rather than anatomically relevant ones (<xref ref-type="bibr" rid="B56">Liao et&#x20;al., 2019</xref>). However, despite matching random points, this method is likely still scene-specific because matching is performed using a very deep CNN with very large receptive field which may have learned to exploit the global context of the random keypoints, which would not be preserved in different anatomy. Contributing machine learning-based solutions that address some of the large open challenges in 2D/3D registration while making the resulting tools general purpose and easy to use will be an important goal in the immediate future.</p>
</sec>
</sec>
<sec id="s4">
<title>4 Conclusion</title>
<p>Machine learning-based improvements to image-based 2D/3D registration were already of interest before the deep learning era (<xref ref-type="bibr" rid="B31">Gouveia et&#x20;al., 2012</xref>), and deep learning has only accelerated and diversified the contributions to the field. Contextualization of data, representation learning to reduce problem dimensionality, similarity modeling for increased capture range, direct pose regression to avoid iterative optimization, as well as confidence assessment are all well established research thrusts, which are geared towards developing automated registration pipelines. While convincing performance improvements are reported across a variety of clinical tasks and problem settings already today, most of those studies are performed &#x201c;on the benchtop.&#x201d;</p>
<p>Coordinated research efforts are desirable towards 1) developing more robust learning paradigms that succeed under domain shift, 2) creating standardized reporting templates and devising evaluation metrics to enhance reproducibiliy and enable comparisons, 3) researching multi-object registration methods that can deal with the presence of foreign objects, 4) advancing uncertainty quantification and confidence estimation methodology to better support human decisions, and finally 5) developing generalist machine learning components of 2D/3D registration pipelines to improve accessibility.</p>
<p>Even though learning-based methods show great promise and have supplanted traditional methods in many aspects, their rise should not render traditional methods unusable or irrelevant. New researchers typically spend a great deal of time implementing a traditional registration pipeline so that traditional methods may be leveraged in conjunction with the development of learning-based approaches. In order to facilitate more rapid research and development towards learning-based methods, we have made the core registration components of our intensity-based framework (xReg<xref ref-type="fn" rid="FN3">
<sup>3</sup>
</xref>), as well as our physics-based DRR generation tools for realistic synthesis and generalizable learning (DeepDRR<xref ref-type="fn" rid="FN4">
<sup>4</sup>
</xref>) available as open source software projects.</p>
<p>We have no doubt that progress on the aforementioned fronts will firmly establish machine learning methodology as an important component for 2D/3D registration workflows that will substantially contribute to 2D/3D registration growing out of its niche existence, establishing itself as a reliable, precise, and easy-to-use component for research, and more importantly, at the bedside.</p>
</sec>
</body>
<back>
<sec id="s5">
<title>Data Availability Statement</title>
<p>The original contribution presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="s6">
<title>Author Contributions</title>
<p>MU conceived of the presented idea was in charge of overall direction and planning. MU, CG, and RG refined the scope of the presented review and perspective. MU, CG, R.G, M.J, and YH screened abstracts and full texts during review and extracted information from the included studies. MU, CG, and RG merged extracted information and carried out the systematic review. All authors contributed to writing the manuscript. They also provided critical feedback, and helped shape the research and analysis.</p>
</sec>
<sec id="s7">
<title>Conflict of Interest </title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s8">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ack>
<p>We gratefully acknowledge financial support from NIH NIBIB Trailblazer R21 EB028505, and internal funds of the Malone Center for Engineering in Healthcare at Johns Hopkins University.</p>
</ack>
<fn-group>
<fn id="FN1">
<label>1</label>
<p>It should be noted that the definition of capture range is by no means standardized or even similar across papers, which we will also comment on in <xref ref-type="sec" rid="s3">Section&#x20;3</xref>.</p>
</fn>
<fn id="FN2">
<label>2</label>
<p>Non-deep learning techniques usually operate on lower dimensional data that is abstracted from the images (such as cost function values (<xref ref-type="bibr" rid="B104">Wu et&#x20;al., 2016</xref>) or centerlines (<xref ref-type="bibr" rid="B88">Tang and Scalzo, 2016</xref>)) such that domain shift is handled elsewhere in the pipeline, e.&#x121;., in a segmentation algorithm.</p>
</fn>
<fn id="FN3">
<label>3</label>
<p>
<ext-link ext-link-type="uri" xlink:href="https://github.com/rg2/xreg">https://github.com/rg2/xreg</ext-link>
</p>
</fn>
<fn id="FN4">
<label>4</label>
<p>
<ext-link ext-link-type="uri" xlink:href="https://github.com/arcadelab/deepdrr">https://github.com/arcadelab/deepdrr</ext-link>
</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Abe</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Otake</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tennma</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Hiasa</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Oka</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Tanaka</surname>
<given-names>H.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Analysis of Forearm Rotational Motion Using Biplane Fluoroscopic Intensity-Based 2D-3D Matching</article-title>. <source>J.&#x20;Biomech.</source> <volume>89</volume>, <fpage>128</fpage>&#x2013;<lpage>133</lpage>. <pub-id pub-id-type="doi">10.1016/j.jbiomech.2019.04.017</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Alambeigi</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Pedram</surname>
<given-names>S. A.</given-names>
</name>
<name>
<surname>Speyer</surname>
<given-names>J.&#x20;L.</given-names>
</name>
<name>
<surname>Rosen</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Iordachita</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>R. H.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Scade: Simultaneous Sensor Calibration and Deformation Estimation of Fbg-Equipped Unmodeled Continuum Manipulators</article-title>. <source>IEEE Trans. Robot</source> <volume>36</volume>, <fpage>222</fpage>&#x2013;<lpage>239</lpage>. <pub-id pub-id-type="doi">10.1109/tro.2019.2946726</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Amrehn</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Gaube</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Unberath</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Schebesch</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Horz</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Strumia</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <article-title>Ui-net: Interactive Artificial Neural Networks for Iterative Image Segmentation Based on a User Model</article-title>. <source>Proc. Eurographics Workshop Vis. Comput. Biol. Med.</source>, <fpage>143</fpage>&#x2013;<lpage>147</lpage>. </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Baka</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Lelieveldt</surname>
<given-names>B. P. F.</given-names>
</name>
<name>
<surname>Schultz</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Niessen</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Van Walsum</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Respiratory Motion Estimation in X-ray Angiography for Improved Guidance during Coronary Interventions</article-title>. <source>Phys. Med. Biol.</source> <volume>60</volume>, <fpage>3617</fpage>&#x2013;<lpage>3637</lpage>. <pub-id pub-id-type="doi">10.1088/0031-9155/60/9/3617</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Berger</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>M&#xfc;ller</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Aichert</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Unberath</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Thies</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Choi</surname>
<given-names>J.&#x20;H.</given-names>
</name>
<etal/>
</person-group> (<year>2016</year>). <article-title>Marker&#x2010;free Motion Correction in Weight&#x2010;bearing Cone&#x2010;beam CT of the Knee Joint</article-title>. <source>Med. Phys.</source> <volume>43</volume>, <fpage>1235</fpage>&#x2013;<lpage>1248</lpage>. <pub-id pub-id-type="doi">10.1118/1.4941012</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bier</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Goldmann</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Zaech</surname>
<given-names>J.-N.</given-names>
</name>
<name>
<surname>Fotouhi</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Hegeman</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Grupp</surname>
<given-names>R.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Learning to Detect Anatomical Landmarks of the Pelvis in X-Rays from Arbitrary Views</article-title>. <source>Int. J.&#x20;CARS</source> <volume>14</volume>, <fpage>1463</fpage>&#x2013;<lpage>1473</lpage>. <pub-id pub-id-type="doi">10.1007/s11548-019-01975-5</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Bier</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Unberath</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Zaech</surname>
<given-names>J.-N.</given-names>
</name>
<name>
<surname>Fotouhi</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Armand</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Osgood</surname>
<given-names>G.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <source>X-ray-transform Invariant Anatomical Landmark Detection for Pelvic Trauma Surgery</source>. <publisher-loc>Granada, Spain</publisher-loc>: <publisher-name>Springer-Verlag</publisher-name>, <fpage>55</fpage>&#x2013;<lpage>63</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-00937-3_7</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Birkmeyer</surname>
<given-names>J.&#x20;D.</given-names>
</name>
<name>
<surname>Finks</surname>
<given-names>J.&#x20;F.</given-names>
</name>
<name>
<surname>O&#x27;Reilly</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Oerline</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Carlin</surname>
<given-names>A. M.</given-names>
</name>
<name>
<surname>Nunn</surname>
<given-names>A. R.</given-names>
</name>
<etal/>
</person-group> (<year>2013</year>). <article-title>Surgical Skill and Complication Rates after Bariatric Surgery</article-title>. <source>N. Engl. J.&#x20;Med.</source> <volume>369</volume>, <fpage>1434</fpage>&#x2013;<lpage>1442</lpage>. <pub-id pub-id-type="doi">10.1056/nejmsa1300625</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brost</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Wimmer</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Liao</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Bourier</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Koch</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Strobel</surname>
<given-names>N.</given-names>
</name>
<etal/>
</person-group> (<year>2012</year>). <article-title>Constrained Registration for Motion Compensation in Atrial Fibrillation Ablation Procedures</article-title>. <source>IEEE Trans. Med. Imaging</source> <volume>31</volume>, <fpage>870</fpage>&#x2013;<lpage>881</lpage>. <pub-id pub-id-type="doi">10.1109/tmi.2011.2181184</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Graham</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Hutchinson</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Muir</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Automatic Inference and Measurement of 3d Carpal Bone Kinematics from Single View Fluoroscopic Sequences</article-title>. <source>IEEE Trans. Med. Imaging</source> <volume>32</volume>, <fpage>317</fpage>&#x2013;<lpage>328</lpage>. <pub-id pub-id-type="doi">10.1109/TMI.2012.2226740</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Diaz-Pinto</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Ravikumar</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Frangi</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Deep Learning in Medical Image Registration</article-title>. <source>Prog. Biomed. Eng.</source> <pub-id pub-id-type="doi">10.1088/2516-1091/abd37c</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>Z.-Q.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Z.-W.</given-names>
</name>
<name>
<surname>Fang</surname>
<given-names>L.-W.</given-names>
</name>
<name>
<surname>Jian</surname>
<given-names>F.-Z.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>Y.-H.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>Real-time 2d/3d Registration of Vertebra via Machine Learning and Geometric Transformation</article-title>. <source>Zidonghua Xuebao/Acta Automatica Sinica</source> <volume>44</volume>, <fpage>1183</fpage>&#x2013;<lpage>1194</lpage>. </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chou</surname>
<given-names>C.-R.</given-names>
</name>
<name>
<surname>Frederick</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Mageras</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Chang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Pizer</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>2d/3d Image Registration Using Regression Learning</article-title>. <source>Computer Vis. Image Understanding</source> <volume>117</volume>, <fpage>1095</fpage>&#x2013;<lpage>1106</lpage>. <pub-id pub-id-type="doi">10.1016/j.cviu.2013.02.009</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Chou</surname>
<given-names>C.-R.</given-names>
</name>
<name>
<surname>Pizer</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2014</year>). <source>Local Regression Learning via forest Classification for 2d/3d Deformable Registration</source>. <publisher-loc>Nagoya, Japan</publisher-loc>: <publisher-name>Springer-Verlag</publisher-name>, <fpage>24</fpage>&#x2013;<lpage>33</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-14104-6_3</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Chou</surname>
<given-names>C.-R.</given-names>
</name>
<name>
<surname>Pizer</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2013</year>). <source>Real-time 2d/3d Deformable Registration Using Metric Learning</source>. <publisher-loc>Nice, France</publisher-loc>: <publisher-name>Springer-Verlag</publisher-name>, <fpage>1</fpage>&#x2013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-642-36620-8_1</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dalca</surname>
<given-names>A. V.</given-names>
</name>
<name>
<surname>Balakrishnan</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Guttag</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Sabuncu</surname>
<given-names>M. R.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Unsupervised Learning of Probabilistic Diffeomorphic Registration for Images and Surfaces</article-title>. <source>Med. image Anal.</source> <volume>57</volume>, <fpage>226</fpage>&#x2013;<lpage>236</lpage>. <pub-id pub-id-type="doi">10.1016/j.media.2019.07.006</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Doerr</surname>
<given-names>S. A.</given-names>
</name>
<name>
<surname>Uneri</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>C. K.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Ketcha</surname>
<given-names>M. D.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <source>Data-driven Detection and Registration of Spine Surgery Instrumentation in Intraoperative Images</source>. <publisher-loc>Houston, TX, United states</publisher-loc>: <publisher-name>The Society of Photo&#x2013;Optical Instrumentation Engineers.</publisher-name>
</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Du</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Techniques for Interpretable Machine Learning</article-title>. <source>Commun. ACM</source> <volume>63</volume>, <fpage>68</fpage>&#x2013;<lpage>77</lpage>. <pub-id pub-id-type="doi">10.1145/3359786</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Esfandiari</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Weidert</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>K&#xf6;vesh&#xe1;zi</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Anglin</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Street</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Hodgson</surname>
<given-names>A. J.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Deep Learning-Based X-ray Inpainting for Improving Spinal 2d-3d Registration</article-title>. <source>The Int. J.&#x20;Med. robotics &#x2b; Comput. Assist. Surg. : MRCAS.</source> </citation>
</ref>
<ref id="B20">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Esteban</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Grimm</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Unberath</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Zahnd</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Navab</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2019</year>).<article-title>Towards Fully Automatic X-ray to Ct Registration</article-title>. In <conf-name>International Conference on Medical Image Computing and Computer-Assisted Intervention</conf-name>. <publisher-name>Springer</publisher-name>, <fpage>631</fpage>&#x2013;<lpage>639</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-32226-7_70</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ewurum</surname>
<given-names>C. H.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Pagnha</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Surgical Navigation in Orthopedics: Workflow and System Review</article-title>. <source>Intell. Orthopaedics</source>, <fpage>47</fpage>&#x2013;<lpage>63</lpage>. <pub-id pub-id-type="doi">10.1007/978-981-13-1396-7_4</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Foley</surname>
<given-names>J.&#x20;P.</given-names>
</name>
<name>
<surname>Hsu</surname>
<given-names>W. K.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Effectiveness of Bioskills Training in Spinal Surgery</article-title>. <source>Contemp. Spine Surg.</source> <volume>22</volume>, <fpage>1</fpage>&#x2013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1097/01.css.0000734864.37046.9b</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Foote</surname>
<given-names>M. D.</given-names>
</name>
<name>
<surname>Zimmerman</surname>
<given-names>B. E.</given-names>
</name>
<name>
<surname>Sawant</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Joshi</surname>
<given-names>S. C.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Real-time 2d-3d Deformable Registration with Deep Learning and Application to Lung Radiotherapy Targeting</source>. <publisher-loc>Hong Kong, China</publisher-loc>: <publisher-name>Springer-Verlag</publisher-name>, <fpage>265</fpage>&#x2013;<lpage>276</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-20351-1_20</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fran&#xe7;ois</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Calvet</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Madad&#xa0;Zadeh</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Saboul</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Gasparini</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Samarakoon</surname>
<given-names>P.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Detecting the Occluding Contours of the Uterus to Automatise Augmented Laparoscopy: Score, Loss, Dataset, Evaluation and User Study</article-title>. <source>Int. J.&#x20;CARS</source> <volume>15</volume>, <fpage>1177</fpage>&#x2013;<lpage>1186</lpage>. <pub-id pub-id-type="doi">10.1007/s11548-020-02151-w</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Lei</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Curran</surname>
<given-names>W. J.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Deep Learning in Medical Image Registration: a Review</article-title>. <source>Phys. Med. Biol.</source> <volume>65</volume>, <fpage>20TR01</fpage>. <pub-id pub-id-type="doi">10.1088/1361-6560/ab843e</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gao</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Farvardin</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Grupp</surname>
<given-names>R. B.</given-names>
</name>
<name>
<surname>Bakhtiarinejad</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Thies</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2020a</year>). <article-title>Fiducial-free 2d/3d Registration for Robot-Assisted Femoroplasty</article-title>. <source>IEEE Trans. Med. Robot. Bionics</source> <volume>2</volume>, <fpage>437</fpage>&#x2013;<lpage>446</lpage>. <pub-id pub-id-type="doi">10.1109/tmrb.2020.3012460</pub-id> </citation>
</ref>
<ref id="B27">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Gao</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Grupp</surname>
<given-names>R. B.</given-names>
</name>
<name>
<surname>Unberath</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>R. H.</given-names>
</name>
<name>
<surname>Armand</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2020b</year>).<article-title>Fiducial-free 2d/3d Registration of the Proximal Femur for Robot-Assisted Femoroplasty</article-title>. In <conf-name>Medical Imaging 2020: Image-Guided Procedures, Robotic Interventions, and Modeling</conf-name>, <volume>11315</volume>. <publisher-loc>Bellingham, WA</publisher-loc>: <publisher-name>International Society for Optics and Photonics</publisher-name>, <fpage>113151C</fpage>. <pub-id pub-id-type="doi">10.1117/12.2550992</pub-id> </citation>
</ref>
<ref id="B28">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Gao</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Gu</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Killeen</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Armand</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>R.</given-names>
</name>
<etal/>
</person-group> (<year>2020c</year>). <source>Generalizing Spatial Transformers to Projective Geometry with Applications to 2d/3d Registration</source>, <volume>12263</volume>. <publisher-loc>Lima, Peru</publisher-loc>: <publisher-name>Springer Science and Business Media Deutschland GmbH</publisher-name>, <fpage>329</fpage>&#x2013;<lpage>339</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-59716-0_32</pub-id>
<article-title>Generalizing Spatial Transformers to Projective Geometry with Applications to 2D/3D Registration</article-title> </citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gao</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Unberath</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Armand</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Localizing Dexterous Surgical Tools in X-ray for Image-Based Navigation</article-title>. In <conf-name>International Conference on Information Processing for Computer-Assisted Interventions, Proceedings of.</conf-name> </citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Garrow</surname>
<given-names>C. R.</given-names>
</name>
<name>
<surname>Kowalewski</surname>
<given-names>K.-F.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Wagner</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Schmidt</surname>
<given-names>M. W.</given-names>
</name>
<name>
<surname>Engelhardt</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Machine Learning for Surgical Phase Recognition</article-title>. <source>Ann. Surg.</source> <volume>273</volume>, <fpage>684</fpage>&#x2013;<lpage>693</lpage>. <pub-id pub-id-type="doi">10.1097/sla.0000000000004425</pub-id> </citation>
</ref>
<ref id="B31">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Gouveia</surname>
<given-names>A. I. R.</given-names>
</name>
<name>
<surname>Metz</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Freire</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Klein</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Comparative Evaluation of Regression Methods for 3d-2d Image Registration</article-title>. In <conf-name>International Conference on Artificial Neural Networks</conf-name>. <publisher-name>Springer</publisher-name>, <fpage>238</fpage>&#x2013;<lpage>245</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-642-33266-1_30</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Grupp</surname>
<given-names>R. B.</given-names>
</name>
<name>
<surname>Hegeman</surname>
<given-names>R. A.</given-names>
</name>
<name>
<surname>Murphy</surname>
<given-names>R. J.</given-names>
</name>
<name>
<surname>Alexander</surname>
<given-names>C. P.</given-names>
</name>
<name>
<surname>Otake</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>McArthur</surname>
<given-names>B. A.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Pose Estimation of Periacetabular Osteotomy Fragments with Intraoperative X-ray Navigation</article-title>. <source>IEEE Trans. Biomed. Eng.</source> <volume>67</volume>, <fpage>441</fpage>&#x2013;<lpage>452</lpage>. <pub-id pub-id-type="doi">10.1109/TBME.2019.2915165</pub-id> </citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Grupp</surname>
<given-names>R. B.</given-names>
</name>
<name>
<surname>Murphy</surname>
<given-names>R. J.</given-names>
</name>
<name>
<surname>Hegeman</surname>
<given-names>R. A.</given-names>
</name>
<name>
<surname>Alexander</surname>
<given-names>C. P.</given-names>
</name>
<name>
<surname>Unberath</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Otake</surname>
<given-names>Y.</given-names>
</name>
<etal/>
</person-group> (<year>2020b</year>). <article-title>Fast and Automatic Periacetabular Osteotomy Fragment Pose Estimation Using Intraoperatively Implanted Fiducials and Single-View Fluoroscopy</article-title>. <source>Phys. Med. Biol.</source> <volume>65</volume>, <fpage>245019</fpage>. <pub-id pub-id-type="doi">10.1088/1361-6560/aba089</pub-id> </citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Grupp</surname>
<given-names>R. B.</given-names>
</name>
<name>
<surname>Unberath</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Hegeman</surname>
<given-names>R. A.</given-names>
</name>
<name>
<surname>Murphy</surname>
<given-names>R. J.</given-names>
</name>
<name>
<surname>Alexander</surname>
<given-names>C. P.</given-names>
</name>
<etal/>
</person-group> (<year>2020c</year>). <article-title>Automatic Annotation of Hip Anatomy in Fluoroscopy for Robust and Efficient 2d/3d Registration</article-title>. <source>Int. J.&#x20;CARS</source> <volume>15</volume>, <fpage>759</fpage>&#x2013;<lpage>769</lpage>. <pub-id pub-id-type="doi">10.1007/s11548-020-02162-7</pub-id> </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Grupp</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Unberath</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Hegeman</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Murphy</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Alexander</surname>
<given-names>C.</given-names>
</name>
<etal/>
</person-group> (<year>2020a</year>). <article-title>Data and Code Associated with the Publication: Automatic Annotation of Hip Anatomy in Fluoroscopy for Robust and Efficient 2D/3D Registration</article-title> <pub-id pub-id-type="doi">10.7281/T1/IFSXNV</pub-id> </citation>
</ref>
<ref id="B36">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Gu</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Grupp</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Fotouhi</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Unberath</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2020</year>). <source>Extended Capture Range of Rigid 2d/3d Registration by Estimating Riemannian Pose Gradients</source>. <publisher-loc>Lima, Peru</publisher-loc>: <publisher-name>Springer Science and Business Media Deutschland GmbH</publisher-name>, <fpage>281</fpage>&#x2013;<lpage>291</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-59861-7_29</pub-id> </citation>
</ref>
<ref id="B37">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Guan</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Meng</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Transfer Learning for Rigid 2d/3d Cardiovascular Images Registration</source>. <publisher-loc>Xi&#x2019;an, China</publisher-loc>: <publisher-name>Springer Science and Business Media Deutschland GmbH</publisher-name>, <fpage>380</fpage>&#x2013;<lpage>390</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-31723-2_32</pub-id>
<article-title>Transfer Learning for Rigid 2D/3D Cardiovascular Images Registration</article-title> </citation>
</ref>
<ref id="B38">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Guan</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Meng</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2020</year>). <source>Transfer Learning for Nonrigid 2d/3d Cardiovascular Images Registration.</source>
</citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hafezi-Nejad</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Bailey</surname>
<given-names>C. R.</given-names>
</name>
<name>
<surname>Solomon</surname>
<given-names>A. J.</given-names>
</name>
<name>
<surname>Abou Areda</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Carrino</surname>
<given-names>J.&#x20;A.</given-names>
</name>
<name>
<surname>Khan</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Vertebroplasty and Kyphoplasty in the usa from 2004 to 2017: National Inpatient Trends, Regional Variations, Associated Diagnoses, and Outcomes</article-title>. <source>J.&#x20;NeuroInterventional Surg.</source> </citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Han</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Uneri</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Vijayan</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Vagdargi</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Sheth</surname>
<given-names>N.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Fracture Reduction Planning and Guidance in Orthopaedic Trauma Surgery via Multi-Body Image Registration</article-title>. <source>Med. Image Anal.</source> <volume>68</volume>, <fpage>101917</fpage>. <pub-id pub-id-type="doi">10.1016/j.media.2020.101917</pub-id> </citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hansen</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>M&#xfc;ller</surname>
<given-names>S. D.</given-names>
</name>
<name>
<surname>Koumoutsakos</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2003</year>). <article-title>Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (Cma-es)</article-title>. <source>Evol. Comput.</source> <volume>11</volume>, <fpage>1</fpage>&#x2013;<lpage>18</lpage>. <pub-id pub-id-type="doi">10.1162/106365603321828970</pub-id> </citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Haskins</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Kruger</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Deep Learning in Medical Image Registration: a Survey</article-title>. <source>Machine Vis. Appl.</source> <volume>31</volume>, <fpage>1</fpage>&#x2013;<lpage>18</lpage>. <pub-id pub-id-type="doi">10.1007/s00138-020-01060-x</pub-id> </citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hiasa</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Otake</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tanaka</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Sanada</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Sato</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Recovery of 3d Rib Motion from Dynamic Chest Radiography and Ct Data Using Local Contrast Normalization and Articular Motion Model</article-title>. <source>Med. image Anal.</source> <volume>51</volume>, <fpage>144</fpage>&#x2013;<lpage>156</lpage>. <pub-id pub-id-type="doi">10.1016/j.media.2018.10.002</pub-id> </citation>
</ref>
<ref id="B44">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Hou</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Alansary</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>McDonagh</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Davidson</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Rutherford</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Hajnal</surname>
<given-names>J.&#x20;V.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <source>Predicting Slice-To-Volume Transformation in Presence of Arbitrary Subject Motion</source>. <publisher-loc>Quebec City, QC, Canada</publisher-loc>: <publisher-name>Springer-Verlag</publisher-name>, <fpage>296</fpage>&#x2013;<lpage>304</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-66185-8_34</pub-id> </citation>
</ref>
<ref id="B45">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Hou</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Miolane</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Khanal</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>M. C. H.</given-names>
</name>
<name>
<surname>Alansary</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>McDonagh</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <source>Computing Cnn Loss and Gradients for Pose Estimation with Riemannian Geometry</source>. <publisher-loc>Granada, Spain</publisher-loc>: <publisher-name>Springer-Verlag</publisher-name>, <fpage>756</fpage>&#x2013;<lpage>764</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-00928-1_85</pub-id> </citation>
</ref>
<ref id="B46">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Hu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Bonmati</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Gibson</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Hipwell</surname>
<given-names>J.&#x20;H.</given-names>
</name>
<name>
<surname>Hawkes</surname>
<given-names>D. J.</given-names>
</name>
<name>
<surname>Bandula</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2016</year>).<article-title>2d-3d Registration Accuracy Estimation for Optimised Planning of Image-Guided Pancreatobiliary Interventions</article-title>. In <conf-name>International Conference on Medical Image Computing and Computer-Assisted Intervention</conf-name>. <publisher-name>Springer</publisher-name>, <fpage>516</fpage>&#x2013;<lpage>524</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-46720-7_60</pub-id> </citation>
</ref>
<ref id="B47">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Hu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Modat</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Gibson</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Ghavami</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Bonmati</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Moore</surname>
<given-names>C. M.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>).<article-title>Label-driven Weakly-Supervised Learning for Multimodal Deformable Image Registration</article-title>. In <conf-name>2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018)</conf-name>. <publisher-name>IEEE</publisher-name>, <fpage>1070</fpage>&#x2013;<lpage>1074</lpage>. </citation>
</ref>
<ref id="B48">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hummel</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Figl</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Bax</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Bergmann</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Birkfellner</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>2d/3d Registration of Endoscopic Ultrasound to Ct Volume Data</article-title>. <source>Phys. Med. Biol.</source> <volume>53</volume>, <fpage>4303</fpage>&#x2013;<lpage>4316</lpage>. <pub-id pub-id-type="doi">10.1088/0031-9155/53/16/006</pub-id> </citation>
</ref>
<ref id="B49">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Joskowicz</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Hazan</surname>
<given-names>E. J.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Computer Aided Orthopaedic Surgery: Incremental Shift or Paradigm Change?</article-title> </citation>
</ref>
<ref id="B50">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Karner</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Gsaxner</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Pepe</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Fleck</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Arth</surname>
<given-names>C.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <source>Single-shot Deep Volumetric Regression for mobile Medical Augmented Reality</source>. <publisher-loc>Lima, Peru</publisher-loc>: <publisher-name>Springer Science and Business Media Deutschland GmbH</publisher-name>, <fpage>64</fpage>&#x2013;<lpage>74</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-60946-7_7</pub-id> </citation>
</ref>
<ref id="B51">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Krizhevsky</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Sutskever</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Hinton</surname>
<given-names>G. E.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Imagenet Classification with Deep Convolutional Neural Networks</article-title>. <source>Adv. Neural Inf. Process. Syst.</source> <volume>25</volume>, <fpage>1097</fpage>&#x2013;<lpage>1105</lpage>. </citation>
</ref>
<ref id="B52">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Le Folgoc</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Delingette</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Criminisi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Ayache</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Quantifying Registration Uncertainty with Sparse Bayesian Modelling</article-title>. <source>IEEE Trans. Med. Imaging</source> <volume>36</volume>, <fpage>607</fpage>&#x2013;<lpage>617</lpage>. <pub-id pub-id-type="doi">10.1109/TMI.2016.2623608</pub-id> </citation>
</ref>
<ref id="B53">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Leonard</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Reiter</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Sinha</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Ishii</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>R. H.</given-names>
</name>
<name>
<surname>Hager</surname>
<given-names>G. D.</given-names>
</name>
</person-group> (<year>2016</year>).<article-title>Image-based Navigation for Functional Endoscopic Sinus Surgery Using Structure from Motion</article-title>. In <conf-name>Medical Imaging 2016: Image Processing</conf-name>. <publisher-loc>Bellingham, WA</publisher-loc>: <publisher-name>International Society for Optics and Photonics</publisher-name>, <fpage>97840V</fpage>. <pub-id pub-id-type="doi">10.1117/12.2217279</pub-id> </citation>
</ref>
<ref id="B54">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lepetit</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Moreno-Noguer</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Fua</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>EPnP: An Accurate O(n) Solution to the PnP Problem</article-title>. <source>Int. J.&#x20;Comput. Vis.</source> <volume>81</volume>, <fpage>155</fpage>&#x2013;<lpage>166</lpage>. <pub-id pub-id-type="doi">10.1007/s11263-008-0152-6</pub-id> </citation>
</ref>
<ref id="B55">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Pei</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Zha</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2020</year>). <source>Non-rigid 2d-3d Registration Using Convolutional Autoencoders</source>, <volume>2020</volume>. <publisher-loc>Iowa City, IA, United states</publisher-loc>: <publisher-name>IEEE Computer Society</publisher-name>, <fpage>700</fpage>&#x2013;<lpage>704</lpage>.</citation>
</ref>
<ref id="B56">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Liao</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>W.-A.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>S. K.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Multiview 2d/3d Rigid Registration via a point-of-interest Network for Tracking and Triangulation</source>, <volume>2019</volume>. <publisher-loc>Long Beach, CA, United states</publisher-loc>: <publisher-name>IEEE Computer Society</publisher-name>, <fpage>12630</fpage>&#x2013;<lpage>12639</lpage>.</citation>
</ref>
<ref id="B57">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liao</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Miao</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Chefd&#x27;Hotel</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>A Review of Recent Advances in Registration Techniques Applied to Minimally Invasive Therapy</article-title>. <source>IEEE Trans. Multimedia</source> <volume>15</volume>, <fpage>983</fpage>&#x2013;<lpage>1000</lpage>. <pub-id pub-id-type="doi">10.1109/tmm.2013.2244869</pub-id> </citation>
</ref>
<ref id="B58">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lin</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Winey</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Shape Distribution-Based 2d/3d Registration for Fast and Accurate 6 Degrees-Of-freedom Stereotactic Patient Positioning</article-title>. <source>Int. J.&#x20;Radiat. Oncology&#x2a;Biology&#x2a;Physics</source> <volume>84</volume>, <fpage>S724</fpage>. <pub-id pub-id-type="doi">10.1016/j.ijrobp.2012.07.1939</pub-id> </citation>
</ref>
<ref id="B59">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>D. C.</given-names>
</name>
<name>
<surname>Nocedal</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>1989</year>). <article-title>On the Limited Memory Bfgs Method for Large Scale Optimization</article-title>. <source>Math. programming</source> <volume>45</volume>, <fpage>503</fpage>&#x2013;<lpage>528</lpage>. <pub-id pub-id-type="doi">10.1007/bf01589116</pub-id> </citation>
</ref>
<ref id="B60">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Luo</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>H.-Q.</given-names>
</name>
<name>
<surname>Du</surname>
<given-names>Y.-P.</given-names>
</name>
<name>
<surname>Cheng</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Towards Multiple Instance Learning and hermann Weyls Discrepancy for Robust Image-Guided Bronchoscopic Intervention</source>. <publisher-loc>Shenzhen, China</publisher-loc>: <publisher-name>Springer Science and Business Media Deutschland GmbHLNCS</publisher-name>, <fpage>403</fpage>&#x2013;<lpage>411</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-32254-0_45</pub-id> </citation>
</ref>
<ref id="B61">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Maes</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Collignon</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Vandermeulen</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Marchal</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Suetens</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>1997</year>). <article-title>Multimodality Image Registration by Maximization of Mutual Information</article-title>. <source>IEEE Trans. Med. Imaging</source> <volume>16</volume>, <fpage>187</fpage>&#x2013;<lpage>198</lpage>. <pub-id pub-id-type="doi">10.1109/42.563664</pub-id> </citation>
</ref>
<ref id="B62">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Markelj</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Toma&#x17e;evi&#x10d;</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Likar</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Pernu&#x161;</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>A Review of 3d/2d Registration Methods for Image-Guided Interventions</article-title>. <source>Med. image Anal.</source> <volume>16</volume>, <fpage>642</fpage>&#x2013;<lpage>661</lpage>. <pub-id pub-id-type="doi">10.1016/j.media.2010.03.005</pub-id> </citation>
</ref>
<ref id="B63">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mezger</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Jendrewski</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Bartels</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Navigation in Surgery</article-title>. <source>Langenbecks Arch. Surg.</source> <volume>398</volume>, <fpage>501</fpage>&#x2013;<lpage>514</lpage>. <pub-id pub-id-type="doi">10.1007/s00423-013-1059-4</pub-id> </citation>
</ref>
<ref id="B64">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Miao</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Piat</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Fischer</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Tuysuzoglu</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Mewes</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Mansi</surname>
<given-names>T.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <source>Dilated Fcn for Multi-Agent 2d/3d Medical Image Registration</source>. <publisher-loc>New Orleans, LA, United states</publisher-loc>: <publisher-name>AAAI press</publisher-name>, <fpage>4694</fpage>&#x2013;<lpage>4701</lpage>.</citation>
</ref>
<ref id="B65">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Miao</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Z. J.</given-names>
</name>
<name>
<surname>Liao</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2016a</year>). <article-title>A Cnn Regression Approach for Real-Time 2d/3d Registration</article-title>. <source>IEEE Trans. Med. Imaging</source> <volume>35</volume>, <fpage>1352</fpage>&#x2013;<lpage>1363</lpage>. <pub-id pub-id-type="doi">10.1109/tmi.2016.2521800</pub-id> </citation>
</ref>
<ref id="B66">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Miao</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Z. J.</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Liao</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2016b</year>). <source>Real-time 2d/3d Registration via Cnn Regression</source>, <volume>2016</volume>. <publisher-loc>Prague, Czech republic</publisher-loc>: <publisher-name>IEEE Computer Society</publisher-name>, <fpage>1430</fpage>&#x2013;<lpage>1434</lpage>.</citation>
</ref>
<ref id="B67">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mirota</surname>
<given-names>D. J.</given-names>
</name>
<name>
<surname>Ishii</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Hager</surname>
<given-names>G. D.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Vision-based Navigation in Image-Guided Interventions</article-title>. <source>Annu. Rev. Biomed. Eng.</source> <volume>13</volume>, <fpage>297</fpage>&#x2013;<lpage>319</lpage>. <pub-id pub-id-type="doi">10.1146/annurev-bioeng-071910-124757</pub-id> </citation>
</ref>
<ref id="B68">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mitrovi&#x107;</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Pernu&#x161;</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Likar</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>&#x160;piclin</surname>
<given-names>&#x17d;.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Simultaneous 3D-2D Image Registration and C&#x2010;arm Calibration: Application to Endovascular Image&#x2010;guided Interventions</article-title>. <source>Med. Phys.</source> <volume>42</volume>, <fpage>6433</fpage>&#x2013;<lpage>6447</lpage>. <pub-id pub-id-type="doi">10.1118/1.4932626</pub-id> </citation>
</ref>
<ref id="B69">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Mitrovi&#x107;</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>&#x160;piclin</surname>
<given-names>&#x17d;.</given-names>
</name>
<name>
<surname>Likar</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Pernu&#x161;</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2014</year>). <source>Automatic Detection of Misalignment in Rigid 3d-2d Registration</source>, <volume>8361</volume>. <publisher-loc>Nagoya, Japan</publisher-loc>: <publisher-name>Springer-Verlag</publisher-name>, <fpage>117</fpage>&#x2013;<lpage>124</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-14127-5_15</pub-id> </citation>
</ref>
<ref id="B70">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Moher</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Liberati</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Tetzlaff</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Altman</surname>
<given-names>D. G.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Preferred Reporting Items for Systematic Reviews and Meta-Analyses: the Prisma Statement</article-title>. <source>BMJ</source> <volume>339</volume>, <fpage>b2535</fpage>. <pub-id pub-id-type="doi">10.1136/bmj.b2535</pub-id> </citation>
</ref>
<ref id="B71">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Neumann</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Tonnies</surname>
<given-names>K. D.</given-names>
</name>
<name>
<surname>Pohle-Frohlich</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2020</year>). <source>Deep Similarity Learning Using a Siamese Resnet Trained on Similarity Labels from Disparity Maps of Cerebral Mra Mip Pairs</source>, <volume>11313</volume>. <publisher-loc>Houston, TX, United states</publisher-loc>: <publisher-name>The Society of Photo&#x2013;Optical Instrumentation Engineers.</publisher-name>
</citation>
</ref>
<ref id="B72">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nolte</surname>
<given-names>L.-P.</given-names>
</name>
<name>
<surname>Slomczykowski</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>Berlemann</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Strauss</surname>
<given-names>M. J.</given-names>
</name>
<name>
<surname>Hofstetter</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Schlenzka</surname>
<given-names>D.</given-names>
</name>
<etal/>
</person-group> (<year>2000</year>). <article-title>A New Approach to Computer-Aided Spine Surgery: Fluoroscopy-Based Surgical Navigation</article-title>. <source>E Spine J.</source> <volume>9</volume>, <fpage>S078</fpage>&#x2013;<lpage>S088</lpage>. <pub-id pub-id-type="doi">10.1007/pl00010026</pub-id> </citation>
</ref>
<ref id="B73">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Otake</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Esnault</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Grupp</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Kosugi</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Sato</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2016</year>).<article-title>Robust Patella Motion Tracking Using Intensity-Based 2d-3d Registration on Dynamic Bi-plane Fluoroscopy: towards Quantitative Assessment in Mpfl Reconstruction Surgery</article-title>. In <conf-name>Medical Imaging 2016: Image-Guided Procedures, Robotic Interventions, and Modeling</conf-name>. <publisher-name>International Society for Optics and Photonics</publisher-name>, <fpage>97860B</fpage>. <pub-id pub-id-type="doi">10.1117/12.2214699</pub-id> </citation>
</ref>
<ref id="B74">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Otake</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>A. S.</given-names>
</name>
<name>
<surname>Webster Stayman</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Uneri</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Kleinszig</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Vogt</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2013</year>). <article-title>Robust 3D-2D Image Registration: Application to Spine Interventions and Vertebral Labeling in the Presence of Anatomical Deformation</article-title>. <source>Phys. Med. Biol.</source> <volume>58</volume>, <fpage>8535</fpage>&#x2013;<lpage>8553</lpage>. <pub-id pub-id-type="doi">10.1088/0031-9155/58/23/8535</pub-id> </citation>
</ref>
<ref id="B75">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Pei</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Qin</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>T.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <source>Non-rigid Craniofacial 2d-3d Registration Using Cnn-Based Regression</source>. <publisher-loc>Quebec City, QC, Canada</publisher-loc>: <publisher-name>Springer-Verlag</publisher-name>, <fpage>117</fpage>&#x2013;<lpage>125</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-67558-9_14</pub-id> </citation>
</ref>
<ref id="B76">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pfandler</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Stefan</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Mehren</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Lazarovici</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Weigl</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Technical and Nontechnical Skills in Surgery</article-title>. <source>Spine</source> <volume>44</volume>, <fpage>E1396</fpage>&#x2013;<lpage>E1400</lpage>. <pub-id pub-id-type="doi">10.1097/brs.0000000000003154</pub-id> </citation>
</ref>
<ref id="B77">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Picard</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Clarke</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Deep</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Gregori</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Computer Assisted Knee Replacement Surgery: Is the Movement Mainstream?</article-title> <source>Orthop. Muscular Syst.</source> <volume>3</volume>. </citation>
</ref>
<ref id="B78">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Pluim</surname>
<given-names>J.&#x20;P.</given-names>
</name>
<name>
<surname>Muenzing</surname>
<given-names>S. E.</given-names>
</name>
<name>
<surname>Eppenhof</surname>
<given-names>K. A.</given-names>
</name>
<name>
<surname>Murphy</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>The Truth Is Hard to Make: Validation of Medical Image Registration</article-title>. In <conf-name>2016 23rd International Conference on Pattern Recognition (ICPR)</conf-name>. <publisher-name>IEEE</publisher-name>, <fpage>2294</fpage>&#x2013;<lpage>2300</lpage>. <pub-id pub-id-type="doi">10.1109/icpr.2016.7899978</pub-id> </citation>
</ref>
<ref id="B79">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Powell</surname>
<given-names>M. J.</given-names>
</name>
</person-group> (<year>2009</year>). <source>The Bobyqa Algorithm for Bound Constrained Optimization without derivativesCambridge NA Report NA2009/06</source>. <publisher-loc>Cambridge</publisher-loc>: <publisher-name>University of Cambridge</publisher-name>, <fpage>26</fpage>&#x2013;<lpage>46</lpage>.</citation>
</ref>
<ref id="B80">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Risholm</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Janoos</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Norton</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Golby</surname>
<given-names>A. J.</given-names>
</name>
<name>
<surname>Wells</surname>
<given-names>W. M.</given-names>
<suffix>III</suffix>
</name>
</person-group> (<year>2013</year>). <article-title>Bayesian Characterization of Uncertainty in Intra-subject Non-rigid Registration</article-title>. <source>Med. image Anal.</source> <volume>17</volume>, <fpage>538</fpage>&#x2013;<lpage>555</lpage>. <pub-id pub-id-type="doi">10.1016/j.media.2013.03.002</pub-id> </citation>
</ref>
<ref id="B81">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schaffert</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Fischer</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Borsdorf</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Maier</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2020a</year>). <article-title>Learning an Attention Model for Robust 2-D/3-D Registration Using point-to-plane Correspondences</article-title>. <source>IEEE Trans. Med. Imaging</source> <volume>39</volume>, <fpage>3159</fpage>&#x2013;<lpage>3174</lpage>. <pub-id pub-id-type="doi">10.1109/tmi.2020.2988410</pub-id> </citation>
</ref>
<ref id="B82">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Schaffert</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Fischer</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Borsdorf</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Maier</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Metric-driven Learning of Correspondence Weighting for 2-D/3-D Image Registration</source>, <volume>11269</volume>. <publisher-loc>Stuttgart, Germany</publisher-loc>: <publisher-name>Springer-Verlag</publisher-name>, <fpage>140</fpage>&#x2013;<lpage>152</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-12939-2_11</pub-id> </citation>
</ref>
<ref id="B83">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Schaffert</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Wei&#xdf;</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Borsdorf</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Maier</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2020b</year>). <source>Learning-based Correspondence Estimation for 2-D/3-D Registration</source>. <publisher-loc>Berlin, Germany</publisher-loc>: <publisher-name>Springer Science and Business Media Deutschland GmbH</publisher-name>, <fpage>222</fpage>&#x2013;<lpage>228</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-658-29267-6_50</pub-id> </citation>
</ref>
<ref id="B84">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Schultz</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Handels</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Ehrhardt</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>A Multilevel Markov Chain Monte Carlo Approach for Uncertainty Quantification in Deformable Registration</article-title>. In <conf-name>Medical Imaging 2018: Image Processing</conf-name>, <volume>10574</volume>. <publisher-loc>Bellingham, WA</publisher-loc>: <publisher-name>International Society for Optics and Photonics)</publisher-name>, <fpage>105740O</fpage>. <pub-id pub-id-type="doi">10.1117/12.2293588</pub-id> </citation>
</ref>
<ref id="B85">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shetty</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Birkhold</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Strobel</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Egger</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Jaganathan</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kowarschik</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Deep Learning Compatible Differentiable X-ray Projections for Inverse Rendering</article-title>. <comment>arXiv preprint arXiv:2102.02912</comment>
<pub-id pub-id-type="doi">10.1007/978-3-658-33198-6_70</pub-id> </citation>
</ref>
<ref id="B86">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sinha</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Ishii</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Hager</surname>
<given-names>G. D.</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>R. H.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Endoscopic Navigation in the Clinic: Registration in the Absence of Preoperative Imaging</article-title>. <source>Int. J.&#x20;CARS</source> <volume>14</volume>, <fpage>1495</fpage>&#x2013;<lpage>1506</lpage>. <pub-id pub-id-type="doi">10.1007/s11548-019-02005-0</pub-id> </citation>
</ref>
<ref id="B87">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sugano</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2003</year>). <article-title>Computer-assisted Orthopedic Surgery</article-title>. <source>J.&#x20;Orthopaedic Sci.</source> <volume>8</volume>, <fpage>442</fpage>&#x2013;<lpage>448</lpage>. <pub-id pub-id-type="doi">10.1007/s10776-002-0623-6</pub-id> </citation>
</ref>
<ref id="B88">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Tang</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Scalzo</surname>
<given-names>F.</given-names>
</name>
</person-group> (<year>2016</year>). <source>Similarity Metric Learning for 2d to 3d Registration of Brain Vasculature</source>, <volume>10072</volume>. <publisher-loc>Las Vegas, NV, United states</publisher-loc>: <publisher-name>Springer-Verlag</publisher-name>, <fpage>3</fpage>&#x2013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-50835-1_1</pub-id> </citation>
</ref>
<ref id="B89">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Thies</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Z&#xe4;ch</surname>
<given-names>J.-N.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Navab</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Maier</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>A Learning-Based Method for Online Adjustment of C-Arm Cone-Beam Ct Source Trajectories for Artifact Avoidance</article-title>. <source>Int. J.&#x20;CARS</source> <volume>15</volume>, <fpage>1787</fpage>&#x2013;<lpage>1796</lpage>. <pub-id pub-id-type="doi">10.1007/s11548-020-02249-1</pub-id> </citation>
</ref>
<ref id="B90">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Toth</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Miao</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kurzendorfer</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Rinaldi</surname>
<given-names>C. A.</given-names>
</name>
<name>
<surname>Liao</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Mansi</surname>
<given-names>T.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>3d/2d Model-To-Image Registration by Imitation Learning for Cardiac Procedures</article-title>. <source>Int. J.&#x20;CARS</source> <volume>13</volume>, <fpage>1141</fpage>&#x2013;<lpage>1149</lpage>. <pub-id pub-id-type="doi">10.1007/s11548-018-1774-y</pub-id> </citation>
</ref>
<ref id="B91">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Tucker</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Fotouhi</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Unberath</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>S. C.</given-names>
</name>
<name>
<surname>Fuerst</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>).<article-title>Towards Clinical Translation of Augmented Orthopedic Surgery: from Pre-op Ct to Intra-op X-ray via Rgbd Sensing</article-title>. In <conf-name>Medical Imaging 2018: Imaging Informatics for Healthcare, Research, and Applications</conf-name>. <publisher-loc>Bellingham, WA</publisher-loc>: <publisher-name>International Society for Optics and Photonics</publisher-name>, <fpage>105790J</fpage>. <pub-id pub-id-type="doi">10.1117/12.2293675</pub-id> </citation>
</ref>
<ref id="B92">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Unberath</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Zaech</surname>
<given-names>J.-N.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Bier</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Goldmann</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>S. C.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Enabling Machine Learning in x-ray-based Procedures via Realistic Simulation of Image Formation</article-title>. <source>Int. J.&#x20;CARS</source> <volume>14</volume>, <fpage>1517</fpage>&#x2013;<lpage>1528</lpage>. <pub-id pub-id-type="doi">10.1007/s11548-019-02011-2</pub-id> </citation>
</ref>
<ref id="B93">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Unberath</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Zaech</surname>
<given-names>J.-N.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>S. C.</given-names>
</name>
<name>
<surname>Bier</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Fotouhi</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Armand</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>).<article-title>DeepDRR - A Catalyst for Machine Learning in Fluoroscopy-Guided Procedures</article-title>. In <conf-name>International Conference on Medical Image Computing and Computer-Assisted Intervention</conf-name>. <publisher-name>Springer</publisher-name>, <fpage>98</fpage>&#x2013;<lpage>106</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-00937-3_12</pub-id> </citation>
</ref>
<ref id="B94">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Uneri</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>De Silva</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Goerres</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Jacobson</surname>
<given-names>M. W.</given-names>
</name>
<name>
<surname>Ketcha</surname>
<given-names>M. D.</given-names>
</name>
<name>
<surname>Reaungamornrat</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <article-title>Intraoperative Evaluation of Device Placement in Spine Surgery Using Known-Component 3D-2D Image Registration</article-title>. <source>Phys. Med. Biol.</source> <volume>62</volume>, <fpage>3330</fpage>&#x2013;<lpage>3351</lpage>. <pub-id pub-id-type="doi">10.1088/1361-6560/aa62c5</pub-id> </citation>
</ref>
<ref id="B95">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Uneri</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Otake</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>A. S.</given-names>
</name>
<name>
<surname>Kleinszig</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Vogt</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Khanna</surname>
<given-names>A. J.</given-names>
</name>
<etal/>
</person-group> (<year>2013</year>). <article-title>3D-2D Registration for Surgical Guidance: Effect of Projection View Angles on Registration Accuracy</article-title>. <source>Phys. Med. Biol.</source> <volume>59</volume>, <fpage>271</fpage>&#x2013;<lpage>287</lpage>. <pub-id pub-id-type="doi">10.1088/0031-9155/59/2/271</pub-id> </citation>
</ref>
<ref id="B96">
<citation citation-type="book">
<collab>US Food and Drug Administration</collab> (<year>2021</year>). <source>Artificial Intelligence/machine Learning (Ai/ml)-based Software as a Medical Device (Samd) Action Plan</source>. <publisher-loc>White Oak, MD, USA</publisher-loc>: <publisher-name>US Food Drug Admin.</publisher-name> <comment>Tech. Rep 145022.</comment>
</citation>
</ref>
<ref id="B97">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>van der List</surname>
<given-names>J.&#x20;P.</given-names>
</name>
<name>
<surname>Chawla</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Joskowicz</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Pearle</surname>
<given-names>A. D.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Current State of Computer Navigation and Robotics in Unicompartmental and Total Knee Arthroplasty: a Systematic Review with Meta-Analysis</article-title>. <source>Knee Surg. Sports Traumatol. Arthrosc.</source> <volume>24</volume>, <fpage>3482</fpage>&#x2013;<lpage>3495</lpage>. <pub-id pub-id-type="doi">10.1007/s00167-016-4305-9</pub-id> </citation>
</ref>
<ref id="B98">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Varnavas</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Carrell</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Penney</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2015a</year>). <article-title>Fully Automated 2D-3D Registration and Verification</article-title>. <source>Med. image Anal.</source> <volume>26</volume>, <fpage>108</fpage>&#x2013;<lpage>119</lpage>. <pub-id pub-id-type="doi">10.1016/j.media.2015.08.005</pub-id> </citation>
</ref>
<ref id="B99">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Varnavas</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Carrell</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Penney</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2015b</year>). <article-title>Fully Automated 2d-3d Registration and Verification</article-title>. <source>Med. image Anal.</source> <volume>26</volume>, <fpage>108</fpage>&#x2013;<lpage>119</lpage>. <pub-id pub-id-type="doi">10.1016/j.media.2015.08.005</pub-id> </citation>
</ref>
<ref id="B100">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Varnavas</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Carrell</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Penney</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2013</year>). <source>Fully Automated Initialisation of 2d-3d Image Registration</source>. <publisher-loc>San Francisco, CA, United states</publisher-loc>: <publisher-name>IEEE Computer Society</publisher-name>, <fpage>568</fpage>&#x2013;<lpage>571</lpage>.</citation>
</ref>
<ref id="B101">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vercauteren</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Unberath</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Padoy</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Navab</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>CAI4CAI: The Rise of Contextual Artificial Intelligence in Computer Assisted Interventions</article-title>. <source>Proc. IEEE Inst. Electr. Electron. Eng.</source> <volume>108</volume>, <fpage>198</fpage>&#x2013;<lpage>214</lpage>. <pub-id pub-id-type="doi">10.1109/JPROC.2019.2946993</pub-id> </citation>
</ref>
<ref id="B102">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>L.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Multi-view point-based Registration for Native Knee Kinematics Measurement with Feature Transfer Learning.</article-title> </citation>
</ref>
<ref id="B103">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Fatah</surname>
<given-names>E. E.</given-names>
</name>
<name>
<surname>Mahfouz</surname>
<given-names>M. R.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Fully Automatic Initialization of Two-Dimensional-Three-Dimensional Medical Image Registration Using Hybrid Classifier</article-title>. <source>J.&#x20;Med. Imaging (Bellingham)</source> <volume>2</volume>, <fpage>024007</fpage>. <pub-id pub-id-type="doi">10.1117/1.JMI.2.2.024007</pub-id> </citation>
</ref>
<ref id="B104">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Su</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>A Neural Network-Based 2d/3d Image Registration Quality Evaluator for Pediatric Patient Setup in External Beam Radiotherapy</article-title>. <source>J.&#x20;Appl. Clin. Med. Phys.</source> <volume>17</volume>, <fpage>22</fpage>&#x2013;<lpage>33</lpage>. <pub-id pub-id-type="doi">10.1120/jacmp.v17i1.5235</pub-id> </citation>
</ref>
<ref id="B105">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>J.&#x20;Y.</given-names>
</name>
<name>
<surname>Tamhane</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Kazanzides</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Unberath</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Cross-modal Self-Supervised Representation Learning for Gesture and Skill Recognition in Robotic Surgery</article-title>. <source>Int. J.&#x20;Comp. Assist. Radiol. Surg.</source>, <fpage>1</fpage>&#x2013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1007/s11548-021-02343-y</pub-id> </citation>
</ref>
<ref id="B106">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xiangqian</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Xiaoqing</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Gang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Yubo</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>2d/3d Medical Image Registration Using Convolutional Neural Network</article-title>. <source>Chin. J.&#x20;Biomed. Eng.</source> <volume>39</volume>, <fpage>394</fpage>&#x2013;<lpage>403</lpage>. </citation>
</ref>
<ref id="B107">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Xie</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Meng</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Guan</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Q.</given-names>
</name>
</person-group> (<year>2017</year>). <source>Single Shot 2d3d Image Regisraton</source>, <volume>2018</volume>. <publisher-loc>Shanghai, China</publisher-loc>: <publisher-name>Institute of Electrical and Electronics Engineers Inc</publisher-name>, <fpage>1</fpage>&#x2013;<lpage>5</lpage>.</citation>
</ref>
<ref id="B108">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Kwitt</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Styner</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Niethammer</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Quicksilver: Fast Predictive Image Registration - A Deep Learning Approach</article-title>. <source>NeuroImage</source> <volume>158</volume>, <fpage>378</fpage>&#x2013;<lpage>396</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2017.07.008</pub-id> </citation>
</ref>
<ref id="B109">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2019</year>). <source>A Novel Neurosurgery Registration Pipeline Based on Heat Maps and Anatomic Facial Feature Points</source>. <publisher-loc>Huaqiao, China</publisher-loc>: <publisher-name>Institute of Electrical and Electronics Engineers Inc).</publisher-name>
</citation>
</ref>
<ref id="B110">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yi</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Ramchandran</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Siewerdsen</surname>
<given-names>J.&#x20;H.</given-names>
</name>
<name>
<surname>Uneri</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Robotic Drill Guide Positioning Using Known-Component 3D-2D Image Registration</article-title>. <source>J.&#x20;Med. Imaging (Bellingham)</source> <volume>5</volume>, <fpage>021212</fpage>. <pub-id pub-id-type="doi">10.1117/1.JMI.5.2.021212</pub-id> </citation>
</ref>
<ref id="B111">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Yokota</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Okada</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Takao</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Sugano</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Tada</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tomiyama</surname>
<given-names>N.</given-names>
</name>
<etal/>
</person-group> (<year>2013</year>).<article-title>Automated Ct Segmentation of Diseased Hip Using Hierarchical and Conditional Statistical Shape Models</article-title>. In <conf-name>International Conference on Medical Image Computing and Computer-Assisted Intervention</conf-name>. <publisher-name>Springer</publisher-name>, <fpage>190</fpage>&#x2013;<lpage>197</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-642-40763-5_24</pub-id> </citation>
</ref>
<ref id="B112">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Zaech</surname>
<given-names>J.-N.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Bier</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Maier</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Navab</surname>
<given-names>N.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>).<article-title>Learning to Avoid Poor Images: Towards Task-Aware C-Arm Cone-Beam Ct Trajectories</article-title>. In <conf-name>International Conference on Medical Image Computing and Computer-Assisted Intervention</conf-name>. <publisher-name>Springer</publisher-name>, <fpage>11</fpage>&#x2013;<lpage>19</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-32254-0_2</pub-id> </citation>
</ref>
<ref id="B113">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zapaishchykova</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Dreizin</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>J.&#x20;Y.</given-names>
</name>
<name>
<surname>Roohi</surname>
<given-names>S. F.</given-names>
</name>
<name>
<surname>Unberath</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>An Interpretable Approach to Automated Severity Scoring in Pelvic Trauma</article-title>. </citation>
</ref>
<ref id="B114">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Automatic Cone Beam Projection-Based Liver Tumor Localization by Deep Learning and Biomechanical Modeling</article-title>. <source>Int. J.&#x20;Radiat. Oncology&#x2a;Biology&#x2a;Physics</source> <volume>108</volume>, <fpage>S171</fpage>. <pub-id pub-id-type="doi">10.1016/j.ijrobp.2020.07.946</pub-id> </citation>
</ref>
<ref id="B115">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Pei</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Qin</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>T.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <source>Temporal Consistent 2d-3d Registration of Lateral Cephalograms and Cone-Beam Computed Tomography Images</source>, <volume>11046</volume>. <publisher-loc>Granada, Spain</publisher-loc>: <publisher-name>Springer-Verlag</publisher-name>, <fpage>371</fpage>&#x2013;<lpage>379</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-00919-9_43</pub-id> </citation>
</ref>
<ref id="B116">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Chou</surname>
<given-names>C.-R.</given-names>
</name>
<name>
<surname>Mageras</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Pizer</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Local Metric Learning in 2d/3d Deformable Registration with Application in the Abdomen</article-title>. <source>IEEE Trans. Med. Imaging</source> <volume>33</volume>, <fpage>1592</fpage>&#x2013;<lpage>1600</lpage>. <pub-id pub-id-type="doi">10.1109/tmi.2014.2319193</pub-id> </citation>
</ref>
<ref id="B117">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zheng</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Miao</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Jane Wang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Liao</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Pairwise Domain Adaptation Module for Cnn-Based 2-D/3-D Registration</article-title>. <source>J.&#x20;Med. Imaging (Bellingham)</source> <volume>5</volume>, <fpage>021204</fpage>. <pub-id pub-id-type="doi">10.1117/1.JMI.5.2.021204</pub-id> </citation>
</ref>
<ref id="B118">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhou</surname>
<given-names>D.-X.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Universality of Deep Convolutional Neural Networks</article-title>. <source>Appl. Comput. harmonic Anal.</source> <volume>48</volume>, <fpage>787</fpage>&#x2013;<lpage>794</lpage>. <pub-id pub-id-type="doi">10.1016/j.acha.2019.06.004</pub-id> </citation>
</ref>
<ref id="B119">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Ai</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Fan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>Y.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Iterative Closest&#x20;Graph Matching for Non-rigid 3d/2d Coronary Arteries Registration</article-title>. <source>Comp. Methods Programs Biomed.</source> <volume>199</volume>, <fpage>105901</fpage>. <pub-id pub-id-type="doi">10.1016/j.cmpb.2020.105901</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>