<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="review-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Med. Technol.</journal-id>
<journal-title>Frontiers in Medical Technology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Med. Technol.</abbrev-journal-title>
<issn pub-type="epub">2673-3129</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fmedt.2021.767836</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Medical Technology</subject>
<subj-group>
<subject>Review</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Deep Learning for Automatic Image Segmentation in Stomatology and Its Clinical Application</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Luo</surname> <given-names>Dan</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/1460347/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Zeng</surname> <given-names>Wei</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/1569592/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Chen</surname> <given-names>Jinlong</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/1569660/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Tang</surname> <given-names>Wei</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1460336/overview"/>
</contrib>
</contrib-group>
<aff><institution>The State Key Laboratory of Oral Diseases and National Clinical Research Center for Oral Diseases &#x00026; Department of Oral and Maxillofacial Surgery, West China College of Stomatology, Sichuan University</institution>, <addr-line>Chengdu</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Shuaiqi Liu, Hebei University, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Wenzheng Bao, Xuzhou University of Technology, China; Geng Peng, Shijiazhuang Tiedao University, China</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Wei Tang <email>mydrtw&#x00040;vip.sina.com</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Medtech Data Analytics, a section of the journal Frontiers in Medical Technology</p></fn></author-notes>
<pub-date pub-type="epub">
<day>13</day>
<month>12</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>3</volume>
<elocation-id>767836</elocation-id>
<history>
<date date-type="received">
<day>31</day>
<month>08</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>29</day>
<month>10</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2021 Luo, Zeng, Chen and Tang.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Luo, Zeng, Chen and Tang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract><p>Deep learning has become an active research topic in the field of medical image analysis. In particular, for the automatic segmentation of stomatological images, great advances have been made in segmentation performance. In this paper, we systematically reviewed the recent literature on segmentation methods for stomatological images based on deep learning, and their clinical applications. We categorized them into different tasks and analyze their advantages and disadvantages. The main categories that we explored were the data sources, backbone network, and task formulation. We categorized data sources into panoramic radiography, dental X-rays, cone-beam computed tomography, multi-slice spiral computed tomography, and methods based on intraoral scan images. For the backbone network, we distinguished methods based on convolutional neural networks from those based on transformers. We divided task formulations into semantic segmentation tasks and instance segmentation tasks. Toward the end of the paper, we discussed the challenges and provide several directions for further research on the automatic segmentation of stomatological images.</p></abstract>
<kwd-group>
<kwd>deep learning</kwd>
<kwd>convolutional neural networks</kwd>
<kwd>transformer</kwd>
<kwd>automatic segmentation</kwd>
<kwd>stomatological image</kwd>
</kwd-group>
<counts>
<fig-count count="5"/>
<table-count count="7"/>
<equation-count count="0"/>
<ref-count count="92"/>
<page-count count="15"/>
<word-count count="10351"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>Imaging examinations, intraoral scanning, and other technologies are often required to assist diagnosis and treatment of diseases because of the complex structure of the oral and maxillofacial region and various types of diseases. Imaging examinations use dental X-rays, panoramic radiography, cone-beam computed tomography (CBCT), and multi-slice spiral computed tomography (MSCT). These are widely used in stomatology and produce large amounts of medical image data. Efficient and accurate processing of medical images is essential for the development of stomatology. The key task is image segmentation, which can realize the localization and the (qualitative and quantitative) analysis of lesions, help to design a treatment plan, and analyze the efficacy of the treatment. The traditional manual segmentation method is time-consuming and the segmentation effect depends on the experience of the doctor, leading to an unsatisfactory result. Therefore, the application of modern image segmentation technology to stomatology is very important.</p>
<p>Deep learning (DL) is a branch of machine learning and is a promising method of achieving artificial intelligence. Owing to the availability of large-scale annotated data and powerful computing resources, DL-based medical image segmentation algorithms have achieved excellent performance. These methods have successfully assisted the accurate diagnosis and minimally invasive treatment of brain tumors (<xref ref-type="bibr" rid="B1">1</xref>), retinal vessels (<xref ref-type="bibr" rid="B2">2</xref>), pulmonary nodules (<xref ref-type="bibr" rid="B3">3</xref>), cartilage, and bone (<xref ref-type="bibr" rid="B4">4</xref>). This paper reviews current DL-based medical image segmentation methods and their applications in stomatology. Existing automatic segmentation algorithms are classified according to the data source, the form of the automatic segmentation task, and the structure of the backbone network of the algorithm. The feasibility, accuracy, and application prospects of these algorithms are comprehensively analyzed, and their future research prospects are discussed.</p>
</sec>
<sec id="s2">
<title>Stomatological Imaging Data Sources and Comparison</title>
<p>Common stomatological images can be categorized into five types: panoramic radiography, dental X-rays, CBCT, MSCT, and intraoral scanning (IOS). Each type is suitable for specific clinical applications, according to its unique imaging principles. Dental X-rays and panoramic radiography are mainly used for dental caries, alveolar bone resorption, and impacted teeth. CBCT is mainly used for the early diagnosis and comprehensive analysis of cracked teeth, diseases after root canal treatment, jaw lesions, and other diseases. CBCT can also assist the design of implant guides, orthodontic treatment, and maxillofacial disease treatment. MSCT is often conducted to assist the diagnosis, treatment, and postoperative efficacy analysis of soft and hard tissue lesions in the maxillofacial region. IOS is generally employed in chair-side digital restoration, digital orthodontics, and digital implant surgery. There is a structural overlap between dental X-rays and panoramic radiography because both produce 2D images. Without clinical experience in reading films, missed diagnoses and misdiagnoses may easily occur. Although CBCT and MSCT produce 3D images with clear layers, traditional empirical reading could also lead to missed diagnoses and misdiagnoses of early and minor lesions. <xref ref-type="table" rid="T1">Table 1</xref> shows the imaging characteristics, advantages, and disadvantages of different types of data, together with the clinical application prospects of DL.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>The imaging characteristics, advantages, and disadvantages of different data types and the prospects for clinical application of deep learning.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Data types</bold></th>
<th valign="top" align="left"><bold>Imaging characteristics</bold></th>
<th valign="top" align="left"><bold>Advantages</bold></th>
<th valign="top" align="left"><bold>Disadvantages</bold></th>
<th valign="top" align="left"><bold>Prospects for clinical application of deep learning</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Dental X-rays, panoramic radiography</td>
<td valign="top" align="left">2D</td>
<td valign="top" align="left">Easy to operate, low dose, fast imaging</td>
<td valign="top" align="left">Lack of 3D information</td>
<td valign="top" align="left">Assisting in diagnosing and screening diseases quickly and accurately. Reducing missed diagnosis and misdiagnosis.</td>
</tr>
<tr>
<td valign="top" align="left">CBCT</td>
<td valign="top" align="left">3D</td>
<td valign="top" align="left">High spatial resolution, short exposure time, low effective radiation dose and small metal artifacts</td>
<td valign="top" align="left">Low density resolution</td>
<td valign="top" align="left">1. Rapid and accurate segmentation of teeth or lesions can assist early diagnosis and reduce missed diagnosis and misdiagnosis.<break/> 2. Rapid and accurate dentition can meet the needs of implant guide plate design and orthodontic treatment design.</td>
</tr>
<tr>
<td valign="top" align="left">MSCT</td>
<td valign="top" align="left">3D</td>
<td valign="top" align="left">High density resolution</td>
<td valign="top" align="left">Low spatial resolution, long exposure time, high effective radiation dose and large metal artifacts</td>
<td valign="top" align="left">1. Reducing the missed diagnosis and misdiagnosis.<break/> 2. Automatically segmenting lesions or normal structures, can be used in intraoperative interaction, registration, and treatment design.</td>
</tr>
<tr>
<td valign="top" align="left">IOS</td>
<td valign="top" align="left">Surface 3D data</td>
<td valign="top" align="left">Obtaining the 3D data of tooth and soft and hard tissue surface in real time</td>
<td valign="top" align="left">Lack of internal data within soft and hard tissue</td>
<td valign="top" align="left">1. Pursing for segmenting tooth accurately.<break/> 2. Fast data-processing, obtaining results in a few seconds.</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3">
<title>Automatic Segmentation Algorithms for Medical Images</title>
<p>Image segmentation aims to simplify or change the representation of images, making them easier to understand and analyze. Image segmentation can be divided into the semantic segmentation task and the instance segmentation task. The semantic segmentation task focuses on differences between categories, whereas the instance segmentation task focuses on differences between individuals (<xref ref-type="fig" rid="F1">Figure 1</xref>). In the semantic segmentation task, it is required to separate the teeth, jaws, and background, without distinguishing between individuals in each category (&#x0201C;Tooth&#x0201D; or &#x0201C;Jaw).&#x0201D; Conversely, in the instance segmentation task, both the category label and the instance label (within the class) are required; that is, the individuals in each category (&#x0201C;Tooth&#x0201D; or &#x0201C;Jaw)&#x0201D; must be distinguished.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Task definitions for automatic image segmentation. <bold>(A)</bold> The original image. <bold>(B)</bold> Semantic segmentation: it is required to segment the teeth, jaws, and background, without the need to distinguish the individuals in the category &#x0201C;Tooth&#x0201D; or &#x0201C;Jaw.&#x0201D; <bold>(C)</bold> Instance segmentation: not only the category label is required, but also the instance label among the same class is needed, i.e., separating the individuals in the category &#x0201C;Tooth&#x0201D; or &#x0201C;Jaw&#x0201D;.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmedt-03-767836-g0001.tif"/>
</fig>
<p>Traditional image segmentation algorithms (<xref ref-type="bibr" rid="B5">5</xref>&#x02013;<xref ref-type="bibr" rid="B7">7</xref>) cannot be directly applied to complex scenes because of the limitations of their manually designed features. The emergence of DL has made it possible to segment medical images efficiently and effectively. Segmentation algorithms based on convolutional neural networks (CNNs) have already become the <italic>de facto</italic> standard in image segmentation tasks. Their excellent segmentation ability has been demonstrated experimentally and theoretically and can be further applied to medical images. In addition to the popularity of CNNs, the transformer structure (<xref ref-type="bibr" rid="B8">8</xref>), originating from the field of natural language processing, has become an active research topic in computer vision because of its excellent long-term modeling ability. Therefore, according to these different types of backbone networks, we divide automatic image segmentation methods into CNN-based methods and transformer-based methods.</p>
<p>We have collected and summarized about 30 articles on image segmentation tasks. An overview of these methods is shown in <xref ref-type="fig" rid="F2">Figure 2</xref>, and their evolution over time is depicted in <xref ref-type="fig" rid="F3">Figure 3</xref>.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>The overview of automatic segmentation algorithms. <bold>(A)</bold> For the backbone network, there are CNN-based and Transformer-based methods, the former includes AlexNet, VGG, GoogLeNet, ResNet, DenseNet, MobileNet, ShuffleNet, and EfficientNet, and the latter includes ViT, Data-efficient image Transformers (DeiT), Convolutional vision Transformer (CvT), and Swin-Transformer. <bold>(B)</bold> For the semantic segmentation, the CNN-based methods include FCN, SegNet, PSPNet, DeepLab (v1, v2, v3, v3&#x0002B;), UNet, VNet, and UNet&#x0002B;&#x0002B;, and the Transformer-based methods include SETR, Segmenter, SegFormer, Swin-UNet, Medical Transformer (MedT), UNETR, MBT-Net, TransUNet, and TransFuse. <bold>(C)</bold> The instance segmentation task also can be categorized into CNN-based and Transformer-based methods. Meanwhile, it can be divided into the detection-based and the detection-free instance segmentation methods, the former is divided into the single-stage (YOLCAT, YOLO, and SSD) and two-stage methods (Mask R-CNN, PANet, Cascade R-CNN, and HTC), and the latter includes SOLO, DWT, and DeepMask. The Transformer-based methods, such as cell-DETR, ISTR, belong to detection-based methods.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmedt-03-767836-g0002.tif"/>
</fig>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>The development of automatic image segmentation. The black represents CNN-based methods and the red shows Transformer-based methods.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmedt-03-767836-g0003.tif"/>
</fig>
<sec>
<title>Principles and Development of CNNs and Transformers</title>
<sec>
<title>CNNs</title>
<p>The main ingredients of a modern CNN are the convolution layer, nonlinear activation layer, pooling layer, and fully connected layer, of which the convolution layer is the core component. The main principles of the convolution layer are its local receptive fields and weights-sharing strategy; the first refers to the limited range of data within a sliding window, and the second refers to the shared parameters of convolution kernels despite the sliding windows. The pooling layer can reduce the resolution of extracted features, to reduce the amount of calculation, and select the most robust features to prevent overfitting. The fully connected layer refers to the connection between all nodes in two adjacent layers; such a layer can realize the integration and mapping of input features and is usually used to output classification results. The nonlinear activation layer provides nonlinearity to the neural network so that it can approximate all continuous functions.</p>
<p>AlexNet (<xref ref-type="bibr" rid="B9">9</xref>), an early CNN model, adopted the ReLU activation function to accelerate network convergence and the dropout technique to prevent overfitting. VGG (<xref ref-type="bibr" rid="B10">10</xref>) achieved better performance than AlexNet by replacing the large 5 &#x000D7; 5 convolution kernel with two continuous 3 &#x000D7; 3 convolution kernels and increasing the network depth. GoogLeNet (<xref ref-type="bibr" rid="B11">11</xref>) used the Inception module to increase the width of the network while reducing the number of parameters. Its subsequent version (<xref ref-type="bibr" rid="B12">12</xref>) improved performance by convolution decomposition, batch normalization, label smoothing, and other techniques. ResNet (<xref ref-type="bibr" rid="B13">13</xref>) solved the problem of network degradation by using skip connections and has been one of the most popular feature extractors in many vision tasks. DenseNet (<xref ref-type="bibr" rid="B14">14</xref>) made full use of extracted features by establishing dense connections between different layers. Moreover, lightweight models [e.g., MobileNet (<xref ref-type="bibr" rid="B15">15</xref>) and ShuffleNet (<xref ref-type="bibr" rid="B16">16</xref>)] and models designed by neural architecture search (NAS) [e.g., EfficientNet (<xref ref-type="bibr" rid="B17">17</xref>)] have already received widespread attention in the DL community.</p>
</sec>
<sec>
<title>Transformers</title>
<p>The transformer structure, which originated from natural language processing (<xref ref-type="bibr" rid="B18">18</xref>&#x02013;<xref ref-type="bibr" rid="B20">20</xref>), has recently attracted substantial attention from the vision community. The transformer (<xref ref-type="bibr" rid="B18">18</xref>) proposed the multi-head self-attention module and a feedforward network to model long-term relationships within input sequences; it also has an enhanced ability for parallel computing. Witnessing the power of transformers in natural language processing, some pioneering studies (<xref ref-type="bibr" rid="B21">21</xref>&#x02013;<xref ref-type="bibr" rid="B24">24</xref>) have successfully applied it to image classification, object detection, and segmentation tasks.</p>
<p>Vision Transformer (ViT) (<xref ref-type="bibr" rid="B21">21</xref>) split an image into patches and directly fed these patches into the standard transformer with positional embeddings, demonstrating that learning from large-scale data is better than inductive bias. Data-efficient image Transformers (DeiT) (<xref ref-type="bibr" rid="B22">22</xref>) achieved better performance by more careful training strategies and token-based extraction. Convolutional vision Transformer (CvT) (<xref ref-type="bibr" rid="B23">23</xref>) improved the performance and efficiency of ViT by introducing convolution into the ViT architecture. This was accomplished by two major modifications: the hierarchical transformer, containing convolutional token embedding, and a convolutional transformer block, using a convolutional projection. Swin-Transformer (<xref ref-type="bibr" rid="B24">24</xref>) is a hierarchical transformer whose features are calculated within a shifted window, providing higher efficiency by limiting the self-attention calculation to non-overlapping local windows and allowing cross-window connection; its computational complexity is linear with respect to image size. These features make Swin-Transformer compatible with a wide range of visual tasks, including image classification, object detection, and semantic segmentation.</p>
</sec>
</sec>
<sec>
<title>Common Algorithms for Semantic Segmentation</title>
<p>The aim of the semantic segmentation task is to assign a unique category label to each pixel or voxel in the image. Semantic segmentation can both identify and mark the boundaries of different categories, such as the boundaries of teeth and jaws. Depending on the backbone network used, CNN-based and transformer-based approaches have been developed (<xref ref-type="fig" rid="F4">Figure 4</xref>). CNN-based semantic segmentation algorithms include FCN (<xref ref-type="bibr" rid="B25">25</xref>), SegNet (<xref ref-type="bibr" rid="B26">26</xref>), Pyramid Scene Parsing Network (PSPNet) (<xref ref-type="bibr" rid="B27">27</xref>), DeepLab (v1, v2, v3, v3&#x0002B;) (<xref ref-type="bibr" rid="B28">28</xref>&#x02013;<xref ref-type="bibr" rid="B31">31</xref>), U-shaped Network (UNet) (<xref ref-type="bibr" rid="B32">32</xref>), 3D U-shaped Network (3D UNet) (<xref ref-type="bibr" rid="B33">33</xref>), V-shaped Network (VNet) (<xref ref-type="bibr" rid="B34">34</xref>), and UNet&#x0002B;&#x0002B; (<xref ref-type="bibr" rid="B35">35</xref>). Algorithms based on transformers include SEgmentation TRansformer (SETR) (<xref ref-type="bibr" rid="B36">36</xref>), Segmenter (<xref ref-type="bibr" rid="B37">37</xref>), SegFormer (<xref ref-type="bibr" rid="B38">38</xref>), Swin-UNet (<xref ref-type="bibr" rid="B39">39</xref>), Medical Transformer (MedT) (<xref ref-type="bibr" rid="B40">40</xref>), UNEt TRansformers (UNETR) (<xref ref-type="bibr" rid="B41">41</xref>), Multi-Branch Hybrid Transformer Network (MBT-Net) (<xref ref-type="bibr" rid="B42">42</xref>), TransUNet (<xref ref-type="bibr" rid="B43">43</xref>), and TransFuse (<xref ref-type="bibr" rid="B44">44</xref>).</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Structure of semantic segmentation network. <bold>(A)</bold> The CNN-based semantic segmentation approach, from UNet. <bold>(B)</bold> The Transformer-based semantic segmentation approach, from Swin-Transformer.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmedt-03-767836-g0004.tif"/>
</fig>
<sec>
<title>CNN-Based Algorithms</title>
<p>As an iconic in image semantic segmentation, FCN (<xref ref-type="bibr" rid="B25">25</xref>) replaced all full connection layers with convolutional layers to predict the dense segmentation map. In contrast to FCN, SegNet (<xref ref-type="bibr" rid="B26">26</xref>) performed nonlinear upsampling according to the index of the max-pooling of the corresponding encoder, where the spatial information of the encoding stage was maintained. PSPNet (<xref ref-type="bibr" rid="B27">27</xref>) obtained the global context information by aggregating the context information, to improve the parsing performance for complex scenes.</p>
<p>The DeepLab series focused on enlarging the receptive field and integrating multi-scale feature information. DeepLabV1 (<xref ref-type="bibr" rid="B28">28</xref>) used dilated convolution and conditional random fields to obtain more informative feature maps. DeepLabV2 (<xref ref-type="bibr" rid="B29">29</xref>) featured the atrous spatial pyramid pooling module (ASPP), which performed the atrous convolution of different sampling rates to obtain multi-scale feature representation. DeepLabV3 (<xref ref-type="bibr" rid="B30">30</xref>) achieved the effect of atrous convolutions, multi-grid, and ASPP. DeepLabV3&#x0002B; (<xref ref-type="bibr" rid="B31">31</xref>) used the encoder-decoder structure to perform segmentation tasks, and the depthwise separable convolution from Xception was introduced into the ASPP module.</p>
<p>UNet (<xref ref-type="bibr" rid="B32">32</xref>) was one of the most influential segmentation models dedicated to biomedical fields. Compared with FCN, its major contributions lay in its U-shaped symmetric network and an elastic deformation-based data augmentation strategy. The U-shaped network consisted of symmetric compression paths and expansion paths, and the elastic deformation effectively simulated the normal changes in cell morphology. 3D UNet (<xref ref-type="bibr" rid="B33">33</xref>) implemented a 3D image segmentation task by replacing the 2D convolution kernel in UNet (<xref ref-type="bibr" rid="B32">32</xref>) with a 3D convolution kernel. VNet (<xref ref-type="bibr" rid="B34">34</xref>) used a new loss function, termed Dice loss, to handle the limited number of annotated volumes available for training. UNet&#x0002B;&#x0002B; (<xref ref-type="bibr" rid="B35">35</xref>) introduced a built-in ensemble of UNets of varying depths and had redesigned skip connections to enhance performance for objects of varying sizes.</p>
</sec>
<sec>
<title>Transformer-Based Algorithms</title>
<p>SETR (<xref ref-type="bibr" rid="B36">36</xref>) used a pure transformer to encode an image to a sequence of patches, without the need for a convolution layer or resolution reduction and showed the power of the transformer structure for segmentation tasks. In Segmenter (<xref ref-type="bibr" rid="B37">37</xref>), the global context relationship was established from the first layer and a pointwise linear decoder was employed to obtain the semantic labels. SegFormer (<xref ref-type="bibr" rid="B38">38</xref>) combined the hierarchical transformer structure with a lightweight multi-layer perceptron decoder, without the need for positional encoding or a complex decoder.</p>
<p>Swin-UNet (<xref ref-type="bibr" rid="B39">39</xref>) unified UNet with a pure transformer structure for medical image segmentation tasks, by feeding tokenized image blocks into the symmetric transformer-based U-shaped encoder-decoder architecture with skip connections, and local and global cues were fully exploited. The successful application of Swin-UNet to multi-organ and cardiac segmentation tasks demonstrated the potential benefits of the transformer structure to medical image segmentation. MedT (<xref ref-type="bibr" rid="B40">40</xref>) featured the gated axial-attention model, in which an additional control mechanism was introduced into the self-attention module. In addition, a local-global training strategy (LoGo) was proposed to further improve performance. UNETR (<xref ref-type="bibr" rid="B41">41</xref>) employed pure transformers as an encoder to capture global multi-scale information effectively. The effectiveness of UNETR in 3D brain tumors and spleen tasks (CT and MRI modalities) was validated by experiments on the MSD dataset. MBT-Net (<xref ref-type="bibr" rid="B42">42</xref>) applied a multi-branch hybrid transformer network, which was composed of a body branch and an edge branch, to the corneal endothelial cell segmentation task. Other transformer-based methods for medical image segmentation include TransUNet (<xref ref-type="bibr" rid="B43">43</xref>) and TransFuse (<xref ref-type="bibr" rid="B44">44</xref>).</p>
</sec>
</sec>
<sec>
<title>Common Algorithms for Instance Segmentation</title>
<p>Depending on the backbone network used, instance segmentation methods can also be categorized into CNN-based and transformer-based methods. In addition, from the perspective of algorithms, instance segmentation methods can be divided into detection-based and detection-free methods. Detection-based methods can be regarded as extensions of object detection: they obtain bounding boxes by object detection methods and then perform segmentation within the bounding boxes. Moreover, the detection methods can be divided into single-stage and two-stage methods. Single-stage methods include You Only Look At Coefficient Ts (YOLCAT) (<xref ref-type="bibr" rid="B45">45</xref>), You Only Look Once (YOLO) (<xref ref-type="bibr" rid="B46">46</xref>), and Single Shot MultiBox Detector (SSD) (<xref ref-type="bibr" rid="B47">47</xref>). Two-stage methods include Mask R-CNN (<xref ref-type="bibr" rid="B48">48</xref>), PANet (<xref ref-type="bibr" rid="B49">49</xref>), Cascade R-CNN (<xref ref-type="bibr" rid="B50">50</xref>), and hybrid task cascade (HTC) (<xref ref-type="bibr" rid="B51">51</xref>). Detection-free methods first predict the embedding vector and then group the corresponding pixel points into a single instance by clustering; examples of such methods include Segmenting Objects by Locations (SOLO) (<xref ref-type="bibr" rid="B52">52</xref>), Deep Watershed Tranform (DWT) (<xref ref-type="bibr" rid="B53">53</xref>), and DeepMask (<xref ref-type="bibr" rid="B54">54</xref>).</p>
<p>Most existing transformer-based instance segmentation algorithms are built on a detection method, DETR (<xref ref-type="bibr" rid="B55">55</xref>), so they belong to the class of detection-based methods; such methods include Cell-DETR (<xref ref-type="bibr" rid="B56">56</xref>) and ISTR (<xref ref-type="bibr" rid="B57">57</xref>). We have collected 12 articles on instance segmentation tasks; the overall development is shown in <xref ref-type="fig" rid="F5">Figure 5</xref>.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Structure of instance segmentation network. <bold>(A)</bold> The CNN-based instance segmentation approach, from Mask R-CNN. <bold>(B)</bold> The Transformer-based instance segmentation approach, from ISTR.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmedt-03-767836-g0005.tif"/>
</fig>
<sec>
<title>CNN-Based Algorithms</title>
<p>Detection-based instance segmentation methods follow the principle of detecting first and then segmenting. The performance of such methods is heavily dependent on the performance of the object detector, and so a better detector would improve the quality of instance segmentation. As discussed above, detection methods can be divided into single-stage and two-stage methods. A typical example of a single-stage method is YOLCAT (<xref ref-type="bibr" rid="B45">45</xref>), which first generated multiple prototype masks, and then used the generated coefficient to combine prototype masks, to formulate the object detection and segmentation results. In addition, a popular single-stage object detector [e.g., YOLO (<xref ref-type="bibr" rid="B46">46</xref>)] could accomplish the instance segmentation task by adding the same mask branch. A typical example of a two-stage method is Mask R-CNN (<xref ref-type="bibr" rid="B48">48</xref>), which used RoIAlign for feature alignment and added an object mask prediction branch to Faster R-CNN (<xref ref-type="bibr" rid="B58">58</xref>). PANet (<xref ref-type="bibr" rid="B49">49</xref>) further aggregated the underlying and high-level features on the basis of Mask R-CNN and performed fusion operations by adaptive feature pooling for subsequent prediction. Cascade R-CNN (<xref ref-type="bibr" rid="B50">50</xref>) achieved the purpose of continuously optimizing the prediction results by cascading several detection networks with different IoU thresholds. HTC (<xref ref-type="bibr" rid="B51">51</xref>) had a multi-task and multi-stage hybrid cascade structure and incorporated a branch for semantic segmentation to enhance spatial context.</p>
<p>Detection-free instance segmentation methods learn the affinity relation by projecting each pixel onto embedding space, pushing pixels of different instances apart, and pulling pixels of the same instance closer in the embedding space; finally, a postprocessing step such as grouping can formulate the instance segmentation result. SOLO (<xref ref-type="bibr" rid="B52">52</xref>) was an end-to-end detection-free instance segmentation method, which could directly map the original input image to the required instance mask, eliminating the postprocessing requirements in detection. DWT (<xref ref-type="bibr" rid="B53">53</xref>) combined the traditional watershed transform segmentation algorithm with a CNN to perform instance segmentation. DeepMask (<xref ref-type="bibr" rid="B54">54</xref>) simultaneously generated a mask, indicating whether each pixel on a patch belongs to an object, and an objectiveness score, indicating the confidence of an object located at the center of the patch. Compared with detection-based instance segmentation methods, the performance of these detection-free methods is limited, and there is scope for improvement.</p>
</sec>
<sec>
<title>Transformer-Based Algorithms</title>
<p>Applying the transformer structure to instance segmentation tasks is a relatively new research area. Cell-DETR (<xref ref-type="bibr" rid="B56">56</xref>) was one of the first methods to apply the transformer structure to instance segmentation tasks on biomedical data, achieving performance comparable with that of the latest CNN-based instance segmentation methods, but having a smaller number of parameters and a simpler structure. ISTR (<xref ref-type="bibr" rid="B57">57</xref>), an end-to-end instance segmentation framework, predicted low-dimensional mask embeddings and assigned them with ground-truth mask embeddings to compute the set loss, achieving a significant performance gain for instance segmentation tasks by conducting a recurrent refinement strategy.</p>
</sec>
</sec>
<sec>
<title>Characteristics of Automatic Segmentation Algorithms</title>
<p>The characteristics of semantic segmentation algorithms are shown in <xref ref-type="table" rid="T2">Table 2</xref>, and those of instance segmentation algorithms are shown in <xref ref-type="table" rid="T3">Table 3</xref>. The code and the data of works of literature are shown in <xref ref-type="app" rid="A1">Appendix</xref>.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Features of semantic segmentation algorithms.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Models</bold></th>
<th valign="top" align="left"><bold>Features</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">FCN (<xref ref-type="bibr" rid="B25">25</xref>)</td>
<td valign="top" align="left">The first full convolution network in semantic segmentation task.<break/>Ignoring the global context information and having a relatively high usage of GPU memory.</td>
</tr>
<tr>
<td valign="top" align="left">SegNet (<xref ref-type="bibr" rid="B26">26</xref>)</td>
<td valign="top" align="left">Improving the segmentation performance at boundary, reducing the number of model parameters and calculation cost</td>
</tr>
<tr>
<td valign="top" align="left">PSPNet (<xref ref-type="bibr" rid="B27">27</xref>)</td>
<td valign="top" align="left">Taking the global context information into consideration, improving the segmentation of small objects and co-occurrent categories.</td>
</tr>
<tr>
<td valign="top" align="left">DeepLab series (<xref ref-type="bibr" rid="B28">28</xref>&#x02013;<xref ref-type="bibr" rid="B31">31</xref>)</td>
<td valign="top" align="left">V1: enlarging the receptive field by atrous convolution. V2: obtaining multi-scale feature by ASPP module. V3: exploring the effect of atrous convolutions, multi-grid, atrous spatial pyramid pooling, useful for small objects. V3&#x0002B;:utilizing the decoder module to refine the segmentation results especially along object boundaries, which is a faster and stronger encoder-decoder network.</td>
</tr>
<tr>
<td valign="top" align="left">UNet (<xref ref-type="bibr" rid="B32">32</xref>), 3D UNet (<xref ref-type="bibr" rid="B33">33</xref>)</td>
<td valign="top" align="left">It is extremely suitable for segmenting medical images and can train from small-scale dataset with dedicated data augmentation.</td>
</tr>
<tr>
<td valign="top" align="left">VNet (<xref ref-type="bibr" rid="B34">34</xref>)</td>
<td valign="top" align="left">It is a variant of UNet and suitable for 3D image analysis.</td>
</tr>
<tr>
<td valign="top" align="left">UNet&#x0002B;&#x0002B; (<xref ref-type="bibr" rid="B35">35</xref>)</td>
<td valign="top" align="left">An advanced UNet structure, improving the performance on objects of varying size by unifying a set of UNet with different depth.</td>
</tr>
<tr>
<td valign="top" align="left">SETR (<xref ref-type="bibr" rid="B36">36</xref>)</td>
<td valign="top" align="left">A novel and accurate Transformer-based model on semantic segmentation task, without the need for convolution layer and resolution reduction.</td>
</tr>
<tr>
<td valign="top" align="left">Segmenter (<xref ref-type="bibr" rid="B37">37</xref>)</td>
<td valign="top" align="left">Applying Transformer structure to obtain global context information and achieving SOTA performance on ADE20K dataset.</td>
</tr>
<tr>
<td valign="top" align="left">SegFormer (<xref ref-type="bibr" rid="B38">38</xref>)</td>
<td valign="top" align="left">Simplifying the design of Transformer-based model, a lightweight multilayer perceptron decoder is proposed to avoid the complex design of decoder, without the need for positional encoding.</td>
</tr>
<tr>
<td valign="top" align="left">Swin-UNet (<xref ref-type="bibr" rid="B39">39</xref>)</td>
<td valign="top" align="left">A combination of UNet and Swin-Transformer, which is carefully designed for medical image segmentation, achieving high performance with small number of parameters.</td>
</tr>
<tr>
<td valign="top" align="left">MedT (<xref ref-type="bibr" rid="B40">40</xref>)</td>
<td valign="top" align="left">A Transformer-based medical image segmentation network without pre-training.</td>
</tr>
<tr>
<td valign="top" align="left">UNETR (<xref ref-type="bibr" rid="B41">41</xref>)</td>
<td valign="top" align="left">Effectively capturing the global and multi-scale information and achieving high performance on 3D brain tumors and spleen tasks.</td>
</tr>
<tr>
<td valign="top" align="left">MBT-Net (<xref ref-type="bibr" rid="B42">42</xref>)</td>
<td valign="top" align="left">Fully exploiting the global and local context information by Transformer and CNN respectively and achieving high performance on segmenting corneal endothelial cells.</td>
</tr>
<tr>
<td valign="top" align="left">TransUNet (<xref ref-type="bibr" rid="B43">43</xref>)</td>
<td valign="top" align="left">It combines the advantages of UNet and Transformer structure to make a strong method on many medical applications including multi-organ segmentation and cardiac segmentation tasks.</td>
</tr>
<tr>
<td valign="top" align="left">TransFuse (<xref ref-type="bibr" rid="B44">44</xref>)</td>
<td valign="top" align="left">It combines Transformers and CNNs in a parallel style, capturing both global and local information respectively, obtaining better results on both 2D and 3D medical image sets including polyp, skin lesion, hip, and prostate segmentation.</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Features of instance segmentation algorithms.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Models</bold></th>
<th valign="top" align="left"><bold>Features</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">YOLACT (<xref ref-type="bibr" rid="B45">45</xref>)</td>
<td valign="top" align="left">A real-time instance segmentation method, which achieves the mAP of 29.8% and reaches 33 fps on MSCOCO dataset.</td>
</tr>
<tr>
<td valign="top" align="left">YOLO (<xref ref-type="bibr" rid="B46">46</xref>)</td>
<td valign="top" align="left">It takes object detection tasks as a regression problem to spatially split bounding boxes and class probabilities, reaching very high speed on many tasks while having a comparable mAP.</td>
</tr>
<tr>
<td valign="top" align="left">SSD (<xref ref-type="bibr" rid="B47">47</xref>)</td>
<td valign="top" align="left">A fast object detection method that predicts bounding box location by regression and object class by classification, reaching faster speeds comparing to Faster-RCNN, without the need for bounding box proposal and pixel/feature resampling.</td>
</tr>
<tr>
<td valign="top" align="left">Mask RCNN (<xref ref-type="bibr" rid="B48">48</xref>)</td>
<td valign="top" align="left">Adding a mask branch to the detection Fast R-CNN, proposing RoIAlign for feature alignment.</td>
</tr>
<tr>
<td valign="top" align="left">PANet (<xref ref-type="bibr" rid="B49">49</xref>)</td>
<td valign="top" align="left">It proposes a new feature fusion strategy for multi-scale features and obtains the winner in the COCO 2017 Challenge Instance Segmentation task and the 2nd place in Object Detection task without large-batch training.</td>
</tr>
<tr>
<td valign="top" align="left">Cascade-RCNN (<xref ref-type="bibr" rid="B50">50</xref>)</td>
<td valign="top" align="left">Continuously optimizing the prediction results by cascading several detection networks with different IoU thresholds.</td>
</tr>
<tr>
<td valign="top" align="left">HTC (<xref ref-type="bibr" rid="B51">51</xref>)</td>
<td valign="top" align="left">Proposing a multi-task and multi-stage hybrid cascade structure and achieve high performance on many tasks.</td>
</tr>
<tr>
<td valign="top" align="left">SOLO (<xref ref-type="bibr" rid="B52">52</xref>)</td>
<td valign="top" align="left">An end-to-end detection-free instance segmentation method</td>
</tr>
<tr>
<td valign="top" align="left">DWT (<xref ref-type="bibr" rid="B53">53</xref>)</td>
<td valign="top" align="left">Combining the traditional watershed transform algorithm with the CNN model</td>
</tr>
<tr>
<td valign="top" align="left">DeepMask (<xref ref-type="bibr" rid="B54">54</xref>)</td>
<td valign="top" align="left">An earlier instance segmentation method, relatively low performance.</td>
</tr>
<tr>
<td valign="top" align="left">Cell-DETR (<xref ref-type="bibr" rid="B56">56</xref>)</td>
<td valign="top" align="left">The first Transformer-based instance segmentation method for biomedical data and SOTA performance.</td>
</tr>
<tr>
<td valign="top" align="left">ISTR (<xref ref-type="bibr" rid="B57">57</xref>)</td>
<td valign="top" align="left">It is the first end-to-end Transformer-based framework in instance segmentation task, predicting low-dimensional mask embeddings, and then matching with ground truth mask embeddings for loss computing.</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s4">
<title>Clinical Application of Automatic Image Segmentation in Stomatology</title>
<p>The segmentation of teeth, jaws and their related diseases is usually considered as a preprocessing step to complete tooth matching (<xref ref-type="bibr" rid="B59">59</xref>&#x02013;<xref ref-type="bibr" rid="B62">62</xref>), tooth numbering (<xref ref-type="bibr" rid="B63">63</xref>&#x02013;<xref ref-type="bibr" rid="B65">65</xref>), automatic marking of important anatomical markers, in addition to the intelligent diagnosis, classification, and prediction of diseases. Traditional methods for stomatological image segmentation include region-based (<xref ref-type="bibr" rid="B66">66</xref>), threshold-based (<xref ref-type="bibr" rid="B67">67</xref>), clustering-based (<xref ref-type="bibr" rid="B68">68</xref>), edge tracking (<xref ref-type="bibr" rid="B69">69</xref>), and watershed (<xref ref-type="bibr" rid="B8">8</xref>) methods. With the development of DL, many DL-based methods for stomatological image segmentation have been developed; these mainly focus on teeth, jaws, and their related diseases.</p>
<sec>
<title>Application to Teeth and Related Diseases</title>
<p>Automatic segmentation of teeth in stomatological images can contribute to the location of supernumerary teeth and impacted teeth, as well as digital restoration, digital orthodontics, and digital implant surgery. Automatic segmentation of caries and other related lesions is helpful for the early diagnosis of caries, particularly those that are easily missed, such as hidden caries and adjacent caries. At present, the types of medical image data that are commonly used for the segmentation of teeth and related lesions include panoramic radiography, CBCT, dental X-rays, and IOS.</p>
<p>Semantic segmentation is the most common type used for the DL-based automatic segmentation of teeth and their related diseases. This paper reviews 11 articles on semantic segmentation (<xref ref-type="table" rid="T4">Table 4</xref>), finding that semantic segmentation can mark the boundary between teeth and jaws, but the boundary is unclean, particularly for a malposed tooth that overlaps adjacent teeth. However, instance segmentation can distinguish different teeth in six relevant articles (<xref ref-type="table" rid="T5">Table 5</xref>). Compared with semantic segmentation, instance segmentation is better for marking the boundary of each tooth, but there are still some problems, such as the loss of data detail and small sample size, which may affect the accuracy of segmentation.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>Semantic segmentation in teeth and related diseases.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Study</bold></th>
<th valign="top" align="left"><bold>Year</bold></th>
<th valign="top" align="left"><bold>Algorithm</bold></th>
<th valign="top" align="left"><bold>Image type</bold></th>
<th valign="top" align="left"><bold>Images total</bold></th>
<th valign="top" align="left"><bold>Outcome metrics</bold></th>
<th valign="top" align="left"><bold>Performance</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Wirtz (<xref ref-type="bibr" rid="B70">70</xref>)</td>
<td valign="top" align="left">2018</td>
<td valign="top" align="left">UNet</td>
<td valign="top" align="left">Panoramic</td>
<td valign="top" align="left">24</td>
<td valign="top" align="left">Accuracy, Specificity, Precision, Recall, F1-Score, DSC</td>
<td valign="top" align="left">0.818, 0.799, 0.790, 0.827, 0.803, 0.744</td>
</tr>
<tr>
<td valign="top" align="left">Koch (<xref ref-type="bibr" rid="B71">71</xref>)</td>
<td valign="top" align="left">2019</td>
<td valign="top" align="left">UNet</td>
<td valign="top" align="left">Panoramic</td>
<td valign="top" align="left">1500</td>
<td valign="top" align="left">DSC</td>
<td valign="top" align="left">0.934</td>
</tr>
<tr>
<td valign="top" align="left">Sivagami (<xref ref-type="bibr" rid="B72">72</xref>)</td>
<td valign="top" align="left">2020</td>
<td valign="top" align="left">UNet</td>
<td valign="top" align="left">Panoramic</td>
<td valign="top" align="left">1171</td>
<td valign="top" align="left">Accuracy, Specificity, Precision, Recall, F1-Score, DSC</td>
<td valign="top" align="left">0.97, 0.95, 0.93, 0.94, 0.93, 0.94</td>
</tr>
<tr>
<td valign="top" align="left">Choi (<xref ref-type="bibr" rid="B73">73</xref>)</td>
<td valign="top" align="left">2016</td>
<td valign="top" align="left">FCN</td>
<td valign="top" align="left">dental X-ray</td>
<td valign="top" align="left">475</td>
<td valign="top" align="left">F1-score</td>
<td valign="top" align="left">0.74,</td>
</tr>
<tr>
<td valign="top" align="left">Cui (<xref ref-type="bibr" rid="B74">74</xref>)</td>
<td valign="top" align="left">2021</td>
<td valign="top" align="left">ToothPix</td>
<td valign="top" align="left">Panoramic</td>
<td valign="top" align="left">1500</td>
<td valign="top" align="left">IOU, Accuracy, Specificity, Precision, Recall, F1-score</td>
<td valign="top" align="left">0.9042, 0.9808, 0.9852, 0.9407, 0.9591, 0.9486</td>
</tr>
<tr>
<td valign="top" align="left">Zakirov (<xref ref-type="bibr" rid="B75">75</xref>)</td>
<td valign="top" align="left">2018</td>
<td valign="top" align="left">VNet</td>
<td valign="top" align="left">CBCT</td>
<td valign="top" align="left">517</td>
<td valign="top" align="left">IOU, Accuracy</td>
<td valign="top" align="left">0.963, 0.96</td>
</tr>
<tr>
<td valign="top" align="left">Chen (<xref ref-type="bibr" rid="B76">76</xref>)</td>
<td valign="top" align="left">2020</td>
<td valign="top" align="left">FCN&#x0002B;MWT</td>
<td valign="top" align="left">CBCT</td>
<td valign="top" align="left">25</td>
<td valign="top" align="left">DSC, Jaccard, RVD, ASSD</td>
<td valign="top" align="left">0.936, 0.881, 0.072, 0.363 mm</td>
</tr>
<tr>
<td valign="top" align="left">Lee (<xref ref-type="bibr" rid="B77">77</xref>)</td>
<td valign="top" align="left">2020</td>
<td valign="top" align="left">CNN</td>
<td valign="top" align="left">CBCT</td>
<td valign="top" align="left">102</td>
<td valign="top" align="left">DSC, Recall, Precision</td>
<td valign="top" align="left">Validation set: 0.938, 0.952, 0.924;<break/>Testing set: 0.918, 0.932, 0.904</td>
</tr>
<tr>
<td valign="top" align="left">Rao (<xref ref-type="bibr" rid="B78">78</xref>)</td>
<td valign="top" align="left">2020</td>
<td valign="top" align="left">UNet&#x0002B;DCRF</td>
<td valign="top" align="left">CBCT</td>
<td valign="top" align="left">110</td>
<td valign="top" align="left">VD, DSC, ASSD, MSSD</td>
<td valign="top" align="left">18.86 mm<sup>3</sup>, 0.9166, 0.25 mm, 1.18 mm</td>
</tr>
<tr>
<td valign="top" align="left">Ezhov (<xref ref-type="bibr" rid="B79">79</xref>)</td>
<td valign="top" align="left">2019</td>
<td valign="top" align="left">VNet</td>
<td valign="top" align="left">CBCT</td>
<td valign="top" align="left">935</td>
<td valign="top" align="left">IOU, ASD</td>
<td valign="top" align="left">0.94, 0.17 mm</td>
</tr>
<tr>
<td valign="top" align="left">Zanjani (<xref ref-type="bibr" rid="B80">80</xref>)</td>
<td valign="top" align="left">2019</td>
<td valign="top" align="left">PointCNN</td>
<td valign="top" align="left">IOS</td>
<td valign="top" align="left">120</td>
<td valign="top" align="left">IOU</td>
<td valign="top" align="left">0.94</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption><p>Instance segmentation in teeth and related diseases.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Study</bold></th>
<th valign="top" align="left"><bold>Year</bold></th>
<th valign="top" align="left"><bold>Algorithm</bold></th>
<th valign="top" align="left"><bold>Image type</bold></th>
<th valign="top" align="left"><bold>Images total</bold></th>
<th valign="top" align="left"><bold>Outcome metrics</bold></th>
<th valign="top" align="left"><bold>Performance</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Jader (<xref ref-type="bibr" rid="B81">81</xref>)</td>
<td valign="top" align="left">2018</td>
<td valign="top" align="left">mask RCNN</td>
<td valign="top" align="left">Panoramic</td>
<td valign="top" align="left">193</td>
<td valign="top" align="left">Accuracy, Specificity, Precision, Recall, F1-score</td>
<td valign="top" align="left">0.98, 0.99, 0.94, 0.84, 0.88</td>
</tr>
<tr>
<td valign="top" align="left">Silva (<xref ref-type="bibr" rid="B65">65</xref>)</td>
<td valign="top" align="left">2020</td>
<td valign="top" align="left">Mask RCNN, HTC, ResNeSt, PANet (best)</td>
<td valign="top" align="left">Panoramic</td>
<td valign="top" align="left">1,500</td>
<td valign="top" align="left">Accuracy, specificity, precision, recall, F1-Score</td>
<td valign="top" align="left">PANet: 0.967, 0.987, 0.944, 0.891, 0.916</td>
</tr>
<tr>
<td valign="top" align="left">Gurses (<xref ref-type="bibr" rid="B82">82</xref>)</td>
<td valign="top" align="left">2020</td>
<td valign="top" align="left">Mask RCNN&#x0002B; SURF</td>
<td valign="top" align="left">Panoramic</td>
<td valign="top" align="left">580</td>
<td valign="top" align="left">Jaccard, Precision, Recall, F1-score, Rank-1 accuracy</td>
<td valign="top" align="left">0.82, 0.93, 0.91, 0.95, 0.8039</td>
</tr>
<tr>
<td valign="top" align="left">Wu (<xref ref-type="bibr" rid="B83">83</xref>)</td>
<td valign="top" align="left">2020</td>
<td valign="top" align="left">GH &#x0002B; BADice-DenseASPP-UNet &#x0002B; LO</td>
<td valign="top" align="left">CBCT</td>
<td valign="top" align="left">20</td>
<td valign="top" align="left">DSC, ASD, FA, DA</td>
<td valign="top" align="left">0.962, 0.122, 0.991, 0.995</td>
</tr>
<tr>
<td valign="top" align="left">Cui (<xref ref-type="bibr" rid="B84">84</xref>)</td>
<td valign="top" align="left">2019</td>
<td valign="top" align="left">ToothNet</td>
<td valign="top" align="left">CBCT</td>
<td valign="top" align="left">20</td>
<td valign="top" align="left">DSC</td>
<td valign="top" align="left">0.9264</td>
</tr>
<tr>
<td valign="top" align="left">Zanjani (<xref ref-type="bibr" rid="B85">85</xref>)</td>
<td valign="top" align="left">2021</td>
<td valign="top" align="left">Mask-MCNet</td>
<td valign="top" align="left">IOS</td>
<td valign="top" align="left">164</td>
<td valign="top" align="left">mIOU, mAP, mAR</td>
<td valign="top" align="left">0.98, 0.98, 0.97</td>
</tr>
</tbody>
</table>
</table-wrap>
<sec>
<title>Semantic Segmentation in Teeth and Related Diseases</title>
<p>For 2D images, different models can be trained to segment different areas, such as all teeth (<xref ref-type="bibr" rid="B70">70</xref>&#x02013;<xref ref-type="bibr" rid="B72">72</xref>) or adjacent caries (<xref ref-type="bibr" rid="B73">73</xref>), depending on the artificially defined foreground. Wirtz (<xref ref-type="bibr" rid="B70">70</xref>), Koch (<xref ref-type="bibr" rid="B71">71</xref>), and Sevagami (<xref ref-type="bibr" rid="B72">72</xref>) all used the UNet network for the automatic segmentation of teeth from panoramic radiography. Wirtz et al. (<xref ref-type="bibr" rid="B70">70</xref>) also used the method to segment teeth in complex cases such as tooth loss, defect, filling, and fixed bridge restoration, achieving a Dice similarity coefficient (DSC) of 0.744 on their dataset. Koch et al. (<xref ref-type="bibr" rid="B71">71</xref>) proved that UNet improves the segmentation performance by exploiting data symmetry, an ensemble of the network, test-time augmentation, and bootstrapping; they measured a DSC of 0.934 on the dataset created by Silva (<xref ref-type="bibr" rid="B86">86</xref>). Sevagami et al. (<xref ref-type="bibr" rid="B72">72</xref>) believed that UNet could work well without dense connections, residual connections, or the Inception module; the DSC on the dataset obtained from Ivisionlab (<xref ref-type="bibr" rid="B81">81</xref>) was 0.94. Cui et al. (<xref ref-type="bibr" rid="B74">74</xref>) used the generative adversarial network to exploit comprehensive semantic information for tooth segmentation, with an IoU of 0.9042 on the LNDb dental dataset. Choi et al. (<xref ref-type="bibr" rid="B73">73</xref>) first aligned the teeth horizontally, generated the probability map of dental caries in periapical images with FCN, then extracted the crowns, and finally refined the caries probability map, to achieve automatic detection and segmentation of adjacent dental caries. The F1-score was 0.74 on their own dataset. These applications all perform semantic segmentation of 2D images; the only differences are the artificial definition of the foreground and the choice of semantic segmentation model.</p>
<p>Most 3D images of teeth originate from CBCT data, and semantic segmentation of these 3D images requires a 3D semantic segmentation network, such as VNet (<xref ref-type="bibr" rid="B75">75</xref>), multi-task 3DFCN and marker-controlled Watershed transform (MWT) (<xref ref-type="bibr" rid="B76">76</xref>), modified UNet (<xref ref-type="bibr" rid="B77">77</xref>), and the symmetric fully convolutional residual network with DCRF (<xref ref-type="bibr" rid="B78">78</xref>). Ezhov et al. (<xref ref-type="bibr" rid="B79">79</xref>) proposed a coarse-fine network structure to refine the volumetric segmentation of teeth, with an IoU of 0.94. The segmentation results can be used for applications such as tooth volume prediction (<xref ref-type="bibr" rid="B75">75</xref>), panoramic reconstruction (<xref ref-type="bibr" rid="B75">75</xref>), digital orthodontic simulation (<xref ref-type="bibr" rid="B76">76</xref>), and dental implant design (<xref ref-type="bibr" rid="B77">77</xref>).</p>
<p>The gingival tissue cannot be shown on the panoramic radiography or CBCT image, but it is very important for clarifying the relationship between the tooth and gingiva for digital restoration, implant, and orthodontics. For this reason, another imaging method has emerged in stomatology, namely, IOS, which can obtain real-time 3D data (which are point cloud data) of teeth and soft tissues. Zanjani et al. (<xref ref-type="bibr" rid="B80">80</xref>) proposed an end-to-end learning framework for semantic segmentation of individual teeth and gingivae from IOS data. This method was based on PointCNN; it used a non-uniform resampling mechanism and a compatible loss weighting to improve performance; it achieved an IoU of 0.94 on the own dataset of the authors.</p>
<p>The performance of the above methods is shown in <xref ref-type="table" rid="T4">Table 4</xref>.</p>
</sec>
<sec>
<title>Instance Segmentation in Teeth and Related Diseases</title>
<p>Instance segmentation can mark both the boundaries between different categories in the image, such as the boundaries between teeth and jaws and the boundaries of different individuals in the same category, such as the boundaries between different teeth.</p>
<p>Tooth instance segmentation from panoramic radiography is a common task in dentistry. Using the Mask R-CNN algorithm, Jader et al. (<xref ref-type="bibr" rid="B81">81</xref>) performed instance segmentation of teeth from panoramic images, using the transfer learning strategy to solve the problem of insufficient annotated data, and proposing a data augmentation method by separating the teeth; they achieved accurate segmentation, with an F1-score of 0.88 on the dataset (<xref ref-type="bibr" rid="B86">86</xref>). Silva et al. (<xref ref-type="bibr" rid="B65">65</xref>) analyzed the segmentation, detection, and tooth number performance of four end-to-end deep neural network frameworks, namely Mask R-CNN, PANet, HTC, and ResNeSt, on a challenging panoramic radiography dataset. Of these algorithms, PANet achieved the best segmentation performance, with an F1-score of 0.916 on the UFBA-UESC Dental Images Deep dataset. Gurses et al. (<xref ref-type="bibr" rid="B82">82</xref>) proposed a method for human identification from panoramic dental images using Mask R-CNN and SURF; they used two datasets [DS1: part of the dataset from (<xref ref-type="bibr" rid="B81">81</xref>), DS2: their own dataset], achieving an F1-score of 0.95.</p>
<p>The tooth structure from 3D data is clearer, which is an important clinical advantage in instance segmentation of teeth. Wu et al. (<xref ref-type="bibr" rid="B83">83</xref>) used a two-stage deep neural network, which included a global stage (heatmap regression UNet) to guide the localization of tooth ROIs together with a local stage (ROI-based DenseASPP-UNet) for fine segmentation and classification, to perform tooth instance segmentation from CBCT; they achieved a DSC of 0.962 on their own dataset. Cui et al. (<xref ref-type="bibr" rid="B84">84</xref>) proposed a two-stage automatic instance segmentation method (ToothNet), based on a deep CNN for CBCT images, which obtained a good result, with a DSC of 0.9264 on their own dataset. It exploited a novel learned edge map, similarity matrix, and the spatial relations between different teeth.</p>
<p>Tooth instance segmentation with IOS is also an important research direction. Zanjani et al. (<xref ref-type="bibr" rid="B85">85</xref>) proposed a model named Mask-MCNet, for instance segmentation of teeth from IOS data. This model positioned each tooth by predicting its 3D bounding box, and simultaneously segmented points belonging to the individual tooth without using voxelization or subsampling techniques. The model could also preserve the fine detail in the data, enabling the highly accurate segmentation required in clinical practice, and obtaining results in just a few seconds of processing time. On their own dataset, the mIOU achieved was 0.98.</p>
<p>The performance of the above methods is shown in <xref ref-type="table" rid="T5">Table 5</xref>.</p>
</sec>
</sec>
<sec>
<title>Application in Jaws and Related Diseases</title>
<p>There are many types of jaw diseases; moreover, the number of benign and malignant samples is unbalanced, which may easily cause missed diagnoses and misdiagnoses. Minimally invasive and precise treatment of these diseases generally requires precise location of lesions through preoperative planning and accurate intraoperative image guidance. Patients with craniomaxillofacial malformations require intelligent 3D symmetry analysis. The analysis of postoperative efficacy requires both subjective evaluations by doctors and patients, in addition to the quantitative and objective evaluation of the progression and outcome of lesions. Precise segmentation of the jaw or related diseases is important for clinical diagnosis and treatment. In the past, manual segmentation of the jaw and its lesions was time-consuming and laborious. Since the development of DL, researchers have used DL methods to learn the features of the jaw and its lesions, realizing automatic segmentation. The main sources of medical image data used for automatic segmentation are panoramic radiography, CBCT, and MSCT.</p>
<p>Six relevant articles showed that current DL-based automatic segmentation methods for jaws and related diseases mainly focus on semantic segmentation (<xref ref-type="table" rid="T6">Table 6</xref>). There are two factors that affect the segmentation performance: (1) The space between the mandibular condyle and the temporal articular surface is very small and contains an articular disc, which often affects the accuracy of mandibular segmentation. (2) The segmentation accuracy of jaw or teeth in the occlusion and non-occlusion scenarios can be different because of the influence of the contact between upper and lower teeth.</p>
<table-wrap position="float" id="T6">
<label>Table 6</label>
<caption><p>Semantic segmentation in the jaw.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Study</bold></th>
<th valign="top" align="left"><bold>Year</bold></th>
<th valign="top" align="left"><bold>Algorithm</bold></th>
<th valign="top" align="left"><bold>Image type</bold></th>
<th valign="top" align="left"><bold>Images total</bold></th>
<th valign="top" align="left"><bold>Outcome metrics</bold></th>
<th valign="top" align="left"><bold>Performance</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Kong (<xref ref-type="bibr" rid="B87">87</xref>)</td>
<td valign="top" align="left">2020</td>
<td valign="top" align="left">UNet</td>
<td valign="top" align="left">Panoramic</td>
<td valign="top" align="left">2602</td>
<td valign="top" align="left">Accuracy, Jaccard, HD, PPS, Para(M)</td>
<td valign="top" align="left">0.9928, 0.9829, 8.32, 41.0, 0.92</td>
</tr>
<tr>
<td valign="top" align="left">Li (<xref ref-type="bibr" rid="B88">88</xref>)</td>
<td valign="top" align="left">2020</td>
<td valign="top" align="left">Deetal-Perio (based-Mask RCNN)</td>
<td valign="top" align="left">Panoramic</td>
<td valign="top" align="left">470</td>
<td valign="top" align="left">mAP, DSC (all), DSC (single), F1-score, Accuracy</td>
<td valign="top" align="left">Suzhou dataset: 0.826, 0.868, 0.778, 0.878, 0.884;Zhongshan dataset: 0.841, 0.852, 0.748, 0.454, 0.817</td>
</tr>
<tr>
<td valign="top" align="left">Egger (<xref ref-type="bibr" rid="B89">89</xref>)</td>
<td valign="top" align="left">2018</td>
<td valign="top" align="left">FCN-32s, FCN-16s, FCN-8s (best)</td>
<td valign="top" align="left">MSCT</td>
<td valign="top" align="left">20</td>
<td valign="top" align="left">DSC</td>
<td valign="top" align="left">FCN-8s: 0.9203</td>
</tr>
<tr>
<td valign="top" align="left">Zhang (<xref ref-type="bibr" rid="B90">90</xref>)</td>
<td valign="top" align="left">2018</td>
<td valign="top" align="left">UNet</td>
<td valign="top" align="left">CBCT, MSCT</td>
<td valign="top" align="left">CBCT(77), MSCT(30)</td>
<td valign="top" align="left">DSC, SEN, PPV</td>
<td valign="top" align="left">Midface: 0.9319, 0.9282, 0.9361, Mandible: 0.9327, 0.9363, 0.9293</td>
</tr>
<tr>
<td valign="top" align="left">Torosdagli (<xref ref-type="bibr" rid="B91">91</xref>)</td>
<td valign="top" align="left">2019</td>
<td valign="top" align="left">Tiramisu (based on UNet and DenseNET)</td>
<td valign="top" align="left">CBCT</td>
<td valign="top" align="left">50</td>
<td valign="top" align="left">DSC</td>
<td valign="top" align="left">0.9382</td>
</tr>
<tr>
<td valign="top" align="left">Lian (<xref ref-type="bibr" rid="B92">92</xref>)</td>
<td valign="top" align="left">2020</td>
<td valign="top" align="left">DTNet</td>
<td valign="top" align="left">CBCT, MSCT</td>
<td valign="top" align="left">CBCT(77), MSCT(63)</td>
<td valign="top" align="left">DSC, SEN, PPV</td>
<td valign="top" align="left">0.9395, 0.9424, 0.9368</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>For panoramic radiography, Kong et al. (<xref ref-type="bibr" rid="B87">87</xref>) adopted the UNet structure for rapid and accurate segmentation of the maxillofacial region, with an accuracy of 0.9928 on their own dataset. Li et al. (<xref ref-type="bibr" rid="B88">88</xref>) proposed the Deetal-Perio method to predict the severity of periodontitis from panoramic radiography. To calculate alveolar bone absorption, Deetal-Perio first segmented and indexed the individual tooth by using Mask R-CNN with a novel calibration method. It then segmented the contour of the alveolar bone and calculated a ratio of individual teeth, to represent the alveolar bone absorption. Finally, Deetal-Perio predicted the severity of periodontitis according to the ratios of all the teeth. They adopted two datasets, namely the Suzhou dataset and Zhongshan dataset, with DSCs of 0.868 and 0.852, respectively. Egger et al. (<xref ref-type="bibr" rid="B89">89</xref>) automatically segmented the mandible by using FCN and carefully evaluated the mandible segmentation algorithm. Ten CT datasets were included and three network architectures, namely, FCN-32s, FCN-16s, and FCN-8s, were used; FCN-8s achieved the best performance, with a DSC of 0.9203. Zhang et al. (<xref ref-type="bibr" rid="B90">90</xref>) introduced a context-FCN for joint craniomaxillofacial bone segmentation and landmark digitization; the DSCs of midface and mandible on their own dataset were 0.9319 and 0.9327, respectively. Torosdagli et al. (<xref ref-type="bibr" rid="B91">91</xref>) presented a new segmentation network for the mandible, based on a unified algorithm combining UNet and DenseNET. Compared with the most advanced mandible segmentation methods, this method achieved better performance on craniofacial abnormalities and disease states; the DSC was 0.9382 on their own CBCT dataset and 0.9386 on the MICCAI Head and Neck Challenge 2015 dataset. Lian et al. (<xref ref-type="bibr" rid="B92">92</xref>) introduced an efficient end-to-end deep network, the multi-task dynamic transformer network (DTNet), to perform concurrent mandible segmentation and large-scale landmark localization in one pass, for large-volume CBCT images. The network contributed to the quantitative analysis of craniomaxillofacial deformities. The DSC achieved was 0.9395 on the own dataset of the author.</p>
<p>The performance of the above methods is shown in <xref ref-type="table" rid="T6">Table 6</xref>.</p>
</sec>
</sec>
<sec id="s5">
<title>Trends and Future Work</title>
<sec>
<title>Integration and Improvement of Data Quality</title>
<p>As a consequence of the need to protect patient privacy and the right to be informed, there are not always enough cases to establish a large-scale dataset dedicated to the segmentation of stomatological images. Moreover, the collected data are usually taken from diverse hospitals and machines, which further increases the difficulty of formulating universal benchmarks. Therefore, methods to effectively integrate, store, and safely share these data are of vital importance and are urgently required. The establishment of a shared dento-maxillofacial database can help to solve this problem, to some extent. Differences in the data acquisition settings and conditions (such as exposure time) used in each hospital lead to variation in the quality of image data (such as contrast and signal-to-noise ratio), which affects the accuracy and robustness of image segmentation. Further study should focus on standardizing and normalizing image data and improving data quality.</p>
<p>Currently, most data used for image segmentation in stomatology are produced by a single modality (single CBCT or MSCT). In the future, multi-modality data of the same case can be employed collaboratively, to fully exploit the correlative and complementary essence among these modalities; this may further boost the performance.</p>
</sec>
<sec>
<title>Model Design in the Fully Supervised Case</title>
<p>The most common methods of image segmentation in stomatology are built on top of a CNN. However, the transformer structure has been gradually emerging in the field of computer vision because of its global modeling ability. It has been applied to mandible segmentation by researchers, outperforming the current CNN-based model. Transformer-based methods have the potential to obtain satisfactory results in medical image segmentation in the future. In addition, how to reduce the number of model parameters while ensuring accuracy is an active research topic, which is particularly important for the deployment of the medical image segmentation model and the promotion of related technology in clinical settings.</p>
</sec>
<sec>
<title>Model Design for Insufficient Data Annotation</title>
<p>The existing DL-based algorithm relies heavily on large-scale data to learn discriminative features, but the process of labeling stomatological data is time-consuming and labor-intensive; therefore, how to learn effectively from an insufficient and imperfect dataset is an active research topic. There are several ways to solve such problems. First, to reduce the burden of the time-consuming and labor-intensive annotation process, for complete annotation data, weakly supervised and semi-supervised methods can be adopted. Second, to handle the existence of noise in manual labeling, algorithms that learn from noisy labels can be employed. Third, to solve the problem that existing methods cannot be generalized to new categories, techniques such as transfer learning, domain adaptation, and few-shot learning can be considered. In addition, unsupervised learning and self-supervised learning could also be used to explore the structural properties inside the dental image itself, providing a better prior for downstream tasks.</p>
</sec>
<sec>
<title>Interpretability of Deep Learning</title>
<p>Although existing DL methods have shown excellent performance in stomatology, they have not been widely promoted because of the limitations of DL. In addition to the high computing cost and the need for large-scale datasets, the &#x0201C;black box&#x0201D; characteristic of DL methods is the main factor that hinders their application. To gain the trust of doctors, regulators, and patients, a medical diagnostic system must be transparent, interpretable, and explicable. Ideally, it should explain the complete logic of how a decision is made. Therefore, research on interpretability is most urgently needed for the application of DL to clinical diagnosis and treatment.</p>
</sec>
<sec>
<title>Clinical Application in Stomatology</title>
<p>First, from the perspective of what to segment, current studies focus mainly on teeth and teeth-related diseases, whereas little attention is paid to jaw and jaw-related diseases, particularly for soft tissue and related diseases. However, studies on the latter aspects have more clinical significance, so more studies of this type are required. Second, the accuracy of image segmentation is the key to whether the DL methods can be applied clinically. Therefore, more studies are needed to enhance the accuracy and precision of image segmentation, to promote its use in the clinic. Finally, the first step of digital surgical technologies, such as guide templates, surgical navigation, and augmented reality technology, is to segment important structures or lesions, for which the traditional manual segmentation method is currently mainly used. In the future, DL-based automatic segmentation methods could be integrated with these technologies, to assist clinical practice more accurately and efficiently.</p>
</sec>
</sec>
<sec sec-type="conclusions" id="s6">
<title>Conclusion</title>
<p>This paper comprehensively reviews automatic segmentation algorithms based on DL and introduces their clinical applications. The review shows that DL has great potential to segment stomatological images accurately, and this can further promote the transformation of clinical practice from experientialism to digitization, precision, and individuation. In the future, more research is needed to further improve the accuracy of automatic image segmentation and realize intelligence.</p>
</sec>
<sec id="s7">
<title>Author Contributions</title>
<p>DL, WZ, JC, and WT contributed to the conception and design of the study. DL wrote the first draft of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.</p>
</sec>
<sec sec-type="funding-information" id="s8">
<title>Funding</title>
<p>This study was supported by Sichuan Province Regional Innovation Cooperation Project (2020YFQ0012).</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x00027;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Karayegen</surname> <given-names>G</given-names></name> <name><surname>Aksahin</surname> <given-names>MF</given-names></name></person-group>. <article-title>Brain tumor prediction on MR images with semantic segmentation by using deep learning network and 3D imaging of tumor region</article-title>. <source>Biomed Signal Proces and Control.</source> (<year>2021</year>) <volume>66</volume>:<fpage>102458</fpage>. <pub-id pub-id-type="doi">10.1016/j.bspc.2021.102458</pub-id></citation>
</ref>
<ref id="B2">
<label>2.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Atli</surname> <given-names>I</given-names></name> <name><surname>Gedik O</surname> <given-names>S</given-names></name></person-group>. <article-title>Sine-Net: a fully convolutional deep learning architecture for retinal blood vessel segmentation</article-title>. <source>Eng Sci Technol an Int J.</source> (<year>2021</year>) <volume>24</volume>:<fpage>271</fpage>&#x02013;<lpage>83</lpage>. <pub-id pub-id-type="doi">10.1016/j.jestch.2020.07.008</pub-id></citation>
</ref>
<ref id="B3">
<label>3.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Messay</surname> <given-names>T</given-names></name> <name><surname>Hardie</surname> <given-names>RC</given-names></name> <name><surname>Tuinstra</surname> <given-names>TR</given-names></name></person-group>. <article-title>Segmentation of pulmonary nodules in computed tomography using a regression neural network approach and its application to the lung image database consortium and image database resource initiative dataset</article-title>. <source>Med Image Anal.</source> (<year>2015</year>) <volume>22</volume>:<fpage>48</fpage>&#x02013;<lpage>62</lpage>. <pub-id pub-id-type="doi">10.1016/j.media.2015.02.002</pub-id><pub-id pub-id-type="pmid">25791434</pub-id></citation></ref>
<ref id="B4">
<label>4.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ambellan</surname> <given-names>F</given-names></name> <name><surname>Tack</surname> <given-names>A</given-names></name> <name><surname>Ehlke</surname> <given-names>M</given-names></name> <name><surname>Zachow</surname> <given-names>S</given-names></name></person-group>. <article-title>Automated segmentation of knee bone and cartilage combining statistical shape knowledge and convolutional neural networks: data from the osteoarthritis initiative</article-title>. <source>Med Image Anal.</source> (<year>2019</year>) <volume>52</volume>:<fpage>109</fpage>&#x02013;<lpage>18</lpage>. <pub-id pub-id-type="doi">10.1016/j.media.2018.11.009</pub-id><pub-id pub-id-type="pmid">30529224</pub-id></citation></ref>
<ref id="B5">
<label>5.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hojjatoleslami</surname> <given-names>SA</given-names></name> <name><surname>Kruggel</surname> <given-names>F</given-names></name></person-group>. <article-title>Segmentation of large brain lesions</article-title>. <source>IEEE Trans on Med Imaging.</source> (<year>2001</year>) <volume>20</volume>:<fpage>666</fpage>&#x02013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1109/42.932750</pub-id><pub-id pub-id-type="pmid">11465472</pub-id></citation></ref>
<ref id="B6">
<label>6.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alsmadi</surname> <given-names>MK</given-names></name></person-group>. <article-title>A hybrid Fuzzy C-Means and neutrosophic for jaw lesions segmentation</article-title>. <source>Ain Shams Eng J.</source> (<year>2018</year>) <volume>9</volume>:<fpage>697</fpage>&#x02013;<lpage>706</lpage>. <pub-id pub-id-type="doi">10.1016/j.asej.2016.03.016</pub-id></citation>
</ref>
<ref id="B7">
<label>7.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>H</given-names></name> <name><surname>Sun</surname> <given-names>G</given-names></name> <name><surname>Sun</surname> <given-names>H</given-names></name> <name><surname>Liu</surname> <given-names>W</given-names></name></person-group>. <article-title>Watershed algorithm based on morphology for dental X-ray images segmentation[C]//2012 IEEE 11th international conference on signal processing</article-title>. <source>IEEE.</source> (<year>2012</year>) <volume>2</volume>:<fpage>877</fpage>&#x02013;<lpage>80</lpage>. <pub-id pub-id-type="doi">10.1109/ICoSP.2012.6491720</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B8">
<label>8.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Devlin</surname> <given-names>J</given-names></name> <name><surname>Chang</surname> <given-names>MW</given-names></name> <name><surname>Lee</surname> <given-names>K</given-names></name> <name><surname>Toutanova</surname> <given-names>K</given-names></name></person-group>. <source>Bert: Pre-Training of Deep Bidirectional Transformers For Language Understanding. arXiv [Preprint]</source>. <volume>arXiv</volume>:<fpage>1810.04805</fpage> (<year>2018</year>).</citation>
</ref>
<ref id="B9">
<label>9.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krizhevsky</surname> <given-names>A</given-names></name> <name><surname>Sutskever</surname> <given-names>I</given-names></name> <name><surname>Hinton</surname> <given-names>GE</given-names></name></person-group>. <article-title>Imagenet classification with deep convolutional neural networks</article-title>. <source>Adv Neural Inf Process Syst.</source> (<year>2012</year>) <volume>25</volume>:<fpage>1097</fpage>&#x02013;<lpage>105</lpage>. <pub-id pub-id-type="doi">10.1145/3065386</pub-id></citation>
</ref>
<ref id="B10">
<label>10.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Simonyan</surname> <given-names>K</given-names></name> <name><surname>Zisserman</surname> <given-names>A</given-names></name></person-group>. <source>Very Deep Convolutional Networks For Large-Scale Image Recognition</source>. <publisher-loc>San Diego, CA</publisher-loc>: <publisher-name>ICLR 2015</publisher-name> (<year>2014</year>). arXiv [Preprint]. <volume>arXiv</volume>:<fpage>1409.1556</fpage> (<year>2014</year>).</citation>
</ref>
<ref id="B11">
<label>11.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Szegedy</surname> <given-names>C</given-names></name> <name><surname>Liu</surname> <given-names>W</given-names></name> <name><surname>Jia</surname> <given-names>Y</given-names></name> <name><surname>Sermanet</surname> <given-names>P</given-names></name> <name><surname>Reed</surname> <given-names>S</given-names></name> <name><surname>Anguelov</surname> <given-names>D</given-names></name> <etal/></person-group>. <article-title>Going deeper with convolutions</article-title>. In: <source>Proceedings of the IEEE conference on computer vision and pattern recognition</source>. <publisher-loc>Boston, MA, USA</publisher-loc> (<year>2015</year>).</citation>
</ref>
<ref id="B12">
<label>12.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Szegedy</surname> <given-names>C</given-names></name> <name><surname>Vanhoucke</surname> <given-names>V</given-names></name> <name><surname>Ioffe</surname> <given-names>S</given-names></name> <name><surname>Shlens</surname> <given-names>J</given-names></name> <name><surname>Wojna</surname> <given-names>Z</given-names></name></person-group>. <article-title>Rethinking the inception architecture for computer vision</article-title>. In: <source>Proceedings of the IEEE conference on computer vision and pattern recognition</source>. <publisher-loc>Las Vegas, NV, USA</publisher-loc> (<year>2016</year>).</citation>
</ref>
<ref id="B13">
<label>13.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>He</surname> <given-names>K</given-names></name> <name><surname>Zhang</surname> <given-names>X</given-names></name> <name><surname>Ren</surname> <given-names>S</given-names></name> <name><surname>Sun</surname> <given-names>J</given-names></name></person-group>. <article-title>Deep residual learning for image recognition</article-title>. In: <source>Proceedings of the IEEE conference on computer vision and pattern recognition</source>. <publisher-loc>Las Vegas, NV, USA</publisher-loc> (<year>2016</year>). <pub-id pub-id-type="pmid">32166560</pub-id></citation></ref>
<ref id="B14">
<label>14.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>G</given-names></name> <name><surname>Liu</surname> <given-names>Z</given-names></name> <name><surname>Van Der Maaten</surname> <given-names>L</given-names></name> <name><surname>Weinberger K</surname> <given-names>Q</given-names></name></person-group>. <article-title>Densely connected convolutional networks</article-title>. In: <source>Proceedings of the IEEE conference on computer vision and pattern recognition</source>. <publisher-loc>Honolulu, HI, USA</publisher-loc> (<year>2017</year>).</citation>
</ref>
<ref id="B15">
<label>15.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>D</given-names></name> <name><surname>Xu</surname> <given-names>Q</given-names></name> <name><surname>Guo</surname> <given-names>H</given-names></name> <name><surname>Zhao</surname> <given-names>C</given-names></name> <name><surname>Lin</surname> <given-names>Y</given-names></name> <name><surname>Li</surname> <given-names>D</given-names></name></person-group>. <article-title>An efficient and lightweight convolutional neural network for remote sensing image scene classification</article-title>. <source>Sensors.</source> (<year>2020</year>) <volume>20</volume>:<fpage>1999</fpage>. <pub-id pub-id-type="doi">10.3390/s20071999</pub-id><pub-id pub-id-type="pmid">32252483</pub-id></citation></ref>
<ref id="B16">
<label>16.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>X</given-names></name> <name><surname>Zhou</surname> <given-names>X</given-names></name> <name><surname>Lin</surname> <given-names>M</given-names></name> <name><surname>Sun</surname> <given-names>J</given-names></name></person-group>. <article-title>Shufflenet: an extremely efficient convolutional neural network for mobile devices</article-title>. In: <source>Proceedings of the IEEE conference on computer vision and pattern recognition</source>. <publisher-loc>Salt Lake City, UT, USA</publisher-loc> (<year>2018</year>).</citation>
</ref>
<ref id="B17">
<label>17.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Tan</surname> <given-names>M</given-names></name> <name><surname>Le</surname> <given-names>Q</given-names></name></person-group>. <article-title>Efficientnet: rethinking model scaling for convolutional neural networks</article-title>. In: <source>International Conference on Machine Learning</source>. <publisher-loc>Long Beach, CA</publisher-loc>: <publisher-name>PMLR</publisher-name> (<year>2019</year>).</citation>
</ref>
<ref id="B18">
<label>18.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Vaswani</surname> <given-names>A</given-names></name> <name><surname>Shazeer</surname> <given-names>N</given-names></name> <name><surname>Parmar</surname> <given-names>N</given-names></name> <name><surname>Uszkoreit</surname> <given-names>J</given-names></name> <name><surname>Jones</surname> <given-names>L</given-names></name> <name><surname>Gomez</surname> <given-names>AN</given-names></name> <etal/></person-group>. <article-title>Attention is all you need</article-title>. In: <source>Advances in neural information processing systems</source>. <publisher-loc>Long Beach, CA</publisher-loc>: <publisher-name>NIPS</publisher-name> (<year>2017</year>).</citation>
</ref>
<ref id="B19">
<label>19.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Dehghani</surname> <given-names>M</given-names></name> <name><surname>Gouws</surname> <given-names>S</given-names></name> <name><surname>Vinyals</surname> <given-names>O</given-names></name> <name><surname>Uszkoreit</surname> <given-names>J</given-names></name> <name><surname>Kaiser</surname> <given-names>L</given-names></name></person-group>. <source>Universal Transformers</source>. <publisher-loc>United States</publisher-loc>: <publisher-name>ICLR 2018</publisher-name> (<year>2018</year>). arXiv [Preprint]. <volume>arXiv</volume>:<fpage>1807.03819</fpage>.</citation>
</ref>
<ref id="B20">
<label>20.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Dai</surname> <given-names>Z</given-names></name> <name><surname>Yang</surname> <given-names>Z</given-names></name> <name><surname>Yang</surname> <given-names>Y</given-names></name> <name><surname>Carbonell</surname> <given-names>J</given-names></name> <name><surname>Le</surname> <given-names>QV</given-names></name> <name><surname>Salakhutdinov</surname> <given-names>R</given-names></name></person-group>. <source>Transformer-xl:9 Attentive Language Models Beyond A Fixed-Length Context</source>. <publisher-loc>Florence</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>. arXiv [Preprint]. <volume>arXiv</volume>:<fpage>1901.02860</fpage> (<year>2019</year>).</citation>
</ref>
<ref id="B21">
<label>21.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dosovitskiy</surname> <given-names>A</given-names></name> <name><surname>Beyer</surname> <given-names>L</given-names></name> <name><surname>Kolesnikov</surname> <given-names>A</given-names></name> <name><surname>Weissenborn</surname> <given-names>D</given-names></name> <name><surname>Zhai</surname> <given-names>X</given-names></name> <name><surname>Unterthiner</surname> <given-names>T</given-names></name> <etal/></person-group>. <source>An Image is Worth 16x16 Words: Transformers For Image Recognition At Scale</source>. arXiv [Preprint]. <volume>arXiv</volume>:<fpage>2010.11929</fpage> (<year>2020</year>).</citation>
</ref>
<ref id="B22">
<label>22.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Touvron</surname> <given-names>H</given-names></name> <name><surname>Cord</surname> <given-names>M</given-names></name> <name><surname>Douze</surname> <given-names>M</given-names></name> <name><surname>Massa</surname> <given-names>F</given-names></name> <name><surname>Sablayrolles</surname> <given-names>A</given-names></name> <name><surname>J&#x000E9;gou</surname> <given-names>H</given-names></name></person-group>. <article-title>Training data-efficient image transformers &#x00026; distillation through attention</article-title>. In: <source>International Conference on Machine Learning</source>. <publisher-loc>Long Beach, CA</publisher-loc>: <publisher-name>PMLR</publisher-name> (<year>2021</year>).</citation>
</ref>
<ref id="B23">
<label>23.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>H</given-names></name> <name><surname>Xiao</surname> <given-names>B</given-names></name> <name><surname>Codella</surname> <given-names>N</given-names></name> <name><surname>Liu</surname> <given-names>M</given-names></name> <name><surname>Dai</surname> <given-names>X</given-names></name> <name><surname>Yuan</surname> <given-names>L</given-names></name> <name><surname>Zhang</surname> <given-names>L</given-names></name></person-group>. <source>CVT: Introducing Convolutions to Vision Transformers</source>. arXiv [Preprint]. <volume>arXiv</volume>:<fpage>2103.15808</fpage> (<year>2021</year>).</citation>
</ref>
<ref id="B24">
<label>24.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>Z</given-names></name> <name><surname>Lin</surname> <given-names>Y</given-names></name> <name><surname>Cao</surname> <given-names>Y</given-names></name> <name><surname>Hu</surname> <given-names>H</given-names></name> <name><surname>Wei</surname> <given-names>Y</given-names></name> <name><surname>Zhang</surname> <given-names>Z</given-names></name> <etal/></person-group>. <source>Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows</source>. arXiv [Preprint]. <volume>arXiv</volume>:<fpage>2103.14030</fpage> (<year>2021</year>).</citation>
</ref>
<ref id="B25">
<label>25.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Long</surname> <given-names>J</given-names></name> <name><surname>Shelhamer</surname> <given-names>E</given-names></name> <name><surname>Darrell</surname> <given-names>T</given-names></name></person-group>. <article-title>Fully convolutional networks for semantic segmentation</article-title>. In: <source>Proceedings of the IEEE conference on computer vision and pattern recognition</source>. <publisher-loc>Boston, MA, USA</publisher-loc> (<year>2015</year>).</citation>
</ref>
<ref id="B26">
<label>26.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Badrinarayanan</surname> <given-names>V</given-names></name> <name><surname>Kendall</surname> <given-names>A</given-names></name> <name><surname>Cipolla</surname> <given-names>R</given-names></name></person-group>. <article-title>Segnet: a deep convolutional encoder-decoder architecture for image segmentation</article-title>. <source>IEEE Trans Pattern Anal Mach Intell.</source> (<year>2017</year>) <volume>39</volume>:<fpage>2481</fpage>&#x02013;<lpage>95</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2016.2644615</pub-id><pub-id pub-id-type="pmid">28060704</pub-id></citation></ref>
<ref id="B27">
<label>27.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhao</surname> <given-names>H</given-names></name> <name><surname>Shi</surname> <given-names>J</given-names></name> <name><surname>Qi</surname> <given-names>X</given-names></name> <name><surname>Wang</surname> <given-names>X</given-names></name> <name><surname>Jia</surname> <given-names>J</given-names></name></person-group>. <article-title>Pyramid scene parsing network</article-title>. In: <source>Proceedings of the IEEE conference on computer vision and pattern recognition</source>. <publisher-loc>Honolulu, HI</publisher-loc>: <publisher-name>IEEE Computer Society</publisher-name> (<year>2017</year>).</citation>
</ref>
<ref id="B28">
<label>28.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>LC</given-names></name> <name><surname>Papandreou</surname> <given-names>G</given-names></name> <name><surname>Kokkinos</surname> <given-names>I</given-names></name> <name><surname>Murphy</surname> <given-names>K</given-names></name> <name><surname>Yuille</surname> <given-names>AL</given-names></name></person-group>. <source>Semantic Image Segmentation With Deep Convolutional Nets and Fully Connected Crfs</source>. arXiv [Preprint]. <volume>arXiv</volume>:<fpage>1412.7062</fpage> (<year>2014</year>). <pub-id pub-id-type="pmid">28463186</pub-id></citation></ref>
<ref id="B29">
<label>29.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>LC</given-names></name> <name><surname>Papandreou</surname> <given-names>G</given-names></name> <name><surname>Kokkinos</surname> <given-names>I</given-names></name> <name><surname>Murphy</surname> <given-names>K</given-names></name> <name><surname>Yuille</surname> <given-names>AL</given-names></name></person-group>. <article-title>Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs</article-title>. <source>IEEE Trans Pattern Anal Mach Intell.</source> (<year>2017</year>) <volume>40</volume>:<fpage>834</fpage>&#x02013;<lpage>48</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2017.2699184</pub-id><pub-id pub-id-type="pmid">28463186</pub-id></citation></ref>
<ref id="B30">
<label>30.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>LC</given-names></name> <name><surname>Papandreou</surname> <given-names>G</given-names></name> <name><surname>Schroff</surname> <given-names>F</given-names></name> <name><surname>Adam</surname> <given-names>H</given-names></name></person-group>. <source>Rethinking Atrous Convolution For Semantic Image Segmentation</source>. arXiv [Preprint]. <volume>arXiv</volume>:<fpage>1706.05587</fpage> (<year>2017</year>).</citation>
</ref>
<ref id="B31">
<label>31.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>LC</given-names></name> <name><surname>Zhu</surname> <given-names>Y</given-names></name> <name><surname>Papandreou</surname> <given-names>G</given-names></name> <name><surname>Schroff</surname> <given-names>F</given-names></name> <name><surname>Adam</surname> <given-names>H</given-names></name></person-group>. <article-title>Encoder-decoder with atrous separable convolution for semantic image segmentation</article-title>. In: <source>Proceedings of the European conference on computer vision (ECCV)</source>. <publisher-loc>Munich</publisher-loc>: <publisher-name>Springer</publisher-name> (<year>2018</year>).</citation>
</ref>
<ref id="B32">
<label>32.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ronneberger</surname> <given-names>O</given-names></name> <name><surname>Fischer</surname> <given-names>P</given-names></name> <name><surname>Brox</surname> <given-names>T</given-names></name></person-group>. <article-title>U-net: convolutional networks for biomedical image segmentation</article-title>. In: <source>International Conference on Medical image computing and computer-assisted intervention</source>. <publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name> (<year>2015</year>).</citation>
</ref>
<ref id="B33">
<label>33.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>&#x000C7;i&#x000E7;ek</surname> <given-names>&#x000D6;</given-names></name> <name><surname>Abdulkadir</surname> <given-names>A</given-names></name> <name><surname>Lienkamp</surname> <given-names>SS</given-names></name> <name><surname>Brox</surname> <given-names>T</given-names></name> <name><surname>Ronneberger</surname> <given-names>O</given-names></name></person-group>. <article-title>3D U-Net: learning dense volumetric segmentation from sparse annotation</article-title>. In: <source>International conference on medical image computing and computer-assisted intervention</source>. <publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name> (<year>2016</year>).</citation>
</ref>
<ref id="B34">
<label>34.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Milletari</surname> <given-names>F</given-names></name> <name><surname>Navab</surname> <given-names>N</given-names></name> <name><surname>Ahmadi S</surname> <given-names>A</given-names></name></person-group>. <article-title>V-net: fully convolutional neural networks for volumetric medical image segmentation</article-title>. In: <source>2016 fourth international conference on 3D vision (3DV)</source>. <publisher-loc>Stanford, CA</publisher-loc>: <publisher-name>IEEE</publisher-name> (<year>2016</year>).</citation>
</ref>
<ref id="B35">
<label>35.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>Z</given-names></name> <name><surname>Siddiquee</surname> <given-names>MMR</given-names></name> <name><surname>Tajbakhsh</surname> <given-names>N</given-names></name> <name><surname>Liang</surname> <given-names>J</given-names></name></person-group>. <article-title>Unet&#x0002B;&#x0002B;: a nested u-net architecture for medical image segmentation</article-title>. In: <source>Deep learning in medical image analysis and multimodal learning for clinical decision support</source>. <publisher-name>Springer, Cham</publisher-name> (<year>2018</year>). <pub-id pub-id-type="pmid">32613207</pub-id></citation></ref>
<ref id="B36">
<label>36.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zheng</surname> <given-names>S</given-names></name> <name><surname>Lu</surname> <given-names>J</given-names></name> <name><surname>Zhao</surname> <given-names>H</given-names></name> <name><surname>Zhu</surname> <given-names>X</given-names></name> <name><surname>Luo</surname> <given-names>Z</given-names></name> <name><surname>Wang</surname> <given-names>Y</given-names></name> <etal/></person-group>. <article-title>Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers</article-title>. In: <source>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>. <publisher-loc>Salt Lake City, UT</publisher-loc>: <publisher-name>IEEE</publisher-name> (<year>2021</year>).</citation>
</ref>
<ref id="B37">
<label>37.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Strudel</surname> <given-names>R</given-names></name> <name><surname>Garcia</surname> <given-names>R</given-names></name> <name><surname>Laptev</surname> <given-names>I</given-names></name> <name><surname>Schmid</surname> <given-names>C</given-names></name></person-group>. <source>Segmenter: Transformer for Semantic Segmentation</source>. arXiv [Preprint]. <volume>arXiv</volume>:<fpage>2105.05633</fpage> (<year>2021</year>).</citation>
</ref>
<ref id="B38">
<label>38.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xie</surname> <given-names>E</given-names></name> <name><surname>Wang</surname> <given-names>W</given-names></name> <name><surname>Yu</surname> <given-names>Z</given-names></name> <name><surname>Anandkumar</surname> <given-names>A</given-names></name> <name><surname>Alvarez J</surname> <given-names>M</given-names></name> <name><surname>Luo</surname> <given-names>P</given-names></name></person-group>. <source>SegFormer: Simple and Efficient Design For Semantic Segmentation With Transformers</source>. arXiv [Preprint]. <volume>arXiv</volume>:<fpage>2105.15203</fpage> (<year>2021</year>).</citation>
</ref>
<ref id="B39">
<label>39.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cao</surname> <given-names>H</given-names></name> <name><surname>Wang</surname> <given-names>Y</given-names></name> <name><surname>Chen</surname> <given-names>J</given-names></name> <name><surname>Jiang</surname> <given-names>D</given-names></name> <name><surname>Zhang</surname> <given-names>X</given-names></name> <name><surname>Tian</surname> <given-names>Q</given-names></name> <etal/></person-group>. <source>Swin-Unet: Unet-like Pure Transformer For Medical Image Segmentation</source>. arXiv [Preprint]. <volume>arXiv</volume>:<fpage>2105.05537</fpage> (<year>2021</year>).</citation>
</ref>
<ref id="B40">
<label>40.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Valanarasu</surname> <given-names>JMJ</given-names></name> <name><surname>Oza</surname> <given-names>P</given-names></name> <name><surname>Hacihaliloglu</surname> <given-names>I</given-names></name> <name><surname>Patel</surname> <given-names>VM</given-names></name></person-group>. <source>Medical Transformer: Gated Axial-Attention For Medical Image Segmentation</source>. arXiv [Preprint]. <volume>arXiv</volume>:<fpage>2102.10662</fpage> (<year>2021</year>).</citation>
</ref>
<ref id="B41">
<label>41.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hatamizadeh</surname> <given-names>A</given-names></name> <name><surname>Tang</surname> <given-names>Y</given-names></name> <name><surname>Nath</surname> <given-names>V</given-names></name> <name><surname>Yang</surname> <given-names>D</given-names></name> <name><surname>Myronenko</surname> <given-names>A</given-names></name> <name><surname>Landman</surname> <given-names>B</given-names></name> <etal/></person-group>. <source>Unetr: Transformers For 3d Medical Image Segmentation</source>. arXiv [Preprint]. <volume>arXiv</volume>:<fpage>2103.10504</fpage> (<year>2021</year>).</citation>
</ref>
<ref id="B42">
<label>42.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Y</given-names></name> <name><surname>Higashita</surname> <given-names>R</given-names></name> <name><surname>Fu</surname> <given-names>H</given-names></name> <name><surname>Xu</surname> <given-names>Y</given-names></name> <name><surname>Zhang</surname> <given-names>Y</given-names></name> <name><surname>Liu</surname> <given-names>H</given-names></name> <etal/></person-group>. <source>A Multi-Branch Hybrid Transformer Network for Corneal Endothelial Cell Segmentation</source>. arXiv [Preprint]. <volume>arXiv</volume>:<fpage>2106.07557</fpage> (<year>2021</year>).</citation>
</ref>
<ref id="B43">
<label>43.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>J</given-names></name> <name><surname>Lu</surname> <given-names>Y</given-names></name> <name><surname>Yu</surname> <given-names>Q</given-names></name> <name><surname>Luo</surname> <given-names>X</given-names></name> <name><surname>Adeli</surname> <given-names>E</given-names></name> <name><surname>Wang</surname> <given-names>Y</given-names></name> <etal/></person-group>. <source>Transunet: Transformers make strong encoders for medical image segmentation</source>. arXiv [Preprint]. <volume>arXiv</volume>:<fpage>2102.04306</fpage> (<year>2021</year>).</citation>
</ref>
<ref id="B44">
<label>44.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Y</given-names></name> <name><surname>Liu</surname> <given-names>H</given-names></name> <name><surname>Hu</surname> <given-names>Q</given-names></name></person-group>. <source>Transfuse: Fusing Transformers and CNNs for Medical Image Segmentation</source>. arXiv [Preprint]. <volume>arXiv</volume>:<fpage>2102.08005</fpage> (<year>2021</year>).</citation>
</ref>
<ref id="B45">
<label>45.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bolya</surname> <given-names>D</given-names></name> <name><surname>Zhou</surname> <given-names>C</given-names></name> <name><surname>Xiao</surname> <given-names>F</given-names></name> <name><surname>Lee Y</surname> <given-names>J</given-names></name></person-group>. <article-title>Yolact: real-time instance segmentation</article-title>. In: <source>Proceedings of the IEEE/CVF International Conference on Computer Vision</source>. <publisher-loc>Seoul</publisher-loc>: <publisher-name>IEEE</publisher-name> (<year>2019</year>).</citation>
</ref>
<ref id="B46">
<label>46.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Redmon</surname> <given-names>J</given-names></name> <name><surname>Divvala</surname> <given-names>S</given-names></name> <name><surname>Girshick</surname> <given-names>R</given-names></name> <name><surname>Farhadi</surname> <given-names>A</given-names></name></person-group>. <article-title>You only look once: Unified, real-time object detection</article-title>. In: <source>Proceedings of the IEEE conference on computer vision and pattern recognition</source>. <publisher-loc>Las Vegas, NV</publisher-loc>: <publisher-name>IEEE Computer Society</publisher-name> (<year>2016</year>).</citation>
</ref>
<ref id="B47">
<label>47.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>W</given-names></name> <name><surname>Anguelov</surname> <given-names>D</given-names></name> <name><surname>Erhan</surname> <given-names>D</given-names></name> <name><surname>Szegedy</surname> <given-names>C</given-names></name> <name><surname>Reed</surname> <given-names>S</given-names></name> <name><surname>Fu C</surname> <given-names>Y</given-names></name> <name><surname>Berg A</surname> <given-names>C</given-names></name></person-group>. <article-title>SSD: single shot multibox detector</article-title>. In: <source>European conference on computer vision</source>. <publisher-name>Springer, Cham</publisher-name> (<year>2016</year>).</citation>
</ref>
<ref id="B48">
<label>48.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>He</surname> <given-names>K</given-names></name> <name><surname>Gkioxari</surname> <given-names>G</given-names></name> <name><surname>Doll&#x000E1;r</surname> <given-names>P</given-names></name> <name><surname>Girshick</surname> <given-names>R</given-names></name></person-group>. <article-title>Mask R-CNN</article-title>. In: <source>Proceedings of the IEEE international conference on computer vision</source>. <publisher-loc>Venice</publisher-loc>: <publisher-name>IEEE Computer Society</publisher-name> (<year>2017</year>).</citation>
</ref>
<ref id="B49">
<label>49.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>S</given-names></name> <name><surname>Qi</surname> <given-names>L</given-names></name> <name><surname>Qin</surname> <given-names>H</given-names></name> <name><surname>Shi</surname> <given-names>J</given-names></name> <name><surname>Jia</surname> <given-names>J</given-names></name></person-group>. <article-title>Path aggregation network for instance segmentation</article-title>. In: <source>Proceedings of the IEEE conference on computer vision and pattern recognition</source>. <publisher-loc>Salt Lake City, UT</publisher-loc>: <publisher-name>Computer Vision Foundation / IEEE Computer Society</publisher-name> (<year>2018</year>).</citation>
</ref>
<ref id="B50">
<label>50.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Cai</surname> <given-names>Z</given-names></name> <name><surname>Vasconcelos</surname> <given-names>N</given-names></name></person-group>. <article-title>Cascade R-CNN: delving into high quality object detection</article-title>. In: <source>Proceedings of the IEEE conference on computer vision and pattern recognition</source>. <publisher-loc>Salt Lake City, UT</publisher-loc>: (<year>2018</year>). <publisher-name>Computer Vision Foundation / IEEE Computer Society</publisher-name> (<year>2018</year>). p. <fpage>6154</fpage>&#x02013;<lpage>62</lpage>.</citation>
</ref>
<ref id="B51">
<label>51.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>K</given-names></name> <name><surname>Pang</surname> <given-names>J</given-names></name> <name><surname>Wang</surname> <given-names>J</given-names></name> <name><surname>Xiong</surname> <given-names>Y</given-names></name> <name><surname>Li</surname> <given-names>X</given-names></name> <name><surname>Sun</surname> <given-names>S</given-names></name> <etal/></person-group>. <article-title>Hybrid task cascade for instance segmentation</article-title>. In: <source>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>. (<year>2019</year>). p. <fpage>4974</fpage>&#x02013;<lpage>83</lpage>.</citation>
</ref>
<ref id="B52">
<label>52.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>X</given-names></name> <name><surname>Kong</surname> <given-names>T</given-names></name> <name><surname>Shen</surname> <given-names>C</given-names></name> <name><surname>Jiang</surname> <given-names>Y</given-names></name> <name><surname>Li</surname> <given-names>L</given-names></name></person-group>. <article-title>Solo: segmenting objects by locations</article-title>. In: <source>European Conference on Computer Vision</source>. <publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name> (<year>2020</year>). p. <fpage>649</fpage>&#x02013;<lpage>65</lpage>. <pub-id pub-id-type="pmid">34516372</pub-id></citation></ref>
<ref id="B53">
<label>53.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bai</surname> <given-names>M</given-names></name> <name><surname>Urtasun</surname> <given-names>R</given-names></name></person-group>. <article-title>Deep watershed transform for instance segmentation</article-title>. In: <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>. (<year>2017</year>). p. <fpage>5221</fpage>&#x02013;<lpage>29</lpage>. <pub-id pub-id-type="pmid">34288972</pub-id></citation></ref>
<ref id="B54">
<label>54.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>K</given-names></name> <name><surname>Guan</surname> <given-names>K</given-names></name> <name><surname>Peng</surname> <given-names>J</given-names></name> <name><surname>Luo</surname> <given-names>Y</given-names></name> <name><surname>Wang</surname> <given-names>S</given-names></name></person-group>. <article-title>DeepMask: An Algorithm For Cloud and Cloud Shadow Detection in Optical Satellite Remote Sensing Images Using Deep Residual Network</article-title>. <source>arXiv [Preprint]</source>.<volume>arXiv</volume>:<fpage>1911.03607</fpage> (<year>2019</year>).</citation>
</ref>
<ref id="B55">
<label>55.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Carion</surname> <given-names>N</given-names></name> <name><surname>Massa</surname> <given-names>F</given-names></name> <name><surname>Synnaeve</surname> <given-names>G</given-names></name> <name><surname>Usunier</surname> <given-names>N</given-names></name> <name><surname>Kirillov</surname> <given-names>A</given-names></name> <name><surname>Zagoruyko</surname> <given-names>S</given-names></name></person-group>. <article-title>End-to-end object detection with transformers</article-title>. In: <source>European Conference on Computer Vision</source>. <publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name> (<year>2020</year>). p. <fpage>213</fpage>&#x02013;<lpage>29</lpage>.</citation>
</ref>
<ref id="B56">
<label>56.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Prangemeier</surname> <given-names>T</given-names></name> <name><surname>Reich</surname> <given-names>C</given-names></name> <name><surname>Koeppl</surname> <given-names>H</given-names></name></person-group>. <article-title>Attention-based transformers for instance segmentation of cells in microstructures</article-title>. In: <source>2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)</source>. <publisher-loc>IEEE</publisher-loc> (<year>2020</year>). p. <fpage>700</fpage>&#x02013;<lpage>7</lpage>.</citation>
</ref>
<ref id="B57">
<label>57.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hu</surname> <given-names>J</given-names></name> <name><surname>Cao</surname> <given-names>L</given-names></name> <name><surname>Lu</surname> <given-names>Y</given-names></name> <name><surname>Zhang</surname> <given-names>S</given-names></name> <name><surname>Wang</surname> <given-names>Y</given-names></name> <name><surname>Li</surname> <given-names>K</given-names></name> <etal/></person-group>. <source>ISTR: End-to-End Instance Segmentation With Transformers</source>. arXiv [Preprint]. <volume>arXiv</volume>:<fpage>2105.00637</fpage> (<year>2021</year>).</citation>
</ref>
<ref id="B58">
<label>58.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ren</surname> <given-names>S</given-names></name> <name><surname>He</surname> <given-names>K</given-names></name> <name><surname>Girshick</surname> <given-names>R</given-names></name> <name><surname>Sun</surname> <given-names>J</given-names></name></person-group>. <article-title>Faster R-CNN: Towards real-time object detection with region proposal networks</article-title>. <source>Adv Neural Inf Process Syst.</source> (<year>2015</year>) <volume>28</volume>:<fpage>91</fpage>&#x02013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2016.2577031</pub-id><pub-id pub-id-type="pmid">27295650</pub-id></citation></ref>
<ref id="B59">
<label>59.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jain</surname> <given-names>AK</given-names></name> <name><surname>Chen</surname> <given-names>H</given-names></name></person-group>. <article-title>Matching of dental X-ray images for human identification</article-title>. <source>Pattern Recognit.</source> (<year>2004</year>) <volume>37</volume>:<fpage>1519</fpage>&#x02013;<lpage>32</lpage>. <pub-id pub-id-type="doi">10.1016/j.patcog.2003.12.016</pub-id></citation>
</ref>
<ref id="B60">
<label>60.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fahmy</surname> <given-names>GF</given-names></name> <name><surname>Nassar</surname> <given-names>DEM</given-names></name> <name><surname>Said</surname> <given-names>EH</given-names></name> <name><surname>Chen</surname> <given-names>H</given-names></name> <name><surname>Nomir</surname> <given-names>O</given-names></name> <name><surname>Zhou</surname> <given-names>J</given-names></name> <etal/></person-group>. <article-title>Toward an automated dental identification system</article-title>. <source>J Electron Imaging.</source> (<year>2005</year>) <volume>14</volume>:<fpage>043018</fpage>. <pub-id pub-id-type="doi">10.1117/1.2135310</pub-id><pub-id pub-id-type="pmid">28584483</pub-id></citation></ref>
<ref id="B61">
<label>61.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>J</given-names></name> <name><surname>Abdel-Mottaleb</surname> <given-names>M</given-names></name></person-group>. <article-title>A content-based system for human identification based on bitewing dental X-ray images</article-title>. <source>Pattern Recognit.</source> (<year>2005</year>) <volume>38</volume>:<fpage>2132</fpage>&#x02013;<lpage>42</lpage>. <pub-id pub-id-type="doi">10.1016/j.patcog.2005.01.011</pub-id></citation>
</ref>
<ref id="B62">
<label>62.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nomir</surname> <given-names>O</given-names></name> <name><surname>Abdel-Mottaleb</surname> <given-names>M</given-names></name></person-group>. <article-title>A system for human identification from X-ray dental radiographs</article-title>. <source>Pattern Recognit.</source> (<year>2005</year>) <volume>38</volume>:<fpage>1295</fpage>&#x02013;<lpage>305</lpage>. <pub-id pub-id-type="doi">10.1016/j.patcog.2004.12.010</pub-id><pub-id pub-id-type="pmid">16119269</pub-id></citation></ref>
<ref id="B63">
<label>63.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mahoor</surname> <given-names>MH</given-names></name> <name><surname>Abdel-Mottaleb</surname> <given-names>M</given-names></name></person-group>. <article-title>Classification and numbering of teeth in dental bitewing images</article-title>. <source>Pattern Recognit.</source> (<year>2005</year>) <volume>38</volume>:<fpage>577</fpage>&#x02013;<lpage>86</lpage>. <pub-id pub-id-type="doi">10.1016/j.patcog.2004.08.012</pub-id></citation>
</ref>
<ref id="B64">
<label>64.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lin</surname> <given-names>PL</given-names></name> <name><surname>Lai</surname> <given-names>YH</given-names></name> <name><surname>Huang</surname> <given-names>PW</given-names></name></person-group>. <article-title>An effective classification and numbering system for dental bitewing radiographs using teeth region and contour information</article-title>. <source>Pattern Recognit.</source> (<year>2010</year>) <volume>43</volume>:<fpage>1380</fpage>&#x02013;<lpage>92</lpage>. <pub-id pub-id-type="doi">10.1016/j.patcog.2009.10.005</pub-id></citation>
</ref>
<ref id="B65">
<label>65.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Silva</surname> <given-names>B</given-names></name> <name><surname>Pinheiro</surname> <given-names>L</given-names></name> <name><surname>Oliveira</surname> <given-names>L</given-names></name> <name><surname>Pithon</surname> <given-names>M</given-names></name></person-group>. <article-title>A study on tooth segmentation and numbering using end-to-end deep neural networks</article-title>. In: <source>2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)</source>. <publisher-loc>IEEE</publisher-loc> (<year>2020</year>). p. <fpage>164</fpage>&#x02013;<lpage>71</lpage>. <pub-id pub-id-type="doi">10.1109/SIBGRAPI51738.2020.00030</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B66">
<label>66.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lurie</surname> <given-names>A</given-names></name> <name><surname>Tosoni</surname> <given-names>GM</given-names></name> <name><surname>Tsimikas</surname> <given-names>J</given-names></name> <name><surname>Walker Jr</surname> <given-names>F</given-names></name></person-group>. <article-title>Recursive hierarchic segmentation analysis of bone mineral density changes on digital panoramic images</article-title>. <source>Oral Surg Oral Med Oral Pathol Oral Radiol.</source> (<year>2012</year>) <volume>113</volume>:<fpage>549</fpage>&#x02013;<lpage>58. e1</lpage>. <pub-id pub-id-type="doi">10.1016/j.oooo.2011.10.002</pub-id><pub-id pub-id-type="pmid">22668434</pub-id></citation></ref>
<ref id="B67">
<label>67.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Tikhe</surname> <given-names>SV</given-names></name> <name><surname>Naik</surname> <given-names>AM</given-names></name> <name><surname>Bhide</surname> <given-names>SD</given-names></name> <name><surname>Saravanan</surname> <given-names>T</given-names></name> <name><surname>Kaliyamurthie</surname> <given-names>KP</given-names></name></person-group>. <article-title>Algorithm to identify enamel caries and interproximal caries using dental digital radiographs</article-title>. In: <source>2016 IEEE 6th International Conference on Advanced Computing (IACC)</source>. <publisher-loc>IEEE</publisher-loc> (<year>2016</year>). p. <fpage>225</fpage>&#x02013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1109/IACC.2016.50</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B68">
<label>68.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tuan</surname> <given-names>TM</given-names></name></person-group>. <article-title>A cooperative semi-supervised fuzzy clustering framework for dental X-ray image segmentation</article-title>. <source>Expert Syst Appl.</source> (<year>2016</year>) <volume>46</volume>:<fpage>380</fpage>&#x02013;<lpage>93</lpage>. <pub-id pub-id-type="doi">10.1016/j.eswa.2015.11.001</pub-id></citation>
</ref>
<ref id="B69">
<label>69.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Trivedi</surname> <given-names>DN</given-names></name> <name><surname>Kothari</surname> <given-names>AM</given-names></name> <name><surname>Shah</surname> <given-names>S</given-names></name> <name><surname>Nikunj</surname> <given-names>S</given-names></name></person-group>. <article-title>Dental image matching by Canny algorithm for human identification</article-title>. <source>Int J Adv Comput Res.</source> (<year>2014</year>) <volume>4</volume>:<fpage>985</fpage>.</citation>
</ref>
<ref id="B70">
<label>70.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wirtz</surname> <given-names>A</given-names></name> <name><surname>Mirashi</surname> <given-names>SG</given-names></name> <name><surname>Wesarg</surname> <given-names>S</given-names></name></person-group>. <article-title>Automatic teeth segmentation in panoramic X-ray images using a coupled shape model in combination with a neural network</article-title>. In: <source>International conference on medical image computing and computer-assisted intervention</source>. <publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name> (<year>2018</year>). p. <fpage>712</fpage>&#x02013;<lpage>9</lpage>.</citation>
</ref>
<ref id="B71">
<label>71.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Koch</surname> <given-names>TL</given-names></name> <name><surname>Perslev</surname> <given-names>M</given-names></name> <name><surname>Igel</surname> <given-names>C</given-names></name> <name><surname>Brandt</surname> <given-names>SS</given-names></name></person-group>. <article-title>Accurate segmentation of dental panoramic radiographs with U-Nets</article-title>. In: <source>2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019)</source>. <publisher-loc>Venice</publisher-loc>: <publisher-name>IEEE</publisher-name> (<year>2019</year>). p. <fpage>15</fpage>&#x02013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1109/ISBI.2019.8759563</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B72">
<label>72.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sivagami</surname> <given-names>S</given-names></name> <name><surname>Chitra</surname> <given-names>P</given-names></name> <name><surname>Kailash</surname> <given-names>GSR</given-names></name> <name><surname>Muralidharan</surname> <given-names>SR</given-names></name></person-group>. <article-title>Unet architecture based dental panoramic image segmentation</article-title>. In: <source>2020 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET)</source>. <publisher-loc>IEEE</publisher-loc> (<year>2020</year>). p. <fpage>187</fpage>&#x02013;<lpage>91</lpage>.</citation>
</ref>
<ref id="B73">
<label>73.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Choi</surname> <given-names>J</given-names></name> <name><surname>Eun</surname> <given-names>H</given-names></name> <name><surname>Kim</surname> <given-names>C</given-names></name></person-group>. <article-title>Boosting proximal dental caries detection <italic>via</italic> combination of variational methods and convolutional neural network</article-title>. <source>J Signal Process Syst.</source> (<year>2018</year>) <volume>90</volume>:<fpage>87</fpage>&#x02013;<lpage>97</lpage>. <pub-id pub-id-type="doi">10.1007/s11265-016-1214-6</pub-id></citation>
</ref>
<ref id="B74">
<label>74.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Cui</surname> <given-names>W</given-names></name> <name><surname>Zeng</surname> <given-names>L</given-names></name> <name><surname>Chong</surname> <given-names>B</given-names></name> <name><surname>Zhang</surname> <given-names>Q</given-names></name></person-group>. <article-title>Toothpix: pixel-level tooth segmentation in panoramic X-Ray images based on generative adversarial networks</article-title>. In: <source>2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI).</source> <publisher-loc>IEEE</publisher-loc> (<year>2021</year>). p. <fpage>1346</fpage>&#x02013;<lpage>50</lpage>.</citation>
</ref>
<ref id="B75">
<label>75.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zakirov</surname> <given-names>A</given-names></name> <name><surname>Ezhov</surname> <given-names>M</given-names></name> <name><surname>Gusarev</surname> <given-names>M</given-names></name> <name><surname>Alexandrovsky</surname> <given-names>V</given-names></name> <name><surname>Shumilov</surname> <given-names>E</given-names></name></person-group>. <article-title>Dental pathology detection in 3D cone-beam CT</article-title>. <source>arXiv [Preprint]</source>.<volume>arXiv</volume>:<fpage>1810.10309</fpage> (<year>2018</year>).</citation>
</ref>
<ref id="B76">
<label>76.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>Y</given-names></name> <name><surname>Du</surname> <given-names>H</given-names></name> <name><surname>Yun</surname> <given-names>Z</given-names></name> <name><surname>Yang</surname> <given-names>S</given-names></name> <name><surname>Dai</surname> <given-names>Z</given-names></name> <name><surname>Zhong</surname> <given-names>L</given-names></name> <etal/></person-group>. <article-title>Automatic segmentation of individual tooth in dental CBCT images from tooth surface map by a multi-task FCN</article-title>. <source>IEEE Access.</source> (<year>2020</year>) <volume>8</volume>:<fpage>97296</fpage>&#x02013;<lpage>309</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2020.2991799</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B77">
<label>77.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>S</given-names></name> <name><surname>Woo</surname> <given-names>S</given-names></name> <name><surname>Yu</surname> <given-names>J</given-names></name> <name><surname>Seo</surname> <given-names>J</given-names></name> <name><surname>Lee</surname> <given-names>J</given-names></name> <name><surname>Lee</surname> <given-names>C</given-names></name></person-group>. <article-title>Automated CNN-Based tooth segmentation in cone-beam CT for dental implant planning</article-title>. <source>IEEE Access.</source> (<year>2020</year>) <volume>8</volume>:<fpage>50507</fpage>&#x02013;<lpage>18</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2020.2975826</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B78">
<label>78.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rao</surname> <given-names>Y</given-names></name> <name><surname>Wang</surname> <given-names>Y</given-names></name> <name><surname>Meng</surname> <given-names>F</given-names></name> <name><surname>Pu</surname> <given-names>J</given-names></name> <name><surname>Sun</surname> <given-names>J</given-names></name> <name><surname>Wang</surname> <given-names>Q</given-names></name> <etal/></person-group>. <article-title>Symmetric fully convolutional residual network with DCRF for accurate tooth segmentation</article-title>. <source>IEEE Access.</source> (<year>2020</year>) <volume>8</volume>:<fpage>92028</fpage>&#x02013;<lpage>38</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2020.2994592</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B79">
<label>79.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ezhov</surname> <given-names>M</given-names></name> <name><surname>Zakirov</surname> <given-names>A</given-names></name> <name><surname>Gusarev</surname> <given-names>M</given-names></name></person-group>. <article-title>Coarse-to-fine volumetric segmentation of teeth in cone-beam CT</article-title>. In: <source>2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019)</source>. <publisher-loc>Venice</publisher-loc>: <publisher-name>IEEE</publisher-name> (<year>2019</year>). p. <fpage>52</fpage>&#x02013;<lpage>6</lpage>.</citation>
</ref>
<ref id="B80">
<label>80.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zanjani</surname> <given-names>FG</given-names></name> <name><surname>Moin</surname> <given-names>DA</given-names></name> <name><surname>Verheij</surname> <given-names>B</given-names></name> <name><surname>Claessen</surname> <given-names>F</given-names></name> <name><surname>Cherici</surname> <given-names>T</given-names></name> <name><surname>Tan</surname> <given-names>T</given-names></name></person-group>. <article-title>Deep learning approach to semantic segmentation in 3d point cloud intra-oral scans of teeth</article-title>. In: <source>International Conference on Medical Imaging with Deep Learning</source>. <publisher-loc>PMLR</publisher-loc> (<year>2019</year>). p. <fpage>557</fpage>&#x02013;<lpage>71</lpage>.</citation>
</ref>
<ref id="B81">
<label>81.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Jader</surname> <given-names>G</given-names></name> <name><surname>Fontineli</surname> <given-names>J</given-names></name> <name><surname>Ruiz</surname> <given-names>M</given-names></name> <name><surname>Abdalla</surname> <given-names>K</given-names></name> <name><surname>Pithon</surname> <given-names>M</given-names></name> <name><surname>Oliveira</surname> <given-names>L</given-names></name></person-group>. <article-title>Deep instance segmentation of teeth in panoramic X-ray images</article-title>. In: <source>2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)</source>. <publisher-name>IEEE</publisher-name> (<year>2018</year>). p. <fpage>400</fpage>&#x02013;<lpage>7</lpage>.</citation>
</ref>
<ref id="B82">
<label>82.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gurses</surname> <given-names>A</given-names></name> <name><surname>Oktay A</surname> <given-names>B</given-names></name></person-group>. <article-title>Human Identification with Panoramic Dental Images using Mask R-CNN and SURF</article-title>. In: <source>2020 5th International Conference on Computer Science and Engineering (UBMK)</source>. <publisher-name>IEEE</publisher-name> (<year>2020</year>). p. <fpage>232</fpage>&#x02013;<lpage>7</lpage>.</citation>
</ref>
<ref id="B83">
<label>83.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>X</given-names></name> <name><surname>Chen</surname> <given-names>H</given-names></name> <name><surname>Huang</surname> <given-names>Y</given-names></name> <name><surname>Guo</surname> <given-names>H</given-names></name> <name><surname>Qiu</surname> <given-names>T</given-names></name> <name><surname>Wang</surname> <given-names>L</given-names></name></person-group>. <article-title>Center-sensitive and boundary-aware tooth instance segmentation and classification from cone-beam CT</article-title>. In: <source>2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI)</source>. <publisher-loc>IEEE</publisher-loc> (<year>2020</year>). p. <fpage>939</fpage>&#x02013;<lpage>42</lpage>.</citation>
</ref>
<ref id="B84">
<label>84.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cui</surname> <given-names>Z</given-names></name> <name><surname>Li</surname> <given-names>C</given-names></name> <name><surname>Wang</surname> <given-names>W</given-names></name></person-group>. <article-title>ToothNet: automatic tooth instance segmentation and identification from cone beam CT images</article-title>. In: <source>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>. (<year>2019</year>). p. <fpage>6368</fpage>&#x02013;<lpage>77</lpage>.</citation>
</ref>
<ref id="B85">
<label>85.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zanjani</surname> <given-names>FG</given-names></name> <name><surname>Pourtaherian</surname> <given-names>A</given-names></name> <name><surname>Zinger</surname> <given-names>S</given-names></name> <name><surname>Moin</surname> <given-names>DA</given-names></name> <name><surname>Claessen</surname> <given-names>F</given-names></name> <name><surname>Cherici</surname> <given-names>T</given-names></name> <etal/></person-group>. <article-title>Mask-MCNet: tooth instance segmentation in 3D point clouds of intra-oral scans</article-title>. <source>Neurocomputing.</source> (<year>2021</year>) <volume>453</volume>:<fpage>286</fpage>&#x02013;<lpage>98</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2020.06.145</pub-id></citation>
</ref>
<ref id="B86">
<label>86.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Silva</surname> <given-names>G</given-names></name> <name><surname>Oliveira</surname> <given-names>L</given-names></name> <name><surname>Pithon</surname> <given-names>M</given-names></name></person-group>. <article-title>Automatic segmenting teeth in X-ray images: trends, a novel data set, benchmarking and future perspectives</article-title>. <source>Expert Syst Appl.</source> (<year>2018</year>) <volume>107</volume>:<fpage>15</fpage>&#x02013;<lpage>31</lpage>. <pub-id pub-id-type="doi">10.1016/j.eswa.2018.04.001</pub-id></citation>
</ref>
<ref id="B87">
<label>87.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kong</surname> <given-names>Z</given-names></name> <name><surname>Xiong</surname> <given-names>F</given-names></name> <name><surname>Zhang</surname> <given-names>C</given-names></name> <name><surname>Fu</surname> <given-names>Z</given-names></name> <name><surname>Zhang</surname> <given-names>M</given-names></name> <name><surname>Weng</surname> <given-names>J</given-names></name> <etal/></person-group>. <article-title>Automated maxillofacial segmentation in panoramic dental x-ray images using an efficient encoder-decoder network</article-title>. <source>IEEE Access.</source> (<year>2020</year>) <volume>8</volume>:<fpage>207822</fpage>&#x02013;<lpage>33</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2020.3037677</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B88">
<label>88.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>H</given-names></name> <name><surname>Zhou</surname> <given-names>J</given-names></name> <name><surname>Zhou</surname> <given-names>Y</given-names></name> <name><surname>Chen</surname> <given-names>J</given-names></name> <name><surname>Gao</surname> <given-names>F</given-names></name> <name><surname>Xu</surname> <given-names>Y</given-names></name> <etal/></person-group>. <article-title>Automatic and interpretable model for periodontitis diagnosis in panoramic radiographs</article-title>. In: <source>International Conference on Medical Image Computing and Computer-Assisted Intervention</source>. <publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name> (<year>2020</year>). p. <fpage>454</fpage>&#x02013;<lpage>63</lpage>.</citation>
</ref>
<ref id="B89">
<label>89.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Egger</surname> <given-names>J</given-names></name> <name><surname>Pfarrkirchner</surname> <given-names>B</given-names></name> <name><surname>Gsaxner</surname> <given-names>C</given-names></name> <name><surname>Lindner</surname> <given-names>L</given-names></name> <name><surname>Schmalstieg</surname> <given-names>D</given-names></name> <name><surname>Wallner</surname> <given-names>J</given-names></name></person-group>. <article-title>Fully convolutional mandible segmentation on a valid ground-truth dataset</article-title>. In: <source>2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)</source>. <publisher-loc>Honolulu, HI</publisher-loc>: <publisher-name>IEEE</publisher-name> (<year>2018</year>). p. <fpage>656</fpage>&#x02013;<lpage>60</lpage>. <pub-id pub-id-type="pmid">30440482</pub-id></citation></ref>
<ref id="B90">
<label>90.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>J</given-names></name> <name><surname>Liu</surname> <given-names>M</given-names></name> <name><surname>Wang</surname> <given-names>L</given-names></name> <name><surname>Chen</surname> <given-names>S</given-names></name> <name><surname>Yuan</surname> <given-names>P</given-names></name> <name><surname>Li</surname> <given-names>J</given-names></name> <etal/></person-group>. <article-title>Joint craniomaxillofacial bone segmentation and landmark digitization by context-guided fully convolutional networks</article-title>. In: <source>International conference on medical image computing and computer-assisted intervention</source>. <publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name> (<year>2017</year>). p. <fpage>720</fpage>&#x02013;<lpage>8</lpage>. <pub-id pub-id-type="pmid">31816592</pub-id></citation></ref>
<ref id="B91">
<label>91.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Torosdagli</surname> <given-names>N</given-names></name> <name><surname>Liberton D</surname> <given-names>K</given-names></name> <name><surname>Verma</surname> <given-names>P</given-names></name> <name><surname>Sincan</surname> <given-names>M</given-names></name> <name><surname>Lee J</surname> <given-names>S</given-names></name> <name><surname>Bagci</surname> <given-names>U</given-names></name></person-group>. <article-title>Deep geodesic learning for segmentation and anatomical landmarking</article-title>. <source>IEEE Trans Med Imaging.</source> (<year>2018</year>) <volume>38</volume>:<fpage>919</fpage>&#x02013;<lpage>31</lpage>. <pub-id pub-id-type="doi">10.1109/TMI.2018.2875814</pub-id><pub-id pub-id-type="pmid">30334750</pub-id></citation></ref>
<ref id="B92">
<label>92.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Lian</surname> <given-names>C</given-names></name> <name><surname>Wang</surname> <given-names>F</given-names></name> <name><surname>Deng</surname> <given-names>HH</given-names></name> <name><surname>Wang</surname> <given-names>L</given-names></name> <name><surname>Xiao</surname> <given-names>D</given-names></name> <name><surname>Kuang</surname> <given-names>T</given-names></name> <etal/></person-group>. <article-title>Multi-task dynamic transformer network for concurrent bone segmentation and large-scale landmark localization with dental CBCT</article-title>. In: <source>International Conference on Medical Image Computing and Computer-Assisted Intervention</source>. <publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name> (<year>2020</year>). p. <fpage>807</fpage>&#x02013;<lpage>16</lpage>.</citation>
</ref>
</ref-list>
<app-group>
<app id="A1">
<title>Appendix</title>
<table-wrap position="float" id="T7">
<caption><p>The code and the data of works of literature.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Study</bold></th>
<th valign="top" align="left"><bold>Algorithm</bold></th>
<th valign="top" align="left"><bold>Code</bold></th>
<th valign="top" align="left"><bold>Data</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Wirtz (<xref ref-type="bibr" rid="B70">70</xref>)</td>
<td valign="top" align="left">UNet</td>
<td valign="top" align="left"><ext-link ext-link-type="uri" xlink:href="https://github.com/zhixuhao/unet">https://github.com/zhixuhao/unet</ext-link></td>
<td valign="top" align="left">Their own dataset</td>
</tr>
<tr>
<td valign="top" align="left">Koch (<xref ref-type="bibr" rid="B71">71</xref>)</td>
<td valign="top" align="left">UNet</td>
<td valign="top" align="left"><ext-link ext-link-type="uri" xlink:href="https://github.com/zhixuhao/unet">https://github.com/zhixuhao/unet</ext-link></td>
<td valign="top" align="left">The dataset created by Gil Silva</td>
</tr>
<tr>
<td valign="top" align="left">Sivagami (<xref ref-type="bibr" rid="B72">72</xref>)</td>
<td valign="top" align="left">UNet</td>
<td valign="top" align="left"><ext-link ext-link-type="uri" xlink:href="https://github.com/zhixuhao/unet">https://github.com/zhixuhao/unet</ext-link></td>
<td valign="top" align="left"><ext-link ext-link-type="uri" xlink:href="https://github.com/IvisionLab/deep-dental-image">https://github.com/IvisionLab/deep-dental-image</ext-link></td>
</tr>
<tr>
<td valign="top" align="left">Choi (<xref ref-type="bibr" rid="B73">73</xref>)</td>
<td valign="top" align="left">FCN</td>
<td valign="top" align="left"><ext-link ext-link-type="uri" xlink:href="https://github.com/shelhamer/fcn.berkeleyvision.org">https://github.com/shelhamer/fcn.berkeleyvision.org</ext-link></td>
<td valign="top" align="left">Their own dataset</td>
</tr>
<tr>
<td valign="top" align="left">Cui (<xref ref-type="bibr" rid="B74">74</xref>)</td>
<td valign="top" align="left">ToothPix</td>
<td valign="top" align="left">Not available</td>
<td valign="top" align="left">lndb dental dataset: <ext-link ext-link-type="uri" xlink:href="https://github.com/IvisionLab/dental-image">https://github.com/IvisionLab/dental-image</ext-link></td>
</tr>
<tr>
<td valign="top" align="left">Zakirov (<xref ref-type="bibr" rid="B75">75</xref>)</td>
<td valign="top" align="left">VNet</td>
<td valign="top" align="left"><ext-link ext-link-type="uri" xlink:href="https://github.com/mattmacy/vnet.pytorch">https://github.com/mattmacy/vnet.pytorch</ext-link></td>
<td valign="top" align="left">Their own dataset</td>
</tr>
<tr>
<td valign="top" align="left">Chen (<xref ref-type="bibr" rid="B76">76</xref>)</td>
<td valign="top" align="left">FCN&#x0002B;MWT</td>
<td valign="top" align="left"><ext-link ext-link-type="uri" xlink:href="https://github.com/shelhamer/fcn.berkeleyvision.org">https://github.com/shelhamer/fcn.berkeleyvision.org</ext-link></td>
<td valign="top" align="left">Their own dataset</td>
</tr>
<tr>
<td valign="top" align="left">Lee (<xref ref-type="bibr" rid="B77">77</xref>)</td>
<td valign="top" align="left">CNN</td>
<td valign="top" align="left">Not available</td>
<td valign="top" align="left">Their own dataset</td>
</tr>
<tr>
<td valign="top" align="left">Rao (<xref ref-type="bibr" rid="B78">78</xref>)</td>
<td valign="top" align="left">UNet&#x0002B;DCRF</td>
<td valign="top" align="left"><ext-link ext-link-type="uri" xlink:href="https://github.com/zhixuhao/unet">https://github.com/zhixuhao/unet</ext-link></td>
<td valign="top" align="left">Their own dataset</td>
</tr>
<tr>
<td valign="top" align="left">Ezhov (<xref ref-type="bibr" rid="B79">79</xref>)</td>
<td valign="top" align="left">VNet</td>
<td valign="top" align="left"><ext-link ext-link-type="uri" xlink:href="https://github.com/mattmacy/vnet.pytorch">https://github.com/mattmacy/vnet.pytorch</ext-link></td>
<td valign="top" align="left">Their own dataset</td>
</tr>
<tr>
<td valign="top" align="left">Zanjani (<xref ref-type="bibr" rid="B80">80</xref>)</td>
<td valign="top" align="left">PointCNN</td>
<td valign="top" align="left"><ext-link ext-link-type="uri" xlink:href="https://github.com/yangyanli/PointCNN">https://github.com/yangyanli/PointCNN</ext-link></td>
<td valign="top" align="left">Their own dataset</td>
</tr>
<tr>
<td valign="top" align="left">Jader (<xref ref-type="bibr" rid="B81">81</xref>)</td>
<td valign="top" align="left">mask RCNN</td>
<td valign="top" align="left"><ext-link ext-link-type="uri" xlink:href="https://github.com/matterport/Mask_RCNN">https://github.com/matterport/Mask_RCNN</ext-link></td>
<td valign="top" align="left"><ext-link ext-link-type="uri" xlink:href="https://github.com/IvisionLab/deep-dental-image">https://github.com/IvisionLab/deep-dental-image</ext-link></td>
</tr>
<tr>
<td valign="top" align="left">Silva (<xref ref-type="bibr" rid="B65">65</xref>)</td>
<td valign="top" align="left">Mask RCNN, HTC, ResNeSt, PANet (best)</td>
<td valign="top" align="left"><ext-link ext-link-type="uri" xlink:href="https://github.com/openmmlab/mmdetection">https://github.com/openmmlab/mmdetection</ext-link></td>
<td valign="top" align="left"><ext-link ext-link-type="uri" xlink:href="https://github.com/IvisionLab/deep-dental-image">https://github.com/IvisionLab/deep-dental-image</ext-link></td>
</tr>
<tr>
<td valign="top" align="left">Gurses (<xref ref-type="bibr" rid="B82">82</xref>)</td>
<td valign="top" align="left">Mask RCNN&#x0002B; SURF</td>
<td valign="top" align="left"><ext-link ext-link-type="uri" xlink:href="https://github.com/openmmlab/mmdetection">https://github.com/openmmlab/mmdetection</ext-link></td>
<td valign="top" align="left">DS1: <ext-link ext-link-type="uri" xlink:href="https://github.com/IvisionLab/deep-dental-image">https://github.com/IvisionLab/deep-dental-image</ext-link>, DS2: their own data set</td>
</tr>
<tr>
<td valign="top" align="left">Wu (<xref ref-type="bibr" rid="B83">83</xref>)</td>
<td valign="top" align="left">GH &#x0002B; BADice-DenseASPP-UNet &#x0002B; LO</td>
<td valign="top" align="left">Not available</td>
<td valign="top" align="left">Their own dataset</td>
</tr>
<tr>
<td valign="top" align="left">Cui (<xref ref-type="bibr" rid="B84">84</xref>)</td>
<td valign="top" align="left">ToothNet</td>
<td valign="top" align="left">Not available</td>
<td valign="top" align="left">Their own dataset</td>
</tr>
<tr>
<td valign="top" align="left">Zanjani (<xref ref-type="bibr" rid="B85">85</xref>)</td>
<td valign="top" align="left">Mask-MCNet</td>
<td valign="top" align="left">Not available</td>
<td valign="top" align="left">Their own dataset</td>
</tr>
<tr>
<td valign="top" align="left">Kong (<xref ref-type="bibr" rid="B87">87</xref>)</td>
<td valign="top" align="left">UNet</td>
<td valign="top" align="left"><ext-link ext-link-type="uri" xlink:href="https://github.com/zhixuhao/unet">https://github.com/zhixuhao/unet</ext-link></td>
<td valign="top" align="left">Their own dataset</td>
</tr>
<tr>
<td valign="top" align="left">Li (<xref ref-type="bibr" rid="B88">88</xref>)</td>
<td valign="top" align="left">Deetal-Perio (based-Mask RCNN)</td>
<td valign="top" align="left"><ext-link ext-link-type="uri" xlink:href="https://github.com/matterport/Mask_RCNN">https://github.com/matterport/Mask_RCNN</ext-link></td>
<td valign="top" align="left">Suzhou Dataset and Zhongshan Dataset</td>
</tr>
<tr>
<td valign="top" align="left">Egger (<xref ref-type="bibr" rid="B89">89</xref>)</td>
<td valign="top" align="left">FCN-32s, FCN-16s, FCN-8s (best)</td>
<td valign="top" align="left"><ext-link ext-link-type="uri" xlink:href="https://github.com/shelhamer/fcn.berkeleyvision.org">https://github.com/shelhamer/fcn.berkeleyvision.org</ext-link></td>
<td valign="top" align="left">Their own dataset</td>
</tr>
<tr>
<td valign="top" align="left">Zhang (<xref ref-type="bibr" rid="B90">90</xref>)</td>
<td valign="top" align="left">UNet</td>
<td valign="top" align="left"><ext-link ext-link-type="uri" xlink:href="https://github.com/zhixuhao/unet">https://github.com/zhixuhao/unet</ext-link></td>
<td valign="top" align="left">Their own dataset</td>
</tr>
<tr>
<td valign="top" align="left">Torosdagli (<xref ref-type="bibr" rid="B91">91</xref>)</td>
<td valign="top" align="left">Tiramisu (based on UNet and DenseNET)</td>
<td valign="top" align="left"><ext-link ext-link-type="uri" xlink:href="https://github.com/zhixuhao/unet">https://github.com/zhixuhao/unet</ext-link><break/> <ext-link ext-link-type="uri" xlink:href="https://github.com/liuzhuang13/DenseNet">https://github.com/liuzhuang13/DenseNet</ext-link></td>
<td valign="top" align="left"><ext-link ext-link-type="uri" xlink:href="https://www.aicrowd.com/challenges/miccai-2021-hecktor">https://www.aicrowd.com/challenges/miccai-2021-hecktor</ext-link></td>
</tr>
<tr>
<td valign="top" align="left">Lian (<xref ref-type="bibr" rid="B92">92</xref>)</td>
<td valign="top" align="left">DTNet</td>
<td valign="top" align="left">Not available</td>
<td valign="top" align="left">Their own dataset</td>
</tr>
</tbody>
</table>
</table-wrap>
</app>
</app-group>
</back>
</article>