<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange DTD v2.3 20070202//EN" "archivearticle.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="methods-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Med.</journal-id>
<journal-title>Frontiers in Medicine</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Med.</abbrev-journal-title>
<issn pub-type="epub">2296-858X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fmed.2023.1190659</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Medicine</subject>
<subj-group>
<subject>Methods</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>MAS-UNet: a U-shaped network for prostate segmentation</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Hong</surname> <given-names>YuQi</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/2232373/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Qiu</surname> <given-names>Zhao</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Chen</surname> <given-names>Huajing</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Zhu</surname> <given-names>Bing</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Lei</surname> <given-names>Haodong</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>School of Computer Science and Technology, Hainan University</institution>, <addr-line>Haikou</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>Hainan Provincial Public Security Department</institution>, <addr-line>Haikou</addr-line>, <country>China</country></aff>
<aff id="aff3"><sup>3</sup><institution>Haikou Hospital of the Maternal and Child Health</institution>, <addr-line>Haikou</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Liang Zhao, Dalian University of Technology, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Weiyao Lan, Xiamen University, China; Junyong Ye, Chongqing University, China; Chongwen Wang, Beijing Institute of Technology, China</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Zhao Qiu <email>qiuzhao&#x00040;hainanu.edu.cn</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>18</day>
<month>05</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>10</volume>
<elocation-id>1190659</elocation-id>
<history>
<date date-type="received">
<day>21</day>
<month>03</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>25</day>
<month>04</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2023 Hong, Qiu, Chen, Zhu and Lei.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Hong, Qiu, Chen, Zhu and Lei</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license> </permissions>
<abstract>
<p>Prostate cancer is a common disease that seriously endangers the health of middle-aged and elderly men. MRI images are the gold standard for assessing the health status of the prostate region. Segmentation of the prostate region is of great significance for the diagnosis of prostate cancer. In the past, some methods have been used to segment the prostate region, but segmentation accuracy still has room for improvement. This study has proposed a new image segmentation model based on Attention UNet. The model improves Attention UNet by using GN instead of BN, adding dropout to prevent overfitting, introducing the ASPP module, adding channel attention to the attention gate module, and using different channels to output segmentation results of different prostate regions. Finally, we conducted comparative experiments using five existing UNet-based models, and used the dice coefficient as the metric to evaluate the segmentation result. The proposed model achieves dice scores of 0.807 and 0.907 in the transition region and the peripheral region, respectively. The experimental results show that the proposed model is better than other UNet-based models.</p></abstract>
<kwd-group>
<kwd>UNet</kwd>
<kwd>attention gate</kwd>
<kwd>ASPP</kwd>
<kwd>prostate</kwd>
<kwd>channel attention</kwd>
<kwd>spatial attention</kwd>
</kwd-group>
<contract-sponsor id="cn001">Hainan Provincial Department of Science and Technology<named-content content-type="fundref-id">10.13039/501100008111</named-content></contract-sponsor>
<contract-sponsor id="cn002">Education Department of Hainan Province<named-content content-type="fundref-id">10.13039/501100010834</named-content></contract-sponsor>
<contract-sponsor id="cn003">Natural Science Foundation of Hainan Province<named-content content-type="fundref-id">10.13039/501100004761</named-content></contract-sponsor>
<counts>
<fig-count count="7"/>
<table-count count="6"/>
<equation-count count="13"/>
<ref-count count="27"/>
<page-count count="10"/>
<word-count count="4927"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Precision Medicine</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1. Introduction</title>
<p>According to statistics from the National Cancer Institute, in 2017, there were 161,360 new cases of cancer and 26,730 deaths that were related to cancer in America, indicating that prostate cancer has always been a major threat to men&#x00027;s health. Effective segmentation of the prostate and its different regions is helpful to predict the pathological stage and check the therapeutic effect (<xref ref-type="bibr" rid="B1">1</xref>). Compared with CT, magnetic resonance imaging (MR) does no harm to the human body and it also has great tissue contrast and better resolution (<xref ref-type="bibr" rid="B2">2</xref>). On account of these advantages, it has become the mainstream imaging method for prostate region evaluation (<xref ref-type="bibr" rid="B3">3</xref>).</p>
<p>The segmentation of the prostate region in MR images is ordinarily performed by radiologists based on visual examination of the image slices. Manual segmentation requires superb technology and full concentration, and it is time-consuming and prone to deviations within and between operators, which is not suitable for the segmentation of a large number of samples. Therefore, there is an urgent need for reliable automatic segmentation methods for prostate MRI images. However, segmentation of the prostate region is quite challenging because the size and shape of glands in prostate MRI images often have large variability. In addition, the heterogeneity of the signal intensity around the rectal coil, the low contrast between the gland and the adjacent structure, and the anisotropic spatial resolution are reasons for the difficulty of prostate segmentation (<xref ref-type="bibr" rid="B4">4</xref>, <xref ref-type="bibr" rid="B5">5</xref>).</p>
<p>The automatic segmentation of prostate regions is an earlier research topic. In recent years, with the improvement of hardware performance and the continuous development of deep learning-related technologies, the method based on convolutional neural networks (CNNs) has gradually replaced the traditional method. Because the deep learning method can learn complex features and accurately classify pixels, the segmentation results are obtained (<xref ref-type="bibr" rid="B6">6</xref>), and the segmentation result is generally better than the traditional method. Some studies have proposed several deep learning-based methods for prostate segmentation, such as the classical U-Net (<xref ref-type="bibr" rid="B7">7</xref>) model, which is the basis of many recent literature and research, as well as the MultiResU-Net (<xref ref-type="bibr" rid="B8">8</xref>), density-UNet (<xref ref-type="bibr" rid="B9">9</xref>), and Attention UNet (<xref ref-type="bibr" rid="B10">10</xref>) models. Though these models have achieved decent results in prostate segmentation, there is still possibility for further improvement.</p>
<p>In view of the above problems, we proposed a U-shaped structure network for prostate region segmentation. Our main contributions in this study are as follows:</p>
<list list-type="order">
<list-item><p>Based on Attention U-Net, this study proposes to add channel attention to the network to further clarify the importance between channels, so that the network ignores secondary information and focuses more on important channels to extract features better.</p></list-item>
<list-item><p>This study introduces an ASPP structure at the end of the encoder in the U-shaped structure network.</p></list-item>
<list-item><p>In order to reduce the hardware requirements of model training and make the model achieve better performance than previous models while the batchsize is small, this study uses GN to replace the commonly used BN.</p></list-item>
<list-item><p>In order to prevent overfitting, this study introduces dropout in the last downsampling process of the encoder part of the network. Experiments show that it can effectively improve overfitting and further improve segmentation performance.</p></list-item>
<list-item><p>In this study, through the comparison experiments with Unet, Attention U-Net, UNet&#x0002B;&#x0002B;, R2Attention U-Net, and Res-UNet, it is proved that the model proposed in this study has better performance than the traditional models mentioned above in prostate segmentation.</p></list-item>
</list></sec>
<sec id="s2">
<title>2. Related study</title>
<p>FCN (<xref ref-type="bibr" rid="B11">11</xref>) is a pioneer of image segmentation, which makes full use of convolution to extract features from images. On the basis of FCN, a classical encoder&#x02013;decoder model U-Net (<xref ref-type="bibr" rid="B7">7</xref>) is proposed for medical image segmentation tasks, which achieved decent results on various segmentation tasks.</p>
<p>Most models for medical image segmentation are improved based on U-Net, such as UNet&#x0002B;&#x0002B; (<xref ref-type="bibr" rid="B12">12</xref>), Attention-UNet (<xref ref-type="bibr" rid="B10">10</xref>), Res-UNet (<xref ref-type="bibr" rid="B13">13</xref>), Dense-UNet (<xref ref-type="bibr" rid="B14">14</xref>), SA-Net (<xref ref-type="bibr" rid="B15">15</xref>), Bio-Net (<xref ref-type="bibr" rid="B16">16</xref>), and MRF-UNet (<xref ref-type="bibr" rid="B17">17</xref>). UNet&#x0002B;&#x0002B; replaces the clipping and splicing operations of the U-Net direct connection part with convolution operations, obtaining better feature information and making up for the information loss caused by sampling. Attention-UNet uses attention gates to give more importance to the key region of the feature map and make the network more focused on goals. In order to further reduce the loss of information and improve performance, Res-UNet and Dense-unet use Res-block in ResNet (<xref ref-type="bibr" rid="B18">18</xref>) and density-block in DenseNet (<xref ref-type="bibr" rid="B19">19</xref>) instead of ordinary convolution. SA-Net is a lightweight network, in which a spatial attention module is applied at the end of the encoder. Bio-Net adds backward skip connections to the network so that the feature information from the decoder can be transmitted back to the encoder and aggregated with the feature information in the encoder. MRF-UNet combined UNet with Markov random field, which achieved better performance on out-of-distribution data than the original UNet.</p>
<p>The attention mechanism is usually used for natural image analysis, knowledge graphs, natural language processing, automatic image annotation (<xref ref-type="bibr" rid="B20">20</xref>), machine translation (<xref ref-type="bibr" rid="B21">21</xref>), and classification tasks (<xref ref-type="bibr" rid="B22">22</xref>). The trainable attention mechanism is divided into hard attention and soft attention. The hard attention mechanism is usually non-differentiable and relies on reinforcement learning to update parameters, which makes the training process of the model more difficult. According to Ypsilantis and Montana (<xref ref-type="bibr" rid="B23">23</xref>), recursive hard attention was used to detect abnormalities in chest X-ray scans. On the contrary, the soft attention mechanism can be trained using standard backpropagation. For example, additive soft attention is used for sentence translation (<xref ref-type="bibr" rid="B24">24</xref>) and image classification (<xref ref-type="bibr" rid="B22">22</xref>). According to Hu et al. (<xref ref-type="bibr" rid="B25">25</xref>), channel attention was used to highlight important feature dimensions, which achieved the best performance in the ILSVRC 2017 image classification challenge. In addition, some people have proposed self-attention technology to eliminate the dependence on external sector control information. For example, Wang et al. (<xref ref-type="bibr" rid="B26">26</xref>) used a non-local self-attention mechanism to capture deep dependencies. According to Jetley et al. (<xref ref-type="bibr" rid="B22">22</xref>), self-attention is used to perform class-specific pooling to obtain more accurate and robust image classification performance.</p>
<p>In traditional DCNN, there are a series of problems in upsampling and downsampling. On the one hand, the internal data structure and spatial hierarchical information are lost due to pooling. On the other hand, the data of small objects (under certain conditions) will be lost after downsampling, meaning the information cannot be reconstructed. This problem is particularly significant in semantic segmentation, and dilated convolution is proposed to solve these problems. Dilated convolution can arbitrarily expand the receptive field without introducing additional parameters, and a larger receptive field can improve the effect of small object recognition and segmentation in the task of target detection and semantic segmentation. ASPP (<xref ref-type="bibr" rid="B27">27</xref>) module uses multiple parallel atrous convolutions (dilated convolution) layers with different dilation rates, which do achieve decent results in many segmentation tasks.</p></sec>
<sec id="s3">
<title>3. Preliminaries</title>
<sec>
<title>3.1. UNET</title>
<p>The UNet network consists of an encoder and a decoder. The encoder part follows the classical structure of convolution networks. The convolution block consists of two repeated 3 &#x000D7; 3 convolutions. Each convolution is followed by a ReLU activation function and a 2 &#x000D7; 2 maximum pooling operation with a step of 2 for downsampling. In each downsampling step, the number of feature channels is doubled. Each step in the decoder upsamples the size of feature maps by 2, and the number of feature map channels is reduced to half using 2 &#x000D7; 2 convolution (deconvolution). Feature maps from the encoder are directly passed to the decoder with skip connections. After concatenation, there is a convolution block to reduce the number of channels. At the end of the decoder, there is a 1 &#x000D7; 1 convolution layer which is designed for the output.</p>
<p>UNet obtains its energy function by combining the pixel-level softmax function calculated for the last layer of the feature map with the cross-entropy loss function. The definition of the softmax function is as follows:</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mi>exp</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>/</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:msubsup><mml:mi>&#x003A3;</mml:mi><mml:mrow><mml:msup><mml:mi>k</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>K</mml:mi></mml:msubsup><mml:mi>exp</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:msup><mml:mi>k</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></disp-formula>
<p>where represents the activation function in feature channel k at the pixel position <italic>x</italic> &#x02208; &#x003A9;, &#x003A9; &#x02282; &#x02124;<sup>2</sup>. <italic>K</italic> is the number of classes and <italic>p</italic><sub><italic>k</italic></sub>(<italic>x</italic>) is the approximated maximum-function. <italic>p</italic><sub><italic>k</italic></sub>(<italic>x</italic>) &#x02248; 1 for the <italic>k</italic> that has the maximum activation <italic>a</italic><sub><italic>k</italic></sub>(<italic>x</italic>), and <italic>p</italic><sub><italic>k</italic></sub>(<italic>x</italic>) &#x02248; 0 for all other <italic>k</italic>. The cross entropy then penalizes at each position the deviation of <italic>p</italic><sub><italic>l</italic>(<italic>x</italic>)</sub>(<italic>x</italic>) from 1 using:</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munder><mml:mrow><mml:mo>&#x003A3;</mml:mo></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>&#x003A9;</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mi>w</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo class="qopname">log</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>l</italic> : &#x003A9; &#x02192; {1, ..., <italic>K</italic>} is the true label of each pixel, and <italic>w</italic> : &#x003A9; &#x02192; &#x0211D; is a weight map which can make some pixels more important than the others while training (<xref ref-type="bibr" rid="B7">7</xref>).</p></sec>
<sec>
<title>3.2. Attention gate</title>
<p>Attention gate is a mechanism that can be merged into any existing CNN architecture. Let <inline-formula><mml:math id="M3"><mml:msup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula> be the activation map of the chosen layer <italic>l</italic> &#x02208; {1, ..., <italic>L</italic>}, where each <inline-formula><mml:math id="M4"><mml:msubsup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> represents a pixel-by-pixel feature vector of length <italic>F</italic><sub><italic>l</italic></sub> (i.e., the number of feature-maps in layer l). For every <inline-formula><mml:math id="M5"><mml:msubsup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>, AG will calculate the coefficient <inline-formula><mml:math id="M6"><mml:msup><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>, <inline-formula><mml:math id="M7"><mml:msubsup><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>, in order to identify the key region of the feature map and only reserve the parts that are related to specific tasks. The output of the attention gate is:</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M8"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:msubsup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>in which each vector is scaled by the corresponding attention coefficient (<xref ref-type="bibr" rid="B10">10</xref>).</p></sec></sec>
<sec id="s4">
<title>4. Materials</title>
<p>The prostate dataset used in this study is the Task-05 prostate data set of the MSD competition, including 48 sets of multimodal MRI data, provided by Radboud University (Netherlands). Each set of data includes two modalities: transverse t2-weighted scan (resolution 0.6 &#x000D7; 0.6 &#x000D7; 4 mm) and apparent diffusion coefficient (ADC) map (2 &#x000D7; 2 &#x000D7; 4 mm). A total of 80% of the data have manual segmentation labels, including two prostate regions: transition zone (TZ) and peripheral zone (PZ).</p></sec>
<sec id="s5">
<title>5. Methods</title>
<sec>
<title>5.1. Architecture</title>
<p>In this study, inspired by the UNet framework, a new prostate segmentation network is proposed. The architecture of the whole network is shown in <xref ref-type="fig" rid="F1">Figure 1</xref>. First, the first half of the network (i.e., encoder) is used to extract features from 2d slices of MRI images of 3d tissues. The second half (i.e., decoder) is then used to generate the predicted segmentation results, where each type of label is segmented in different channels.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Structure of the proposed network.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmed-10-1190659-g0001.tif"/>
</fig>
<p>The network proposed in this article consists of two parts, an encoder and a decoder. The encoder consists of five convolution blocks, four Max Pooling blocks, and a spatial dilated convolution pyramid (ASPP) module. In the first four convolution blocks, each convolution block consists of two 2d convolution layers followed by group normalization (GN) and ReLu activation function. The fifth convolution block adds a dropout layer on the basis of the first four to prevent overfitting. An ASPP module is added at the bottom of the encoder to further extract features. Each Max Pooling block performs a maximum pooling to achieve downsampling of the feature map by 2. The decoder consists of four upsampling modules, four attention gate (AG) modules, three convolution blocks, and a 2d convolution layer for output. Each upsampling module uses the nearest neighbor interpolation to upsample the feature map. The final 2d convolution layer is responsible for outputting segmentation results.</p></sec>
<sec>
<title>5.2. Convolution blocks</title>
<p>As for batch normalization (BN), it will get a decline in performance if the batchsize is too small; on the other hand, big batchsize will consume a lot of memory, especially when large-size images are input into the network. In order to get better segmentation results while the batchsize is small, we decided to use group normalization (GN) instead of BN.</p>
<p>The structure of the first four convolution blocks and the convolution blocks used in the decoder part is shown in <xref ref-type="fig" rid="F2">Figure 2</xref>.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Conv block.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmed-10-1190659-g0002.tif"/>
</fig>
<p>Using group normalization solves the internal, covariate, and shift problems, and the result is much better than batch normalization when the batchsize is small (in our experiments the batchsize is set to 4). The definition of group normalization is:</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M9"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>x</mml:mi><mml:mo>-</mml:mo><mml:mi>E</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msqrt><mml:mrow><mml:mi>V</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003B5;</mml:mi></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac><mml:mo>*</mml:mo><mml:mi>&#x003B3;</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003B2;</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The input feature map x is divided into several groups according to the channel, and the mean and standard deviation of each group are calculated, respectively. &#x003B3; and &#x003B2; are learnable parameters.</p>
<p>In the encoder part, the input image will go through four such convolution blocks to extract its features. After each of those four convolution blocks, the max pooling module will use the maximum pooling to downsample the feature map (by 2), the number of channels of the feature map remains unchanged, and the length and width become half of the original, so as to further extract features and reduce the number of parameters.</p>
<p>After four rounds of downsampling, the feature map comes to the fifth convolution block. The structure of the last convolution block in the encoder is shown in <xref ref-type="fig" rid="F3">Figure 3</xref>. Compared with the first four convolution blocks, this convolution block has an additional Dropout layer (where the <italic>p</italic>-value is set to 0.5). Experiment results show that adding Dropout can effectively alleviate overfitting and improve the final segmentation results.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Conv block with dropout.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmed-10-1190659-g0003.tif"/>
</fig></sec>
<sec>
<title>5.3. ASPP module</title>
<p>The final part of the encoder is an ASPP module, which is added to extract further features, and its structure is shown in <xref ref-type="fig" rid="F4">Figure 4</xref>. For the input feature map, ASPP uses dilated convolution with different dilation rates to process it (in this article, the dilation rates are set to 1, 6, 12, and 18), then concatenates the obtained results together, expands the number of channels, and finally reduces the number of channels to the desired value through a 1<sup>&#x0002A;</sup>1 convolution layer.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>ASPP block.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmed-10-1190659-g0004.tif"/>
</fig>
<p>The algorithm of the ASPP module is as follows:</p>
<table-wrap position="float" id="T5">
<label>Algorithm 1</label>
<caption><p>ASPP Block.</p></caption>
<graphic xlink:href="fmed-10-1190659-i0001.tif"/>
</table-wrap></sec>
<sec>
<title>5.4. Upsampling</title>
<p>The feature map obtained by the encoder will be sent to the decoder. The structure of the upsampling module is shown in <xref ref-type="fig" rid="F5">Figure 5</xref>. We use the nearest neighbor interpolation for upsampling, which is defined as:</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M14"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:mfrac><mml:mo>*</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mfrac><mml:mrow><mml:mi>H</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:mfrac><mml:mo>*</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The size of the input feature map is W <sup>&#x0002A;</sup> H, and the size of the upsampled feature map is w <sup>&#x0002A;</sup> h. The pixel value of pixel (x, y) on the upsampled feature map equals the pixel value of pixel (W/w <sup>&#x0002A;</sup> x, H/h <sup>&#x0002A;</sup> y) on the original feature map.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Up conv block.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmed-10-1190659-g0005.tif"/>
</fig></sec>
<sec>
<title>5.5. Attention gate</title>
<p>In order to improve the ability to capture key regions and channels, we added a channel attention mechanism to our attention gate. We calculate the attention coefficient using:</p>
<p>Spatial attention:</p>
<disp-formula id="E6"><label>(6)</label><mml:math id="M15"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:msup><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:msubsup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x0002B;</mml:mo><mml:msup><mml:mrow><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>g</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:msub><mml:mrow><mml:mi>g</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E7"><label>(7)</label><mml:math id="M16"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>g</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x00398;</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where &#x003C3;<sub>1</sub>(<italic>x</italic><sub><italic>i</italic></sub>) &#x0003D; max(0, <italic>x</italic><sub><italic>i</italic></sub>) represents ReLu function, <inline-formula><mml:math id="M17"><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>c</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x0002B;</mml:mo><mml:mo class="qopname">exp</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>c</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:math></inline-formula> denotes sigmoid function, and AG is represented by a set of parameters &#x00398;<sub><italic>atts</italic></sub>, including the linear transformations <inline-formula><mml:math id="M18"><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>&#x000D7;</mml:mo><mml:msub><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msup></mml:math></inline-formula>, <inline-formula><mml:math id="M19"><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mo>&#x000D7;</mml:mo><mml:msub><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msup></mml:math></inline-formula>, and <inline-formula><mml:math id="M20"><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msup></mml:math></inline-formula>. The linear transformations are realized by 1 &#x000D7; 1 convolution.</p>
<p>Channel attention:</p>
<disp-formula id="E8"><label>(8)</label><mml:math id="M21"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:msup><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>A</mml:mi><mml:mi>v</mml:mi><mml:mi>g</mml:mi><mml:mi>P</mml:mi><mml:mi>o</mml:mi><mml:mi>o</mml:mi><mml:mi>l</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mi>A</mml:mi><mml:mi>v</mml:mi><mml:mi>g</mml:mi><mml:mi>P</mml:mi><mml:mi>o</mml:mi><mml:mi>o</mml:mi><mml:mi>l</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>g</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E9"><label>(9)</label><mml:math id="M22"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>q</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>g</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x00398;</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Different from spatial attention, in order to obtain the weight of each channel of the input feature map, the channel attention mechanism uses adaptive average pooling (the AvgPool part). In fact, adaptive average pooling works better than adaptive max pooling or use both of them in channel attention. The remaining linear transformations include two 1 &#x000D7; 1 convolutions: <inline-formula><mml:math id="M23"><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mo>&#x000D7;</mml:mo><mml:msub><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>g</mml:mi><mml:mo>/</mml:mo><mml:mn>16</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula><mml:math id="M24"><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>g</mml:mi><mml:mo>/</mml:mo><mml:mn>16</mml:mn></mml:mrow></mml:msub><mml:mo>&#x000D7;</mml:mo><mml:msub><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>g</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msup></mml:math></inline-formula>.</p>
<p>The output of the attention gate is:</p>
<disp-formula id="E10"><label>(10)</label><mml:math id="M25"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:msubsup><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:msubsup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The overall structure of our attention gate is shown in <xref ref-type="fig" rid="F6">Figure 6</xref>:</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Attention gate.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmed-10-1190659-g0006.tif"/>
</fig>
<p>The algorithm of the attention gate is as follows:</p>
<table-wrap position="float" id="T6">
<label>Algorithm 2</label>
<caption><p>Attention gate.</p></caption>
<graphic xlink:href="fmed-10-1190659-i0002.tif"/>
</table-wrap>
</sec></sec>
<sec id="s6">
<title>6. Experiment</title>
<sec>
<title>6.1. Metrics</title>
<p>For multi-class segmentation tasks, we used dice coefficients for each class as the main metric to measure the segmentation effect. Dice is the most frequently used metric in medical image competition. It is a set similarity metric, which is usually used to calculate the similarity of two samples, and the threshold is [0, 1]. It is often used for image segmentation in medical images. The best result of segmentation is 1, and the worst result is 0. The dice coefficient is calculated as follows:</p>
<disp-formula id="E11"><label>(11)</label><mml:math id="M31"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>D</mml:mi><mml:mi>i</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mo>*</mml:mo></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>d</mml:mi><mml:mo>&#x02229;</mml:mo><mml:mi>t</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>e</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>d</mml:mi><mml:mo>&#x0222A;</mml:mo><mml:mi>t</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>pred</italic> is the set of predicted values, <italic>true</italic> is the set of true values, the molecule is the intersection between <italic>pred</italic> and <italic>true</italic>, and the denominator is the union of <italic>pred</italic> and <italic>true</italic>.</p>
<p>In addition to the dice coefficient, we also applied PPV and sensitivity metrics to our experiment to further measure the segmentation result.</p>
<p>The definitions of PPV and sensitivity are as follows:</p>
<disp-formula id="E12"><label>(12)</label><mml:math id="M32"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>P</mml:mi><mml:mi>P</mml:mi><mml:mi>V</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>p</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>d</mml:mi><mml:mo>&#x02229;</mml:mo><mml:mi>t</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E13"><label>(13)</label><mml:math id="M33"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>S</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>v</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>p</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>d</mml:mi><mml:mo>&#x02229;</mml:mo><mml:mi>t</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></sec>
<sec>
<title>6.2. Experiment result</title>
<p>The experiment used the Task05 prostate data set in the medical decathlon competition. Due to the small number of experimental samples, we used offline data augmentation to enrich the training set. Specifically, we first performed data augmentation on prostate MRI images and corresponding labels, including horizontal/vertical flipping, rotation, adding Gaussian noise, and adjusting brightness and contrast, so that the amount of data in the training set is expanded to three times of the original set, which effectively alleviates the problem of small dataset and insufficient training data. Then, some operations are used for data preprocessing, including uniform image size, normalization, and slicing.</p>
<p>Among them, 80% of the original dataset and the pseudo image obtained by data augmentation are used as the training set, and 20 % of the original dataset is used as the validation set. In order to compare the performance of different networks, we trained the network proposed in this study and five other UNet based networks on the same dataset and under same hyperparameters as a comparison. All experiments are based on python language and pytorch framework, carried out on a server equipped with RTX308 and Windows11 system. The parameters of our experiment are shown in <xref ref-type="table" rid="T1">Table 1</xref>.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Parameters of the experiment.</p></caption>
<table frame="box" rules="all">
<tbody>
<tr>
<td valign="top" align="left">Loss</td>
<td valign="top" align="center">BCE dice loss</td>
</tr>
<tr>
<td valign="top" align="left">Epochs</td>
<td valign="top" align="center">1,000</td>
</tr> <tr>
<td valign="top" align="left">Early stop</td>
<td valign="top" align="center">20</td>
</tr> <tr>
<td valign="top" align="left">Batch size</td>
<td valign="top" align="center">4</td>
</tr> <tr>
<td valign="top" align="left">Optimizer</td>
<td valign="top" align="center">Adam</td>
</tr> <tr>
<td valign="top" align="left">Learning rate</td>
<td valign="top" align="center">0.0003</td>
</tr> <tr>
<td valign="top" align="left">Momentum</td>
<td valign="top" align="center">0.9</td>
</tr> <tr>
<td valign="top" align="left">Weight decay</td>
<td valign="top" align="center">0.0001</td>
</tr></tbody>
</table>
</table-wrap>
<p>The experiment results of all networks are shown in <xref ref-type="table" rid="T2">Tables 2</xref>&#x02013;<xref ref-type="table" rid="T4">4</xref>.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Dice coefficient on the MSD prostate dataset.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="left" rowspan="2"><bold>Network</bold></th>
<th valign="top" align="center" colspan="2"><bold>Dice</bold></th>
</tr>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="center"><bold>PZ</bold></th>
<th valign="top" align="center"><bold>TZ</bold></th>
</tr>
</thead>
<tbody>
 <tr>
<td valign="top" align="left">Unet</td>
<td valign="top" align="center">0.7061</td>
<td valign="top" align="center">0.863</td>
</tr> <tr>
<td valign="top" align="left">Attention Unet</td>
<td valign="top" align="center">0.7785</td>
<td valign="top" align="center">0.8813</td>
</tr> <tr>
<td valign="top" align="left">Res Unet</td>
<td valign="top" align="center">0.6623</td>
<td valign="top" align="center">0.8481</td>
</tr> <tr>
<td valign="top" align="left">UNet&#x0002B;&#x0002B;</td>
<td valign="top" align="center">0.6898</td>
<td valign="top" align="center">0.8837</td>
</tr> <tr>
<td valign="top" align="left">R2AttentionUNet</td>
<td valign="top" align="center">0.4805</td>
<td valign="top" align="center">0.752</td>
</tr> <tr>
<td valign="top" align="left">Proposed</td>
<td valign="top" align="center"><bold>0.8070</bold></td>
<td valign="top" align="center"><bold>0.9070</bold></td>
</tr></tbody>
</table>
<table-wrap-foot>
<p>The bold values indicate the best results across different networks.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>PPV on the MSD prostate dataset.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:&#x00023;919498;color:&#x00023;ffffff">
<th valign="top" align="left" rowspan="2"><bold>Network</bold></th>
<th valign="top" align="center" colspan="2"><bold>PPV</bold></th>
</tr>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="center"><bold>PZ</bold></th>
<th valign="top" align="center"><bold>TZ</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Unet</td>
<td valign="top" align="center">0.7884</td>
<td valign="top" align="center"><bold>0.9071</bold></td>
</tr> <tr>
<td valign="top" align="left">Attention Unet</td>
<td valign="top" align="center">0.8648</td>
<td valign="top" align="center">0.8976</td>
</tr> <tr>
<td valign="top" align="left">Res Unet</td>
<td valign="top" align="center">0.7971</td>
<td valign="top" align="center">0.8530</td>
</tr> <tr>
<td valign="top" align="left">Unet&#x0002B;&#x0002B;</td>
<td valign="top" align="center">0.7607</td>
<td valign="top" align="center">0.9043</td>
</tr> <tr>
<td valign="top" align="left">R2AttentionUnet</td>
<td valign="top" align="center">0.6216</td>
<td valign="top" align="center">0.8282</td>
</tr> <tr>
<td valign="top" align="left">Proposed</td>
<td valign="top" align="center"><bold>0.8784</bold></td>
<td valign="top" align="center">0.9058</td>
</tr></tbody>
</table>
<table-wrap-foot>
<p>The bold values indicate the best results across different networks.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>Sensitivity on the MSD prostate dataset.</p></caption> 
<table frame="box" rules="all">
<thead>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="left" rowspan="2"><bold>Network</bold></th>
<th valign="top" align="center" colspan="2"><bold>Sensitivity</bold></th>
</tr>
<tr style="background-color:#919497;color:#ffffff">
<th valign="top" align="center"><bold>PZ</bold></th>
<th valign="top" align="center"><bold>TZ</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Unet</td>
<td valign="top" align="center">0.7887</td>
<td valign="top" align="center">0.8777</td>
</tr> <tr>
<td valign="top" align="left">Attention Unet</td>
<td valign="top" align="center">0.7923</td>
<td valign="top" align="center">0.9074</td>
</tr> <tr>
<td valign="top" align="left">Res Unet</td>
<td valign="top" align="center">0.7287</td>
<td valign="top" align="center">0.8992</td>
</tr> <tr>
<td valign="top" align="left">Unet&#x0002B;&#x0002B;</td>
<td valign="top" align="center">0.7831</td>
<td valign="top" align="center">0.9018</td>
</tr> <tr>
<td valign="top" align="left">R2AttentionUnet</td>
<td valign="top" align="center">0.6191</td>
<td valign="top" align="center">0.7619</td>
</tr> <tr>
<td valign="top" align="left">Proposed</td>
<td valign="top" align="center"><bold>0.8254</bold></td>
<td valign="top" align="center"><bold>0.9219</bold></td>
</tr></tbody>
</table>
<table-wrap-foot>
<p>The bold values indicate the best results across different networks.</p>
</table-wrap-foot>
</table-wrap>
<p>From the tables mentioned above, it can be seen that the network we proposed in this study has achieved 0.807 and 0.907 dice scores in peripheral zone and transition zone of the prostate, respectively. It also achieved the best PPV and sensitivity scores. Compared with the other five UNet-based networks, the proposed method is better in the prostate segmentation task.</p>
<p>The segmentation map is shown in <xref ref-type="fig" rid="F7">Figure 7</xref>, in which the green area represents the peripheral zone (PZ) and red area represents the transition zone (TZ).</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>Segmentation maps of different networks. Left to right: <bold>(A)</bold> MRI image. <bold>(B)</bold> Ground truth. <bold>(C)</bold> MAS-UNet. <bold>(D)</bold> UNet. <bold>(E)</bold> Res UNet. <bold>(F)</bold> Attention UNet. <bold>(G)</bold> UNet&#x0002B;&#x0002B;. <bold>(H)</bold> R2-Attention UNet.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fmed-10-1190659-g0007.tif"/>
</fig></sec></sec>
<sec id="s7">
<title>7. Discussion</title>
<p>Before determining the final network structure, the author has conducted a large number of comparative experiments to verify the effect of various modules on the data set used in this study. The results show that adding cyclic convolution and residual connections to the network does not make sense. When determining the pooling method for channel attention, comparative experiments have also been carried out. The results show that adaptive average pooling is better than adaptive maximum pooling or using both of them.</p>
<p>This study proposes a new prostate segmentation network based on the Unet framework. The network uses GN, ASPP, and channel attention to improve attention Unet, and uses different channels to output different label segmentation results.</p>
<p>We used Unet, attention Unet, Res Unet, Unet&#x0002B;&#x0002B;, and R2AttUnet as five U-shaped networks for comparative experiments, and used the dice coefficient as an indicator to compare the effect of the model. The results show that the proposed model achieves 0.807 and 0.907 scores in the peripheral region and the transition region, respectively, and its segmentation effect is better than other classical U-shaped networks. MAS-UNet provides a new method for automatic prostate segmentation with higher accuracy than others, which would help to relieve the burden on radiologists.</p>
<p>While improving the segmentation effect, the network proposed in this study still has some defects: compared with the original Unet and Attention-Unet, the network proposed in this study increases the amount of calculation due to the introduction of some new modules, which makes the number of parameters of the model increase, and also creates higher requirements for hardware performance. Therefore, how to make the model as lightweight as possible under the premise of ensuring the existing segmentation accuracy will be one of the possible improvement directions in the future.</p></sec>
<sec sec-type="data-availability" id="s8">
<title>Data availability statement</title>
<p>Publicly available datasets were analyzed in this study. This data can be found here: Medical Segmentation Decathlon (<ext-link ext-link-type="uri" xlink:href="https://medicaldecathlon.com">medicaldecathlon.com</ext-link>).</p></sec>
<sec sec-type="author-contributions" id="s9">
<title>Author contributions</title>
<p>YH performed the experiments and wrote the manuscript. ZQ offered guidance and corrected the writing of the manuscript. HC performed approval of the final version. BZ assistant in medical area and performed literature research. HL assistant in the experiment. All authors contributed to the article and approved the submitted version.</p></sec>
</body>
<back>
<sec sec-type="funding-information" id="s10">
<title>Funding</title>
<p>This study was supported by the Education Department of Hainan Province, Project No. Hnjg2021ZD-10, the Hainan Province Science and Technology Special Fund (No. ZDYF2020018), and the Hainan Provincial Natural Science Foundation of China (No. 2019RC100).</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s11">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1.</label>
<citation citation-type="web"><person-group person-group-type="author"><collab><italic>MICCAI Grand Challenge: Prostate MR Image Segmentation 2012</italic></collab></person-group>. (<year>2012</year>). Available online at: <ext-link ext-link-type="uri" xlink:href="https://promise12.grand-challenge.org/Home/">https://promise12.grand-challenge.org/Home/</ext-link> (accessed November 25, 2022).</citation>
</ref>
<ref id="B2">
<label>2.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>PI</given-names></name> <name><surname>Chong</surname> <given-names>ST</given-names></name> <name><surname>Kielar</surname> <given-names>AZ</given-names></name> <name><surname>Kelly</surname> <given-names>AM</given-names></name> <name><surname>Knoepp</surname> <given-names>UD</given-names></name> <name><surname>Mazza</surname> <given-names>MB</given-names></name> <etal/></person-group>. <article-title>Imaging of pregnant and lactating patients: part 1, evidence-based review and recommendations</article-title>. <source>Am J Roentgenol.</source> (<year>2012</year>) <volume>198</volume>:<fpage>778</fpage>&#x02013;<lpage>4</lpage>. <pub-id pub-id-type="doi">10.2214/AJR.11.7405</pub-id><pub-id pub-id-type="pmid">22451541</pub-id></citation></ref>
<ref id="B3">
<label>3.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Leake</surname> <given-names>JL</given-names></name> <name><surname>Hardman</surname> <given-names>R</given-names></name> <name><surname>Ojili</surname> <given-names>V</given-names></name> <name><surname>Thompson</surname> <given-names>I</given-names></name> <name><surname>Shanbhogue</surname> <given-names>A</given-names></name> <name><surname>Hernandez</surname> <given-names>J</given-names></name> <etal/></person-group>. <article-title>Prostate MRI: access to and current practice of prostate MRI in the United States</article-title>. <source>J Am Coll Radiol.</source> (<year>2014</year>) <volume>11</volume>:<fpage>156</fpage>&#x02013;<lpage>60</lpage>. <pub-id pub-id-type="doi">10.1016/j.jacr.2013.05.006</pub-id><pub-id pub-id-type="pmid">24389134</pub-id></citation></ref>
<ref id="B4">
<label>4.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jia</surname> <given-names>HZ</given-names></name> <name><surname>Xia</surname> <given-names>Y</given-names></name> <name><surname>Song</surname> <given-names>Y</given-names></name> <name><surname>Cai</surname> <given-names>WD</given-names></name> <name><surname>Fulham</surname> <given-names>M</given-names></name> <name><surname>Feng</surname> <given-names>DD</given-names></name></person-group>. <article-title>Atlas registration and ensemble deep convolutional neural network-based prostate segmentation using magnetic resonance imaging</article-title>. <source>Neurocomputing.</source> (<year>2017</year>) <volume>275</volume>:<fpage>1358</fpage>&#x02013;<lpage>69</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2017.09.084</pub-id></citation>
</ref>
<ref id="B5">
<label>5.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sharma</surname> <given-names>N</given-names></name> <name><surname>Aggarwal</surname> <given-names>LM</given-names></name></person-group>. <article-title>Automated medical image segmentation techniques</article-title>. <source>J Med Phys</source>. (<year>2010</year>) <volume>35</volume>:<fpage>3</fpage>&#x02013;<lpage>14</lpage>. <pub-id pub-id-type="doi">10.4103/0971-6203.58777</pub-id><pub-id pub-id-type="pmid">20177565</pub-id></citation></ref>
<ref id="B6">
<label>6.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Elguindi</surname> <given-names>S</given-names></name> <name><surname>Zelefsky</surname> <given-names>MJ</given-names></name> <name><surname>Jiang</surname> <given-names>J</given-names></name> <name><surname>Veeraraghavan</surname> <given-names>H</given-names></name> <name><surname>Deasy</surname> <given-names>JO</given-names></name> <name><surname>Hunt</surname> <given-names>MA</given-names></name> <etal/></person-group>. <article-title>Deep learning-based auto-segmentation of targets and organs-at-risk for magnetic resonance imaging only planning of prostate radiotherapy</article-title>. <source>Phys Imag Radiat Oncol.</source> (<year>2019</year>) <volume>12</volume>:<fpage>80</fpage>&#x02013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1016/j.phro.2019.11.006</pub-id><pub-id pub-id-type="pmid">32355894</pub-id></citation></ref>
<ref id="B7">
<label>7.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ronneberger</surname> <given-names>O</given-names></name> <name><surname>Fischer</surname> <given-names>P</given-names></name> <name><surname>Brox</surname> <given-names>T</given-names></name></person-group>. <article-title>U-Net: convolutional networks for biomedical image segmentation</article-title>. <source>ArXiv.</source> (<year>2015</year>). <pub-id pub-id-type="doi">10.1007/978-3-319-24574-4_28</pub-id></citation>
</ref>
<ref id="B8">
<label>8.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ibtehaz</surname> <given-names>N</given-names></name> <name><surname>Rahman</surname> <given-names>MS</given-names></name></person-group>. <article-title>MultiResUNet: rethinking the U-Net architecture for multimodal biomedical image segmentation</article-title>. <source>Neural Networks</source>. (<year>2019</year>) <volume>121</volume>:<fpage>74</fpage>&#x02013;<lpage>87</lpage>. <pub-id pub-id-type="doi">10.1016/j.neunet.2019.08.025</pub-id><pub-id pub-id-type="pmid">31536901</pub-id></citation></ref>
<ref id="B9">
<label>9.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>Y</given-names></name> <name><surname>Wu</surname> <given-names>J</given-names></name> <name><surname>Jin</surname> <given-names>S</given-names></name> <name><surname>Cao</surname> <given-names>L</given-names></name> <name><surname>Jin</surname> <given-names>G</given-names></name></person-group>. <article-title>Dense-U-net: dense encoderJin S, Cao L, Jin G. Dense-U-netNeural Netwo3D particle fields</article-title>. <source>Opt Commun.</source> (<year>2021</year>) <volume>493</volume>:<fpage>126970</fpage>. <pub-id pub-id-type="doi">10.1016/j.optcom.2021.126970</pub-id></citation>
</ref>
<ref id="B10">
<label>10.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Oktay</surname> <given-names>O</given-names></name> <name><surname>Schlemper</surname> <given-names>J</given-names></name> <name><surname>Folgoc</surname> <given-names>LL</given-names></name> <name><surname>Lee</surname> <given-names>MJ</given-names></name> <name><surname>Heinrich</surname> <given-names>MP</given-names></name> <name><surname>Misawa</surname> <given-names>K</given-names></name> <etal/></person-group>. <article-title>Attention U-Net: learning where to look for the pancreas</article-title>. <source>ArXiv.</source> (<year>2018</year>). <pub-id pub-id-type="doi">10.48550/arXiv.1804.03999</pub-id><pub-id pub-id-type="pmid">35474556</pub-id></citation></ref>
<ref id="B11">
<label>11.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shelhamer</surname> <given-names>E</given-names></name> <name><surname>Long</surname> <given-names>J</given-names></name> <name><surname>Darrell</surname> <given-names>T</given-names></name></person-group>. <article-title>Fully convolutional networks for semantic segmentation</article-title>. <source>IEEE Trans Pattern Anal Mach Intellig</source>. (<year>2017</year>) <volume>39</volume>:<fpage>640</fpage>&#x02013;<lpage>51</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2016.2572683</pub-id><pub-id pub-id-type="pmid">27244717</pub-id></citation></ref>
<ref id="B12">
<label>12.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>ZW</given-names></name> <name><surname>Siddiquee</surname> <given-names>MMR</given-names></name> <name><surname>Tajbakhsh</surname> <given-names>N</given-names></name> <name><surname>Liang</surname> <given-names>JM</given-names></name></person-group>. <article-title>UNet&#x0002B;&#x0002B;: a nested U-net architecture for medical image segmentation</article-title>. In: <source>Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, held in conjunction with MICCAI 2018</source>. Granada: 11045 (<year>2018</year>). p. <fpage>3</fpage>&#x02013;<lpage>11</lpage>.<pub-id pub-id-type="pmid">32613207</pub-id></citation></ref>
<ref id="B13">
<label>13.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Khanna</surname> <given-names>A</given-names></name> <name><surname>Londhe N</surname> <given-names>D</given-names></name> <name><surname>Gupta</surname> <given-names>S</given-names></name> <name><surname>Semwal</surname> <given-names>A</given-names></name></person-group>. <article-title>A deep Residual U-Net convolutional neural network for automated lung segmentation in computed tomography images</article-title>. <source>Biocybernet Biomed Eng.</source> (<year>2020</year>) <volume>40</volume>:<fpage>1314</fpage>&#x02013;<lpage>27</lpage>. <pub-id pub-id-type="doi">10.1016/j.bbe.2020.07.007</pub-id></citation>
</ref>
<ref id="B14">
<label>14.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cai</surname> <given-names>S</given-names></name> <name><surname>Tian</surname> <given-names>Y</given-names></name> <name><surname>Lui</surname> <given-names>H</given-names></name> <name><surname>Zeng</surname> <given-names>H</given-names></name> <name><surname>Wu</surname> <given-names>Y</given-names></name> <name><surname>Chen</surname> <given-names>G</given-names></name></person-group>. <article-title>Dense-UNet: a novel multiphoton <italic>in vivo</italic> cellular image segmentation model based on a convolutional neural network</article-title>. <source>Quantit. Imag. Med Surg.</source> (<year>2020</year>) <volume>10</volume>:<fpage>1275</fpage>&#x02013;<lpage>85</lpage>. <pub-id pub-id-type="doi">10.21037/qims-19-1090</pub-id><pub-id pub-id-type="pmid">32550136</pub-id></citation></ref>
<ref id="B15">
<label>15.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Guo</surname> <given-names>C</given-names></name> <name><surname>Szemenyei</surname> <given-names>M</given-names></name> <name><surname>Yi</surname> <given-names>Y</given-names></name> <name><surname>Wang</surname> <given-names>W</given-names></name> <name><surname>Chen</surname> <given-names>B</given-names></name> <name><surname>Fan</surname> <given-names>C</given-names></name></person-group>. <article-title>SA-UNet: spatial attention U-net for retinal vessel segmentation</article-title>. In: <source>2020 25th International Conference on Pattern Recognition (ICPR)</source>. <publisher-loc>Milan</publisher-loc>: <publisher-name>IEEE</publisher-name> (<year>2020</year>). p. <fpage>1236</fpage>&#x02013;<lpage>42</lpage>.</citation>
</ref>
<ref id="B16">
<label>16.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xiang</surname> <given-names>T</given-names></name> <name><surname>Zhang</surname> <given-names>C</given-names></name> <name><surname>Liu</surname> <given-names>D</given-names></name> <name><surname>Song</surname> <given-names>Y</given-names></name> <name><surname>Huang</surname> <given-names>H</given-names></name> <name><surname>Cai</surname> <given-names>W</given-names></name></person-group>. <article-title>BiO-Net: learning recurrent bi-directional connections for encoder-decoder architecture</article-title>. <source>ArXiv.</source> (<year>2020</year>). <pub-id pub-id-type="doi">10.1007/978-3-030-59710-8_8</pub-id></citation>
</ref>
<ref id="B17">
<label>17.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Z</given-names></name> <name><surname>Blaschko</surname> <given-names>MB</given-names></name></person-group>. <article-title>MRF-UNets: searching UNet with Markov random fields</article-title>. <source>ArXiv.</source> (<year>2022</year>). <pub-id pub-id-type="doi">10.1007/978-3-031-26409-2_36</pub-id></citation>
</ref>
<ref id="B18">
<label>18.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>He</surname> <given-names>K</given-names></name> <name><surname>Zhang</surname> <given-names>X</given-names></name> <name><surname>Ren</surname> <given-names>S</given-names></name> <name><surname>Sun</surname> <given-names>J</given-names></name></person-group>. <article-title>Deep residual learning for image recognition</article-title>. In: <source>2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>. <publisher-loc>Las Vegas, NV</publisher-loc>: <publisher-name>IEEE</publisher-name> (<year>2016</year>). p. <fpage>770</fpage>&#x02013;<lpage>8</lpage>.<pub-id pub-id-type="pmid">32166560</pub-id></citation></ref>
<ref id="B19">
<label>19.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>G</given-names></name> <name><surname>Liu</surname> <given-names>Z</given-names></name> <name><surname>Van Der Maaten</surname> <given-names>L</given-names></name> <name><surname>Weinberger</surname> <given-names>KQ</given-names></name></person-group>. <article-title>Densely connected convolutional networks</article-title>. In: <source>2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>. <publisher-loc>Honolulu, HI</publisher-loc>: <publisher-name>IEEE</publisher-name> (<year>2017</year>). p. <fpage>2261</fpage>&#x02013;<lpage>69</lpage>.</citation>
</ref>
<ref id="B20">
<label>20.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Anderson</surname></name> <name><surname>P</surname></name> <name><surname>He</surname> <given-names>X</given-names></name> <name><surname>Buehler</surname> <given-names>C</given-names></name> <name><surname>Teney</surname> <given-names>D</given-names></name> <name><surname>Johnson</surname> <given-names>M</given-names></name> <etal/></person-group>. <article-title>Bottom-up and top-down attention for image captioning and VQA</article-title>. <source>ArXiv.</source> (<year>2017</year>). <pub-id pub-id-type="doi">10.1109/CVPR.2018.00636</pub-id></citation>
</ref>
<ref id="B21">
<label>21.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vaswani</surname> <given-names>A</given-names></name> <name><surname>Shazeer</surname> <given-names>NM</given-names></name> <name><surname>Parmar</surname> <given-names>N</given-names></name> <name><surname>Uszkoreit</surname> <given-names>J</given-names></name> <name><surname>Jones</surname> <given-names>L</given-names></name> <name><surname>Gomez</surname> <given-names>AN</given-names></name> <etal/></person-group>. <article-title>Attention is all you need</article-title>. <source>ArXiv.</source> (<year>2017</year>). <pub-id pub-id-type="doi">10.48550/arXiv.1706.03762</pub-id></citation>
</ref>
<ref id="B22">
<label>22.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jetley</surname> <given-names>S</given-names></name> <name><surname>Lord</surname> <given-names>NA</given-names></name> <name><surname>Lee</surname> <given-names>N</given-names></name> <name><surname>Torr</surname> <given-names>PH</given-names></name></person-group>. <article-title>Learn to pay attention</article-title>. <source>ArXiv.</source> (<year>2018</year>). <pub-id pub-id-type="doi">10.48550/arXiv.1804.02391</pub-id></citation>
</ref>
<ref id="B23">
<label>23.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ypsilantis</surname> <given-names>P</given-names></name> <name><surname>Montana</surname> <given-names>G</given-names></name></person-group>. <article-title>Learning what to look in chest X-rays with a recurrent visual attention model</article-title>. <source>ArXiv.</source> (<year>2017</year>). <pub-id pub-id-type="doi">10.48550/arXiv.1701.06452</pub-id></citation>
</ref>
<ref id="B24">
<label>24.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Shen</surname> <given-names>T</given-names></name> <name><surname>Zhou</surname> <given-names>T</given-names></name> <name><surname>Long</surname> <given-names>G</given-names></name> <name><surname>Jiang</surname> <given-names>J</given-names></name> <name><surname>Pan</surname> <given-names>S</given-names></name> <name><surname>Zhang</surname> <given-names>C</given-names></name></person-group>. <article-title>DiSAN: directional self-attention network for RNN/CNN-free language understanding</article-title>. In: <source>AAAI Conference on Artificial Intelligence</source>. <publisher-loc>Palo Alto, CA</publisher-loc>: <publisher-name>AAAI</publisher-name> (<year>2017</year>). p. <fpage>2374</fpage>&#x02013;<lpage>3468</lpage>.</citation>
</ref>
<ref id="B25">
<label>25.</label>
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hu</surname> <given-names>J</given-names></name> <name><surname>Li</surname> <given-names>S</given-names></name> <name><surname>Sun</surname> <given-names>G</given-names></name></person-group>. <article-title>Squeeze-and-excitation networks</article-title>. In: <source>2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>. <publisher-loc>Salt Lake City, UT</publisher-loc>: <publisher-name>IEEE</publisher-name> (<year>2017</year>). p. <fpage>7132</fpage>&#x02013;<lpage>41</lpage>.</citation>
</ref>
<ref id="B26">
<label>26.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>X</given-names></name> <name><surname>Girshick</surname> <given-names>RB</given-names></name> <name><surname>Gupta</surname> <given-names>AK</given-names></name> <name><surname>He</surname> <given-names>KM</given-names></name></person-group>. <article-title>Non-local neural networks</article-title>. <source>ArXiv.</source> (<year>2017</year>). <pub-id pub-id-type="doi">10.1109/CVPR.2018.00813</pub-id></citation>
</ref>
<ref id="B27">
<label>27.</label>
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>LC</given-names></name> <name><surname>Papandreou</surname> <given-names>G</given-names></name> <name><surname>Kokkinos</surname> <given-names>I</given-names></name> <name><surname>Murphy</surname> <given-names>K</given-names></name> <name><surname>Yuille</surname> <given-names>AL</given-names></name></person-group>. <article-title>DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs</article-title>. <source>ArXiv.</source> (<year>2016</year>). <pub-id pub-id-type="doi">10.48550/arXiv.1606.00915</pub-id><pub-id pub-id-type="pmid">28463186</pub-id></citation></ref>
</ref-list> 
</back>
</article> 

