<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Neurosci.</journal-id>
<journal-title>Frontiers in Neuroscience</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Neurosci.</abbrev-journal-title>
<issn pub-type="epub">1662-453X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fnins.2021.782968</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Deep Convolutional Neural Network With a Multi-Scale Attention Feature Fusion Module for Segmentation of Multimodal Brain Tumor</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>He</surname> <given-names>Xueqin</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1492070/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Xu</surname> <given-names>Wenjie</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1514097/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Yang</surname> <given-names>Jane</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Mao</surname> <given-names>Jianyao</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Chen</surname> <given-names>Sifang</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="corresp" rid="c002"><sup>&#x002A;</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Wang</surname> <given-names>Zhanxiang</given-names></name>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
<xref ref-type="aff" rid="aff5"><sup>5</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1337286/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>School of Informatics, Xiamen University</institution>, <addr-line>Xiamen</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Cognitive Science, University of California, San Diego</institution>, <addr-line>San Diego, CA</addr-line>, <country>United States</country></aff>
<aff id="aff3"><sup>3</sup><institution>Department of Neurosurgery, The First Affiliated Hospital of Xiamen University</institution>, <addr-line>Xiamen</addr-line>, <country>China</country></aff>
<aff id="aff4"><sup>4</sup><institution>Xiamen Key Laboratory of Brain Center, Department of Neurosurgery, The First Affiliated Hospital of Xiamen University</institution>, <addr-line>Xiamen</addr-line>, <country>China</country></aff>
<aff id="aff5"><sup>5</sup><institution>Department of Neuroscience, School of Medicine, Institute of Neurosurgery, Xiamen University</institution>, <addr-line>Xiamen</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Yizhang Jiang, Jiangnan University, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Yugen Yi, Jiangxi Normal University, China; Jian Su, Nanjing University of Information Science and Technology, China</p></fn>
<corresp id="c001">&#x002A;Correspondence: Jane Yang, <email>j7yang@ucsd.edu</email></corresp>
<corresp id="c002">Sifang Chen, <email>csfsong143@aliyun.com</email></corresp>
<fn fn-type="other" id="fn004"><p>This article was submitted to Brain Imaging Methods, a section of the journal Frontiers in Neuroscience</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>26</day>
<month>11</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>15</volume>
<elocation-id>782968</elocation-id>
<history>
<date date-type="received">
<day>25</day>
<month>09</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>02</day>
<month>11</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2021 He, Xu, Yang, Mao, Chen and Wang.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>He, Xu, Yang, Mao, Chen and Wang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>As a non-invasive, low-cost medical imaging technology, magnetic resonance imaging (MRI) has become an important tool for brain tumor diagnosis. Many scholars have carried out some related researches on MRI brain tumor segmentation based on deep convolutional neural networks, and have achieved good performance. However, due to the large spatial and structural variability of brain tumors and low image contrast, the segmentation of MRI brain tumors is challenging. Deep convolutional neural networks often lead to the loss of low-level details as the network structure deepens, and they cannot effectively utilize the multi-scale feature information. Therefore, a deep convolutional neural network with a multi-scale attention feature fusion module (MAFF-ResUNet) is proposed to address them. The MAFF-ResUNet consists of a U-Net with residual connections and a MAFF module. The combination of residual connections and skip connections fully retain low-level detailed information and improve the global feature extraction capability of the encoding block. Besides, the MAFF module selectively extracts useful information from the multi-scale hybrid feature map based on the attention mechanism to optimize the features of each layer and makes full use of the complementary feature information of different scales. The experimental results on the BraTs 2019 MRI dataset show that the MAFF-ResUNet can learn the edge structure of brain tumors better and achieve high accuracy.</p>
</abstract>
<kwd-group>
<kwd>magnetic resonance imaging (MRI)</kwd>
<kwd>semantic segmentation</kwd>
<kwd>convolutional neural network</kwd>
<kwd>residual network</kwd>
<kwd>attention mechanism</kwd>
<kwd>brain tumor</kwd>
</kwd-group>
<counts>
<fig-count count="5"/>
<table-count count="1"/>
<equation-count count="14"/>
<ref-count count="30"/>
<page-count count="9"/>
<word-count count="6810"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="S1">
<title>Introduction</title>
<p>In daily life, the human brain is the controller of all behaviors and the sender of activity instructions. As the main part of the human brain, the cerebrum is the highest part of the central nervous system. Brain health has an important impact on the human body. The brain tumor is one of the most common brain diseases and can be induced at any age. Therefore, the prevention of brain tumors is a significant part of daily health management. Brain tumors can be divided into glioma, meningioma, pituitary adenoma, schwannoma congenital tumor, and so on, among which glioma accounts for the largest proportion. More than half of gliomas are malignant tumors, among which glioblastoma is the most common malignant tumor of the brain and central nervous system, accounting for about 14.5% of all tumors (<xref ref-type="bibr" rid="B24">Ostrom et al., 2020</xref>). According to the World Health Organization (WHO) criteria, gliomas are classified into four grades, and the higher the grade, the more likely the tumor is to be malignant. Among them, grade I and II gliomas are low-grade gliomas (LGG), while grade III and IV gliomas are high-grade gliomas (HGG; <xref ref-type="bibr" rid="B20">Louis et al., 2007</xref>), which are malignant tumors. Brain tumors can cause serious damage not only to the brain but also to other parts of the body, such as vision loss, motor problems, sensory problems, and even shock in severe cases. Therefore, early detection of brain tumors and early intervention are the only way to minimize the impact of brain tumors.</p>
<p>In clinical medicine, brain tumor screening and diagnosis mainly involve physical examination, imaging examination, and pathological examination, among which physical examination is a preliminary diagnosis of the patient&#x2019;s condition through a comprehensive physical examination by the doctor. The results are somewhat accidental, and it is impossible to accurately judge the condition. However, the pathological examination requires anesthesia operation to collect samples from patients, which is complicated, costly, and has certain damage to the patient&#x2019;s body. Compared with the previous two methods, medical imaging has the characteristics of objectivity, accuracy, convenience, and low cost. It not only overcomes the inaccuracy and subjectivity of physical examination but also omits the cumbersome collection of biopsy samples in the pathological examination. And it is one of the main methods of auxiliary diagnosis for patients with brain tumors. The medical imaging techniques used to diagnose brain tumors mainly include magnetic resonance imaging (MRI) and computer tomography (CT). MRI images are clearer than CT. Especially for small tumors, the use of CT technology is prone to miss the diagnosis. And for soft tissues, the resolution of CT is much lower than that of MRI. Thus, the results of auxiliary diagnosis and treatment using MRI will be more accurate. In addition, CT imaging requires prior injection of radioactive isotopes into the patient, which can affect the human body to a certain extent. As a non-invasive and low-cost medical imaging technology, MRI has become the first choice for brain tumors diagnosis. In this article, MRI images are utilized as a data carrier to study the segmentation of glioma, which has the greatest risk of malignancies in brain tumors.</p>
<p>The number of brain tumor patients is increasing with the development of society, the accelerated pace of life, and the increase of people&#x2019;s work pressure. Faster and more accurate intervention is the key to reduce the mortality rate of brain tumor-related diseases. During the analysis of the brain images, accurate identification of tumor area is the premise of subsequent qualitative diagnosis. However, large spatial and structural variability and low image contrast are the main problems in brain tumor segmentation.</p>
<p>Traditional brain imaging diagnosis mainly relies on manual analysis by professional doctors, which requires a lot of time and cost. With the huge and increasing amount of medical image data, the speed of manual analysis is far behind the speed of data generation. At the same time, due to the professional knowledge requirements of manual segmentation of brain tumors, the differences and workload of manual segmentation results, machine-participated semi-automatic or fully automatic brain tumor segmentation shows obvious advantages (<xref ref-type="bibr" rid="B9">Gordillo et al., 2013</xref>). In early studies, it was mainly aimed at semi-automatic segmentation of brain tumors (<xref ref-type="bibr" rid="B9">Gordillo et al., 2013</xref>). The purpose of semi-automatic segmentation research is to minimize human intervention when machines and humans work together to achieve the desired segmentation effect. But it is still affected by differences in human subjective consciousness. The automatic method exploits the model and prior knowledge to achieve independent segmentation.</p>
<p>Segmentation methods for brain tumors can be divided into four categories, which are threshold-based, region-based, classification-based, and model-based methods (<xref ref-type="bibr" rid="B9">Gordillo et al., 2013</xref>). <xref ref-type="bibr" rid="B6">Dawngliana et al. (2015)</xref> combined initial segmentation of multi-layer threshold with the morphological operation of level set to extract fine images. Since brain tumors are relatively easy to identify compared with other brain tissues, the characteristics of tumor regions can be extracted during the preprocessing stage, so that brain tumors can be segmented using region-based methods. <xref ref-type="bibr" rid="B10">Harati et al. (2011)</xref> proposed an improved scale-based fuzzy connectedness algorithm that automatically selects seed points on the scale. The method performed well in low-contrast tumor areas. Region-based methods are greatly affected by image pixel coherence, and noise or intensity changes may lead to holes or excessive segmentation (<xref ref-type="bibr" rid="B9">Gordillo et al., 2013</xref>). Another relatively similar idea is based on the prominent characteristics of brain tumors in medical images, which is to segment brain tumors based on tumor contour by feature extraction of brain tumor boundary information (<xref ref-type="bibr" rid="B5">Bauer et al., 2013</xref>). <xref ref-type="bibr" rid="B8">Essadike et al. (2018)</xref> determined the initial contour by using a tumor filter in analog optics and utilized this initial contour to define the active contour model to determine the tumor boundary. <xref ref-type="bibr" rid="B21">Ma et al. (2018)</xref> combined random forest and active contour models to automatically infer glioma structure from multimodal volumetric MR images and proposed a new multiscale patch-driven active contour model to refine the results using sparse representation techniques. In addition, due to the different formation mechanisms and surface features of different brain tumors, many researchers have studied the texture features of different brain tumors, and achieve tumor segmentation through voxel classification or clustering. Among the segmentation methods based on classification, Fuzzy C-means (FCM) is one of the mainstream methods because of its advantages in preserving the original image information. In the early stage, <xref ref-type="bibr" rid="B26">Pham et al. (1997)</xref> and <xref ref-type="bibr" rid="B29">Xu et al. (1997)</xref> applied the FCM method to MRI segmentation (<xref ref-type="bibr" rid="B16">Latif G. et al., 2021</xref>). Subsequently, many variants of standard FCM, such as bias-corrected FCM (BCFCM), enhanced FCM (EFCM), kernelized FCM (KFCM), and spatially constrained KFCM (SKFCM) emerged (<xref ref-type="bibr" rid="B16">Latif G. et al., 2021</xref>). However, the FCM method is easily disturbed by noise and has a high computational cost. Model-based methods include parametric deformable models, geometric deformable models or level sets, and so on. The above active contour models belong to the parameter deformable model. However, parametric deformable models are difficult to deal with topology changes of contour segmentation and merger naturally, so geometric deformable models or level sets are introduced (<xref ref-type="bibr" rid="B9">Gordillo et al., 2013</xref>). <xref ref-type="bibr" rid="B18">Lee et al. (2012)</xref> exploited the surface evolution principle of geometric deformation model and level set to achieve medical volume image segmentation and carried out tests on tumor tissues, but the computational efficiency of this method was low.</p>
<p>With the rise of deep learning, researchers began to apply deep networks to the automatic segmentation of brain tumors. <xref ref-type="bibr" rid="B11">Havaei et al. (2017)</xref> proposed a brain tumor segmentation model based on a deep neural network, which utilized local features and more global context features to learn the unique features of brain tumor segmentation. For images, a convolutional neural network shows obvious superiority. <xref ref-type="bibr" rid="B25">Pereira et al. (2016)</xref> designed a deeper network for glioma segmentation based on a convolutional neural network, using small kernels. Through intensity normalization and data enhancement, the segmentation effect of the network can be improved and the over-fitting can be avoided while the network parameters are minimized. Among them, U-Net (<xref ref-type="bibr" rid="B27">Ronneberger et al., 2015</xref>), as the classical model of the convolutional neural network, has outstanding application effect in medical images, so it is also widely used in MRI brain tumor segmentation, and there are many modifications based on U-Net. <xref ref-type="bibr" rid="B17">Latif U. et al. (2021)</xref> improved the automatic segmentation process of brain tumors by introducing size variability into the convolutional neural network and proposed a multi-inception-UNET model to improve the scalability of U-Net. <xref ref-type="bibr" rid="B30">Zhang et al. (2020)</xref> proposed a new type of densely connection inception convolutional neural network on the basis of U-Net architecture which was applied to medical images, and conducted experiments in tumor segmentation of brain MRI. They added the Inception-Res module and the densely connecting convolutional module to increase the width and depth of the network, and at the same time led to an increase in the number of parameters, which slows down the speed of model training data (<xref ref-type="bibr" rid="B1">Angulakshmi and Deepa, 2021</xref>).</p>
<p>In this article, a deep convolutional neural network composed of a U-Net and a multi-scale attention feature fusion module (MAFF) is proposed to achieve automatic segmentation of gliomas in 3D brain MRI images. By using multi-modal MRI data, high-precision segmentation of three tumor types is realized. The main contributions of this work are as follows:</p>
<list list-type="simple">
<list-item>
<label>(1)</label>
<p>We introduce five residual connections to U-Net, which enhance the feature extraction ability of encoder blocks, the speed of network convergence, and alleviate the gradient vanishing problem caused by the deep network structure.</p>
</list-item>
<list-item>
<label>(2)</label>
<p>The proposed MAFF module exploits the attention mechanism to selectively extract feature information of each scale, which can gain a global contextual view. The fusion of useful multi-scale features further improves the accuracy of brain tumor segmentation.</p>
</list-item>
<list-item>
<label>(3)</label>
<p>MAFF-ResUNet performs well on the public BraTs 2019 MRI dataset and has certain competitiveness in the field of brain tumor segmentation.</p>
</list-item>
</list>
</sec>
<sec id="S2" sec-type="materials|methods">
<title>Materials and Methods</title>
<sec id="S2.SS1">
<title>Dataset</title>
<p>We performed our experiments on the MICCAI BraTs 2019 MRI dataset (<xref ref-type="bibr" rid="B22">Menze et al., 2015</xref>; <xref ref-type="bibr" rid="B2">Bakas et al., 2017a</xref>,<xref ref-type="bibr" rid="B3">b</xref>, <xref ref-type="bibr" rid="B4">2018</xref>). The BraTs 2019 dataset is a collection of MRI data from glioma patients. There are two types of brain tumors in the dataset: high-grade glioma (HGG) and low-grade glioma (LGG). The dataset consists of 256 HGG cases and 76 LGG cases. Each case includes four 3D MRI modalities (T1, T1ce, T2, and Flair) as can be shown in <xref ref-type="fig" rid="F1">Figure 1</xref>. And the size of each 3D MRI image is 155 &#x00D7; 240 &#x00D7; 240. The ground truth of each image is labeled manually by the expert. There are four types of labels: background (labeled 0), necrosis and non-enhancing tumor (labeled 1), edema (labeled 2), and enhancing tumor (labeled 4). The task is to segment three nested subregions generated by the three labels (1, 2, and 4), named enhancing tumor (ET, the region of label 4), whole tumor (WT, the region consists of label 1, 2, and 4) and tumor core (TC, the region of label 1 and 4).</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>Samples of MRI images in four modalities and their ground truth. In the ground truth image, red, green, and yellow stand for tumor core (TC), whole tumor (WT), and enhance tumor (ET), respectively.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-15-782968-g001.tif"/>
</fig>
</sec>
<sec id="S2.SS2">
<title>Preprocessing</title>
<p>In this work, 3D MRI images from 335 cases in the BraTs 2019 dataset are sliced into multiple 2D images, and slices without tumors are excluded. We use 80% of the generated slices for training, 10% for validation, and 10% for testing. Compared with single modal data, multi-modal data provide more characteristic information for tumor segmentation. To make effective use of multi-modal image information, we concatenate MRI 2D images of four modes in the same dimension as model input.</p>
<p>It is necessary to preprocess the input image before training. First, we remove the top 1% and bottom 1% intensities as Havaei did (<xref ref-type="bibr" rid="B11">Havaei et al., 2017</xref>). Then, since images of different modes have different contrast and other problems, we normalize each modal image before slicing. In this work, z-score normalization is adopted, that is, mean value and standard deviation are used to standardize each image. The formula is as follows:</p>
<disp-formula id="S2.E1"><label>(1)</label><mml:math id="M1" display="block"><mml:mrow><mml:mpadded width="+5.6pt"><mml:msub><mml:mi>z</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mpadded><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:mi mathvariant="normal">&#x03BC;</mml:mi></mml:mrow><mml:mi mathvariant="normal">&#x03C3;</mml:mi></mml:mfrac></mml:mrow></mml:math></disp-formula>
<p>where <italic>x</italic><sub>i</sub> is the input image, and <italic>z</italic><sub>i</sub> is the normalized image. &#x03BC; represents the mean value of the input image, while &#x03C3; denotes the standard deviation of the input image. Finally, we crop the training image into a size of 160 &#x00D7; 160 to reduce the black background in the image, which can obtain effective pixels and reduce the amount of calculation to some extent.</p>
</sec>
<sec id="S2.SS3">
<title>Proposed Method</title>
<sec id="S2.SS3.SSS1">
<title>Architecture of MAFF-ResUNet</title>
<p>Inspired by U-Net (<xref ref-type="bibr" rid="B27">Ronneberger et al., 2015</xref>), ResNet (<xref ref-type="bibr" rid="B13">He et al., 2016</xref>), DAF (<xref ref-type="bibr" rid="B28">Wang et al., 2018</xref>), we propose a deep convolutional neural network with a multi-scale attention feature fusion module based on attention mechanism for brain tumor segmentation. Recently, U-Net has achieved excellent performance in the field of medical image segmentation, which has the advantage of being able to accept input images of any size. <xref ref-type="fig" rid="F2">Figure 2</xref> shows the proposed MAFF-ResUNet, which adopts U-Net as our basic network architecture. The MAFF-ResUNet consists of four encoder blocks, four decoder blocks, an intermediate layer and a MAFF module. Firstly, in the down-sampling path, we utilize the convolution layer to extract low-level features of brain tumors, the pooling layer to expand the receptive field, and residual connections to enhance the expression ability of encoder blocks. In the up-sampling path, the up-sampling layer and convolution are used to restore the image resolution. Skip Connections combine low-level information with high-level information to reduce the loss of detailed information.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Architecture of the proposed MAFF-ResUNet. In the ground truth image, red, green and yellow stand for tumor core (TC), whole tumor (WT), and enhance tumor (ET), respectively.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-15-782968-g002.tif"/>
</fig>
<p>To further refine the boundaries of the brain tumor, we employ bilinear interpolation to up-sample the feature maps of different resolutions from the four decoder blocks to the same size as the input image and then input them into the MAFF module. The MAFF module extracts attention features of different scales and fuses them to improve the segmentation accuracy of brain tumors and obtain the segmentation results of brain tumors.</p>
</sec>
<sec id="S2.SS3.SSS2">
<title>Encoder and Decoder Block</title>
<p>Residual U-Net is built by incorporating residual shortcuts into U-Net. Inspired by ResNet (<xref ref-type="bibr" rid="B13">He et al., 2016</xref>), we utilize five residual connections in the down-sampling branch, including four encoder blocks and an intermediate layer. The main function of these encoder blocks is to extract low-level features. The introduction of short skip connections is beneficial to obtain better feature expression and accelerate model convergence. As can be seen in <xref ref-type="fig" rid="F3">Figure 3A</xref>, each encoder block contains two 3 &#x00D7; 3 convolutions, batch normalization (BN; <xref ref-type="bibr" rid="B14">Ioffe and Szegedy, 2015</xref>), Rectified Linear Unit (ReLU) activation function, and a 2 &#x00D7; 2 max-pooling layer. Each Max-pooling layer reduces the size of the input feature map to half of the original. Moreover, the intermediate layer plays the role of connecting the down-sampling and up-sampling paths. Structurally, the intermediate layer is similar to the encoder block, but without the pooling layer.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p>Encoder and decoder block in MAFF-ResUNet <bold>(A)</bold> encoder block <bold>(B)</bold> decoder block.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-15-782968-g003.tif"/>
</fig>
<p>In the decoding stage, we use four decoder blocks, each of which contains an up-sampling layer, and two 3 &#x00D7; 3 convolutions (see <xref ref-type="fig" rid="F3">Figure 3B</xref>). Similarly, each convolutional operation is followed by a BN layer and a ReLU activation layer. The up-sampling layer restores the size of the feature map by using the bilinear interpolation. The input of each decoder block is composed of two parts, one is the output of the previous decoder block, and the other is the output feature map of the same level encoder block, which makes up for the low-level details lost in the high-level semantic space.</p>
</sec>
<sec id="S2.SS3.SSS3">
<title>Multi-Scale Attention Feature Fusion Module</title>
<p>Inspired by DAF (<xref ref-type="bibr" rid="B28">Wang et al., 2018</xref>), we propose a MAFF module to fuse different scale features and improve the accuracy of brain tumor segmentation. As shown in <xref ref-type="fig" rid="F4">Figure 4</xref>, the MAFF module accepts feature maps from four different scales, expressed as <italic>F</italic><sub>i</sub> &#x2208; <italic>R</italic><sup><italic>C</italic>&#x00D7;<italic>H</italic>&#x00D7;<italic>W</italic></sup>(<italic>i</italic> = 1,2,3,4), where <italic>i</italic> indicates the feature map of the <italic>i</italic>-th level, <italic>C</italic> is the number of channels, <italic>H</italic> and <italic>W</italic> represent the height and width of <italic>F</italic><sub>i</sub>, respectively. These four feature maps are concatenated in the channel dimension and named <italic>F</italic><sub>m</sub> &#x2208; <italic>R</italic><sup>4<italic>C</italic>&#x00D7;<italic>H</italic>&#x00D7;<italic>W</italic></sup>. The low-level feature map contains abundant boundary information of brain tumors, while the high-level feature map contains advanced semantic information of brain tumors.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption><p>Multi-scale attention feature fusion (MAFF) module.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-15-782968-g004.tif"/>
</fig>
<p>The direct concatenation of different scale feature maps will certainly bring a little noise. The segmentation results obtained by directly exploiting <italic>F</italic><sub>m</sub> cannot effectively utilize the complementary information of features at different levels. Therefore, we use the attention feature block (AFB) to get the attention map <italic>A</italic><sub>i</sub> of each level and then multiply it with the mixed feature map <italic>F</italic><sub>m</sub> to obtain the redefined feature map <italic>F</italic><sub>ri</sub> &#x2208; <italic>R</italic><sup><italic>C</italic>&#x00D7;<italic>H</italic>&#x00D7;<italic>W</italic></sup> of each scale. Specifically, we do the following for <italic>F</italic><sub>i</sub>:</p>
<disp-formula id="S2.E2"><label>(2)</label><mml:math id="M2" display="block"><mml:mrow><mml:mpadded width="+5.6pt"><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mpadded><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>u</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>g</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>a</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2062;</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mi>m</mml:mi></mml:msub><mml:mo rspace="5.3pt">)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where <italic>f</italic><sub><italic>1&#x2004; &#x2004; &#x00D7;1</italic></sub> is a 1 &#x00D7; 1 convolution followed by BN and ReLU function, <italic>f</italic><sub>cat</sub>, <italic>f</italic><sub>up</sub>, and <italic>f</italic><sub>s</sub> denote the operations of concatenation, up-sampling, and softmax function, respectively. The <italic>g</italic>(<italic>x</italic>) is an attention feature module composed of convolution and average pooling, which can be formulated as:</p>
<disp-formula id="S2.E3"><label>(3)</label><mml:math id="M3" display="block"><mml:mrow><mml:mrow><mml:mi>g</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo rspace="8.1pt" stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mpadded width="+5.6pt"><mml:mn>1</mml:mn></mml:mpadded><mml:mo>&#x00D7;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mpadded width="+5.6pt"><mml:mn>3</mml:mn></mml:mpadded><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mpadded width="+5.6pt"><mml:mn>3</mml:mn></mml:mpadded><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where <italic>f</italic><sub><italic>3&#x2004; &#x2004; &#x00D7;3</italic></sub> is a convolution layer with the filter size of 3 &#x00D7; 3, and <italic>f</italic><sub>p</sub> represents the operation of average pooling. Besides, BN and parametric satisfaction linear Unit (PReLU; <xref ref-type="bibr" rid="B12">He et al., 2015</xref>) activation function are adopted after each convolution layer in the attention feature block. The PReLU can be obtained by:</p>
<disp-formula id="S2.E4"><label>(4)</label><mml:math id="M4" display="block"><mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>R</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>e</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>L</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>U</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo rspace="8.1pt">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mo movablelimits="false">max</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi>a</mml:mi><mml:mo>&#x002A;</mml:mo><mml:mrow><mml:mo movablelimits="false">min</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where <italic>a</italic> is a learnable parameter.</p>
<p>Then, the output of <italic>g</italic>(<italic>x</italic>) is fed to a softmax layer to obtain the attention map after the up-sample. The mathematical expression of the softmax function is given as:</p>
<disp-formula id="S2.E5"><label>(5)</label><mml:math id="M5" display="block"><mml:mrow><mml:mrow><mml:mrow><mml:mi>S</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>o</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>f</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>m</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>a</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>x</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo rspace="8.1pt">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:msup><mml:mi>e</mml:mi><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:msup><mml:mrow><mml:msubsup><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mpadded width="+5.6pt"><mml:mi>n</mml:mi></mml:mpadded><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup><mml:msup><mml:mi>e</mml:mi><mml:msub><mml:mi>x</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:msup></mml:mrow></mml:mfrac></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mpadded width="+5.6pt"><mml:mi>j</mml:mi></mml:mpadded><mml:mo>=</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">&#x2026;</mml:mi><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>According to the <italic>A</italic><sub>i</sub>, we can selectively extract brain tumor-related feature information from the original feature map by performing a matrix multiplication between <italic>A</italic><sub>i</sub> and <italic>F</italic><sub>m</sub>. The output maps are, respectively, concatenated with <italic>F</italic><sub>i</sub>, and the convolution operation is performed to obtain the redefined feature maps.</p>
<p>In the end, the redefined feature maps of each layer containing both low-level and high-level information are fused, averaged, and then fed to a sigmoid function to obtain the final segmentation result.</p>
</sec>
<sec id="S2.SS3.SSS4">
<title>Loss Function</title>
<p>In a specific task, the choice of a suitable loss function has a significant influence on the experimental results. The loss function is utilized to express the degree of difference between the predicted value and the label value. During the training process, the model continuously fine-tunes the weight and bias of the network to minimize the loss function value and improve the performance of the model. In this article, for the brain tumor segmentation task, our loss function consists of binary cross-entropy loss (BCE) and Dice loss. The Dice loss mainly applies the Dice coefficient, which is a similarity measurement function. The Dice loss takes the responsibility for the prediction of the brain tumor globally, while the BCE loss is responsible for the classification of each pixel. They can be expressed as:</p>
<disp-formula id="S2.E6"><label>(6)</label><mml:math id="M6" display="block"><mml:mrow><mml:mrow><mml:mi>L</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>o</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>s</mml:mi><mml:mo>&#x2062;</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>B</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>C</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>E</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>g</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo rspace="8.1pt">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:munder><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mi>k</mml:mi></mml:munder><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mpadded width="+3.3pt"><mml:msub><mml:mi>g</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mpadded><mml:mo>&#x002A;</mml:mo><mml:mrow><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:msub><mml:mi>g</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo rspace="5.8pt">)</mml:mo></mml:mrow><mml:mo rspace="5.8pt">&#x002A;</mml:mo><mml:mtext>log</mml:mtext></mml:mrow><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<disp-formula id="S2.E7"><label>(7)</label><mml:math id="M7" display="block"><mml:mrow><mml:mrow><mml:mi>L</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>o</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>s</mml:mi><mml:mo>&#x2062;</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>i</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>c</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>e</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>g</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo rspace="8.1pt">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mrow><mml:mi>D</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>i</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>c</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>e</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>g</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo rspace="8.1pt" stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo rspace="5.3pt">-</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>&#x2229;</mml:mo><mml:msub><mml:mi>g</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mi mathvariant="normal">&#x03B5;</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>|</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mi>g</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>|</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi mathvariant="normal">&#x03B5;</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where <italic>n</italic> is the number of samples, <italic>p</italic><sub>k</sub> and <italic>g</italic><sub>k</sub> denote the prediction of the proposed model and the ground truth, respectively; |<italic>p</italic><sub>k</sub>&#x2229;<italic>g</italic><sub>k</sub>| represents the intersection between <italic>p</italic><sub>k</sub> and <italic>g</italic><sub>k</sub>; |<italic>p</italic><sub>k</sub>| and |<italic>g</italic><sub>k</sub>| are the number of pixels in <italic>p</italic><sub>k</sub> and <italic>g</italic><sub>k</sub>, respectively. &#x03B5; stands for the smoothing coefficient, and the value is set to 1.0&#x00D7;10<sup>&#x2212;5</sup>.</p>
<p>The total loss is described as:</p>
<disp-formula id="S2.E8"><label>(8)</label><mml:math id="M8" display="block"><mml:mrow><mml:mrow><mml:mi>L</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>o</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>s</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mpadded width="+5.6pt"><mml:mi>s</mml:mi></mml:mpadded></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03B1;</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>L</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>o</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>s</mml:mi><mml:mo>&#x2062;</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>B</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>C</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>E</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo rspace="5.3pt">+</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03B2;</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>L</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>o</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>s</mml:mi><mml:mo>&#x2062;</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>i</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>c</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where &#x03B1; and &#x03B2; represent the weight. We empirically set the weight &#x03B1; as 0.5, and the weight &#x03B2; as 1.</p>
</sec>
</sec>
</sec>
<sec id="S3">
<title>Experiments and Results</title>
<sec id="S3.SS1">
<title>Training Details</title>
<p>The proposed MAFF-ResUNet is conducted on the PyTorch framework with an NVIDIA GeForce RTX 3090. In this experiment, we use adaptive moment estimation (Adam) (<xref ref-type="bibr" rid="B15">Kingma and Ba, 2014</xref>) as the optimizer. The initial learning rate is 0.0003, momentum is 0.90, and weight decay is set to 0.0001. We utilize poly police to decay the learning rate in the progress of training, as employed by <xref ref-type="bibr" rid="B23">Mou et al. (2019)</xref> and <xref ref-type="bibr" rid="B7">Elhassan et al. (2021)</xref>. It can be defined as in Eq. (9), where <italic>iter</italic> represents the number of iterations, <italic>max_iter</italic> denotes the maximum number of iterations, and <italic>power</italic> is set to 0.9. During training, the batch size is 16.</p>
<disp-formula id="S3.E9"><label>(9)</label><mml:math id="M9" display="block"><mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mpadded width="+5.6pt"><mml:mi>r</mml:mi></mml:mpadded></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mpadded width="+5.6pt"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>n</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>i</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mpadded></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>e</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>a</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>x</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi mathvariant="normal">_</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>i</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>e</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>r</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>o</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>w</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>e</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>r</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:mrow></mml:math></disp-formula>
</sec>
<sec id="S3.SS2">
<title>Evaluation Metrics</title>
<p>To effectively evaluate the performance of the proposed model, we adopt intersection-over-union (IoU), sensitivity, and positive predictive value (PPV), which are commonly used metrics for image segmentation. The IoU can be calculated using Eq. (10). (<italic>P</italic>&#x2229;<italic>G</italic>) is the number of positive pixels which values are the same in both <italic>P</italic> and <italic>G</italic>, while (<italic>P</italic>&#x222A;<italic>G</italic>) stands for the union of <italic>P</italic> and <italic>G</italic>. Sensitivity is defined as the ratio of correctly classified positive samples to the total positive samples in ground truth &#x201C;<italic>G</italic>,&#x201D; as shown in Eq. (11). It can be employed to measure the sensitivity of the model to segmentation targets. PPV represents the proportion of correctly classified positive samples to all positive samples in predicted &#x201C;<italic>P</italic>,&#x201D; which can be formulated as in Eq. (12). |<italic>P</italic>| is the number of positive pixels in <italic>P</italic>, while |<italic>G</italic>| is the number of positive pixels in <italic>G</italic>. The IoU, sensitivity, and PPV are all ranging from 0 to 1. The closer they are to 1, the better the segmentation result is:</p>
<disp-formula id="S3.E10"><label>(10)</label><mml:math id="M10" display="block"><mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>o</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>U</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>P</mml:mi><mml:mo>,</mml:mo><mml:mi>G</mml:mi><mml:mo rspace="8.1pt" stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>&#x2229;</mml:mo><mml:mi>G</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>&#x222A;</mml:mo><mml:mi>G</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<disp-formula id="S3.E11"><label>(11)</label><mml:math id="M11" display="block"><mml:mrow><mml:mrow><mml:mi>S</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>E</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>N</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>P</mml:mi><mml:mo>,</mml:mo><mml:mi>G</mml:mi><mml:mo rspace="8.1pt" stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>&#x2229;</mml:mo><mml:mi>G</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mi>G</mml:mi><mml:mo>|</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<disp-formula id="S3.E12"><label>(12)</label><mml:math id="M12" display="block"><mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>P</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>V</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>P</mml:mi><mml:mo>,</mml:mo><mml:mi>G</mml:mi><mml:mo rspace="8.1pt" stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>&#x2229;</mml:mo><mml:mi>G</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mi>P</mml:mi><mml:mo>|</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<p>Furthermore, as two commonly used metrics for brain tumor segmentation, the Dice similarity coefficient (DSC) and Hausdorff distance (HD) are also applied for the qualitative analysis in this experiment. The DSC is utilized to calculate how similar two samples are, and can be given as:</p>
<disp-formula id="S3.E13"><label>(13)</label><mml:math id="M13" display="block"><mml:mrow><mml:mrow><mml:mi>D</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>S</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>C</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>P</mml:mi><mml:mo>,</mml:mo><mml:mi>G</mml:mi><mml:mo rspace="8.1pt" stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo rspace="5.3pt">=</mml:mo><mml:mfrac><mml:mrow><mml:mpadded width="+5.6pt"><mml:mn>2</mml:mn></mml:mpadded><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mo stretchy="false">|</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>&#x2229;</mml:mo><mml:mi>G</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo><mml:mi>P</mml:mi><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mo stretchy="false">|</mml:mo><mml:mi>G</mml:mi><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<p>The DSC is sensitive to the internal padding of the mask. Compared with the DSC, the HD is more sensitive to the segmented boundary. It represents the maximum Hausdorff distance between the labeled boundary and the predicted boundary, defined as:</p>
<disp-formula id="S3.Ex1"><mml:math id="M14a" display="block"><mml:mrow><mml:mrow><mml:mi>H</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>D</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>P</mml:mi><mml:mo>,</mml:mo><mml:mi>G</mml:mi><mml:mo rspace="8.1pt">)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo movablelimits="false">max</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>P</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>G</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>G</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>P</mml:mi></mml:mrow></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<disp-formula id="S3.E14"><label>(14)</label><mml:math id="M14" display="block"><mml:mrow><mml:mi/><mml:mo>=</mml:mo><mml:mrow><mml:mi>m</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>a</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>x</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:munder><mml:mo movablelimits="false">max</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>P</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:munder><mml:mo movablelimits="false">min</mml:mo><mml:mrow><mml:mi>g</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>G</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:mo>(</mml:mo><mml:mi>p</mml:mi><mml:mo>,</mml:mo><mml:mi>g</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:munder><mml:mo movablelimits="false">max</mml:mo><mml:mrow><mml:mi>g</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>G</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:munder><mml:mo movablelimits="false">min</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>P</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>g</mml:mi><mml:mo>,</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>
<p>where <italic>p</italic> and g represent the points in the predicted area and the ground truth area, respectively.</p>
</sec>
<sec id="S3.SS3">
<title>Performance Comparison</title>
<p>We compare the proposed MAFF-ResUNet with different networks, including FCN (<xref ref-type="bibr" rid="B19">Long et al., 2015</xref>) and U-Net (<xref ref-type="bibr" rid="B27">Ronneberger et al., 2015</xref>). For the FCN network, we will use three models with different network depths: FCN8s, FCN16s, FCN32s. As shown in <xref ref-type="table" rid="T1">Table 1</xref>, the various metrics of FCN in the brain tumor segmentation task are lower than that of other models. Compared with FCN, U-Net with encoder-decoder structure has stronger feature extraction capabilities, and the model performance is significantly improved. The MAFF-ResUNet proposed in this article, except for the PPV metric of TC slightly lower than U-Net, other metrics have improved.</p>
<table-wrap position="float" id="T1">
<label>TABLE 1</label>
<caption><p>Comparison of segmentation results (mean &#x00B1; SD) between the proposed MAFF-ResUNet and existing deep convolutional neural networks.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left" colspan="2"></td>
<td valign="top" align="center"><bold>FCN32s</bold></td>
<td valign="top" align="center"><bold>FCN16s</bold></td>
<td valign="top" align="center"><bold>FCN8s</bold></td>
<td valign="top" align="center"><bold>U-Net</bold></td>
<td valign="top" align="center"><bold>MAFF-ResUNet</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">IoU (%)</td>
<td valign="top" align="center">WT</td>
<td valign="top" align="center">72.90.3</td>
<td valign="top" align="center">77.80.4</td>
<td valign="top" align="center">80.80.2</td>
<td valign="top" align="center">85.60.4</td>
<td valign="top" align="center"><bold>86.5 &#x00B1; 0.08</bold></td>
</tr>
<tr>
<td/>
<td valign="top" align="center">TC</td>
<td valign="top" align="center">80.10.07</td>
<td valign="top" align="center">82.10.3</td>
<td valign="top" align="center">84.40.07</td>
<td valign="top" align="center">88.40.4</td>
<td valign="top" align="center"><bold>88.5 &#x00B1; 0.3</bold></td>
</tr>
<tr>
<td/>
<td valign="top" align="center">ET</td>
<td valign="top" align="center">66.70.6</td>
<td valign="top" align="center">73.10.2</td>
<td valign="top" align="center">77.70.05</td>
<td valign="top" align="center">85.80.4</td>
<td valign="top" align="center"><bold>86.4 &#x00B1; 0.3</bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="7"><hr/></td>
</tr>
<tr>
<td valign="top" align="left">SEN (%)</td>
<td valign="top" align="center">WT</td>
<td valign="top" align="center">85.01.0</td>
<td valign="top" align="center">87.80.8</td>
<td valign="top" align="center">88.60.4</td>
<td valign="top" align="center">91.10.4</td>
<td valign="top" align="center"><bold>91.93 &#x00B1; 0.1</bold></td>
</tr>
<tr>
<td/>
<td valign="top" align="center">TC</td>
<td valign="top" align="center">87.60.4</td>
<td valign="top" align="center">89.70.3</td>
<td valign="top" align="center">90.50.05</td>
<td valign="top" align="center">92.60.3</td>
<td valign="top" align="center"><bold>93.4 &#x00B1; 0.1</bold></td>
</tr>
<tr>
<td/>
<td valign="top" align="center">ET</td>
<td valign="top" align="center">75.50.5</td>
<td valign="top" align="center">82.30.4</td>
<td valign="top" align="center">85.00.2</td>
<td valign="top" align="center">91.80.3</td>
<td valign="top" align="center"><bold>92.5 &#x00B1; 0.1</bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="7"><hr/></td>
</tr>
<tr>
<td valign="top" align="left">PPV (%)</td>
<td valign="top" align="center">WT</td>
<td valign="top" align="center">81.70.6</td>
<td valign="top" align="center">85.30.3</td>
<td valign="top" align="center">88.30.08</td>
<td valign="top" align="center">92.00.09</td>
<td valign="top" align="center"><bold>92.5 &#x00B1; 0.1</bold></td>
</tr>
<tr>
<td/>
<td valign="top" align="center">TC</td>
<td valign="top" align="center">89.90.6</td>
<td valign="top" align="center">90.10.4</td>
<td valign="top" align="center">92.10.2</td>
<td valign="top" align="center"><bold>94.4 &#x00B1; 0.06</bold></td>
<td valign="top" align="center">93.9 &#x00B1; 0.3</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">ET</td>
<td valign="top" align="center">82.40.3</td>
<td valign="top" align="center">85.00.3</td>
<td valign="top" align="center">88.70.06</td>
<td valign="top" align="center">92.270.7</td>
<td valign="top" align="center"><bold>92.34 &#x00B1; 0.4</bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="7"><hr/></td>
</tr>
<tr>
<td valign="top" align="left">DSC (%)</td>
<td valign="top" align="center">WT</td>
<td valign="top" align="center">81.90.3</td>
<td valign="top" align="center">85.20.3</td>
<td valign="top" align="center">87.30.2</td>
<td valign="top" align="center">90.50.3</td>
<td valign="top" align="center"><bold>91.2 &#x00B1; 0.06</bold></td>
</tr>
<tr>
<td/>
<td valign="top" align="center">TC</td>
<td valign="top" align="center">85.40.1</td>
<td valign="top" align="center">86.80.2</td>
<td valign="top" align="center">88.60.1</td>
<td valign="top" align="center">91.70.3</td>
<td valign="top" align="center"><bold>91.8 &#x00B1; 0.3</bold></td>
</tr>
<tr>
<td/>
<td valign="top" align="center">ET</td>
<td valign="top" align="center">73.00.6</td>
<td valign="top" align="center">79.40.2</td>
<td valign="top" align="center">83.40.09</td>
<td valign="top" align="center">89.80.4</td>
<td valign="top" align="center"><bold>90.2 &#x00B1; 0.3</bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="7"><hr/></td>
</tr>
<tr>
<td valign="top" align="left">HD (mm)</td>
<td valign="top" align="center">WT</td>
<td valign="top" align="center">2.920.005</td>
<td valign="top" align="center">2.680.01</td>
<td valign="top" align="center">2.490.01</td>
<td valign="top" align="center">2.200.001</td>
<td valign="top" align="center"><bold>2.16 &#x00B1; 0.005</bold></td>
</tr>
<tr>
<td/>
<td valign="top" align="center">TC</td>
<td valign="top" align="center">1.730.005</td>
<td valign="top" align="center">1.640.006</td>
<td valign="top" align="center">1.560.003</td>
<td valign="top" align="center">1.410.01</td>
<td valign="top" align="center"><bold>1.39 &#x00B1; 0.006</bold></td>
</tr>
<tr>
<td/>
<td valign="top" align="center">ET</td>
<td valign="top" align="center">1.790.01</td>
<td valign="top" align="center">1.640.004</td>
<td valign="top" align="center">1.490.004</td>
<td valign="top" align="center">1.230.02</td>
<td valign="top" align="center"><bold>1.20 &#x00B1; 0.005</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p><italic>Bold indicates the maximum value of IoU, SEN, PPV, DSC, and the minimum value of HD among these methods.</italic></p></fn>
</table-wrap-foot>
</table-wrap>
<p>Moreover, we evaluate the performance of the proposed model from a more intuitive perspective. <xref ref-type="fig" rid="F5">Figure 5</xref> shows the comparison of the prediction image between the proposed approach in this article and other methods. The figure contains four different cases, and shows the original MRI images of flair modality, the prediction results of each model, and ground truth images. Since FCN8s has the best performance among the three FCN networks, we only use the prediction results of FCN8s for comparison. By comparison, we can find that the proposed method in this article is significantly better than U-Net and has obvious advantages in the segmentation of brain tumor contours and edge details. It shows that the introduction of the MAFF module makes the brain tumor segmentation results have richer edge information. Besides, there are fewer pixels mistakenly classified by the MAFF-ResUNet. The predicted images of the MAFF-ResUNet are more similar to manually annotated images.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption><p>Visualized predicted images of different models. In the ground truth image, red, green and yellow represent tumor core (TC), whole tumor (WT), and enhance tumor (ET), respectively.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-15-782968-g005.tif"/>
</fig>
</sec>
</sec>
<sec sec-type="conclusion" id="S4">
<title>Conclusion</title>
<p>In this article, we propose a deep convolutional neural network for MRI brain tumor segmentation, named MAFF-ResUNet. This network takes advantage of the encoder-decoder structure. The introduction of residual shortcuts in the encoder block, combined with skip connections, enhances the global feature extraction capability of the network. In addition, for the output feature maps of different levels of decoder blocks, the attention mechanism is utilized to selectively extract important feature information of each level. Then the multi-scale feature maps are fused to obtain the segmentation. The proposed method is verified on the public BraTs 2019 MRI dataset. Experimental results show that the MAFF-ResUNet is better than existing deep convolutional neural networks. From the perspective of predicted images, the proposed method can effectively exploit multi-scale feature information and maintain most of the edge detail information. Therefore, the MAFF-ResUNet method proposed in this article can achieve high-precision automatic segmentation of brain tumors and can be used as an auxiliary tool for clinicians to perform early screening or diagnosis and treatment of brain tumors.</p>
</sec>
<sec sec-type="data-availability" id="S5">
<title>Data Availability Statement</title>
<p>Publicly available datasets were analyzed in this study. This data can be found here: <ext-link ext-link-type="uri" xlink:href="https://www.med.upenn.edu/cbica/brats-2019/">https://www.med.upenn.edu/cbica/brats-2019/</ext-link>.</p>
</sec>
<sec id="S6">
<title>Ethics Statement</title>
<p>Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. The patients/participants provided their written informed consent to participate in this study. Ethical review and approval was not required for the animal study because the data sets used in this manuscript are from public data sets. The data link is as follows: <ext-link ext-link-type="uri" xlink:href="https://www.med.upenn.edu/cbica/brats-2019/">https://www.med.upenn.edu/cbica/brats-2019/</ext-link>. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.</p>
</sec>
<sec id="S7">
<title>Author Contributions</title>
<p>XH wrote the main manuscript and conducted the experiments. WX and JY participated in the writing of the manuscript and modified the English grammar of the article. JY and SC made the experiments. JM and ZW analyzed the results. All authors reviewed the manuscript.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="pudiscl1">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<sec sec-type="funding-information" id="S8">
<title>Funding</title>
<p>This work was supported by a grant from the Natural Science Foundation of Fujian Province (2020J02063), Xiamen Science and Technology Bureau Foundation of Science and Technology Project for Medical and Healthy (3502Z20209005), National Natural Science Foundation of China (82072777), and Xiamen Science and Technology Bureau Foundation of Science and Technology Project for Medical and Healthy (3502Z20214ZD1013).</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Angulakshmi</surname> <given-names>M.</given-names></name> <name><surname>Deepa</surname> <given-names>M.</given-names></name></person-group> (<year>2021</year>). <article-title>A review on deep learning architecture and methods for MRI brain tumour segmentation.</article-title> <source><italic>Curr. Med. Imaging</italic></source> <volume>17</volume> <fpage>695</fpage>&#x2013;<lpage>706</lpage>. <pub-id pub-id-type="doi">10.2174/1573405616666210108122048</pub-id> <pub-id pub-id-type="pmid">33423651</pub-id></citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bakas</surname> <given-names>S.</given-names></name> <name><surname>Akbari</surname> <given-names>H.</given-names></name> <name><surname>Sotiras</surname> <given-names>A.</given-names></name> <name><surname>Bilello</surname> <given-names>M.</given-names></name> <name><surname>Rozycki</surname> <given-names>M.</given-names></name> <name><surname>Kirby</surname> <given-names>J. S.</given-names></name><etal/></person-group> (<year>2017a</year>). <article-title>Data descriptor: advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features.</article-title> <source><italic>Sci. Data.</italic></source> <volume>4</volume>:<fpage>13</fpage>. <pub-id pub-id-type="doi">10.1038/sdata.2017.117</pub-id> <pub-id pub-id-type="pmid">28872634</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bakas</surname> <given-names>S.</given-names></name> <name><surname>Akbari</surname> <given-names>H.</given-names></name> <name><surname>Sotiras</surname> <given-names>A.</given-names></name> <name><surname>Bilello</surname> <given-names>M.</given-names></name> <name><surname>Rozycki</surname> <given-names>M.</given-names></name> <name><surname>Kirby</surname> <given-names>J.</given-names></name><etal/></person-group> (<year>2017b</year>). <article-title>Segmentation labels and radiomic features for the pre-operative scans of the TCGA-GBM collection.</article-title> <source><italic>Cancer Imaging Arch.</italic></source> <pub-id pub-id-type="doi">10.7937/K9/TCIA.2017.KLXWJJ1Q</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bakas</surname> <given-names>S.</given-names></name> <name><surname>Reyes</surname> <given-names>M.</given-names></name> <name><surname>Jakab</surname> <given-names>A.</given-names></name> <name><surname>Bauer</surname> <given-names>S.</given-names></name> <name><surname>Rempfler</surname> <given-names>M.</given-names></name> <name><surname>Crimi</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Identifying the best machine learning algorithms for brain tumor segmentation. progression assessment, and overall survival prediction in the BRATS Challenge.</article-title> <source><italic>arXiv</italic> [Preprint]</source> arXiv: 1811.02629,</citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bauer</surname> <given-names>S.</given-names></name> <name><surname>Wiest</surname> <given-names>R.</given-names></name> <name><surname>Nolte</surname> <given-names>L. P.</given-names></name> <name><surname>Reyes</surname> <given-names>M.</given-names></name></person-group> (<year>2013</year>). <article-title>A survey of MRI-based medical image analysis for brain tumor studies.</article-title> <source><italic>Phys. Med. Biol.</italic></source> <volume>58</volume> <fpage>R97</fpage>&#x2013;<lpage>R129</lpage>. <pub-id pub-id-type="doi">10.1088/0031-9155/58/13/R97</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dawngliana</surname> <given-names>M.</given-names></name> <name><surname>Deb</surname> <given-names>D.</given-names></name> <name><surname>Handique</surname> <given-names>M.</given-names></name> <name><surname>Roy</surname> <given-names>S.</given-names></name></person-group> (<year>2015</year>). &#x201C;<article-title>Automatic brain tumor segmentation in mri: hybridized multilevel thresholding and level set</article-title>,&#x201D; in <source><italic>Proceedings of the 2015 International Symposium on Advanced Computing and Communication (ISACC)</italic></source>, (<publisher-loc>Silchar</publisher-loc>: <publisher-name>Institute of Electrical and Electronics Engineers</publisher-name>), <fpage>219</fpage>&#x2013;<lpage>223</lpage>.</citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Elhassan</surname> <given-names>M. A. M.</given-names></name> <name><surname>Huang</surname> <given-names>C. X.</given-names></name> <name><surname>Yang</surname> <given-names>C. H.</given-names></name> <name><surname>Munea</surname> <given-names>T. L.</given-names></name></person-group> (<year>2021</year>). <article-title>DSANet: dilated spatial attention for real-time semantic segmentation in urban street scenes.</article-title> <source><italic>Exp. Syst. Appl.</italic></source> <volume>183</volume>:<fpage>12</fpage>. <pub-id pub-id-type="doi">10.1016/j.eswa.2021.115090</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Essadike</surname> <given-names>A.</given-names></name> <name><surname>Ouabida</surname> <given-names>E.</given-names></name> <name><surname>Bouzid</surname> <given-names>A.</given-names></name></person-group> (<year>2018</year>). <article-title>Brain tumor segmentation with vander lugt correlator based active contour.</article-title> <source><italic>Comp. Methods Prog. Biomed.</italic></source> <volume>160</volume> <fpage>103</fpage>&#x2013;<lpage>117</lpage>. <pub-id pub-id-type="doi">10.1016/j.cmpb.2018.04.004</pub-id> <pub-id pub-id-type="pmid">29728237</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gordillo</surname> <given-names>N.</given-names></name> <name><surname>Montseny</surname> <given-names>E.</given-names></name> <name><surname>Sobrevilla</surname> <given-names>P.</given-names></name></person-group> (<year>2013</year>). <article-title>State of the art survey on MRI brain tumor segmentation.</article-title> <source><italic>Magn. Reson. Imaging</italic></source> <volume>31</volume> <fpage>1426</fpage>&#x2013;<lpage>1438</lpage>. <pub-id pub-id-type="doi">10.1016/j.mri.2013.05.002</pub-id> <pub-id pub-id-type="pmid">23790354</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Harati</surname> <given-names>V.</given-names></name> <name><surname>Khayati</surname> <given-names>R.</given-names></name> <name><surname>Farzan</surname> <given-names>A.</given-names></name></person-group> (<year>2011</year>). <article-title>Fully automated tumor segmentation based on improved fuzzy connectedness algorithm in brain MR images.</article-title> <source><italic>Comput. Biol. Med.</italic></source> <volume>41</volume> <fpage>483</fpage>&#x2013;<lpage>492</lpage>. <pub-id pub-id-type="doi">10.1016/j.compbiomed.2011.04.010</pub-id> <pub-id pub-id-type="pmid">21601840</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Havaei</surname> <given-names>M.</given-names></name> <name><surname>Davy</surname> <given-names>A.</given-names></name> <name><surname>Warde-Farley</surname> <given-names>D.</given-names></name> <name><surname>Biard</surname> <given-names>A.</given-names></name> <name><surname>Courville</surname> <given-names>A.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>Brain tumor segmentation with deep neural networks.</article-title> <source><italic>Med. Image Anal.</italic></source> <volume>35</volume> <fpage>18</fpage>&#x2013;<lpage>31</lpage>. <pub-id pub-id-type="doi">10.1016/j.media.2016.05.004</pub-id> <pub-id pub-id-type="pmid">27310171</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>He</surname> <given-names>K. M.</given-names></name> <name><surname>Zhang</surname> <given-names>X. Y.</given-names></name> <name><surname>Ren</surname> <given-names>S. Q.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name></person-group> (<year>2015</year>). &#x201C;<article-title>Delving deep into rectifiers: surpassing human-level performance on imagenet classification</article-title>,&#x201D; in <source><italic>Proceedings of the IEEE International Conference on Computer Vision</italic></source>, (<publisher-loc>Santiago</publisher-loc>: <publisher-name>Institute of Electrical and Electronics Engineers</publisher-name>), <fpage>1026</fpage>&#x2013;<lpage>1034</lpage>.</citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>He</surname> <given-names>K. M.</given-names></name> <name><surname>Zhang</surname> <given-names>X. Y.</given-names></name> <name><surname>Ren</surname> <given-names>S. Q.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). &#x201C;<article-title>Deep residual learning for image recognition</article-title>,&#x201D; in <source><italic>Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</italic></source>, (<publisher-loc>Las Vegas, NV</publisher-loc>: <publisher-name>Institute of Electrical and Electronics Engineers</publisher-name>), <fpage>770</fpage>&#x2013;<lpage>778</lpage>.</citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ioffe</surname> <given-names>S.</given-names></name> <name><surname>Szegedy</surname> <given-names>C.</given-names></name></person-group> (<year>2015</year>). <article-title>Batch normalization: accelerating deep network training by reducing internal covariate shift.</article-title> <source><italic>Proc. Mach. Learn.</italic></source> <volume>37</volume> <fpage>448</fpage>&#x2013;<lpage>456</lpage>.</citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kingma</surname> <given-names>D.</given-names></name> <name><surname>Ba</surname> <given-names>J.</given-names></name></person-group> (<year>2014</year>). <article-title>Adam: a method for stochastic optimization.</article-title> <source><italic>arXiv</italic> [Preprint]</source> arXiv:1412.6980,</citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Latif</surname> <given-names>G.</given-names></name> <name><surname>Alghazo</surname> <given-names>J.</given-names></name> <name><surname>Sibai</surname> <given-names>F. N.</given-names></name> <name><surname>Iskandar</surname> <given-names>D.</given-names></name> <name><surname>Khan</surname> <given-names>A. H.</given-names></name></person-group> (<year>2021</year>). <article-title>Recent advancements in fuzzy c-means based techniques for brain MRI segmentation.</article-title> <source><italic>Curr. Med. Imaging</italic></source> <volume>17</volume> <fpage>917</fpage>&#x2013;<lpage>930</lpage>. <pub-id pub-id-type="doi">10.2174/1573405616666210104111218</pub-id> <pub-id pub-id-type="pmid">33397241</pub-id></citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Latif</surname> <given-names>U.</given-names></name> <name><surname>Shahid</surname> <given-names>A. R.</given-names></name> <name><surname>Raza</surname> <given-names>B.</given-names></name> <name><surname>Ziauddin</surname> <given-names>S.</given-names></name> <name><surname>Khan</surname> <given-names>M. A.</given-names></name></person-group> (<year>2021</year>). <article-title>An end-to-end brain tumor segmentation system using multi-inception-UNET.</article-title> <source><italic>Int. J. Imaging Syst. Technol.</italic></source> <volume>31</volume> <fpage>1803</fpage>&#x2013;<lpage>1816</lpage>. <pub-id pub-id-type="doi">10.1002/ima.22585</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>M.</given-names></name> <name><surname>Cho</surname> <given-names>W.</given-names></name> <name><surname>Kim</surname> <given-names>S.</given-names></name> <name><surname>Park</surname> <given-names>S.</given-names></name> <name><surname>Kim</surname> <given-names>J. H.</given-names></name></person-group> (<year>2012</year>). <article-title>Segmentation of interest region in medical volume images using geometric deformable model.</article-title> <source><italic>Comput. Biol. Med.</italic></source> <volume>42</volume> <fpage>523</fpage>&#x2013;<lpage>537</lpage>. <pub-id pub-id-type="doi">10.1016/j.compbiomed.2012.01.005</pub-id> <pub-id pub-id-type="pmid">22402196</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Long</surname> <given-names>J.</given-names></name> <name><surname>Shelhamer</surname> <given-names>E.</given-names></name> <name><surname>Darrell</surname> <given-names>T.</given-names></name></person-group> (<year>2015</year>). <article-title>Fully convolutional networks for semantic segmentation.</article-title> <source><italic>IEEE Trans. Pattern Anal. Mach. Intell.</italic></source> <volume>39</volume> <fpage>640</fpage>&#x2013;<lpage>651</lpage>.</citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Louis</surname> <given-names>D. N.</given-names></name> <name><surname>Ohgaki</surname> <given-names>H.</given-names></name> <name><surname>Wiestler</surname> <given-names>O. D.</given-names></name> <name><surname>Cavenee</surname> <given-names>W. K.</given-names></name> <name><surname>Burger</surname> <given-names>P. C.</given-names></name> <name><surname>Jouvet</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2007</year>). <article-title>The 2007 WHO classification of tumours of the central nervous system.</article-title> <source><italic>Acta Neuropathol.</italic></source> <volume>114</volume> <fpage>97</fpage>&#x2013;<lpage>109</lpage>. <pub-id pub-id-type="doi">10.1007/s00401-007-0243-4</pub-id> <pub-id pub-id-type="pmid">17618441</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>C.</given-names></name> <name><surname>Luo</surname> <given-names>G. N.</given-names></name> <name><surname>Wang</surname> <given-names>K. Q.</given-names></name></person-group> (<year>2018</year>). <article-title>Concatenated and connected random forests with multiscale patch driven active contour model for automated brain tumor segmentation of MR images.</article-title> <source><italic>IEEE Trans. Med. Imag.</italic></source> <volume>37</volume> <fpage>1943</fpage>&#x2013;<lpage>1954</lpage>. <pub-id pub-id-type="doi">10.1109/TMI.2018.2805821</pub-id> <pub-id pub-id-type="pmid">29994627</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Menze</surname> <given-names>B. H.</given-names></name> <name><surname>Jakab</surname> <given-names>A.</given-names></name> <name><surname>Bauer</surname> <given-names>S.</given-names></name> <name><surname>Kalpathy-Cramer</surname> <given-names>J.</given-names></name> <name><surname>Farahani</surname> <given-names>K.</given-names></name> <name><surname>Kirby</surname> <given-names>J.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>The multimodal brain tumor image segmentation benchmark (BRATS).</article-title> <source><italic>IEEE Trans. Med. Imaging</italic></source> <volume>34</volume> <fpage>1993</fpage>&#x2013;<lpage>2024</lpage>. <pub-id pub-id-type="doi">10.1109/TMI.2014.2377694</pub-id> <pub-id pub-id-type="pmid">25494501</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mou</surname> <given-names>L.</given-names></name> <name><surname>Zhao</surname> <given-names>Y. T.</given-names></name> <name><surname>Chen</surname> <given-names>L.</given-names></name> <name><surname>Cheng</surname> <given-names>J.</given-names></name> <name><surname>Gu</surname> <given-names>Z. W.</given-names></name> <name><surname>Hao</surname> <given-names>H. Y.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>CS-Net: channel and spatial attention network for curvilinear structure segmentation.</article-title> <source><italic>Med. Image Comput. Comput. Assist. Interv.</italic></source> <volume>11764</volume> <fpage>721</fpage>&#x2013;<lpage>730</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-32239-7_80</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ostrom</surname> <given-names>Q. T.</given-names></name> <name><surname>Patil</surname> <given-names>N.</given-names></name> <name><surname>Cioffi</surname> <given-names>G.</given-names></name> <name><surname>Waite</surname> <given-names>K.</given-names></name> <name><surname>Kruchko</surname> <given-names>C.</given-names></name> <name><surname>Barnholtz-Sloan</surname> <given-names>J. S.</given-names></name></person-group> (<year>2020</year>). <article-title>CBTRUS statistical report: primary brain and other central nervous system tumors diagnosed in the United States in 2013&#x2013;2017.</article-title> <source><italic>Neuro Oncol.</italic></source> <volume>22</volume> <fpage>iv1-iv96.</fpage> <pub-id pub-id-type="doi">10.1093/neuonc/noaa200</pub-id> <pub-id pub-id-type="pmid">33123732</pub-id></citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pereira</surname> <given-names>S.</given-names></name> <name><surname>Pinto</surname> <given-names>A.</given-names></name> <name><surname>Alves</surname> <given-names>V.</given-names></name> <name><surname>Silva</surname> <given-names>C. A.</given-names></name></person-group> (<year>2016</year>). <article-title>Brain tumor segmentation using convolutional neural networks in MRI images.</article-title> <source><italic>IEEE Trans. Med. Imaging</italic></source> <volume>35</volume> <fpage>1240</fpage>&#x2013;<lpage>1251</lpage>. <pub-id pub-id-type="doi">10.1109/TMI.2016.2538465</pub-id> <pub-id pub-id-type="pmid">26960222</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pham</surname> <given-names>D.</given-names></name> <name><surname>Prince</surname> <given-names>J. L.</given-names></name> <name><surname>Xu</surname> <given-names>C. Y.</given-names></name> <name><surname>Dagher</surname> <given-names>A. P.</given-names></name></person-group> (<year>1997</year>). <article-title>An automated technique for statistical characterization of brain tissues in magnetic resonance imaging.</article-title> <source><italic>Intern. J. Pattern Recognit. Artif. Intell.</italic></source> <volume>11</volume> <fpage>1189</fpage>&#x2013;<lpage>1211</lpage>. <pub-id pub-id-type="doi">10.1142/S021800149700055X</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ronneberger</surname> <given-names>O.</given-names></name> <name><surname>Fischer</surname> <given-names>P.</given-names></name> <name><surname>Brox</surname> <given-names>T.</given-names></name></person-group> (<year>2015</year>). <article-title>U-Net: convolutional networks for biomedical image segmentation.</article-title> <source><italic>Intern. Conf. Med. Image Comput. Comput. Assist. Interv.</italic></source> <volume>9351</volume> <fpage>234</fpage>&#x2013;<lpage>241</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-24574-4_28</pub-id></citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Deng</surname> <given-names>Z. J.</given-names></name> <name><surname>Hu</surname> <given-names>X. W.</given-names></name> <name><surname>Zhu</surname> <given-names>L.</given-names></name> <name><surname>Yang</surname> <given-names>X.</given-names></name> <name><surname>Xu</surname> <given-names>X. M.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Deep attentional features for prostate segmentation in ultrasound.</article-title> <source><italic>Med. Image Comput. Comput. Assist. Interv.</italic></source> <volume>11073</volume> <fpage>523</fpage>&#x2013;<lpage>530</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-00937-3_60</pub-id></citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>C. Y.</given-names></name> <name><surname>Pham</surname> <given-names>D. L.</given-names></name> <name><surname>Prince</surname> <given-names>J. L.</given-names></name></person-group> (<year>1997</year>). <article-title>Finding the brain cortex using fuzzy segmentation, isosurfaces, and deformable surface models.</article-title> <source><italic>Intern. Conf. Inf. Process. Med. Imaging</italic></source> <volume>1230</volume> <fpage>399</fpage>&#x2013;<lpage>404</lpage>. <pub-id pub-id-type="doi">10.1007/3-540-63046-5_33</pub-id></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Z.</given-names></name> <name><surname>Wu</surname> <given-names>C. D.</given-names></name> <name><surname>Coleman</surname> <given-names>S.</given-names></name> <name><surname>Kerr</surname> <given-names>D.</given-names></name></person-group> (<year>2020</year>). <article-title>DENSE-INception U-net for medical image segmentation.</article-title> <source><italic>Comput. Methods Prog. Biomed.</italic></source> <volume>192</volume>:<fpage>105395</fpage>. <pub-id pub-id-type="doi">10.1016/j.cmpb.2020.105395</pub-id> <pub-id pub-id-type="pmid">32163817</pub-id></citation></ref>
</ref-list>
</back>
</article>