<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Neurosci.</journal-id>
<journal-title>Frontiers in Neuroscience</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Neurosci.</abbrev-journal-title>
<issn pub-type="epub">1662-453X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fnins.2022.1004050</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Online hard example mining vs. fixed oversampling strategy for segmentation of new multiple sclerosis lesions from longitudinal FLAIR MRI</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Schmidt-Mengin</surname> <given-names>Marius</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Soulier</surname> <given-names>Th&#x000E9;odore</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1931930/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Hamzaoui</surname> <given-names>Mariem</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1975844/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Yazdan-Panah</surname> <given-names>Arya</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1579503/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Bodini</surname> <given-names>Benedetta</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Ayache</surname> <given-names>Nicholas</given-names></name>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/422935/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Stankoff</surname> <given-names>Bruno</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Colliot</surname> <given-names>Olivier</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/10655/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Institut du Cerveau-Paris Brain Institute, Centre National de la Recherche Scientifique, Inria, Inserm, Assistance Publique-H&#x000F4;pitaux de Paris, H&#x000F4;pital de la Piti&#x000E9; Salp&#x000EA;tri&#x000E8;re, Sorbonne Universit&#x000E9;</institution>, <addr-line>Paris</addr-line>, <country>France</country></aff>
<aff id="aff2"><sup>2</sup><institution>Institut du Cerveau-Paris Brain Institute, Centre National de la Recherche Scientifique, Inserm, Assistance Publique-H&#x000F4;pitaux de Paris, H&#x000F4;pital de la Piti&#x000E9; Salp&#x000EA;tri&#x000E8;re, Sorbonne Universit&#x000E9;</institution>, <addr-line>Paris</addr-line>, <country>France</country></aff>
<aff id="aff3"><sup>3</sup><institution>Department of Neurology, Assistance Publique-H&#x000F4;pitaux de Paris, H&#x000F4;pital Saint-Antoine</institution>, <addr-line>Paris</addr-line>, <country>France</country></aff>
<aff id="aff4"><sup>4</sup><institution>Inria, Epione Project-Team</institution>, <addr-line>Sophia-Antipolis</addr-line>, <country>France</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Michel Dojat, Institut National de la Sant&#x000E9; et de la Recherche M&#x000E9;dicale (INSERM), France</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Richard McKinley, Bern University Hospital, Switzerland; Govind Nair, National Institutes of Health (NIH), United States; Robert Fekete, New York Medical College, United States; Vijay Venkatraman, The University of Melbourne, Australia</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Olivier Colliot <email>olivier.colliot&#x00040;cnrs.fr</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Brain Imaging Methods, a section of the journal Frontiers in Neuroscience</p></fn>
<fn fn-type="equal" id="fn002"><p>&#x02020;These authors have contributed equally to this work</p></fn></author-notes>
<pub-date pub-type="epub">
<day>04</day>
<month>11</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>16</volume>
<elocation-id>1004050</elocation-id>
<history>
<date date-type="received">
<day>26</day>
<month>07</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>11</day>
<month>10</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2022 Schmidt-Mengin, Soulier, Hamzaoui, Yazdan-Panah, Bodini, Ayache, Stankoff and Colliot.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Schmidt-Mengin, Soulier, Hamzaoui, Yazdan-Panah, Bodini, Ayache, Stankoff and Colliot</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license></permissions>
<abstract>
<p>Detecting new lesions is a key aspect of the radiological follow-up of patients with Multiple Sclerosis (MS), leading to eventual changes in their therapeutics. This paper presents our contribution to the MSSEG-2 MICCAI 2021 challenge. The challenge is focused on the segmentation of new MS lesions using two consecutive Fluid Attenuated Inversion Recovery (FLAIR) Magnetic Resonance Imaging (MRI). In other words, considering longitudinal data composed of two time points as input, the aim is to segment the lesional areas, which are present only in the follow-up scan and not in the baseline. The backbone of our segmentation method is a 3D UNet applied patch-wise to the images, and in which, to take into account both time points, we simply concatenate the baseline and follow-up images along the channel axis before passing them to the 3D UNet. Our key methodological contribution is the use of online hard example mining to address the challenge of class imbalance. Indeed, there are very few voxels belonging to new lesions which makes training deep-learning models difficult. Instead of using handcrafted priors like brain masks or multi-stage methods, we experiment with a novel modification to online hard example mining (OHEM), where we use an exponential moving average (i.e., its weights are updated with momentum) of the 3D UNet to mine hard examples. Using a moving average instead of the raw model should allow smoothing of its predictions and allow it to give more consistent feedback for OHEM.</p></abstract>
<kwd-group>
<kwd>segmentation</kwd>
<kwd>deep learning</kwd>
<kwd>hard example mining</kwd>
<kwd>multiple sclerosis</kwd>
<kwd>MRI</kwd>
</kwd-group>
<counts>
<fig-count count="3"/>
<table-count count="2"/>
<equation-count count="1"/>
<ref-count count="23"/>
<page-count count="10"/>
<word-count count="5854"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>Multiple Sclerosis (MS) is a chronic autoimmune demyelinating inflammatory disease of the central nervous system and represents the leading cause of non-traumatic motor disability of young people in Europe and North America (Howard et al., <xref ref-type="bibr" rid="B12">2016</xref>). MS lesions, consisting of focal areas of demyelination, edema, and auto-immune inflammation, are visible on Magnetic Resonance Imaging (MRI), especially on Fluid Attenuated Inversion Recovery (FLAIR) as contiguous areas of hypersignal (Filippi et al., <xref ref-type="bibr" rid="B8">2019</xref>). The decrease or absence of new FLAIR lesion formation over time is a key radiological endpoint in clinical trials assessing disease-modifying therapies in MS, and the absence of such radiological activity takes part in the &#x0201C;No Evidence of Disease Activity&#x0201D; score, used to monitor patient&#x00027;s disease control and to discuss potential therapeutic change at the individual level (Hegen et al., <xref ref-type="bibr" rid="B11">2018</xref>). Novel lesion identification and segmentation is usually performed manually, or using semi-automated procedures, by radiologists or neurologists and is time-consuming and subject to intra- and inter-rater variability (Altay et al., <xref ref-type="bibr" rid="B1">2013</xref>). The aim of the MICCAI MSSEG-2 challenge was to benchmark new automatic methods to segment new lesions based on two FLAIR MRIs from two longitudinal visits (baseline and follow-up) of the same patient. Already published methods for this task consists mostly of either non-deep learning methods (Cabezas et al., <xref ref-type="bibr" rid="B4">2016</xref>) or deep learning methods using multiple MRI sequences (McKinley et al., <xref ref-type="bibr" rid="B15">2020</xref>; Salem et al., <xref ref-type="bibr" rid="B18">2020</xref>); there are very few deep learning methods for this precise task based uniquely on FLAIR sequences (Gessert et al., <xref ref-type="bibr" rid="B9">2020</xref>). The present paper describes our contribution to the challenge. The backbone of our approach is a patch-wise 3D UNet (&#x000C7;i&#x000E7;ek et al., <xref ref-type="bibr" rid="B5">2022</xref>). Our key methodological contribution is to introduce online hard example mining (Shrivastava et al., <xref ref-type="bibr" rid="B19">2016</xref>) (OHEM) to tackle class imbalance. Indeed, one important characteristic of the dataset is that there are fewer voxels belonging to a new lesion (positive) than not belonging to a new lesion (negative), images comprise on average approximately 0.005% of positive voxels. Notably, we use a moving average of our 3D UNet to perform inference for hard example mining. Our goal is that, similar to He et al. (<xref ref-type="bibr" rid="B10">2020</xref>), doing so will provide more stable predictions as training progresses. The present paper extends that published in the proceedings of the MICCAI MSSEG-2 2021 workshop (Commowick et al., <xref ref-type="bibr" rid="B6">2021</xref>) by providing a more extensive description of the methodology as well as more detailed experimental results including the testing of the algorithm on another cohort (Bodini et al., <xref ref-type="bibr" rid="B3">2016</xref>) distinct from the MICCAI MSSEG-2 testing dataset.</p></sec>
<sec sec-type="methods" id="s2">
<title>Methods</title>
<sec>
<title>Preprocessing</title>
<p>We resampled each FLAIR image to a voxel size of 0.5 mm as it is the highest resolution of the training dataset and applied a <italic>z</italic>-score normalization to each FLAIR individually. As the two consecutive FLAIR images (baseline and follow-up) of a patient have been aligned in the halfway space using a rigid transformation by the challenge providers, our method starts by concatenating them along the channel dimension, resulting in a tensor of shape 2<sup>&#x0002A;</sup>D<sup>&#x0002A;</sup>H<sup>&#x0002A;</sup>W, where D, H, and W are, respectively, the depth, height, and width of the resampled FLAIR image. This tensor is then subdivided into patches of shape 2<sup>&#x0002A;</sup>32<sup>&#x0002A;</sup>32<sup>&#x0002A;</sup>32, which are passed through a 3D UNet to obtain the segmentation.</p></sec>
<sec>
<title>Model</title>
<p>Our backbone model is a standard 3D UNet, which can be described by the following equations:</p>
<disp-formula id="E1"><mml:math id="M1"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mtext>B</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mtext>n</mml:mtext><mml:mo>)</mml:mo></mml:mrow><mml:mtext>&#x000A0;</mml:mtext><mml:mo>:</mml:mo><mml:mtext>=2x&#x000A0;</mml:mtext><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtext>3DConvolution</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mtext>n</mml:mtext><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo><mml:mtext>Group&#x000A0;Normalization</mml:mtext></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mrow><mml:mo>&#x02192;</mml:mo><mml:mtext>ReLU</mml:mtext></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>3D&#x000A0;UNet&#x000A0;</mml:mtext><mml:mo>:</mml:mo><mml:mtext>=B</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>16</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x02193;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x02192;</mml:mo><mml:mtext>B</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>32</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x02193;</mml:mo><mml:mo>&#x02192;</mml:mo><mml:mtext>B</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>64</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>&#x02191;</mml:mo><mml:mtext>B</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>32</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo><mml:mo>&#x02191;</mml:mo><mml:mtext>B</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>16</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x02192;</mml:mo><mml:mtext>Conv</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mtext>1</mml:mtext><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where the numbers in the parentheses are the number of filters, &#x02193; indicates max pooling and &#x02191; indicates trilinear upsampling. The model is trained on patches of size 32. For inference, we split the image into a grid of patches of size 32, with a stride of 24. This means that the patches have an overlap of 8 pixels. In these overlapping regions, we averaged all predictions and binarized the final output with a threshold of 0.5.</p></sec>
<sec>
<title>Dataset</title>
<p>We used the MICCAI MSSEG-2 datasets (Commowick et al., <xref ref-type="bibr" rid="B6">2021</xref>) for training, validation, and the first testing set (see <xref ref-type="sec" rid="A1">Appendix</xref>). We also used a second testing set consisting of a previously published MS cohort from our laboratory (Bodini et al., <xref ref-type="bibr" rid="B3">2016</xref>). This cohort was constituted of 19 patients with active relapsing remitting MS (13 women, mean age 32.3 years sd 5.6) who underwent two MRIs with FLAIR spaced from minimum 31 days to maximum 120 days. Of those 19 patients, only 18 had available FLAIR MRIs for each visit. As only one of those 18 remaining patients had no new lesions at the second visit, we focused on the 17 patients that presented new lesions at the second visit for the second testing dataset. For these 17 patients, the new lesions at the second visit were manually contoured in native space and verified by a senior neurologist. After rigid co-registration to halfway space (FLIRT, <ext-link ext-link-type="uri" xlink:href="http://fsl.fmrib.ox.ac.uk/">http://fsl.fmrib.ox.ac.uk/</ext-link>) (Jenkinson and Smith, <xref ref-type="bibr" rid="B13">2001</xref>), we gave the baseline and the follow-up FLAIR as input to our algorithm, and the manually contoured lesion mask as ground truth to evaluate our algorithm performances. Acquisitions for our testing cohort were run on a 3 Tesla Siemens machine, with a 32-channel head coil (Repetition Time: 8.88 ms; Echo Time: 129 ms; Inversion Time: 2.5 ms; Flip Angle: 120&#x000B0;; Pixel size: 0.9 &#x000D7; 0.9 &#x000D7; 3 mm).</p></sec>
<sec>
<title>Training</title>
<p>As the images contain very few positive voxels, we do not sample the patches uniformly during training. One common strategy is to over-sample patches containing positive regions with a constant ratio. However, this ratio must be fine-tuned by hand. If it is too high, it can result in many false positives. Instead, our method uses a 3D UNet with momentum weight updates to perform hard example mining. A training iteration consists of three steps, illustrated in <xref ref-type="fig" rid="F1">Figure 1</xref> and described by <xref ref-type="table" rid="T2">Algorithm 1</xref>. In the first step, we select a batch of <inline-formula><mml:math id="M5"><mml:mover accent="false" class="mml-overline"><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mo accent="true">&#x000AF;</mml:mo></mml:mover></mml:math></inline-formula> = 128 patches, which contains 30% of positive patches and 70% of uniformly sampled patches (i.e., mostly negatives due to the class imbalance). We then pass this batch through a 1st 3D UNet, denoted by <inline-formula><mml:math id="M6"><mml:mover accent="false" class="mml-overline"><mml:mrow><mml:mi>U</mml:mi><mml:mi>N</mml:mi><mml:mi>e</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mo accent="true">&#x000AF;</mml:mo></mml:mover></mml:math></inline-formula>, to obtain a prediction for each element of the batch and compute the segmentation errors with respect to the ground truth. Second, we select the <italic>B</italic> = 32 patches with the highest error and perform a training step on them with a second 3D UNet, denoted <italic>Unet</italic>. Last, we perform a momentum update of the weights of the 1st 3D <inline-formula><mml:math id="M7"><mml:mover accent="false" class="mml-overline"><mml:mrow><mml:mi>U</mml:mi><mml:mi>N</mml:mi><mml:mi>e</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mo accent="true">&#x000AF;</mml:mo></mml:mover></mml:math></inline-formula>, with the second 3D <italic>Unet</italic>. The use of momentum ensures that the predictions given by the 1st 3D <inline-formula><mml:math id="M8"><mml:mover accent="false" class="mml-overline"><mml:mrow><mml:mi>U</mml:mi><mml:mi>N</mml:mi><mml:mi>e</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mo accent="true">&#x000AF;</mml:mo></mml:mover></mml:math></inline-formula> do not fluctuate too much during training and provide reliable samples for online hard example mining.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Illustration of our training strategy. <inline-formula><mml:math id="M2"><mml:mover accent="false" class="mml-overline"><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mo accent="true">&#x000AF;</mml:mo></mml:mover></mml:math></inline-formula> patches are fed to a first 3D <inline-formula><mml:math id="M3"><mml:mover accent="false" class="mml-overline"><mml:mrow><mml:mi>U</mml:mi><mml:mi>N</mml:mi><mml:mi>e</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mo accent="true">&#x000AF;</mml:mo></mml:mover></mml:math></inline-formula> and the segmentation errors are computed for each patch. The patches are ranked according to their errors, and the top<sub>B</sub> patches are selected to perform a training step with a second 3D <italic>Unet</italic>. The weights of the first 3D <inline-formula><mml:math id="M4"><mml:mover accent="false" class="mml-overline"><mml:mrow><mml:mi>U</mml:mi><mml:mi>N</mml:mi><mml:mi>e</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mo accent="true">&#x000AF;</mml:mo></mml:mover></mml:math></inline-formula> are momentum-updated with the weights of the second 3D <italic>Unet</italic>.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-16-1004050-g0001.tif"/>
</fig>
<table-wrap position="float" id="T2">
<label>Algorithm 1</label>
<caption><p>The algorithm used for the training with OHEM and momentum update.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-16-1004050-i0001.tif"/>
</table-wrap></sec>
<sec>
<title>Training&#x02014;OHEM vs. oversampling comparison</title>
<p>We optimized each network for 3 h on one NVIDIA Tesla P100 graphic card using Adam (Kingma and Ba, <xref ref-type="bibr" rid="B14">2022</xref>). Note that for OHEM, the duration of one iteration is roughly 2 times longer. In the end, 3 h of training corresponds to about 30k iterations with OHEM and 64k without. The initial learning rate was set to 10<sup>&#x02212;3</sup> and decayed to 10<sup>&#x02212;4</sup> and 10<sup>&#x02212;5</sup> after, respectively, 50 and 80% of the training time. We split the dataset into 30 patients for training, and 10 for validation.</p>
<p>We compared the learning curves using the Dice score on the validation set for six training procedures: three with OHEM with a momentum of, respectively, 0, 0.9, and 0.99, and three without OHEM but with oversampling with a probability p of, respectively, 0 (uniform), 0.1, and 0.5. This oversampling probability meant that we sampled positive patches (i.e., with a new lesion at a second time point) with a probability p and other patches (that could be randomly positive or negative) with a probability 1-p for the training.</p></sec>
<sec>
<title>Training&#x02014;Final approach provided for the MSSEG-2 challenge</title>
<p>We used the model described before, using OHEM with a momentum of 0.9, and trained the model on the whole MICCAI MSSEG-2 training dataset for 30k iteration. As above, the initial learning rate is set to 10<sup>&#x02212;3</sup> and decayed to 10<sup>&#x02212;4</sup> and 10<sup>&#x02212;5</sup> after, respectively, 50 and 80% of the training time.</p></sec>
<sec>
<title>Evaluation metrics for the testing dataset</title>
<p>The evaluation procedure was defined by the MICCAI MSSEG-2 committee (Commowick et al., <xref ref-type="bibr" rid="B6">2021</xref>). We briefly recall this procedure in the following. The MICCAI MSSEG-2 testing dataset of 60 patients was divided into two subsets, according to the presence or absence of new lesions in patients: 28 patients without new lesions and 32 patients with new lesions. Those two datasets were evaluated differently.</p>
<p>All new lesions from the ground truth and our algorithm prediction were individualized by computing the connected components, and all lesions smaller than 3 mm<sup>3</sup> were removed (Commowick et al., <xref ref-type="bibr" rid="B7">2018</xref>). The detection was defined at the lesion level using the algorithm described by Commowick et al. (<xref ref-type="bibr" rid="B7">2018</xref>) with the parameters &#x003B1; = 10%, &#x003B2; = 65%, and &#x003B3; = 70%, which were set by the MICCAI MSSEG-2 committee.</p>
<p>For the 28 patients without new lesions, the following metrics are reported: the lesion volume prediction per patient in mm<sup>3</sup>, and the new lesion detection rate per patient.</p>
<p>For the 32 patients with new lesions, the evaluation aimed at assessing both the quality of the detection and the segmentation. For evaluating the segmentation, the (voxel-level) Dice score per patient was reported. For evaluating the detection, the following metrics were used: the mean sensitivity <italic>Sens</italic> (=recall) at the lesion level per patient for detecting new lesions, and the mean positive predictive value <italic>PPV</italic> (=precision) at the lesion level per patient for detecting new lesions and the F<sub>1</sub> score at the lesion level (which combines lesion-level <italic>Sens</italic> and <italic>PPV</italic>) per patient (Commowick et al., <xref ref-type="bibr" rid="B7">2018</xref>).</p>
<p>The calculation of those metrics is described below. True positives with respect to the ground truth TP<sub>gt</sub> were defined as the number of new lesions from the ground truth that were correctly detected by our algorithm. True positives with respect to our prediction TP<sub>pred</sub> correspond to the number of new lesions predicted by our algorithm that were correctly detected by the ground truth.</p>
<list list-type="bullet">
<list-item><p><inline-formula><mml:math id="M9"><mml:mi>D</mml:mi><mml:mi>i</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>2</mml:mn><mml:mtext>&#x000A0;</mml:mtext><mml:mo>|</mml:mo><mml:mi>P</mml:mi><mml:mi>R</mml:mi><mml:mi>E</mml:mi><mml:mi>D</mml:mi><mml:mo>&#x02229;</mml:mo><mml:mi>G</mml:mi><mml:mi>T</mml:mi><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mi>P</mml:mi><mml:mi>R</mml:mi><mml:mi>E</mml:mi><mml:mi>D</mml:mi><mml:mo>|</mml:mo><mml:mo>&#x0002B;</mml:mo><mml:mo>|</mml:mo><mml:mi>G</mml:mi><mml:mi>T</mml:mi><mml:mo>|</mml:mo></mml:mrow></mml:mfrac></mml:math></inline-formula>, where PRED is the network prediction and GT the ground truth segmentation, |<italic>PRED</italic>&#x02229;<italic>GT</italic>| is the number of overlapping voxels between the prediction and the ground truth, |<italic>PRED</italic>| is the number of voxels in the prediction and |<italic>GT</italic>| the number of voxels in the ground truth.</p></list-item>
<list-item><p><inline-formula><mml:math id="M10"><mml:mi>S</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>g</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mi>e</mml:mi><mml:mi>w</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mi>l</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>g</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></inline-formula> where TP<sub>gt</sub> and n<sub>new lesions_gt</sub> are, respectively, the true positives with respect to the ground truth and the number of new lesions in the ground truth.</p></list-item>
<list-item><p><inline-formula><mml:math id="M11"><mml:mi>P</mml:mi><mml:mi>P</mml:mi><mml:mi>V</mml:mi><mml:mo>=</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mi>e</mml:mi><mml:mi>w</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mi>l</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></inline-formula> where TP<sub>pred</sub> and n<sub>new lesions_pred</sub> are, respectively, the true positives with respect to our prediction and the number of new lesions in our prediction.</p></list-item>
<list-item><p><inline-formula><mml:math id="M12"><mml:msub><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup><mml:mi>S</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:msup><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x0002A;</mml:mo></mml:mrow></mml:msup><mml:mi>P</mml:mi><mml:mi>P</mml:mi><mml:mi>V</mml:mi></mml:mrow><mml:mrow><mml:mi>S</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>P</mml:mi><mml:mi>P</mml:mi><mml:mi>V</mml:mi></mml:mrow></mml:mfrac></mml:math></inline-formula> where <italic>Sens</italic> and <italic>PPV</italic> are, respectively, the previously defined sensitivity and Positive Predictive Value.</p></list-item>
</list>
<p>All of those metrics were compared to zero for patients without new lesion, and to the ground truth segmentation of patients with new lesion, which is the consensual segmentation from four expert annotators (Commowick et al., <xref ref-type="bibr" rid="B6">2021</xref>). All results are presented as mean, Standard Error to the Mean (SEM), and rank among other challenge pipelines when available.</p>
<p>For the second testing dataset, constituted by the 17 patients with new lesions in our cohort, we used exactly the same evaluation procedure that we described above for the patients with new lesions from the MICCAI MSSEG-2 testing dataset.</p></sec>
<sec>
<title>Implementation details</title>
<p>Our algorithms were implemented on PyTorch (Paszke et al., <xref ref-type="bibr" rid="B16">2017</xref>) and written using TorchIO library (P&#x000E9;rez-Garc&#x000ED;a et al., <xref ref-type="bibr" rid="B17">2021</xref>). The implementation was based on that of Wolny et al. (<xref ref-type="bibr" rid="B22">2020</xref>). Training was performed on an NVIDIA Tesla P100 graphic card.</p></sec></sec>
<sec sec-type="results" id="s3">
<title>Results</title>
<sec>
<title>Results on the validation set: Impact of the OHEM procedure</title>
<p>The comparison of the learning curves for the proposed OHEM procedure and the forced oversampling procedure is shown in <xref ref-type="fig" rid="F2">Figure 2</xref>. One can observe that, on this task, the OHEM procedure, even with increasing the momentum to 0.99, did not give better results in terms of training speed nor plateau of the Dice score. However, we observed that using OHEM gives a positive momentum that helped to reach a higher plateau of the Dice score on the validation set compared to a null one.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Evolution of the Dice score as a function of training time for the OHEM and the forced oversampling procedure (denoted as &#x0201C;Uniform&#x0201D;). For OHEM, &#x003BC; is the momentum. For &#x0201C;Uniform&#x0201D;, patches were sampled with respective probabilities, p for those with new lesions, and 1-p for the rest (not necessarily without new lesions). One can observe that the &#x0201C;Uniform&#x0201D; procedure with <italic>p</italic> &#x0003E; 0 ended up performing best and that, when using OHEM, choosing &#x003BC; &#x0003E; 0 seems to be beneficial.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-16-1004050-g0002.tif"/>
</fig></sec>
<sec>
<title>Results on the testing set</title>
<p>Results on the first testing set from MICCAI MSSEG-2 are shown in <xref ref-type="table" rid="T1">Table 1</xref>. In the 32 patients with new lesions, our network achieved a mean lesion-level F<sub>1</sub> score per patient of 0.446 (SEM 0.057), ranking 13th over 29 approaches for this metric. The mean Dice per patient was 0.400 (SEM 0.051), which ranked 18/29. Our mean sensitivity at the lesion level per patient was 0.616 (SEM 0.069) and our mean positive predictive value at the lesion level per patient was 0.383 (SEM 0.054). Concerning the 28 patients without new lesions, for whom any prediction is a pure false positive, on average, 0.75 (SEM 0.32) new lesions were predicted per patient (ranking our approach 15/29), with a mean lesion volume per patient among those 28 patients without new lesion of 31.2 mm<sup>3</sup> (SEM 13.0), which corresponded to a rank of 20/29.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Results on the testing set using MICCAI MSSEG-2 evaluation metrics, with the specific evaluation metrics from MICCAI MSSEG-2 testing dataset for the 32 patients with new lesions (a) as well as for the 28 patients without new lesions (b), and the 17 patients with new lesions from our second testing dataset (c).</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left" colspan="4"><bold>(a) MICCAI MSSEG-2 testing dataset: patients with new lesions (<italic>n =</italic> 32)</bold></th>
</tr>
</thead>
<tbody>
<tr style="border-bottom: thin solid #000000;">
<td valign="top" align="left"><bold>Lesion-level F</bold><sub>1</sub> <bold>score per patient</bold>,<break/> <bold>mean (SEM); rank</bold></td>
<td valign="top" align="left"><bold>Dice score per patient</bold>,<break/> <bold>mean (SEM); rank</bold></td>
<td valign="top" align="left"><italic><bold>Sens</bold></italic> <bold>at lesion level per patient</bold>,<break/> <bold>mean (SEM)</bold></td>
<td valign="top" align="left"><italic><bold>PPV</bold></italic> <bold>at lesion level per patient</bold>,<break/> <bold>mean (SEM)</bold></td>
</tr> <tr style="border-bottom: thin solid #000000;">
<td valign="top" align="left">0.446 (0.057); 13<sup>th</sup>/29</td>
<td valign="top" align="left">0.400 (0.051); 18<sup>th</sup>/29</td>
<td valign="top" align="left">0.616 (0.069)</td>
<td valign="top" align="left">0.383 (0.054)</td>
</tr> <tr style="border-bottom: thin solid #000000;">
<td valign="top" align="left" colspan="4"><bold>(b) MICCAI MSSEG-2 testing dataset: patients without new lesion (</bold><italic><bold>n</bold> =</italic> <bold>28)</bold></td>
</tr> <tr>
<td valign="top" align="left" colspan="2"><bold>Number of new lesions predicted per patient</bold>,</td>
<td valign="top" align="left" colspan="2"><bold>Lesion volume predicted in mm</bold><sup>3</sup> <bold>per patient</bold>,</td>
</tr>
<tr style="border-bottom: thin solid #000000;">
<td valign="top" align="left" colspan="2"><bold>mean (SEM); rank</bold></td>
<td valign="top" align="left" colspan="2"><bold>mean (SEM); rank</bold></td>
</tr> <tr style="border-bottom: thin solid #000000;">
<td valign="top" align="left" colspan="2">0.750 (0.320); 15<sup>th</sup>/29</td>
<td valign="top" align="left" colspan="2">31.2 (13.0); 20<sup>th</sup>/29</td>
</tr> <tr style="border-bottom: thin solid #000000;">
<td valign="top" align="left" colspan="4"><bold>(c) Second testing dataset: patients with new lesions (</bold><italic><bold>n</bold> =</italic> <bold>17</bold>)</td>
</tr> <tr style="border-bottom: thin solid #000000;">
<td valign="top" align="left"><bold>Lesion-level F</bold><sub>1</sub> <bold>score per patient</bold>,<break/> <bold>mean (SEM)</bold></td>
<td valign="top" align="left"><bold>Dice score per patient, mean (SEM)</bold></td>
<td valign="top" align="left"><italic><bold>Sens</bold></italic> <bold>at lesion level per patient</bold>,<break/> <bold>mean (SEM)</bold></td>
<td valign="top" align="left"><italic><bold>PPV</bold></italic> <bold>at lesion level per patient, mean (SEM)</bold></td>
</tr> <tr>
<td valign="top" align="left">0.365 (0.038)</td>
<td valign="top" align="left">0.465 (0.046)</td>
<td valign="top" align="left">0.901 (0.043)</td>
<td valign="top" align="left">0.239 (0.030)</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>On our second testing set from our laboratory, on the 17 patients with new lesions, the mean Dice per patient was 0.465 (SEM 0.046). At the lesion level, our network achieved a mean sensitivity per patient of 0.901 (SEM 0.043) and a mean positive predictive value per patient of 0.239 (SEM 0.030), resulting in a mean lesion-level F<sub>1</sub> score per patient of 0.365 (SEM 0.038).</p>
<p><xref ref-type="fig" rid="F3">Figure 3</xref> shows an example of inference on a follow-up MRI from this second testing set from our laboratory.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Example of prediction on one patient from our second testing dataset (Bodini et al., <xref ref-type="bibr" rid="B3">2016</xref>).</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-16-1004050-g0003.tif"/>
</fig></sec></sec>
<sec sec-type="discussion" id="s4">
<title>Discussion</title>
<p>The main contribution of this work was the introduction of online hard example mining (OHEM) to deal with class imbalance. The rest of the approach is constituted of a standard 3D UNet. We first showed that the use of a non-negative momentum helped the training procedure. However, overall, OHEM did not perform better than a predefined fixed oversampling and especially performed worse when an oversampling probability of <italic>p</italic> = 0.1 was used for fixed oversampling.</p>
<p>On the MICCAI MSSEG-2 testing set, our approach ranked in the mid-class of the challenge (Dice score of 0.400, corresponding rank 18/29; lesion-level F<sub>1</sub> score of 0.446, rank 13/29). Interestingly, compared to other pipelines of the challenge, our worst performances were on the subset of patients without new lesions, where any prediction is a false positive. Together with the relatively high sensitivity but relatively low PPV, this could be explained by a bias in the OHEM training toward a high detection rate, resulting in a greater false positive rate. This trend was even stronger when we evaluated the algorithm performances on our second testing dataset, with a higher Dice score of 0.465, a higher sensitivity of 0.901 but a lower PPV of 0.239.</p>
<p>When compared to other pipelines of the challenge, the best pipeline in the subset of patients without new lesions, consisting of a 3D UNet with pre-activation block, also used an oversampling strategy for Regions of Interest with new lesions, but was also ranked in the mid-class of the challenge for the Dice score on the patients with new lesions (with a Dice score of 0.409). The most accurate pipeline in terms of Dice score (even better than several annotators), which did not use any oversampling strategy, was ranked in the mid-class of the challenge for the subset of patients without new lesions for the score of new lesions detection rate. This is consistent with the idea that dealing with the oversampling of positive examples is a key problem in the balance between false positive and false negative predictions in this new lesion segmentation task. We believe, given the medical utility of this task at the individual level for patient follow-up, that a compromise between sensitivity and PPV favoring sensitivity is clinically relevant if the algorithm is considered as an auxiliary to the neurologist or radiologist. Indeed, the interrater variability in manual new lesions detection is mainly explained by false negative rate (Altay et al., <xref ref-type="bibr" rid="B1">2013</xref>), i.e., new lesions that were not detected by the rater. We believe that sensitive algorithms could help neurologists or radiologists to detect those overlooked new lesions. The clinicians could subsequently easily remove false positive predictions of the algorithm after visual checking. However, there is still a long way to go for clinical applications of algorithms for new lesion segmentation. This will require not only algorithm improvement but also prospective validation studies on larger and very diverse datasets.</p>
<p>There was only one pipeline in the challenge that did not use deep learning. Even if they outperformed four deep learning teams on average, their ranking was low on the MICCAI MSSEG-2 testing dataset, with a mean Dice of 0.309 for patients with new lesions, and a mean volume of new lesions detected of 177.9 mm<sup>3</sup> for patients without new lesions. This does not mean that non-deep-learning methods are not potentially useful for this task but this would require additional comparisons which are outside of the scope of the present work. To our knowledge, most of the previously published deep learning algorithms (McKinley et al., <xref ref-type="bibr" rid="B15">2020</xref>; Salem et al., <xref ref-type="bibr" rid="B18">2020</xref>) or recent non deep learning based on deformation field (Cabezas et al., <xref ref-type="bibr" rid="B4">2016</xref>) used to segment new lesions on MS MRIs are based on multiple MRI sequences and not only on a single sequence. It is the same when looking at previously published deep learning algorithms used to segment the lesion load transversally (Valverde et al., <xref ref-type="bibr" rid="B20">2019</xref>; Zeng et al., <xref ref-type="bibr" rid="B23">2020</xref>). So, even if clinically relevant (Hegen et al., <xref ref-type="bibr" rid="B11">2018</xref>), the challenge task allows neural network to learn less information for prediction than in most of the state of art methods, and it can partly explain the difficulty of the task. The previous work from Gessert et al. (<xref ref-type="bibr" rid="B9">2020</xref>) based on attention gated two paths convolutional neural networks was to our knowledge the most relevant deep learning work published on the task of segmenting MS new lesion based only on two follow up FLAIR sequences. They did not require an oversampling procedure to deal with class imbalance and had very good lesion-wise true positive rate and lesion-wise false positive rate. However, we could not compare methods since their proposed evaluation metrics differed from the ones provided by MICCAI MSSEG-2 (Commowick et al., <xref ref-type="bibr" rid="B7">2018</xref>).</p>
<p>This work has several limitations. First, concerning the OHEM training methodology (Shrivastava et al., <xref ref-type="bibr" rid="B19">2016</xref>), it did not improve the training procedure on this task and did not outperform significantly other competing 3D UNets across the challenge. Despite being an interesting methodology to deal with class imbalance, we have to keep in mind that it has been developed for detection in 2D natural images (Shrivastava et al., <xref ref-type="bibr" rid="B19">2016</xref>) using fast R-CNN (Wang et al., <xref ref-type="bibr" rid="B21">2016</xref>). Even though it has shown promising results in Bian et al. (<xref ref-type="bibr" rid="B2">2022</xref>) work on heart MRI, unveiling its full potential for 3D medical image segmentation may require further adaptations and developments. Second, we chose to compare OHEM and fixed oversampling as a function of training time and not as a function of epochs. Training time could be influenced by many parameters like machine heat and GPU availability. However, we believe it was the fairest way to compare methods. Indeed, the unit cost of each epoch (or iteration) has no reason to be the same for the different techniques. Even worse, it can vary across epochs due to the nature of the OHEM method. Another limitation is that we used a single split into training and validation rather than a cross-validation strategy. Thus, we did not use all samples for testing and we did not assess their variability when varying the training set. We made this choice because we had to provide one single result for the challenge. We did not use data augmentation in our training strategy to be able to compare different oversampling strategies and momentum, but OHEM comportment should be explored with data augmentation in future work. Due to the short delay between baseline and follow-up MRIs in the MICCAI MSSEG-2 dataset (from 1 to 3 years) as well as in our second testing dataset (maximum 120 days), we could not explore the influence of severe atrophy in this task. An adjacent and clinically useful task for longitudinal follow-up of MS patients, that we could not assess here due to challenge constraints focusing on new lesions, is the detection of shrinking and enlarging lesions. Furthermore, it is likely that the use of multicontrast MRI could improve the results over the use of FLAIR alone. The aim of the MICCAI 2021 MSSEG-2 challenge was to develop an algorithm only based on two longitudinal FLAIRs. Thus, our present work only uses FLAIR as input and a comparison with a multicontrast input is left for future work. Another important aspect that remains to be studied is generalizability to other acquisition settings. In the MICCAI MSSEG-2 challenge, there was quite a variety of different MRI machines. Furthermore, it is interesting to note that the General Electric machines present in the MICCAI MSSEG-2 dataset were not present in the training dataset. However, further experiments, which could not be performed within the challenge setting, would be required to demonstrate generalizability across acquisition settings. Future work will be to go further into dealing with class imbalance during training with a fixed oversampling strategy, as it gave interesting results on the validation set and in other pipelines of the challenge. The difficulty with a fixed oversampling strategy is the arbitrary choice of the oversampling factor. Perhaps inserting neurological priors to guide the oversampling factors and adapting them to the anatomical region could be a promising idea, allowing to take into account the complexity of prediction in some brain areas and the variability of the lesion load over brain regions in MS to tune locally the probability of patches from those regions to be oversampled.</p></sec>
<sec sec-type="conclusions" id="s5">
<title>Conclusion</title>
<p>In this paper, we described our contribution to the MICCAI MSSEG-2 challenge (Commowick et al., <xref ref-type="bibr" rid="B6">2021</xref>). The main new methodological component was the use of online hard example mining (OHEM) for handling class imbalance. Overall, on the challenge testing set, our pipeline ranked at the mid-class, with an average Dice of 0.400 and an average F<sub>1</sub> score of 0.446. For this specific application, on the validation set, OHEM did not provide any improvement over a standard fixed oversampling strategy. Nevertheless, such a strategy may deserve further investigation for medical imaging problems with class imbalance.</p></sec>
<sec sec-type="data-availability" id="s6">
<title>Data availability statement</title>
<p>The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: <ext-link ext-link-type="uri" xlink:href="https://portal.fli-iam.irisa.fr/msseg-2/data/">https://portal.fli-iam.irisa.fr/msseg-2/data/</ext-link>.</p></sec>
<sec id="s7">
<title>Ethics statement</title>
<p>The studies involving human participants were reviewed and approved by OFSEP: <ext-link ext-link-type="uri" xlink:href="https://www.ofsep.org/en">https://www.ofsep.org/en</ext-link>. The patients/participants provided their written informed consent to participate in this study.</p></sec>
<sec id="s8">
<title>Author contributions</title>
<p>MS-M and TS contributed to the pipeline implementation and the manuscript redaction. MH and AY-P contributed to algorithm training, submission, and manuscript redaction. BB and BS contributed with clinical advice and revision. NA, BS, and OC contributed with implementation advice, work supervision, and manuscript revision. All authors contributed to the article and approved the submitted version.</p></sec>
<sec sec-type="funding-information" id="s9">
<title>Funding</title>
<p>The research leading to these results has received funding from the French government under management of Agence Nationale de la Recherche as part of the Investissements d&#x00027;avenir program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute), reference ANR-10-IAIHU-06 (Agence Nationale de la Recherche-10-IA Institut Hospitalo-Universitaire-6), and reference number ANR-19-P3IA-0002 (3IA C&#x000F4;te d&#x00027;Azur) and from ICM under the Big Brain Theory program (project IMAGIN-DEAL in MS-M). This work was supported by the Fondation pour la Recherche M&#x000E9;dicale, Grant No. FDM202006011247 to TS and by the Fondation Sorbonne Universit&#x000E9; to MH.</p></sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p></sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p></sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Altay</surname> <given-names>E. E.</given-names></name> <name><surname>Fisher</surname> <given-names>E.</given-names></name> <name><surname>Jones</surname> <given-names>S. E.</given-names></name> <name><surname>Hara-Cleaver</surname> <given-names>C.</given-names></name> <name><surname>Lee</surname> <given-names>J. C.</given-names></name> <name><surname>Rudick</surname> <given-names>R. A.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>Reliability of classifying multiple sclerosis disease activity using magnetic resonance imaging in a multiple sclerosis clinic</article-title>. <source>JAMA Neurol</source>. <volume>70</volume>, <fpage>338</fpage>. <pub-id pub-id-type="doi">10.1001/2013.jamaneurol.211</pub-id><pub-id pub-id-type="pmid">23599930</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bian</surname> <given-names>C.</given-names></name> <name><surname>Yang</surname> <given-names>X.</given-names></name> <name><surname>Ma</surname> <given-names>J.</given-names></name> <name><surname>Zheng</surname> <given-names>S.</given-names></name> <name><surname>Liu</surname> <given-names>Y. A.</given-names></name> <name><surname>Nezafat</surname> <given-names>R.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>&#x0201C;Pyramid network with online hard example mining for accurate left atrium segmentation,&#x0201D;</article-title> in <source>International Workshop on Statistical Atlases and Computational Models of the Heart</source>, (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>237</fpage>&#x02013;<lpage>245</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-12029-0_26</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bodini</surname> <given-names>B.</given-names></name> <name><surname>Veronese</surname> <given-names>M.</given-names></name> <name><surname>Garc&#x000ED;a-Lorenzo</surname> <given-names>D.</given-names></name> <name><surname>Battaglini</surname> <given-names>M.</given-names></name> <name><surname>Poirion</surname> <given-names>E.</given-names></name> <name><surname>Chardain</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Dynamic Imaging of Individual Remyelination Profiles in Multiple Sclerosis</article-title>. <source>Ann. Neurol</source>. <volume>79</volume>, <fpage>726</fpage>&#x02013;<lpage>738</lpage>. <pub-id pub-id-type="doi">10.1002/ana.24620</pub-id><pub-id pub-id-type="pmid">26891452</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cabezas</surname> <given-names>M.</given-names></name> <name><surname>Corral</surname> <given-names>J. F.</given-names></name> <name><surname>Oliver</surname> <given-names>A.</given-names></name> <name><surname>D&#x000ED;ez</surname> <given-names>Y.</given-names></name> <name><surname>Tintor&#x000E9;</surname> <given-names>M.</given-names></name> <name><surname>Auger</surname> <given-names>C.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Improved automatic detection of new t2 lesions in multiple sclerosis using deformation fields</article-title>. <source>Am. J. Neuroradiol</source>. <volume>37</volume>, <fpage>1816</fpage>&#x02013;<lpage>1823</lpage>. <pub-id pub-id-type="doi">10.3174/ajnr.A4829</pub-id><pub-id pub-id-type="pmid">27282863</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>&#x000C7;i&#x000E7;ek</surname> <given-names>&#x000D6;.</given-names></name> <name><surname>Abdulkadir</surname> <given-names>A.</given-names></name> <name><surname>Lienkamp</surname> <given-names>S. S.</given-names></name> <name><surname>Brox</surname> <given-names>T.</given-names></name> <name><surname>Ronneberger</surname> <given-names>O.</given-names></name></person-group> (<year>2022</year>). <article-title>3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. Published online June 21, 2016</article-title>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1606.06650">http://arxiv.org/abs/1606.06650</ext-link> (accessed June 28, 2022).</citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Commowick</surname> <given-names>O.</given-names></name> <name><surname>Cervenansky</surname> <given-names>F.</given-names></name> <name><surname>Cotton</surname> <given-names>F.</given-names></name> <name><surname>Dojat</surname> <given-names>M.</given-names></name></person-group> (<year>2021</year>). <article-title>&#x0201C;MSSEG-2 challenge proceedings: Multiple sclerosis new lesions segmentation challenge using a data management and processing infrastructure,&#x0201D;</article-title> in <source>MICCAI 2021-24th International Conference on Medical Image Computing and Computer Assisted Intervention</source>, <fpage>1</fpage>&#x02013;<lpage>118</lpage>.</citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Commowick</surname> <given-names>O.</given-names></name> <name><surname>Istace</surname> <given-names>A.</given-names></name> <name><surname>Kain</surname> <given-names>M.</given-names></name> <name><surname>Laurent</surname> <given-names>B.</given-names></name> <name><surname>Leray</surname> <given-names>F.</given-names></name> <name><surname>Simon</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Objective evaluation of multiple sclerosis lesion segmentation using a data management and processing infrastructure</article-title>. <source>Sci. Rep</source>. <volume>8</volume>, <fpage>13650</fpage>. <pub-id pub-id-type="doi">10.1038/s41598-018-31911-7</pub-id><pub-id pub-id-type="pmid">30209345</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Filippi</surname> <given-names>M.</given-names></name> <name><surname>Br&#x000FC;ck</surname> <given-names>W.</given-names></name> <name><surname>Chard</surname> <given-names>D.</given-names></name> <name><surname>Fazekas</surname> <given-names>F.</given-names></name> <name><surname>Geurts</surname> <given-names>J. J.</given-names></name> <name><surname>Enzinger</surname> <given-names>C.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Association between pathological and MRI findings in multiple sclerosis</article-title>. <source>Lancet Neurol.</source> <volume>18</volume>, <fpage>198</fpage>&#x02013;<lpage>210</lpage>. <pub-id pub-id-type="doi">10.1016/S1474-4422(18)30451-4</pub-id><pub-id pub-id-type="pmid">30663609</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gessert</surname> <given-names>N.</given-names></name> <name><surname>Kr&#x000FC;ger</surname> <given-names>J.</given-names></name> <name><surname>Opfer</surname> <given-names>R.</given-names></name> <name><surname>Ostwaldt</surname> <given-names>A. C.</given-names></name> <name><surname>Manogaran</surname> <given-names>P.</given-names></name> <name><surname>Kitzler</surname> <given-names>H. H.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Multiple sclerosis lesion activity segmentation with attention-guided two-path CNNs</article-title>. <source>Comput. Med. Imaging Graph</source>. <volume>84</volume>, <fpage>101772</fpage>. <pub-id pub-id-type="doi">10.1016/j.compmedimag.2020.101772</pub-id><pub-id pub-id-type="pmid">32795845</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>He</surname> <given-names>K.</given-names></name> <name><surname>Fan</surname> <given-names>H.</given-names></name> <name><surname>Wu</surname> <given-names>Y.</given-names></name> <name><surname>Xie</surname> <given-names>S.</given-names></name> <name><surname>Girshick</surname> <given-names>R.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Momentum contrast for unsupervised visual representation learning,&#x0201D;</article-title> in <source>2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source> (<publisher-loc>IEEE</publisher-loc>), <fpage>9726</fpage>&#x02013;<lpage>9735</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR42600.2020.00975</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hegen</surname> <given-names>H.</given-names></name> <name><surname>Bsteh</surname> <given-names>G.</given-names></name> <name><surname>Berger</surname> <given-names>T.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x02018;No evidence of disease activity&#x00027; - is it an appropriate surrogate in multiple sclerosis?</article-title> <source>Eur. J. Neurol</source>. <volume>25</volume>, <fpage>1107</fpage>&#x02013;<lpage>e101</lpage>. <pub-id pub-id-type="doi">10.1111/ene.13669</pub-id><pub-id pub-id-type="pmid">29687559</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Howard</surname> <given-names>J.</given-names></name> <name><surname>Trevick</surname> <given-names>S.</given-names></name> <name><surname>Younger</surname> <given-names>D. S.</given-names></name></person-group> (<year>2016</year>). <article-title>Epidemiology of multiple sclerosis</article-title>. <source>Neurol. Clin</source>. <volume>34</volume>, <fpage>919</fpage>&#x02013;<lpage>939</lpage>. <pub-id pub-id-type="doi">10.1016/j.ncl.2016.06.016</pub-id><pub-id pub-id-type="pmid">27720001</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jenkinson</surname> <given-names>M.</given-names></name> <name><surname>Smith</surname> <given-names>S.</given-names></name></person-group> (<year>2001</year>). <article-title>A global optimisation method for robust affine registration of brain images</article-title>. <source>Med. Image Anal</source>. <volume>5</volume>, <fpage>143</fpage>&#x02013;<lpage>156</lpage>. <pub-id pub-id-type="doi">10.1016/S1361-8415(01)00036-6</pub-id><pub-id pub-id-type="pmid">11516708</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Kingma</surname> <given-names>D. P.</given-names></name> <name><surname>Ba</surname> <given-names>J.</given-names></name></person-group> (<year>2022</year>). <source>Adam: A Method for Stochastic Optimization. Published online January 29, 2017</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1412.6980">http://arxiv.org/abs/1412.6980</ext-link> (accessed June 28, 2022).</citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McKinley</surname> <given-names>R.</given-names></name> <name><surname>Wepfer</surname> <given-names>R.</given-names></name> <name><surname>Grunder</surname> <given-names>L.</given-names></name> <name><surname>Aschwanden</surname> <given-names>F.</given-names></name> <name><surname>Fischer</surname> <given-names>T.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Automatic detection of lesion load change in Multiple Sclerosis using convolutional neural networks with segmentation confidence</article-title>. <source>NeuroImage Clin</source>. <volume>25</volume>, <fpage>102104</fpage>. <pub-id pub-id-type="doi">10.1016/j.nicl.2019.102104</pub-id><pub-id pub-id-type="pmid">31927500</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paszke</surname> <given-names>A.</given-names></name> <name><surname>Gross</surname> <given-names>S.</given-names></name> <name><surname>Chintala</surname> <given-names>S.</given-names></name> <name><surname>Chanan</surname> <given-names>G.</given-names></name> <name><surname>Yang</surname> <given-names>E.</given-names></name> <name><surname>DeVito</surname> <given-names>Z.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>&#x0201C;Automatic differentiation in PyTorch,&#x0201D;</article-title> in <source>NIPS 2017 Workshop Autodiff Submission</source>.</citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>P&#x000E9;rez-Garc&#x000ED;a</surname> <given-names>F.</given-names></name> <name><surname>Sparks</surname> <given-names>R.</given-names></name> <name><surname>Ourselin</surname> <given-names>S.</given-names></name></person-group> (<year>2021</year>). <article-title>TorchIO: A Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning</article-title>. <source>Comput. Methods Progr. Biomed</source>. <volume>208</volume>, <fpage>106236</fpage>. <pub-id pub-id-type="doi">10.1016/j.cmpb.2021.106236</pub-id><pub-id pub-id-type="pmid">34311413</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Salem</surname> <given-names>M.</given-names></name> <name><surname>Valverde</surname> <given-names>S.</given-names></name> <name><surname>Cabezas</surname> <given-names>M.</given-names></name> <name><surname>Pareto</surname> <given-names>D.</given-names></name> <name><surname>Oliver</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>A fully convolutional neural network for new T2-w lesion detection in multiple sclerosis</article-title>. <source>NeuroImage Clin</source>. <volume>25</volume>, <fpage>102149</fpage>. <pub-id pub-id-type="doi">10.1016/j.nicl.2019.102149</pub-id><pub-id pub-id-type="pmid">31918065</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shrivastava</surname> <given-names>A.</given-names></name> <name><surname>Gupta</surname> <given-names>A.</given-names></name> <name><surname>Girshick</surname> <given-names>R.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Training region-based object detectors with online hard example mining,&#x0201D;</article-title> in <source>2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source> (<publisher-loc>IEEE</publisher-loc>), <fpage>761</fpage>&#x02013;<lpage>769</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2016.89</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Valverde</surname> <given-names>S.</given-names></name> <name><surname>Salem</surname> <given-names>M.</given-names></name> <name><surname>Cabezas</surname> <given-names>M.</given-names></name> <name><surname>Pareto</surname> <given-names>D.</given-names></name> <name><surname>Vilanova</surname> <given-names>J. C.</given-names></name> <name><surname>Rami&#x000F3;-Torrent&#x000E0;</surname> <given-names>L.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>One-shot domain adaptation in multiple sclerosis lesion segmentation using convolutional neural networks</article-title>. <source>NeuroImage Clin</source>. <volume>21</volume>, <fpage>101638</fpage>. <pub-id pub-id-type="doi">10.1016/j.nicl.2018.101638</pub-id><pub-id pub-id-type="pmid">30555005</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Ma</surname> <given-names>H.</given-names></name> <name><surname>Chen</surname> <given-names>X.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Salient object detection via fast R-CNN and low-level cues,&#x0201D;</article-title> in <source>2016 IEEE International Conference on Image Processing (ICIP)</source> (<publisher-loc>IEEE</publisher-loc>), <fpage>1042</fpage>&#x02013;<lpage>1046</lpage>. <pub-id pub-id-type="doi">10.1109/ICIP.2016.7532516</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wolny</surname> <given-names>A.</given-names></name> <name><surname>Cerrone</surname> <given-names>L.</given-names></name> <name><surname>Vijayan</surname> <given-names>A.</given-names></name> <name><surname>Tofanelli</surname> <given-names>R.</given-names></name> <name><surname>Barro</surname> <given-names>A. V.</given-names></name> <name><surname>Louveaux</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Accurate and versatile 3D segmentation of plant tissues at cellular resolution</article-title>. <source>eLife</source>. 9, e57613. <pub-id pub-id-type="doi">10.7554/eLife.57613</pub-id><pub-id pub-id-type="pmid">32723478</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zeng</surname> <given-names>C.</given-names></name> <name><surname>Gu</surname> <given-names>L.</given-names></name> <name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Zhao</surname> <given-names>S.</given-names></name></person-group> (<year>2020</year>). <article-title>Review of Deep Learning Approaches for the Segmentation of Multiple Sclerosis Lesions on Brain MRI</article-title>. <source>Front Neuroinformatics</source>. <volume>14</volume>, <fpage>610967</fpage>. <pub-id pub-id-type="doi">10.3389/fninf.2020.610967</pub-id><pub-id pub-id-type="pmid">33328949</pub-id></citation></ref>
</ref-list>
<app-group>
<app id="A1">
<title>Appendix</title>
<p>We used the MICCAI MSSEG-2 dataset (Commowick et al., <xref ref-type="bibr" rid="B6">2021</xref>), consisting in 100 MS patients with two longitudinal FLAIR MRI spaced from 1 to 3 years, acquired with 6 Philips scanners (Ingenia 1.5T, 2 Ingenia 3T, 1 Achieva dStream 3T, 1 Achieva 1.5T, 1 Achieva 3T), 6 Siemens scanners (1 Aera 1.5T, 1 Skyra 3T, 1 Verio 3T, 1 Prisma 3T, 2 Avanto 1.5T), and 3 General Electrics (GE) scanners (Optima MR450w 1.5T, SIGNA HDx 3T, SIGNA HDxt 1.5T), with different voxel sizes (from 0.5 to 1.2 mm<sup>3</sup>). Ground truth, consisting in new lesions on second time point, were delineated by 4 neuroradiologists from different centers manually on ITK-SNAP (<ext-link ext-link-type="uri" xlink:href="http://www.itksnap.org/pmwiki/pmwiki.php">http://www.itksnap.org/pmwiki/pmwiki.php</ext-link>), and consensus was obtained with the majority voting for each voxel. The whole dataset was divided by MSSEG-2 training committee into 40 patients available to challengers for training and validation, and 60 patients, not available to the challengers, for testing. All MRIs acquired with GE were only in the testing dataset.</p>
</app>
</app-group>
</back>
</article>