<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange DTD v2.3 20070202//EN" "archivearticle.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="methods-article" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Plant Sci.</journal-id>
<journal-title>Frontiers in Plant Science</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Plant Sci.</abbrev-journal-title>
<issn pub-type="epub">1664-462X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpls.2022.1010981</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Plant Science</subject>
<subj-group>
<subject>Methods</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Transfer learning for versatile plant disease recognition with&#xa0;limited data</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Xu</surname>
<given-names>Mingle</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1472694"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Yoon</surname>
<given-names>Sook</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
<xref ref-type="author-notes" rid="fn001">
<sup>*</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/595546"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Jeong</surname>
<given-names>Yongchae</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/2093017"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Park</surname>
<given-names>Dong Sun</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="author-notes" rid="fn001">
<sup>*</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/567101"/>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>Department of Electronics Engineering, Jeonbuk National University</institution>, <addr-line>Jeonbuk</addr-line>, <country>South Korea</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>Core Research Institute of Intelligent Robots, Jeonbuk National University</institution>, <addr-line>Jeonbuk</addr-line>, <country>South Korea</country>
</aff>
<aff id="aff3">
<sup>3</sup>
<institution>Department of Computer Engineering, Mokpo National University</institution>, <addr-line>Jeonnam</addr-line>, <country>South Korea</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>Edited by: Peng Chen, Anhui University, China</p>
</fn>
<fn fn-type="edited-by">
<p>Reviewed by: Shichao Jin, Nanjing Agricultural University, China; Muhammad Shoaib Farooq, University of Management and Technology, Pakistan</p>
</fn>
<fn fn-type="corresp" id="fn001">
<p>*Correspondence: Sook Yoon, <email xlink:href="mailto:syoon@mokpo.ac.kr">syoon@mokpo.ac.kr</email>; Dong Sun Park, <email xlink:href="mailto:dspark@jbnu.ac.kr">dspark@jbnu.ac.kr</email>
</p>
</fn>
<fn fn-type="other" id="fn002">
<p>This article was submitted to Sustainable and Intelligent Phytoprotection, a section of the journal Frontiers in Plant Science</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>23</day>
<month>11</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>13</volume>
<elocation-id>1010981</elocation-id>
<history>
<date date-type="received">
<day>03</day>
<month>08</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>20</day>
<month>10</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2022 Xu, Yoon, Jeong and Park</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Xu, Yoon, Jeong and Park</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Deep learning has witnessed a significant improvement in recent years to recognize plant diseases by observing their corresponding images. To have a decent performance, current deep learning models tend to require a large-scale dataset. However, collecting a dataset is expensive and time-consuming. Hence, the limited data is one of the main challenges to getting the desired recognition accuracy. Although transfer learning is heavily discussed and verified as an effective and efficient method to mitigate the challenge, most proposed methods focus on one or two specific datasets. In this paper, we propose a novel transfer learning strategy to have a high performance for <italic>versatile plant disease recognition</italic>, on multiple plant disease datasets. Our transfer learning strategy differs from the current popular one due to the following factors. First, PlantCLEF2022, a large-scale dataset related to plants with 2,885,052 images and 80,000 classes, is utilized to pre-train a model. Second, we adopt a vision transformer (ViT) model, instead of a convolution neural network. Third, the ViT model undergoes transfer learning twice to save computations. Fourth, the model is first pre-trained in ImageNet with a self-supervised loss function and with a supervised loss function in PlantCLEF2022. We apply our method to 12 plant disease datasets and the experimental results suggest that our method surpasses the popular one by a clear margin for different dataset settings. Specifically, our proposed method achieves a mean testing accuracy of 86.29over the 12 datasets in a 20-shot case, 12.76 higher than the current state-of-the-art method&#x2019;s accuracy of 73.53. Furthermore, our method outperforms other methods in one plant growth stage prediction and the one weed recognition dataset. To encourage the community and related applications, we have made public our codes and pre-trained model<sup>
<xref ref-type="fn" rid="fn1">
<sup>1</sup>
</xref>
</sup>.</p>
</abstract>
<kwd-group>
<kwd>plant disease recognition</kwd>
<kwd>transfer learning</kwd>
<kwd>vision transformer</kwd>
<kwd>self-supervised learning</kwd>
<kwd>few-shot learning</kwd>
<kwd>PlantCLEF2022</kwd>
</kwd-group>
<counts>
<fig-count count="7"/>
<table-count count="5"/>
<equation-count count="1"/>
<ref-count count="47"/>
<page-count count="14"/>
<word-count count="6762"/>
</counts>
</article-meta>
</front>
<body>
<sec id="s1" sec-type="intro">
<title>1 Introduction</title>
<p>Keeping plants healthy is one of the essential challenges to having an expected and high yield. Traditionally, experts have to go to farms to check if plants are infected with diseases but deep learning enables the check to take place automatically based on their images. Because of the decent performance of deep learning, plant disease recognition has witnessed a significant improvement in recent years (<xref ref-type="bibr" rid="B1">Abade et&#xa0;al., 2021</xref>; <xref ref-type="bibr" rid="B21">Liu et&#xa0;al., 2021</xref>; <xref ref-type="bibr" rid="B24">Ngugi et&#xa0;al., 2021</xref>). To obtain a comparable recognition performance, a large-scale dataset is entailed to train a deep learning-based model. However, collecting images for plant disease is expensive and time-consuming. Besides, few images are normally available at the beginning of a plant disease recognition project when sanity checking should be executed before devoting more resources. Therefore, <italic>limited dataset</italic>, a situation where a few labeled images are accessible for some classes in the training process is one of the main issues in the literature (<xref ref-type="bibr" rid="B12">Fan et&#xa0;al., 2022</xref>). To facilitate this issue, many algorithms and strategies are proposed, such as data augmentation (<xref ref-type="bibr" rid="B23">Mohanty et&#xa0;al., 2016</xref>; <xref ref-type="bibr" rid="B41">Xu et&#xa0;al., 2022b</xref>; <xref ref-type="bibr" rid="B25">Olaniyi et&#xa0;al., 2022</xref>), transfer learning (<xref ref-type="bibr" rid="B23">Mohanty et&#xa0;al., 2016</xref>; <xref ref-type="bibr" rid="B35">Too et&#xa0;al., 2019</xref>; <xref ref-type="bibr" rid="B6">Chen J. et&#xa0;al., 2020</xref>; <xref ref-type="bibr" rid="B39">Xing and Lee, 2022</xref>; <xref ref-type="bibr" rid="B47">Zhao et&#xa0;al., 2022</xref>), few-shot learning (<xref ref-type="bibr" rid="B3">Afifi et&#xa0;al., 2020</xref>; <xref ref-type="bibr" rid="B11">Egusquiza et&#xa0;al., 2022</xref>), and semi-supervised learning (<xref ref-type="bibr" rid="B20">Li and Chao, 2021</xref>).</p>
<p>Although the challenge of a limited dataset is considered in many works, most of them merely focus on one or few specific datasets, such as the PlantVillage dataset (<xref ref-type="bibr" rid="B23">Mohanty et&#xa0;al., 2016</xref>; <xref ref-type="bibr" rid="B35">Too et&#xa0;al., 2019</xref>; <xref ref-type="bibr" rid="B20">Li and Chao, 2021</xref>), AI Challenger dataset (<xref ref-type="bibr" rid="B47">Zhao et&#xa0;al., 2022</xref>), tomato dataset (<xref ref-type="bibr" rid="B41">Xu et&#xa0;al., 2022b</xref>), wheat and rice dataset (<xref ref-type="bibr" rid="B31">Sethy et&#xa0;al., 2020</xref>; <xref ref-type="bibr" rid="B28">Rahman et&#xa0;al., 2020</xref>), cucumber (<xref ref-type="bibr" rid="B37">Wang et&#xa0;al., 2022</xref>), and apple leaf disease dataset (<xref ref-type="bibr" rid="B12">Fan et&#xa0;al., 2022</xref>). A basic question in this situation is whether a useful method for one dataset is helpful for other datasets. Further, there is a fundamental desire to find a robust method for most plant disease recognition applications. On the other hand, improving the application performance with a limited dataset is desired. For example, can we get a comparable result with only 20 training images for each class (20-shot)? To address these two issues, we propose a novel transfer learning strategy to achieve high performance for different limited datasets and various types of plants and diseases.</p>
<p>Via obtaining a good feature space, transfer learning aims to learn something beneficial for a target task with a target dataset from a source task with a source dataset (<xref ref-type="bibr" rid="B26">Pan and Yang, 2009</xref>). In plant disease recognition, a deep learning-based model is generally pre-trained in the source dataset and then fine-tuned in the labeled target dataset. As shown in <xref ref-type="fig" rid="f1">
<bold>Figure&#xa0;1</bold>
</xref>, it is understood that three key factors essentially lead to a positive transfer learning performance, a <italic>desired source dataset</italic>, <italic>powerful model</italic>, and suitable <italic>loss function</italic> to pre-train the model (<xref ref-type="bibr" rid="B38">Wu et&#xa0;al., 2018</xref>; <xref ref-type="bibr" rid="B18">Kornblith et&#xa0;al., 2019</xref>; <xref ref-type="bibr" rid="B17">Kolesnikov et&#xa0;al., 2020</xref>; <xref ref-type="bibr" rid="B36">Tripuraneni et&#xa0;al., 2020</xref>; <xref ref-type="bibr" rid="B13">He et&#xa0;al., 2022</xref>). However, the three factors have been undeveloped in plant disease recognition.</p>
<fig id="f1" position="float">
<label>Figure&#xa0;1</label>
<caption>
<p>Training from scratch <bold>(A)</bold> and transfer learning <bold>(B)</bold>. Three key factors in transfer learning are the source dataset, the model, and the loss function to pre-train the model. These have all been undeveloped in plant disease recognition.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-1010981-g001.tif"/>
</fig>
<p>First, it is beneficial to have a <italic>plant-related</italic> dataset with a high number of images and classes (<italic>large scale</italic>), as well as <italic>wide image variation</italic>. For example, a plant-related source dataset could be better than the widely used ImageNet (<xref ref-type="bibr" rid="B9">Deng et&#xa0;al., 2009</xref>) for plant disease recognition, which has been verified (<xref ref-type="bibr" rid="B16">Kim et&#xa0;al., 2021</xref>; <xref ref-type="bibr" rid="B47">Zhao et&#xa0;al., 2022</xref>). Hence, finding a suitable source dataset is essential for plant disease recognition. Following this idea, PlantCLEF2022, a plant-related dataset with 2,885,052 images and 80,000 classes, was adopted for our paper.</p>
<p>Second, a model with higher performance in ImageNet or a source dataset may have a better performance in the target dataset with a transfer learning strategy (<xref ref-type="bibr" rid="B18">Kornblith et&#xa0;al., 2019</xref>). Convolution neural networks (CNN) (<xref ref-type="bibr" rid="B19">Krizhevsky et&#xa0;al., 2012</xref>; <xref ref-type="bibr" rid="B14">He et&#xa0;al., 2016</xref>) achieved the best accuracy for the ImageNet validation dataset. Simultaneously, the attention mechanism has been leveraged to boost the performance of plant disease recognition (<xref ref-type="bibr" rid="B44">Yang et&#xa0;al., 2020</xref>; <xref ref-type="bibr" rid="B27">Qian et&#xa0;al., 2022</xref>; <xref ref-type="bibr" rid="B47">Zhao et&#xa0;al., 2022</xref>). In recent years, Vision Transformer (ViT) (<xref ref-type="bibr" rid="B10">Dosovitskiy et&#xa0;al., 2020</xref>), a general model of attention mechanism, has become a hot topic in the computer vision community and outperforms CNN-based models. For example, MAE (<xref ref-type="bibr" rid="B13">He et&#xa0;al., 2022</xref>) scores 85.9 inaccuracy for the ViT-L model which is higher than Resnet50 and ResNet152 with scores of 79.26 and 80.62, respectively. Therefore, for plant recognition, ViT-based models with a transfer learning strategy are promising but still underdeveloped (<xref ref-type="bibr" rid="B37">Wang et&#xa0;al., 2022</xref>).</p>
<p>Third, the supervised loss function inevitably pushes the model to learn source task-related features that may not be helpful for the target task (<xref ref-type="bibr" rid="B38">Wu et&#xa0;al., 2018</xref>). In contrast, the self-supervised loss function eases the issue by introducing a pretext task, such as contrast loss (<xref ref-type="bibr" rid="B38">Wu et&#xa0;al., 2018</xref>) and reconstruction loss (<xref ref-type="bibr" rid="B13">He et&#xa0;al., 2022</xref>). Thus, a ViT mode pre-trained in the PlantCLEF2022 dataset with a self-supervised loss function is assumed to be better than the current popular transfer learning strategy that is pre-trained on a CNN-based model in the ImageNet dataset with a supervised loss function (<xref ref-type="bibr" rid="B23">Mohanty et&#xa0;al., 2016</xref>; <xref ref-type="bibr" rid="B44">Yang et&#xa0;al., 2020</xref>; <xref ref-type="bibr" rid="B2">Abbas et&#xa0;al., 2021</xref>; <xref ref-type="bibr" rid="B12">Fan et&#xa0;al., 2022</xref>; <xref ref-type="bibr" rid="B43">Yadav et&#xa0;al., 2022</xref>).</p>
<p>Besides, the transfer learning strategy is slightly problematic when considering computing devices and the large-scale PlantCLEF2022 dataset. To be more specific, training a ViT model 800 epochs in PlantCLEF2022 as MAE (<xref ref-type="bibr" rid="B13">He et&#xa0;al., 2022</xref>) requires more than five months with four RTX 3090 GPUs. To reduce the computing cost, we utilize a dual transfer learning strategy, where a public ViT model pre-trained in ImageNet with a self-supervised loss function is trained in the PlantCLEF2022 dataset with a supervised loss function. In this way, we only spend about 15 days training the model in PlantCLEF2022. We emphasize that our dual transfer learning is different from (<xref ref-type="bibr" rid="B4">Azizi et&#xa0;al., 2021</xref>; <xref ref-type="bibr" rid="B47">Zhao et&#xa0;al., 2022</xref>) due to the following facts, aiming to reduce the cost of pre-training a model, large-scale PlantCLEF2022 dataset, and employing a ViT-based model.</p>
<p>To summarize, our paper will make the following contributions:</p>
<list list-type="bullet">
<list-item>
<p>We propose a novel transfer learning to achieve versatile plant disease recognition with a plant-related source dataset PlantCLEF2022, ViT model, and self-supervised learning to pre-train the model.</p>
</list-item>
<list-item>
<p>We utilize dual transfer learning to save computation costs, considering the large-scale PlantCLEF2022 dataset.</p>
</list-item>
<list-item>
<p>We validate our method in 12 plant disease datasets and our method surpasses the current widely used strategy by a large margin. Specifically, we score an average testing accuracy of 86.29 in a 20-shot case, 12.76 higher than the widely used strategy.</p>
</list-item>
<list-item>
<p>Our transfer learning strategy also outperforms other methods in one plant growth stage prediction and one plant weed recognition, which suggests that our strategy contributes beyond plant disease recognition.</p>
</list-item>
</list>
</sec>
<sec id="s2" sec-type="materials|methods">
<label>2</label>
<title>Material and method</title>
<sec id="s2_1">
<label>2.1</label>
<title>Plant disease datasets</title>
<p>To validate the generalization of transfer learning and deep learning, we executed our method in fourteen public datasets, thirteen related to plant disease recognition. To be more specific, we used PlantVillage (<xref ref-type="bibr" rid="B15">Hughes et&#xa0;al., 2015</xref>), PlantDocCls (<xref ref-type="bibr" rid="B32">Singh et&#xa0;al., 2020</xref>), Cassava (<xref ref-type="bibr" rid="B29">Ramcharan et&#xa0;al., 2017</xref>), Apple2020 (<xref ref-type="bibr" rid="B34">Thapa et&#xa0;al., 2020</xref>), Apple2021 (<xref ref-type="bibr" rid="B33">Thapa et&#xa0;al., 2021</xref>), Rice1426 (<xref ref-type="bibr" rid="B28">Rahman et&#xa0;al., 2020</xref>), Rice5932 (<xref ref-type="bibr" rid="B31">Sethy et&#xa0;al., 2020</xref>), TaiwanTomato<xref ref-type="fn" rid="fn2">
<sup>2</sup>
</xref>, IVADLTomato and IVADLRose<xref ref-type="fn" rid="fn3">
<sup>3</sup>
</xref>, CitrusLeaf (<xref ref-type="bibr" rid="B30">Rauf et&#xa0;al., 2019</xref>), CGIARWheat<xref ref-type="fn" rid="fn4">
<sup>4</sup>
</xref>, and PDD271* (<xref ref-type="bibr" rid="B21">Liu et&#xa0;al., 2021</xref>). More details of the datasets are shown in <xref ref-type="table" rid="T1">
<bold>Table&#xa0;1</bold>
</xref> while three random images for each class are displayed here<sup>
<xref ref-type="fn" rid="fn5">
<sup>5</sup>
</xref>
</sup>.</p>
<table-wrap id="T1" position="float">
<label>Table&#xa0;1</label>
<caption>
<p>Information of the used plant disease recognition datasets.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" align="left">Dataset</th>
<th valign="top" align="center">Images</th>
<th valign="top" align="center">Classes</th>
<th valign="top" align="center">Highlights</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">PlantVillage</td>
<td valign="top" align="center">54,305</td>
<td valign="top" align="center">38</td>
<td valign="top" align="left">Covers 14 types of plants. Each image is taken in controlled conditions and only includes one leaf in the center. Some diseases are spilt into two cases according to their severities, early and late. Each class has more than 273 images. All images are the same height and width, 256*256.</td>
</tr>
<tr>
<td valign="top" align="left">PlantDocCls</td>
<td valign="top" align="center">2,576</td>
<td valign="top" align="center">27</td>
<td valign="top" align="left">Includes 13 plants. The images are collected from the Internet with diverse heights and widths and most of the images are taken in real field conditions. The original training and testing dataset include 2,340 and 236 images, respectively.</td>
</tr>
<tr>
<td valign="top" align="left">Cassava</td>
<td valign="top" align="center">21,397</td>
<td valign="top" align="center">5</td>
<td valign="top" align="left">The images are taken in real field conditions and thus have wide variations, such as background, illumination, and leaf scales. All images have the same height and width, 800*600.</td>
</tr>
<tr>
<td valign="top" align="left">Apple2020</td>
<td valign="top" align="center">3,642</td>
<td valign="top" align="center">4</td>
<td valign="top" align="left">Taken in real field conditions. One leaf may include more than one type of disease and those images are labeled as one class. All images are the same size, 2048*1365.</td>
</tr>
<tr>
<td valign="top" align="left">Apple2021</td>
<td valign="top" align="center">18,632</td>
<td valign="top" align="center">6</td>
<td valign="top" align="left">An updated version of Apple2020 but with 2 more classes. All images are the same size, 4000*2672.</td>
</tr>
<tr>
<td valign="top" align="left">Rice1426</td>
<td valign="top" align="center">1,426</td>
<td valign="top" align="center">9</td>
<td valign="top" align="left">Images are taken in both real filed and controlled conditions. The images are not just related to leaves, but also other organs, stems, and grains. Images are in 224*224 resolution.</td>
</tr>
<tr>
<td valign="top" align="left">Rice5932</td>
<td valign="top" align="center">5,932</td>
<td valign="top" align="center">4</td>
<td valign="top" align="left">Only includes rice leaf images with different scales. All images are resized to 300*300.</td>
</tr>
<tr>
<td valign="top" align="left">TaiwanTomato</td>
<td valign="top" align="center">622</td>
<td valign="top" align="center">5</td>
<td valign="top" align="left">One image may include one or multiple leaves taken in either controlled conditions or real field conditions. There are 495 and 127 images in the original training and testing dataset, respectively. All images are resized to 227*227.</td>
</tr>
<tr>
<td valign="top" align="left">IVADLTomato</td>
<td valign="top" align="center">3,021</td>
<td valign="top" align="center">9</td>
<td valign="top" align="left">The original dataset includes more images in an unbalanced way. We limited the number for each class to less than 520. The original images have a large height and width, and we resized the images to 520*520 to save disk space.</td>
</tr>
<tr>
<td valign="top" align="left">IVADLRose</td>
<td valign="top" align="center">3,132</td>
<td valign="top" align="center">6</td>
<td valign="top" align="left">Similar to IVADLTomato, we limited the number for each class and resized the images.</td>
</tr>
<tr>
<td valign="top" align="left">CitrusLeaf</td>
<td valign="top" align="center">609</td>
<td valign="top" align="center">5</td>
<td valign="top" align="left">Images are taken in controlled conditions and resized to 256*256. We only used the leaf parts from the original Citrus dataset.</td>
</tr>
<tr>
<td valign="top" align="left">CGIARWheat</td>
<td valign="top" align="center">876</td>
<td valign="top" align="center">3</td>
<td valign="top" align="left">Includes leaves, stems, and whole plants. Images are taken from different viewpoints with diverse distances and different image sizes.</td>
</tr>
<tr>
<td valign="top" align="left">PDD271*</td>
<td valign="top" align="center">2,710</td>
<td valign="top" align="center">271</td>
<td valign="top" align="left">Covers fruit trees, vegetables, and field crops, with huge image variations. Ten images for each class are available as samples.</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The datasets are considered from several viewpoints. <xref ref-type="fig" rid="f2">
<bold>Figure&#xa0;2</bold>
</xref> gives a glance at some images in the datasets. First is the <italic>number of images and the number of classes</italic>. Generally, the more classes and fewer images, the more difficult the recognition task. PDD271 covers 271 classes, including fruit trees, vegetables, and field crops, but unfortunately, it is not public. Only ten samples for each class are available and therefore, we adopted it as a few-shot learning task. In contrast, most of the public datasets only involved one type of plant, such as rice (<xref ref-type="bibr" rid="B28">Rahman et&#xa0;al., 2020</xref>; <xref ref-type="bibr" rid="B31">Sethy et&#xa0;al., 2020</xref>) or apple (<xref ref-type="bibr" rid="B34">Thapa et&#xa0;al., 2020</xref>; <xref ref-type="bibr" rid="B33">Thapa et&#xa0;al., 2021</xref>). Besides, the number distribution of classes may cause class-imbalance trouble, in which the trained model may have higher performance for the class with a dominant number of images in the training stage. Second, <italic>the conditions the images were taken in</italic> matters since controlling the conditions reduces the variation in the collected images, such as background and illuminations. A previous work (<xref ref-type="bibr" rid="B5">Barbedo, 2019</xref>) proves that controlling the conditions or masking the background out can improve recognition performance. Third, the <italic>organs</italic> of plants in images are also important. The main organs in the datasets are leaves, but also include some fruits, stems, and whole plants. Interestingly, different leaves of plants have heterogeneous shapes that may result in various performances with the same model. For example, the leaves of cassava are far different from their counterparts in apple and tomato plants. Especially, some images in PDD271 are captured with part of a leaf, not the whole leaf as in PlantVillage. Fourth, the <italic>scale</italic> of the images is also essential to the performance. The scale is related to the distance between the camera and the plant when taking pictures. For example, the leaves in PlantVillage and Apple2020 have a similar scale while the images in Rice1426 are on different scales. Fifth, <italic>image size</italic>,i.e. height and width, may incur challenges for recognition tasks as the disease phenomenon may not be clear enough in small-size images. To summarize, we emphasize that image variations (<xref ref-type="bibr" rid="B40">Xu et&#xa0;al., 2022a</xref>) in the dataset have an influence on training models and their corresponding performance, and thus, recognizing the image variations is significant to understanding the dataset.</p>
<fig id="f2" position="float">
<label>Figure&#xa0;2</label>
<caption>
<p>Image examples from different datasets. We recognize that there are image variations [40], such as background, the shape of leaves, illumination, and scale.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-1010981-g002.tif"/>
</fig>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title>PlantCLEF2022 dataset</title>
<p>PlantCLEF2022<xref ref-type="fn" rid="fn6">
<sup>6</sup>
</xref> was originally a challenge to identify the plant species based on their images. The trusted training dataset, PlantCLEF2022, annotated by human experts with 2,885,052 images and 80,000 classes, is leveraged and used as the default PlantCLEF2022 dataset in this paper. Each class in the dataset is limited to no more than 100 images and has 36.1 images on average. As shown in <xref ref-type="fig" rid="f3">
<bold>Figure&#xa0;3</bold>
</xref>, the images cover plant habitat (environment or background) and organs such as the leaf, fruit, bark, or stem. Essentially, plants can be recognized based on multiple pieces of visual evidence, instead of only one piece of evidence (<xref ref-type="bibr" rid="B42">Xu et&#xa0;al., 2022c</xref>). Besides, the images belonging to one class embrace huge variations. As displayed in <xref ref-type="fig" rid="f4">
<bold>Figure&#xa0;4</bold>
</xref>, the variations include background, illumination, color, scale, and image size.</p>
<fig id="f3" position="float">
<label>Figure&#xa0;3</label>
<caption>
<p>Different interests or organs in PlantCLEF2022 testing dataset.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-1010981-g003.tif"/>
</fig>
<fig id="f4" position="float">
<label>Figure&#xa0;4</label>
<caption>
<p>Images of Aralia Nudicaulis L. species from PlantCLEF2022 dataset. The images from the same plant species are heterogeneous in the background, illumination, color, scale, etc.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-1010981-g004.tif"/>
</fig>
<p>
<bold>Why PlantCLEF2022?</bold> We recognize that three characteristics make PlantCLEF2022 beneficial to plant disease recognition with transfer learning strategy, i.e., <italic>plant-related, large-scale, and wide variations</italic>. First, it is accepted that a large-scale related source dataset contributes to the target task. As the PlantCLEF2022 dataset is plant-related and on a large scale, even when compared to ImageNet (<xref ref-type="bibr" rid="B9">Deng et&#xa0;al., 2009</xref>), it can be beneficial to plant disease recognition and related tasks, such as growth stage prediction. Second, the PlantCLEF2022 dataset has wide variations as mentioned before, by which we can learn a better feature space when using it to pre-train a model. Arguably, the variations in PlantCLEF2022 are much stronger than all of the plant disease datasets introduced in Section 2.1. We have noticed that finding this kind of dataset for plant disease cognition tasks is one of the main interests in recent years. In the beginning, ImageNet made a significant contribution as a source dataset. Recently, the AI Challenger dataset, a little bit bigger than PlantVillage but with small variations as most of the images are taken in controlled conditions, is considered as a source dataset (<xref ref-type="bibr" rid="B47">Zhao et&#xa0;al., 2022</xref>). Although it is plant-related, the AI Challenger dataset is far behind when compared to PlantCLEF2022 because of its number of images and classes and poor image variations.</p>
</sec>
<sec id="s2_3">
<label>2.3</label>
<title>Dual transfer learning</title>
<p>To achieve versatile plant disease recognition with a limited dataset, we believe that, under the transfer learning paradigm, a large-scale related dataset, PlantCLEF2022, and a powerful model are beneficial. Hence, we designed a dual transfer learning model, taking the computation load and device into consideration. As shown in <xref ref-type="fig" rid="f5">
<bold>Figures&#xa0;5A, C</bold>
</xref>, our transfer learning consists of three steps with transfer learning occurring twice.</p>
<fig id="f5" position="float">
<label>Figure&#xa0;5</label>
<caption>
<p>Transfer learning strategies for plant disease recognition. Our strategy differs from the current popular transfer learning strategy <bold>(A)</bold> in the source dataset, model, and loss function. Furthermore, we adopt dual transfer learning <bold>(C)</bold> to save computation time by utilizing the public pre-trained model, compared to <bold>(B)</bold>.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-1010981-g005.tif"/>
</fig>
<p>In the first step, a vision transformer (ViT) model is pre-trained with the ImageNet (<xref ref-type="bibr" rid="B9">Deng et&#xa0;al., 2009</xref>) in a self-supervised manner, reconstruction loss. We emphasize here that we directly adopted the pre-trained model from masked autoencoder (MAE) (<xref ref-type="bibr" rid="B13">He et&#xa0;al., 2022</xref>), instead of training the model ourselves. Simultaneously, we argue that superior pre-trained models are essential for better plant disease recognition, even if the models have the same architecture. The experiments in the following section prove that the original pre-trained ViT model (<xref ref-type="bibr" rid="B10">Dosovitskiy et&#xa0;al., 2020</xref>) performs worse than MAE (<xref ref-type="bibr" rid="B13">He et&#xa0;al., 2022</xref>). As shown in <xref ref-type="fig" rid="f6">
<bold>Figure&#xa0;6</bold>
</xref>, MAE is a composite of an encoder and a decoder that are optimized by a reconstruction loss, <italic>&#x2112;</italic>
<sub>
<italic>recon</italic>
</sub>=||<italic>input</italic>,&#x2009;<italic>target</italic>||<sub>2</sub> where <italic>input</italic> is the original image and <italic>target</italic> denotes the reconstructed image. During the training process, the original image <italic>input</italic> is split into several patches that are randomly blocked. The encoder aims to extract necessary information from the blocked image and the decoder is required to fill the blocked patches. As the optimization does not require labels, it falls under self-supervised learning.</p>
<fig id="f6" position="float">
<label>Figure&#xa0;6</label>
<caption>
<p>The high-level architecture of MAE [13]. With MAE, an image is split into patches that are then randomly blocked. The unblocked patches are fed to an encoder, followed by a decoder to reconstruct the whole input image. After the unsupervised pre-training, the decoder is discarded and only the encoder is utilized in the downstream task. The input is not blocked and a specific classifier is added after the encoder when fine-tuning the model in a target task.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-1010981-g006.tif"/>
</fig>
<p>The decoder in MAE is discarded and the encoder is utilized in the second step, followed by a linear layer and a softmax operation to do classification. The encoder and the added linear layer are fine-tuned in the PlantCLEF2022 dataset, optimized by the cross entropy loss, <italic>&#x2112;</italic>
<sub>
<italic>ce</italic>
</sub>=&#x2212;<italic>log</italic>(<italic>p</italic>(<italic>y</italic>
<sub>
<italic>j</italic>
</sub>)) where <italic>j</italic> is the ground truth index and <italic>p</italic>(<italic>y</italic>) is the output of softmax operation. Different from the first step, the input is not split into patches and blocked. The main characteristic of the second step is the PlantCLEF2022 dataset, related to the plant disease recognition dataset. We highlight that the second step is outlined and trained in our previous paper (<xref ref-type="bibr" rid="B42">Xu et&#xa0;al., 2022c</xref>) for the PlantCLEF2022 challenge and thus is not outlined and trained in this paper.</p>
<p>In the third step, the added linear layer in the second step is replaced by a new linear layer. To be clear, the encoder and the new linear layer in this step are fine-tuned in a specific plant disease recognition dataset. The cross-entropy loss is again utilized to optimize the whole network. As mentioned before, the first and second steps are executed in other papers and thus only the third step is required for this paper. We have termed our strategy dual transfer learning since the model is trained with two other datasets and transferred twice.</p>
<p>We believe that the first step is not mandatory for better performance in versatile plant disease recognition but contributes to the reduction of the training time for the whole system. As shown in <xref ref-type="fig" rid="f5">
<bold>Figure&#xa0;5B</bold>
</xref>, we can pre-train a model in the PlantCLEF2022 dataset and then fine-tune it for the plant disease dataset. Unfortunately, this setting may entail a long training epoch in PlantCLEF2022 to have a better performance, such as 800 epochs in MAE (<xref ref-type="bibr" rid="B13">He et&#xa0;al., 2022</xref>). In contrast, we only train 100 epochs for the second step and hence can save time. Besides, by training an MAE model in a self-supervised way, one decoder is trained at the same time which needs more time for one epoch. Therefore, our dual transfer learning reduces training time <italic>via</italic> utilizing the public model from MAE (<xref ref-type="bibr" rid="B13">He et&#xa0;al., 2022</xref>).</p>
</sec>
</sec>
<sec id="s3">
<label>3</label>
<title>Experiment</title>
<sec id="s3_1">
<label>3.1</label>
<title>Experimental settings</title>
<p>
<bold>Dataset.</bold> For each original dataset in <xref ref-type="table" rid="T1">
<bold>Table&#xa0;1</bold>
</xref>, we split them into training, validation, and testing datasets. The training dataset is leveraged to train the models while the validation one is only used to choose the best-trained model from different epochs. Then, the best model is evaluated in the testing dataset. If there is a testing dataset with annotations in the original dataset, we directly used the original testing dataset. Otherwise, the whole original dataset is split into training, testing, and validation datasets in different percentages or an exact number of images. To be more specific, the original testing datasets in PlantDocCls and TaiwanTomato are directly used while a new testing dataset is made for other datasets.</p>
<p>For each plant disease dataset, we consider two training cases, generic and few-shot cases. Different percentages of the training dataset are utilized in the generic case, such as 20% and 40%, while only several images for each class are taken to train the model in the few-shot case. To summarize, we set eight dataset modes, as shown in <xref ref-type="table" rid="T2">
<bold>Table&#xa0;2</bold>
</xref>, four percentages as training in generic cases and 4 types of few-shot cases. Except for ratio80, 20% is taken for the validation and testing datasets for all experiments. The validation and testing datasets are the same for the generic and few-shot cases. Furthermore, the dataset splitting was randomly executed once only, by which the images of each dataset mode are fixed for all compared models or strategies. Although the percentage of validation and testing datasets is the same for most of the dataset modes, the images are different because of a different random process.</p>
<table-wrap id="T2" position="float">
<label>Table&#xa0;2</label>
<caption>
<p>The settings in different dataset modes for the original dataset without labeled testing dataset.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" align="left">Dataset case</th>
<th valign="top" align="center">Dataset mode</th>
<th valign="top" align="center">Training</th>
<th valign="top" align="center">Validation</th>
<th valign="top" align="center">Testing</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">
<bold>Generic case</bold>
</td>
<td valign="top" align="center">Ratio20 <break/>Ratio40 <break/>Ratio60<break/>Ratio80</td>
<td valign="top" align="center">20% <break/>40% <break/>60% <break/>80% </td>
<td valign="top" align="center">20% <break/>20% <break/>20% <break/>10% </td>
<td valign="top" align="center">20% <break/>20% <break/>20% <break/>10% </td>
</tr>
<tr>
<td valign="top" align="left">
<bold>Few-shot case</bold>
</td>
<td valign="top" align="center">1-shot <break/>5-shot <break/>10-shot <break/>20-shot </td>
<td valign="top" align="center">1 <break/>5 <break/>10 <break/>20 </td>
<td valign="top" align="center">20% <break/>20% <break/>20% <break/>20% </td>
<td valign="top" align="center">20% <break/>20% <break/>20% <break/>20% </td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>The splitting was random once only, by which the images of each dataset mode are fixed for all compared models or transfer learning strategies. Although the percentage of validation and testing dataset was the same for most of the dataset modes, the images are different because of a different random process.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>
<bold>Comparison methods</bold>. To validate our method, we designed several comparisons with different strategies or models. To choose the compared methods, we held to the following features: with transfer learning or without transfer learning, CNN-based or ViT-based, supervised or self-supervised, and trained with PlantCLEF2022 or not. Simultaneously, we do not want to pre-train the models because of our lack of GPUs and the almost 3 million images in PlantCLEF2022. Based on these two ideas, the compared methods are described below and more interesting methods are listed in <xref ref-type="table" rid="T3">
<bold>Table&#xa0;3</bold>
</xref> with their corresponding characteristics.</p>
<list list-type="bullet">
<list-item>
<p>RN50. A ResNet50 model is trained from scratch with the target datasets shown in <xref ref-type="table" rid="T1">
<bold>Table&#xa0;1</bold>
</xref>.</p>
</list-item>
<list-item>
<p>RN50-IN. A ResNet50 model is pre-trained with the ImageNet (IN) dataset in a supervised way and then fine-tuned in the target datasets.</p>
</list-item>
<list-item>
<p>MoCo-v2. A MoCo-v2 model is pre-trained with the ImageNet dataset in a self-supervised way and then fine-tuned in the target datasets.</p>
</list-item>
<list-item>
<p>ViT. A ViT-large (<xref ref-type="bibr" rid="B10">Dosovitskiy et&#xa0;al., 2020</xref>) model is trained from scratch with the target datasets.</p>
</list-item>
<list-item>
<p>ViT-IN. A ViT-large model is pre-trained with the Imagenet dataset in a supervised way and then fine-tuned in the target datasets.</p>
</list-item>
<list-item>
<p>MAE. A ViT-large model is pre-trained with the ImageNet dataset in a self-supervised way. Specifically, MAE (<xref ref-type="bibr" rid="B13">He et&#xa0;al., 2022</xref>) uses reconstruction loss to learn better performance with a high occlusion.</p>
</list-item>
<list-item>
<p>Our model. We fine-tuned a ViT model from MAE with the PlantCLEF2022 dataset and then fine-tuned it again with the target datasets.</p>
</list-item>
</list>
<table-wrap id="T3" position="float">
<label>Table&#xa0;3</label>
<caption>
<p>The characteristics of the compared methods.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" align="center">Case</th>
<th valign="top" align="center">Name</th>
<th valign="top" align="center">Model</th>
<th valign="top" align="center">ImageNet</th>
<th valign="top" align="center">PlantCLEF2022</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1<break/>2<break/>3<break/>4<break/>5<break/>6<break/>7<break/>8<break/>9<break/>10</td>
<td valign="top" align="center">RN50<break/>RN50-IN<break/>-<break/>MoCo-v2<break/>-<break/>ViT<break/>ViT-IN<break/>-<break/>MAE<break/>Ours</td>
<td valign="top" align="center">CNN<break/>CNN<break/>CNN<break/>CNN<break/>CNN<break/>ViT<break/>ViT<break/>ViT<break/>ViT<break/>ViT</td>
<td valign="top" align="center">N/A<break/>Supervised<break/>N/A<break/>Self-supervised<break/>Self-supervised<break/>N/A<break/>Supervised<break/>N/A<break/>Self-supervised<break/>Self-supervised</td>
<td valign="top" align="center">N/A<break/>N/A<break/>Self-supervised<break/>N/A<break/>Supervised<break/>N/A<break/>N/A<break/>Self-supervised<break/>N/A<break/>Supervised</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>N/A denotes not available or not used. We evaluated the compared methods from these viewpoints: no pre-training process because of our lack of GPUs, and showing the impacts of the basic model (CNN orViT), supervised or self-supervised, plant-related dataset (ImageNet or PlantCLEF2022), and dual transfer learning strategy. The named methods are compared in our paper while the other methods are encouraged and left for future studies considering the availability of GPUs.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>We noticed that there were several other possible strategies. For instance, it is interesting to directly pre-train a ViT model with only the PlantCLEF2022 dataset in a self-supervised manner, no ImageNet, shown as Case 8 in <xref ref-type="table" rid="T3">
<bold>Table&#xa0;3</bold>
</xref>. Further, pre-training an RN50 model with the PlantCLEF2022 dataset in a self-supervised manner is also encouraged to distinguish the impact of convolution neural networks (CNNs) and vision transformers (ViTs), shown as Case 3 in <xref ref-type="table" rid="T3">
<bold>Table&#xa0;3</bold>
</xref>. Simultaneously, fine-tuning a MoCo-v2 model in the PlantCLEF2022 dataset is also inspired to see the difference between CNN and ViT, shown as Case 5 in <xref ref-type="table" rid="T3">
<bold>Table&#xa0;3</bold>
</xref>, even if we expect a lower performance because MoCo-v2 has a lower accuracy in ImageNet than MAE. However, training these models is too expensive. It is estimated that pre-training a ViT-large model as MAE costs more than <italic>five</italic> months with our current computation devices, four RTX 3090 GPUs. Therefore, these possible strategies are left for future studies.</p>
<p>
<bold>Implementation details.</bold> As mentioned in Section 2.3, we have used the pre-trained ViT-L model from our previous paper (<xref ref-type="bibr" rid="B42">Xu et&#xa0;al., 2022c</xref>). Hence, we only focus on the last fine-tuning process in this paper, i.e. fine-tuning the ViT-L model in the plant disease recognition dataset. The ViT-L model has 24 transformer blocks with a hidden size of 1024, an MLP size of 4096, and 16 heads for each multi-head attention layer. The ViT-L model has approximately 307 million trainable parameters in total.</p>
<p>For a fair comparison, all models or transfer learning strategies were executed with the same settings with most of them following the fine-tuning schemes in MAE (<xref ref-type="bibr" rid="B13">He et&#xa0;al., 2022</xref>). In detail, the basic learning rate <italic>lr<sub>b</sub>
</italic> was 0.001, and the actual learning <italic>lr<sub>a</sub>
</italic> = <italic>lr<sub>b</sub>
</italic> * <italic>batch</italic>/256 where <italic>batch</italic> was the batch size for different training dataset modes. The model was warmed up in 5 epochs with the learning rate increasing linearly from the first epoch to the set learning rate. Furthermore, 0.05 weight decay and 0.65 layer decay were utilized. Mixup (<xref ref-type="bibr" rid="B46">Zhang et&#xa0;al., 2017</xref>) and CutMix (<xref ref-type="bibr" rid="B45">Yun et&#xa0;al., 2019</xref>) were adopted as data augmentation methods.</p>
<p>The main change from MAE experimental setting was the batch size. Considering the number of images in each dataset, in the generic case, the batch size was 64 for CGIARWheat, Strawberry2021, CitrusLeaf, and TaiwanTomato, while it was 128 for other datasets. In terms of the few-shot case, the number of classes was one factor to set as the batch size should not be larger than the number of classes in the 1-shot case. Specifically, the batch size was 4 for most of the datasets, except for CGIARWheat with 2, IVADLTomato with 8, PlantDocCls with 16, PlantVillage with 32, and Rice1426 with 8. Besides, the generic case was trained with four GPUs while the few-shot cases were trained with only one GPU. To evaluate during thetraining process, the models were trained for 50 epochs and validated after every 5 epochs in the validation dataset, including the first epoch. The best models were tested in the testing datasets.</p>
<p>
<bold>Evaluation metric.</bold> <italic>Accuracy</italic>, a common evaluation metric for image classification (<xref ref-type="bibr" rid="B10">Dosovitskiy et&#xa0;al., 2020</xref>; <xref ref-type="bibr" rid="B41">Xu et&#xa0;al., 2022b</xref>; <xref ref-type="bibr" rid="B13">He et&#xa0;al., 2022</xref>) was leveraged to assess different methods in a specific dataset. Since we aim to achieve versatile plant disease recognition performance, the <italic>mean accuracy</italic>, <italic>mAcc</italic>, over all datasets was utilized and computed as follows:</p>
<disp-formula>
<label>(1)</label>
<mml:math display="block" id="M1">
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>c</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi>M</mml:mi>
</mml:mfrac>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>M</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>c</mml:mi>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <italic>Acc<sub>i</sub>
</italic> is the testing accuracy in the <italic>i</italic>-th dataset and <italic>N</italic> is the total number of datasets. To assess the generality, testing accuracy and mean testing accuracy was employed, instead of validation accuracy and mean validation accuracy as used in MAE (<xref ref-type="bibr" rid="B13">He et&#xa0;al., 2022</xref>). In general, high testing accuracy and mean testing accuracy were desired.</p>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Experimental results</title>
<sec id="s3_2_1">
<label>3.2.1</label>
<title>Main result</title>
<p>As our main objective was achieving versatile plant disease recognition with a limited dataset, we first compared our method to other strategies. <xref ref-type="table" rid="T4">
<bold>Table&#xa0;4</bold>
</xref> displays the mean testing accuracy of different methods over the 12 plant disease datasets mentioned in <xref ref-type="table" rid="T1">
<bold>Table&#xa0;1</bold>
</xref> and <xref ref-type="fig" rid="f7">
<bold>Figure&#xa0;7</bold>
</xref> illustrates the tendency of mean testing accuracy of various methods in few-shot case and generic case respectively. The testing accuracy, the curve of validation loss, and the accuracy for each dataset can be found in the <xref ref-type="supplementary-material" rid="SM1">
<bold>Supplementary Material</bold>
</xref>. As shown in <xref ref-type="table" rid="T4">
<bold>Table&#xa0;4</bold>
</xref>, the experimental results suggested that our method surpasses other methods by a clear margin across all dataset modes. Specifically, our method achieves 86.29 <italic>mAcc</italic> in a 20-shot case where only 20 images per class are utilized to train the models, compared to the second-best method, RN50-IN. We observed that the gap between our method and other methods becomes less when the number of training images increases. For example, the gap between our method and the second-best method, RN50-IN, in Ratio20 is 14.02 and becomes 2.37 in Ratio80, which suggests that a limited training dataset is one main obstacle for current methods.</p>
<fig id="f7" position="float">
<label>Figure&#xa0;7</label>
<caption>
<p>Curves of average testing accuracy <italic>mAcc</italic> of different methods in various training dataset modes over the 12 plant disease datasets.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-1010981-g007.tif"/>
</fig>
<table-wrap id="T4" position="float">
<label>Table&#xa0;4</label>
<caption>
<p>The mean testing accuracy <italic>mAcc</italic> of different training methods over the 12 datasets for plant disease recognition detailed in <xref ref-type="table" rid="T1">
<bold>Table&#xa0;1</bold>
</xref>. .</p>
</caption>
<table frame="hsides">
<tbody>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="center">1-shot</td>
<td valign="top" align="center">5-shot</td>
<td valign="top" align="center">10-shot</td>
<td valign="top" align="center">20-shot</td>
<td valign="top" align="center">Ratio20</td>
<td valign="top" align="center">Ratio40</td>
<td valign="top" align="center">Ratio60</td>
<td valign="top" align="center">Ratio80</td>
</tr>
<tr>
<td valign="top" align="left">RN50<break/>RN50-IN<break/>MoCo-v2<break/>ViT<break/>ViT-IN<break/>MAE<break/>Ours</td>
<td valign="top" align="center">26.33<break/>23.46<break/>23.28<break/>27.56<break/>23.02<break/>27.81<break/>
<bold>44.28</bold>
</td>
<td valign="top" align="center">27.38<break/>52.03<break/>47.27<break/>36.96<break/>30.87<break/>34.11<break/>
<bold>69.83</bold>
</td>
<td valign="top" align="center">31.75<break/>64.28<break/>60.93<break/>40.01<break/>35.94<break/>44.08<break/>
<bold>80.73</bold>
</td>
<td valign="top" align="center">38.13<break/>73.53<break/>72.38<break/>45.14<break/>40.83<break/>49.26<break/>
<bold>86.29</bold>
</td>
<td valign="top" align="center">53.71<break/>76.77<break/>66.58<break/>51.93<break/>51.64<break/>64.90<break/>
<bold>90.79</bold>
</td>
<td valign="top" align="center">65.19<break/>88.78<break/>81.68<break/>59.40<break/>59.42<break/>83.23<break/>
<bold>92.55</bold>
</td>
<td valign="top" align="center">67.91<break/>89.58<break/>83.84<break/>60.71<break/>62.67<break/>86.65<break/>
<bold>93.23</bold>
</td>
<td valign="top" align="center">71.07<break/>90.97<break/>85.28<break/>64.46<break/>65.53<break/>88.76<break/>
<bold>93.34</bold>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>The best average accuracy for each dataset mode is in boldface.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>In terms of the impact of transfer learning, the CNN-based method, RN50-IN, has the second-best mean testing accuracy, much higher than its counterpart, RN50 training from scratch, in the target dataset. However, ViT-IN shows its inferiority for a limited training dataset while more training images lead to a minor increase. We postulate that ViT is harder to train than the original ViT-IN, as suggested in the original paper (<xref ref-type="bibr" rid="B10">Dosovitskiy et&#xa0;al., 2020</xref>). In contrast, CNN has been regularly developed in the last decade, and thus the optimizing problem has been largely mitigated. A similar phenomenon exists in the loss function to train the models. For example, MoCo-v2 (<xref ref-type="bibr" rid="B7">Chen X. et&#xa0;al., 2020</xref>) scores 71.1 top-1 in accuracy in ImageNet while RN50 (<xref ref-type="bibr" rid="B14">He et&#xa0;al., 2016</xref>) obtains 77.15. On the contrary, MAE (<xref ref-type="bibr" rid="B13">He et&#xa0;al., 2022</xref>) achieves a 85.9 top-1 accuracy score. A comparison between ViT, ViT-IN, and MAE suggests that the self-supervised loss function contributes to the improvement of the ViT-based model in all training dataset modes.</p>
<p>Our method is based on MAE and is pre-trained one more time in the PlantCLEF2022 dataset. Excitingly, our method obtained 35.42, 36.65, and 37.03 higher accuracy scores than MAE in 5-shot, 10-shot, and 20-shot, respectively. The soar of the mean testing accuracy of our method compared to MAE proves that PlantCLEF2022 is essentially beneficial for achieving versatile plant disease recognition with a limited dataset. Our method not only achieved the best performance but also converged faster than other methods. For example, the validation loss was minimized to a low value within 5 epochs for the Ratio40 case. Please refer to <xref ref-type="supplementary-material" rid="SM1">
<bold>Figures S1</bold>
</xref> and <xref ref-type="supplementary-material" rid="SM1">
<bold>S2 in the Supplementary Material</bold>
</xref>.</p>
<p>Finally, 10 images for each class are available in PDD271* (<xref ref-type="bibr" rid="B21">Liu et&#xa0;al., 2021</xref>) and we used them as a few-shot learning task. Our method achieved a testing accuracy of 81.9 with only 1,355 images for both training and testing, compared to the original accuracy of 85.4 with 154,701 and 21,889 images for training and testing (<xref ref-type="bibr" rid="B21">Liu et&#xa0;al., 2021</xref>).</p>
</sec>
<sec id="s3_2_2">
<label>3.2.2</label>
<title>Beyond plant disease</title>
<p>Beyond achieving versatile plant disease recognition, we believe that our transfer learning strategy is also beneficial for other types of plant-related work. We performed two types of experiments over two datasets. The Strawberry2021<xref ref-type="fn" rid="fn7">
<sup>7</sup>
</xref> dataset, designed to predict plant growth stages, such as the young leaves and flowering stages, includes 557 images and 4 classes. The CottonWeedID15 (<xref ref-type="bibr" rid="B8">Chen et&#xa0;al., 2022</xref>) dataset requires the model to distinguish 15 types of weed in a cotton field, with 5,187 images in total.</p>
<p>The mean testing accuracy is displayed in <xref ref-type="table" rid="T5">
<bold>Table&#xa0;5</bold>
</xref> while the details can be found in the <xref ref-type="supplementary-material" rid="SM1">
<bold>Supplementary Material</bold>
</xref>. It is interesting that our method scored a mean testing accuracy of 97.60 in a 5-shot case where only 5 images of each label were utilized to train the network. The current popular strategy obtains similar results but in the Ratio40 case, with approximately 121 images per class. The experimental results suggest that our method can also contribute to plant-related applications beyond plant disease recognition with few training samples.</p>
<table-wrap id="T5" position="float">
<label>Table&#xa0;5</label>
<caption>
<p>The mean testing accuracy of different training methods over Strawberry2021 and CottonWeedID15.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" align="left"/>
<th valign="top" align="center">1-shot</th>
<th valign="top" align="center">5-shot</th>
<th valign="top" align="center">10-shot</th>
<th valign="top" align="center">20-shot</th>
<th valign="top" align="center">Ratio20</th>
<th valign="top" align="center">Ratio40</th>
<th valign="top" align="center">Ratio60</th>
<th valign="top" align="center">Ratio80</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">RN50</td>
<td valign="top" align="left">20.50</td>
<td valign="top" align="left">21.75</td>
<td valign="top" align="left">26.45</td>
<td valign="top" align="left">35.95</td>
<td valign="top" align="left">39.90</td>
<td valign="top" align="left">68.90</td>
<td valign="top" align="left">66.90</td>
<td valign="top" align="left">78.25</td>
</tr>
<tr>
<td valign="top" align="left">RN50-IN</td>
<td valign="top" align="left">45.55</td>
<td valign="top" align="left">75.95</td>
<td valign="top" align="left">87.90</td>
<td valign="top" align="left">87.15</td>
<td valign="top" align="left">60.85</td>
<td valign="top" align="left">98.00</td>
<td valign="top" align="left">98.35</td>
<td valign="top" align="left">98.55</td>
</tr>
<tr>
<td valign="top" align="left">MoCo-v2</td>
<td valign="top" align="left">45.65</td>
<td valign="top" align="left">70.25</td>
<td valign="top" align="left">84.65</td>
<td valign="top" align="left">86.05</td>
<td valign="top" align="left">66.90</td>
<td valign="top" align="left">96.45</td>
<td valign="top" align="left">96.20</td>
<td valign="top" align="left">97.50</td>
</tr>
<tr>
<td valign="top" align="left">ViT</td>
<td valign="top" align="left">32.70</td>
<td valign="top" align="left">39.90</td>
<td valign="top" align="left">44.30</td>
<td valign="top" align="left">51.45</td>
<td valign="top" align="left">56.25</td>
<td valign="top" align="left">65.65</td>
<td valign="top" align="left">75.40</td>
<td valign="top" align="left">80.90</td>
</tr>
<tr>
<td valign="top" align="left">ViT-IN</td>
<td valign="top" align="left">27.20</td>
<td valign="top" align="left">33.35</td>
<td valign="top" align="left">43.10</td>
<td valign="top" align="left">45.25</td>
<td valign="top" align="left">55.05</td>
<td valign="top" align="left">68.30</td>
<td valign="top" align="left">75.50</td>
<td valign="top" align="left">82.35</td>
</tr>
<tr>
<td valign="top" align="left">MAE</td>
<td valign="top" align="left">17.45</td>
<td valign="top" align="left">41.45</td>
<td valign="top" align="left">59.50</td>
<td valign="top" align="left">59.20</td>
<td valign="top" align="left">85.20</td>
<td valign="top" align="left">97.80</td>
<td valign="top" align="left">98.35</td>
<td valign="top" align="left">98.75</td>
</tr>
<tr>
<td valign="top" align="left">Ours</td>
<td valign="top" align="left">
<bold>73.90</bold>
</td>
<td valign="top" align="left">
<bold>97.60</bold>
</td>
<td valign="top" align="left">
<bold>97.55</bold>
</td>
<td valign="top" align="left">
<bold>97.85</bold>
</td>
<td valign="top" align="left">
<bold>99.80</bold>
</td>
<td valign="top" align="left">
<bold>99.35</bold>
</td>
<td valign="top" align="left">
<bold>98.80</bold>
</td>
<td valign="top" align="left">
<bold>99.70</bold>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>The best average accuracy for each dataset mode shows in boldface.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s3_2_3">
<label>3.2.3</label>
<title>Discussion</title>
<p>
<bold>Limited data</bold> is one main challenge in achieving high performance in the computer vision field (<xref ref-type="bibr" rid="B40">Xu et&#xa0;al., 2022a</xref>) and plant disease recognition (<xref ref-type="bibr" rid="B22">Lu et&#xa0;al., 2022</xref>; <xref ref-type="bibr" rid="B41">Xu et&#xa0;al., 2022b</xref>). Through our experimental results, we argue that the required amount of training dataset is partly dependent on the model or pre-trained model. As shown in <xref ref-type="table" rid="T4">
<bold>Table&#xa0;4</bold>
</xref>, the mean testing accuracy of RN50-IN was 83.23 in the Ratio40 case and gains 12.76 from the Ratio20 case, while our method only had a 1.76 increase. Through this analysis, we believe that our method mitigates the requirement of a large dataset for plant disease recognition.</p>
<p>Furthermore, we emphasized that more training data tends to contribute to high performance but the gains become lower when a decent performance is obtained. For example, 20 percent more data only resulted in an increase of 0.11 in mean testing accuracy score in the Ratio60 case with our strategy. Therefore, recognizing the limitation of increasing data is also essential for practical applications. Sometimes, we may have to resort to alternative ways to have higher performance, instead of just increasing the training dataset.</p>
<p>
<bold>Future work.</bold> First, we emphasize here that we are not aiming to achieve the best performance with our method in this paper. Instead, we propose a versatile plant disease recognition method with a limited training dataset. Therefore, we encourage our method to be used as a baseline for future works, although we did obtain superior performance in plant disease recognition. For example, is the PlantCLEF2022 dataset beneficial for a CNN-based network? In this way, we can pre-train the RN50 model and then fine-tune it in the target dataset. Moreover, it is interesting to analyze the reason why the same model and strategy behave differently in different datasets. For example, our method achieved a score of 97.4 in testing accuracy in the 20-shot case in the PlantVillage dataset as shown in <xref ref-type="supplementary-material" rid="SM1">
<bold>Table S1</bold>
</xref> while scoring only 63.8 in the IVADLTomato dataset as shown in <xref ref-type="supplementary-material" rid="SM1">
<bold>Table S9</bold>
</xref>. Furthermore, we only validated our method in plant disease recognition, and encourage deploying our method to perform object detection and segmentation (<xref ref-type="bibr" rid="B41">Xu et&#xa0;al., 2022b</xref>). We also highlight combining our transfer learning with other unsupervised or self-supervised learning in the future. For instance, using a few labeled images to train a model and then leveraging the trained model to generate pseudo labels for unlabeled images (<xref ref-type="bibr" rid="B20">Li and Chao, 2021</xref>) and reduce annotation cost. Our preliminary results in Strawberry2021 and CottonWeedID15 suggest that our transfer learning strategy is not just promising for plant disease but also plant stage recognition and weed identification. We encourage more plant-related applications to deploy our method as a baseline.</p>
</sec>
</sec>
</sec>
<sec id="s4" sec-type="conclusion">
<title>4 Conclusion</title>
<p>We proposed a simple but nontrivial transfer learning strategy to achieve versatile plant disease recognition with limited data. Our method strikingly outperforms current strategies, not only on 12 plant disease recognition datasets but also in one plant growth stage prediction and one weed detection dataset. One main characteristic of our method is the use of PlantCLEF2022, a plant-related dataset including 2,885,052 images and 80,000 classes with huge image variations, which enables our transfer learning to be beneficial for versatile plant disease recognition tasks. Considering the large-scale dataset, our method employs a vision transformer (ViT) model because of its higher performance than the widely used convolution neural network. To reduce the computation cost, dual transfer learning is leveraged as the ViT model is first pre-trained with ImageNet in a self-supervised manner because the ImageNet dataset is different to the plant disease dataset. The model is then fine-tuned with PlantCLEF2022 in a supervised manner. We believe that our transfer learning strategy contributes to the field and to fuel the community, our codes and the pre-trained model are publicly available.</p>
</sec>
<sec id="s5" sec-type="data-availability">
<title>Data availability statement</title>
<p>Publicly available datasets were analyzed in this study. Their download links can be found here: <uri xlink:href="https://github.com/xml94/MAE_plant_disease">https://github.com/xml94/MAE_plant_disease</uri>.</p>
</sec>
<sec id="s6" sec-type="author-contributions">
<title>Author contributions</title>
<p>MX: conceptualization, methodology, software, writing - original draft, writing - review and editing. SY: supervision and writing - review and editing. YJ: writing - review and editing. DP: supervision, project administration, funding acquisition, writing - review and editing. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec id="s7" sec-type="funding-information">
<title>Funding</title>
<p>This research is partly supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No.2019R1A6A1A09031717), supported by the National Research Foundation of Korea(NRF) grant funded by the Korean government (MSIT). (NRF-2021R1A2C1012174), and supported by the Korea Institute of Planning and Evaluation for Technology in Food, Agriculture, and Forestry (IPET) and the Korea Smart Farm R&amp;D Foundation (KosFarm) through the Smart Farm Innovation Technology Development Program, funded by the Ministry of Agriculture, Food and Rural Affairs (MAFRA) and the Ministry of Science and ICT (MSIT), Rural Development Administration (RDA) (No. 421005-04).</p>
</sec>
<sec id="s8" sec-type="acknowledgment">
<title>Acknowledgments</title>
<p>We appreciated the valuable suggestions from the reviewers to make the paper clear and easier to follow.</p>
</sec>
<sec id="s9" sec-type="COI-statement">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="s10" sec-type="disclaimer">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<sec id="s11" sec-type="supplementary-material">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fpls.2022.1010981/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fpls.2022.1010981/full#supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="DataSheet_1.pdf" id="SM1" mimetype="application/pdf"/>
</sec>
<fn-group>
<fn id="fn1">
<label>1</label>
<p>
<uri xlink:href="https://github.com/xml94/MAE_plant_disease">https://github.com/xml94/MAE_plant_disease</uri>
</p>
</fn>
<fn id="fn2">
<label>2</label>
<p>
<uri xlink:href="https://data.mendeley.com/datasets/ngdgg79rzb/1">https://data.mendeley.com/datasets/ngdgg79rzb/1</uri>
</p>
</fn>
<fn id="fn3">
<label>3</label>
<p>
<uri xlink:href="https://github.com/IVADL/tomato-disease-detector">https://github.com/IVADL/tomato-disease-detector</uri>
</p>
</fn>
<fn id="fn4">
<label>4</label>
<p>
<uri xlink:href="https://www.kaggle.com/datasets/shadabhussain/cgiar-computer-vision-for-crop-disease?resource=download">https://www.kaggle.com/datasets/shadabhussain/cgiar-computer-vision-for-crop-disease?resource=download</uri>
</p>
</fn>
<fn id="fn5">
<label>5</label>
<p>
<uri xlink:href="https://github.com/xml94/MAE_plant_disease/blob/main/visualize_dataset/dataset.md">https://github.com/xml94/MAE_plant_disease/blob/main/visualize_dataset/dataset.md</uri>
</p>
</fn>
<fn id="fn6">
<label>6</label>
<p>
<uri xlink:href="https://www.aicrowd.com/challenges/lifeclef-2022-plant">https://www.aicrowd.com/challenges/lifeclef-2022-plant</uri>
</p>
</fn>
<fn id="fn7">
<label>7</label>
<p>
<uri xlink:href="https://aistudio.baidu.com/aistudio/datasetdetail/98233">https://aistudio.baidu.com/aistudio/datasetdetail/98233</uri>
</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Abade</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Ferreira</surname> <given-names>P. A.</given-names>
</name>
<name>
<surname>de Barros Vidal</surname> <given-names>F.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Plant diseases recognition on images using convolutional neural networks: A systematic review</article-title>. <source>Comput. Electron. Agric.</source> <volume>185</volume>, <fpage>106125</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.compag.2021.106125</pub-id>
</citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Abbas</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Jain</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Gour</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Vankudothu</surname> <given-names>S.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Tomato plant disease detection using transfer learning with c-gan synthetic images</article-title>. <source>Comput. Electron. Agric.</source> <volume>187</volume>, <fpage>106279</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.compag.2021.106279</pub-id>
</citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Afifi</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Alhumam</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Abdelwahab</surname> <given-names>A.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Convolutional neural network for automatic identification of plant diseases with limited data</article-title>. <source>Plants</source> <volume>10</volume>, <fpage>28</fpage>. doi: <pub-id pub-id-type="doi">10.3390/plants10010028</pub-id>
</citation>
</ref>
<ref id="B4">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Azizi</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Mustafa</surname> <given-names>B.</given-names>
</name>
<name>
<surname>Ryan</surname> <given-names>F.</given-names>
</name>
<name>
<surname>Beaver</surname> <given-names>Z.</given-names>
</name>
<name>
<surname>Freyberg</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Deaton</surname> <given-names>J.</given-names>
</name>
<etal/>
</person-group>. (<year>2021</year>). &#x201c;<article-title>Big self-supervised models advance medical image classification</article-title>,&#x201d; in <source>Proceedings of the IEEE/CVF international conference on computer vision</source>, <publisher-loc>Montreal</publisher-loc>: <publisher-name>IEEE</publisher-name>. <fpage>3478</fpage>&#x2013;<lpage>3488</lpage>.</citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Barbedo</surname> <given-names>J. G. A.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Plant disease identification from individual lesions and spots using deep learning</article-title>. <source>Biosyst. Eng.</source> <volume>180</volume>, <fpage>96</fpage>&#x2013;<lpage>107</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.biosystemseng.2019.02.002</pub-id>
</citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Sun</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Nanehkaran</surname> <given-names>Y. A.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Using deep transfer learning for image-based plant disease identification</article-title>. <source>Comput. Electron. Agric.</source> <volume>173</volume>, <fpage>105393</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.compag.2020.105393</pub-id>
</citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Fan</surname> <given-names>H.</given-names>
</name>
<name>
<surname>Girshick</surname> <given-names>R.</given-names>
</name>
<name>
<surname>He</surname> <given-names>K.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Improved baselines with momentum contrastive learning</article-title>. <source>arXiv. preprint. arXiv:2003.04297</source>.</citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Lu</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Li</surname> <given-names>Z.</given-names>
</name>
<name>
<surname>Young</surname> <given-names>S.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Performance evaluation of deep transfer learning on multi-class identification of common weed species in cotton production systems</article-title>. <source>Comput. Electron. Agric.</source> <volume>198</volume>, <fpage>107091</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.compag.2022.107091</pub-id>
</citation>
</ref>
<ref id="B9">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Deng</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Dong</surname> <given-names>W.</given-names>
</name>
<name>
<surname>Socher</surname> <given-names>R.</given-names>
</name>
<name>
<surname>Li</surname> <given-names>L.-J.</given-names>
</name>
<name>
<surname>Li</surname> <given-names>K.</given-names>
</name>
<name>
<surname>Fei-Fei</surname> <given-names>L.</given-names>
</name>
</person-group> (<year>2009</year>). &#x201c;<article-title>Imagenet: A large-scale hierarchical image database</article-title>,&#x201d; in <source>2009 IEEE conference on computer vision and pattern recognition</source> (<publisher-loc>Miami Beach</publisher-loc>: <publisher-name>Ieee</publisher-name>), <fpage>248</fpage>&#x2013;<lpage>255</lpage>.</citation>
</ref>
<ref id="B10">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Dosovitskiy</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Beyer</surname> <given-names>L.</given-names>
</name>
<name>
<surname>Kolesnikov</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Weissenborn</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Zhai</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Unterthiner</surname> <given-names>T.</given-names>
</name>
<etal/>
</person-group>. (<year>2020</year>). &#x201c;<article-title>An image is worth 16x16 words: Transformers for image recognition at scale</article-title>,&#x201d; in <source>International conference on learning representations</source>.</citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Egusquiza</surname> <given-names>I.</given-names>
</name>
<name>
<surname>Picon</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Irusta</surname> <given-names>U.</given-names>
</name>
<name>
<surname>Bereciartua-Perez</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Eggers</surname> <given-names>T.</given-names>
</name>
<name>
<surname>Klukas</surname> <given-names>C.</given-names>
</name>
<etal/>
</person-group>. (<year>2022</year>). <article-title>Analysis of few-shot techniques for fungal plant disease classification and evaluation of clustering capabilities over real datasets</article-title>. <source>Front. Plant Sci.</source> <volume>295</volume>. doi: <pub-id pub-id-type="doi">10.3389/fpls.2022.813237</pub-id>
</citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fan</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Luo</surname> <given-names>P.</given-names>
</name>
<name>
<surname>Mu</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Zhou</surname> <given-names>R.</given-names>
</name>
<name>
<surname>Tjahjadi</surname> <given-names>T.</given-names>
</name>
<name>
<surname>Ren</surname> <given-names>Y.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Leaf image based plant disease identification using transfer learning and feature fusion</article-title>. <source>Comput. Electron. Agric.</source> <volume>196</volume>, <fpage>106892</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.compag.2022.106892</pub-id>
</citation>
</ref>
<ref id="B13">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>He</surname> <given-names>K.</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Xie</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Li</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Doll&#xe1;r</surname> <given-names>P.</given-names>
</name>
<name>
<surname>Girshick</surname> <given-names>R.</given-names>
</name>
</person-group> (<year>2022</year>). &#x201c;<article-title>Masked autoencoders are scalable vision learners</article-title>,&#x201d; in <source>Proceedings of the IEEE/CVF conference on computer vision and pattern recognition</source>, <publisher-loc>New Orleans</publisher-loc>: <publisher-name>IEEE</publisher-name>. <fpage>16000</fpage>&#x2013;<lpage>16009</lpage>.</citation>
</ref>
<ref id="B14">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>He</surname> <given-names>K.</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Ren</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Sun</surname> <given-names>J.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>Deep residual learning for image recognition</article-title>,&#x201d; in <source>Proceedings of the IEEE conference on computer vision and pattern recognition</source>, <publisher-loc>Caesars Palace</publisher-loc>: <publisher-name>IEEE</publisher-name>. <fpage>770</fpage>&#x2013;<lpage>778</lpage>.</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hughes</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Salath&#xe9;</surname> <given-names>M</given-names>
</name>
</person-group>. (<year>2015</year>). <article-title>An open access repository of images on plant health to enable the development of mobile disease diagnostics</article-title>. <source>arXiv. preprint. arXiv:1511.08060</source>.</citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname> <given-names>B.</given-names>
</name>
<name>
<surname>Han</surname> <given-names>Y.-K.</given-names>
</name>
<name>
<surname>Park</surname> <given-names>J.-H.</given-names>
</name>
<name>
<surname>Lee</surname> <given-names>J.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Improved vision-based detection of strawberry diseases using a deep neural network</article-title>. <source>Front. Plant Sci.</source> <volume>11</volume>, <elocation-id>559172</elocation-id>. doi: <pub-id pub-id-type="doi">10.3389/fpls.2020.559172</pub-id>
</citation>
</ref>
<ref id="B17">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kolesnikov</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Beyer</surname> <given-names>L.</given-names>
</name>
<name>
<surname>Zhai</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Puigcerver</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Yung</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Gelly</surname> <given-names>S.</given-names>
</name>
<etal/>
</person-group>. (<year>2020</year>). &#x201c;<article-title>Big transfer (bit): General visual representation learning</article-title>,&#x201d; in <source>European Conference on computer vision</source> (<publisher-name>Springer</publisher-name>), <fpage>491</fpage>&#x2013;<lpage>507</lpage>.</citation>
</ref>
<ref id="B18">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kornblith</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Shlens</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Le</surname> <given-names>Q. V.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Do better imagenet models transfer better</article-title>?,&#x201d; in <source>Proceedings of the IEEE/CVF conference on computer vision and pattern recognition</source>, <publisher-loc>Long Beach</publisher-loc>: <publisher-name>IEEE</publisher-name>. <fpage>2661</fpage>&#x2013;<lpage>2671</lpage>.</citation>
</ref>
<ref id="B19">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Krizhevsky</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Sutskever</surname> <given-names>I.</given-names>
</name>
<name>
<surname>Hinton</surname> <given-names>G. E.</given-names>
</name>
</person-group> (<year>2012</year>). &#x201c;<article-title>Imagenet classification with deep convolutional neural networks</article-title>,&#x201d; in <source>Advances in neural information processing systems</source>, <publisher-loc>Lake Tahoe</publisher-loc> <volume>vol. 25</volume>. Eds. <person-group person-group-type="editor">
<name>
<surname>Pereira</surname> <given-names>F.</given-names>
</name>
<name>
<surname>Burges</surname> <given-names>C.</given-names>
</name>
<name>
<surname>Bottou</surname> <given-names>L.</given-names>
</name>
<name>
<surname>Weinberger</surname> <given-names>K.</given-names>
</name>
</person-group> (<publisher-name>Curran Associates, Inc</publisher-name>).</citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Chao</surname> <given-names>X.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Semi-supervised few-shot learning approach for plant diseases recognition</article-title>. <source>Plant Methods</source> <volume>17</volume>, <fpage>1</fpage>&#x2013;<lpage>10</lpage>. doi: <pub-id pub-id-type="doi">10.1186/s13007-021-00770-1</pub-id>
</citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Min</surname> <given-names>W.</given-names>
</name>
<name>
<surname>Mei</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>L.</given-names>
</name>
<name>
<surname>Jiang</surname> <given-names>S.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Plant disease recognition: A large-scale benchmark dataset and a visual region and loss reweighting approach</article-title>. <source>IEEE Trans. Image. Process.</source> <volume>30</volume>, <fpage>2003</fpage>&#x2013;<lpage>2015</lpage>. doi: <pub-id pub-id-type="doi">10.1109/TIP.2021.3049334</pub-id>
</citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lu</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Olaniyi</surname> <given-names>E.</given-names>
</name>
<name>
<surname>Huang</surname> <given-names>Y.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Generative adversarial networks (gans) for image augmentation in agriculture: A systematic review</article-title>. <source>Comput. Electron. Agric.</source> <volume>200</volume>, <fpage>107208</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.compag.2022.107208</pub-id>
</citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mohanty</surname> <given-names>S. P.</given-names>
</name>
<name>
<surname>Hughes</surname> <given-names>D. P.</given-names>
</name>
<name>
<surname>Salath&#xe9;</surname> <given-names>M.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Using deep learning for image-based plant disease detection</article-title>. <source>Front. Plant Sci.</source> <volume>7</volume>, <elocation-id>1419</elocation-id>. doi: <pub-id pub-id-type="doi">10.3389/fpls.2016.01419</pub-id>
</citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ngugi</surname> <given-names>L. C.</given-names>
</name>
<name>
<surname>Abelwahab</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Abo-Zahhad</surname> <given-names>M.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Recent advances in image processing techniques for automated leaf pest and disease recognition&#x2013;a review</article-title>. <source>Inf. Process. Agric.</source> <volume>8</volume>, <fpage>27</fpage>&#x2013;<lpage>51</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.inpa.2020.04.004</pub-id>
</citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Olaniyi</surname> <given-names>E.</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Lu</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Huang</surname> <given-names>Y.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Generative adversarial networks for image augmentation in agriculture: a systematic review</article-title>. <source>arXiv. preprint. arXiv:2204.04707</source>.</citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pan</surname> <given-names>S. J.</given-names>
</name>
<name>
<surname>Yang</surname> <given-names>Q.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>A survey on transfer learning</article-title>. <source>IEEE Trans. Knowledge. Data Eng.</source> <volume>22</volume>, <fpage>1345</fpage>&#x2013;<lpage>1359</lpage>. doi: <pub-id pub-id-type="doi">10.1109/TKDE.2009.191</pub-id>
</citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Qian</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>C.</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>L.</given-names>
</name>
<name>
<surname>Li</surname> <given-names>K.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Deep learning-based identification of maize leaf diseases is improved by an attention mechanism: Self-attention</article-title>. <source>Front. Plant Sci.</source> <volume>1154</volume>. doi: <pub-id pub-id-type="doi">10.3389/fpls.2022.864486</pub-id>
</citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rahman</surname> <given-names>C. R.</given-names>
</name>
<name>
<surname>Arko</surname> <given-names>P. S.</given-names>
</name>
<name>
<surname>Ali</surname> <given-names>M. E.</given-names>
</name>
<name>
<surname>Khan</surname> <given-names>M. A. I.</given-names>
</name>
<name>
<surname>Apon</surname> <given-names>S. H.</given-names>
</name>
<name>
<surname>Nowrin</surname> <given-names>F.</given-names>
</name>
<etal/>
</person-group>. (<year>2020</year>). <article-title>Identification and recognition of rice diseases and pests using convolutional neural networks</article-title>. <source>Biosyst. Eng.</source> <volume>194</volume>, <fpage>112</fpage>&#x2013;<lpage>120</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.biosystemseng.2020.03.020</pub-id>
</citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ramcharan</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Baranowski</surname> <given-names>K.</given-names>
</name>
<name>
<surname>McCloskey</surname> <given-names>P.</given-names>
</name>
<name>
<surname>Ahmed</surname> <given-names>B.</given-names>
</name>
<name>
<surname>Legg</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Hughes</surname> <given-names>D. P.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Deep learning for image-based cassava disease detection</article-title>. <source>Front. Plant Sci.</source> <volume>8</volume>, <elocation-id>1852</elocation-id>. doi: <pub-id pub-id-type="doi">10.3389/fpls.2017.01852</pub-id>
</citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rauf</surname> <given-names>H. T.</given-names>
</name>
<name>
<surname>Saleem</surname> <given-names>B. A.</given-names>
</name>
<name>
<surname>Lali</surname> <given-names>M. I. U.</given-names>
</name>
<name>
<surname>Khan</surname> <given-names>M. A.</given-names>
</name>
<name>
<surname>Sharif</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Bukhari</surname> <given-names>S. A. C.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>A citrus fruits and leaves dataset for detection and classification of citrus diseases through machine learning</article-title>. <source>Data Brief</source> <volume>26</volume>, <fpage>104340</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.dib.2019.104340</pub-id>
</citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sethy</surname> <given-names>P. K.</given-names>
</name>
<name>
<surname>Barpanda</surname> <given-names>N. K.</given-names>
</name>
<name>
<surname>Rath</surname> <given-names>A. K.</given-names>
</name>
<name>
<surname>Behera</surname> <given-names>S. K.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Deep feature based rice leaf disease identification using support vector machine</article-title>. <source>Comput. Electron. Agric.</source> <volume>175</volume>, <fpage>105527</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.compag.2020.105527</pub-id>
</citation>
</ref>
<ref id="B32">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Singh</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Jain</surname> <given-names>N.</given-names>
</name>
<name>
<surname>Jain</surname> <given-names>P.</given-names>
</name>
<name>
<surname>Kayal</surname> <given-names>P.</given-names>
</name>
<name>
<surname>Kumawat</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Batra</surname> <given-names>N.</given-names>
</name>
</person-group> (<year>2020</year>). &#x201c;<article-title>Plantdoc: a dataset for visual plant disease detection</article-title>,&#x201d; in <source>Proceedings of the 7th ACM IKDD CoDS and 25th COMAD</source>, <publisher-loc>Hyderabad</publisher-loc>: <publisher-name>ACM (Association for Computing Machinery)</publisher-name>. <fpage>249</fpage>&#x2013;<lpage>253</lpage>.</citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Thapa</surname> <given-names>R.</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>Q.</given-names>
</name>
<name>
<surname>Snavely</surname> <given-names>N.</given-names>
</name>
<name>
<surname>Belongie</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Khan</surname> <given-names>A.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>The plant pathology 2021 challenge dataset to classify foliar disease of apples</article-title>. doi: <pub-id pub-id-type="doi">10.1002/aps3.11390</pub-id>
</citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Thapa</surname> <given-names>R.</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>K.</given-names>
</name>
<name>
<surname>Snavely</surname> <given-names>N.</given-names>
</name>
<name>
<surname>Belongie</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Khan</surname> <given-names>A.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>The plant pathology challenge 2020 data set to classify foliar disease of apples</article-title>. <source>Appl. Plant Sci.</source> <volume>8</volume>, <elocation-id>e11390</elocation-id>. doi: <pub-id pub-id-type="doi">10.1002/aps3.11390</pub-id>
</citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Too</surname> <given-names>E. C.</given-names>
</name>
<name>
<surname>Yujian</surname> <given-names>L.</given-names>
</name>
<name>
<surname>Njuki</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Yingchun</surname> <given-names>L.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>A comparative study of fine-tuning deep learning models for plant disease identification</article-title>. <source>Comput. Electron. Agric.</source> <volume>161</volume>, <fpage>272</fpage>&#x2013;<lpage>279</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.compag.2018.03.032</pub-id>
</citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tripuraneni</surname> <given-names>N.</given-names>
</name>
<name>
<surname>Jordan</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Jin</surname> <given-names>C.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>On the theory of transfer learning: The importance of task diversity</article-title>. <source>Adv. Neural Inf. Process. Syst.</source> <volume>33</volume>, <fpage>7852</fpage>&#x2013;<lpage>7862</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.5555/3495724.3496382</pub-id>
</citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname> <given-names>F.</given-names>
</name>
<name>
<surname>Rao</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Luo</surname> <given-names>Q.</given-names>
</name>
<name>
<surname>Jin</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Jiang</surname> <given-names>Z.</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>W.</given-names>
</name>
<etal/>
</person-group>. (<year>2022</year>). <article-title>Practical cucumber leaf disease recognition using improved swin transformer and small sample size</article-title>. <source>Comput. Electron. Agric.</source> <volume>199</volume>, <fpage>107163</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.compag.2022.107163</pub-id>
</citation>
</ref>
<ref id="B38">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Wu</surname> <given-names>Z.</given-names>
</name>
<name>
<surname>Xiong</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Yu</surname> <given-names>S. X.</given-names>
</name>
<name>
<surname>Lin</surname> <given-names>D.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Unsupervised feature learning via non-parametric instance discrimination</article-title>,&#x201d; in <source>Proceedings of the IEEE conference on computer vision and pattern recognition</source>, <publisher-loc>Salt Lake City</publisher-loc>: <publisher-name>ACM (Association for Computing Machinery)</publisher-name>. <fpage>3733</fpage>&#x2013;<lpage>3742</lpage>.</citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xing</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Lee</surname> <given-names>H. J.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Crop pests and diseases recognition using danet with tldp</article-title>. <source>Comput. Electron. Agric.</source> <volume>199</volume>, <elocation-id>107144</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.compag.2022.107144</pub-id>
</citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xu</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Yoon</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Fuentes</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Park</surname> <given-names>D. S.</given-names>
</name>
</person-group> (<year>2022</year>a). <article-title>A comprehensive survey of image augmentation techniques for deep learning</article-title>. <source>arXiv. preprint. arXiv:2205.01491</source>. <publisher-loc>Bologna</publisher-loc>
</citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xu</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Yoon</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Fuentes</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Yang</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Park</surname> <given-names>D. S.</given-names>
</name>
</person-group> (<year>2022</year>b). <article-title>Style-consistent image translation: A novel data augmentation paradigm to improve plant disease recognition</article-title>. <source>Front. Plant Sci.</source> <volume>12</volume>, <fpage>773142</fpage>&#x2013;<elocation-id>773142</elocation-id>. doi: <pub-id pub-id-type="doi">10.3389/fpls.2021.773142</pub-id>
</citation>
</ref>
<ref id="B42">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Xu</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Yoon</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Jeong</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Lee</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Park</surname> <given-names>D. S.</given-names>
</name>
</person-group> (<year>2022</year>c). &#x201c;<article-title>Transfer learning with self-supervised vision transformer for large-scale plant identification</article-title>,&#x201d; in <source>International conference of the cross-language evaluation forum for European languages</source> (<publisher-name>Springer</publisher-name>), <fpage>2253</fpage>&#x2013;<lpage>2261</lpage>.</citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yadav</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Thakur</surname> <given-names>U.</given-names>
</name>
<name>
<surname>Saxena</surname> <given-names>R.</given-names>
</name>
<name>
<surname>Pal</surname> <given-names>V.</given-names>
</name>
<name>
<surname>Bhateja</surname> <given-names>V.</given-names>
</name>
<name>
<surname>Lin</surname> <given-names>J. C.-W.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Afd-net: Apple foliar disease multi classification using deep learning on plant pathology dataset</article-title>. <source>Plant Soil</source>, <volume>477</volume>, <fpage>1</fpage>&#x2013;<lpage>17</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s11104-022-05407-3</pub-id>
</citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname> <given-names>G.</given-names>
</name>
<name>
<surname>He</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Yang</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Xu</surname> <given-names>B.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Fine-grained image classification for crop disease based on attention mechanism</article-title>. <source>Front. Plant Sci.</source> <volume>11</volume>, <elocation-id>600854</elocation-id>. doi: <pub-id pub-id-type="doi">10.3389/fpls.2020.600854</pub-id>
</citation>
</ref>
<ref id="B45">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Yun</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Han</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Oh</surname> <given-names>S. J.</given-names>
</name>
<name>
<surname>Chun</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Choe</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Yoo</surname> <given-names>Y.</given-names>
</name>
</person-group> (<year>2019</year>). &#x201c;<article-title>Cutmix: Regularization strategy to train strong classifiers with localizable features</article-title>,&#x201d; in <source>Proceedings of the IEEE/CVF international conference on computer vision</source>, <publisher-loc>Long Beach</publisher-loc>: <publisher-name>IEEE</publisher-name>. <fpage>6023</fpage>&#x2013;<lpage>6032</lpage>.</citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname> <given-names>H.</given-names>
</name>
<name>
<surname>Cisse</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Dauphin</surname> <given-names>Y. N.</given-names>
</name>
<name>
<surname>Lopez-Paz</surname> <given-names>D.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Mixup: Beyond empirical risk minimization</article-title>. <source>arXiv. preprint. arXiv:1710.09412</source>.</citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Li</surname> <given-names>K.</given-names>
</name>
<name>
<surname>Li</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Ma</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>L.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Identification method of vegetable diseases based on transfer learning and attention mechanism</article-title>. <source>Comput. Electron. Agric.</source> <volume>193</volume>, <fpage>106703</fpage>. doi: <pub-id pub-id-type="doi">10.1016/j.compag.2022.106703</pub-id>
</citation>
</ref>
</ref-list>
</back>
</article>