<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Plant Sci.</journal-id>
<journal-title>Frontiers in Plant Science</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Plant Sci.</abbrev-journal-title>
<issn pub-type="epub">1664-462X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpls.2022.907916</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Plant Science</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Few-shot learning approach with multi-scale feature fusion and attention for plant disease recognition</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Lin</surname> <given-names>Hong</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1724850/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Tse</surname> <given-names>Rita</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Tang</surname> <given-names>Su-Kit</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1852027/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Qiang</surname> <given-names>Zhen-ping</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="corresp" rid="c002"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1671376/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Pau</surname> <given-names>Giovanni</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
<xref ref-type="aff" rid="aff5"><sup>5</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Faculty of Applied Sciences, Macao Polytechnic University</institution>, <addr-line>Macau, Macao SAR</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>Engineering Research Centre of Applied Technology on Machine Translation and Artificial Intelligence of Ministry of Education, Macao Polytechnic University</institution>, <addr-line>Macau, Macao SAR</addr-line>, <country>China</country></aff>
<aff id="aff3"><sup>3</sup><institution>College of Big Data and Intelligent Engineering, Southwest Forestry University</institution>, <addr-line>Kunming</addr-line>, <country>China</country></aff>
<aff id="aff4"><sup>4</sup><institution>Department of Computer Science and Engineering, University of Bologna</institution>, <addr-line>Bologna</addr-line>, <country>Italy</country></aff>
<aff id="aff5"><sup>5</sup><institution>Samueli Computer Science Department, University of California, Los Angeles</institution>, <addr-line>Los Angeles, CA</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Wei Qiu, Nanjing Agricultural University, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Yang Li, Shihezi University, China; Fiaz Ahmad, Bahauddin Zakariya University, Pakistan</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Giovanni Pau <email>giovanni.pau&#x00040;unibo.it</email></corresp>
<corresp id="c002">Zhen-ping Qiang <email>qzp&#x00040;swfu.edu.cn</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Sustainable and Intelligent Phytoprotection, a section of the journal Frontiers in Plant Science</p></fn></author-notes>
<pub-date pub-type="epub">
<day>16</day>
<month>09</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>13</volume>
<elocation-id>907916</elocation-id>
<history>
<date date-type="received">
<day>30</day>
<month>03</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>22</day>
<month>07</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2022 Lin, Tse, Tang, Qiang and Pau.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Lin, Tse, Tang, Qiang and Pau</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license></permissions>
<abstract>
<p>Image-based deep learning method for plant disease diagnosing is promising but relies on large-scale dataset. Currently, the shortage of data has become an obstacle to leverage deep learning methods. Few-shot learning can generalize to new categories with the supports of few samples, which is very helpful for those plant disease categories where only few samples are available. However, two challenging problems are existing in few-shot learning: (1) the feature extracted from few shots is very limited; (2) generalizing to new categories, especially to another domain is very tough. In response to the two issues, we propose a network based on the Meta-Baseline few-shot learning method, and combine cascaded multi-scale features and channel attention. The network takes advantage of multi-scale features to rich the feature representation, uses channel attention as a compensation module efficiently to learn more from the significant channels of the fused features. Meanwhile, we propose a group of training strategies from data configuration perspective to match various generalization requirements. Through extensive experiments, it is verified that the combination of multi-scale feature fusion and channel attention can alleviate the problem of limited features caused by few shots. To imitate different generalization scenarios, we set different data settings and suggest the optimal training strategies for intra-domain case and cross-domain case, respectively. The effects of important factors in few-shot learning paradigm are analyzed. With the optimal configuration, the accuracy of 1-shot task and 5-shot task achieve at 61.24% and 77.43% respectively in the task targeting to single-plant, and achieve at 82.52% and 92.83% in the task targeting to multi-plants. Our results outperform the existing related works. It demonstrates that the few-shot learning is a feasible potential solution for plant disease recognition in the future application.</p></abstract>
<kwd-group>
<kwd>few-shot learning</kwd>
<kwd>meta-learning</kwd>
<kwd>multi-scale feature fusion</kwd>
<kwd>attention</kwd>
<kwd>plant disease recognition</kwd>
<kwd>cross-domain</kwd>
<kwd>training strategy</kwd>
<kwd>sub-class classification</kwd>
</kwd-group>
<contract-num rid="cn001">12163004</contract-num>
<contract-num rid="cn002">202101BD070001-053</contract-num>
<contract-num rid="cn003">2022J0496</contract-num>
<contract-sponsor id="cn001">National Natural Science Foundation of China<named-content content-type="fundref-id">10.13039/501100001809</named-content></contract-sponsor>
<contract-sponsor id="cn002">Natural Science Foundation of Yunnan Province<named-content content-type="fundref-id">10.13039/501100005273</named-content></contract-sponsor>
<contract-sponsor id="cn003">Yunnan Provincial Department of Education<named-content content-type="fundref-id">10.13039/501100007846</named-content></contract-sponsor>
<counts>
<fig-count count="7"/>
<table-count count="9"/>
<equation-count count="7"/>
<ref-count count="60"/>
<page-count count="17"/>
<word-count count="10620"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Plant disease has always been a significant concern in agriculture since it results in reduction of crop quality and production (Campbell and Madden, <xref ref-type="bibr" rid="B6">1990</xref>; Oerke and Dehne, <xref ref-type="bibr" rid="B43">2004</xref>; Strange and Scott, <xref ref-type="bibr" rid="B51">2005</xref>). Image-based auto-diagnosing method is very accessible and economical for farmers. It is especially friendly to those farmers who are in remote areas or on a small scale. In recent years, deep learning methods are widely used in image-based recognition (Lin et al., <xref ref-type="bibr" rid="B38">2021</xref>). Many networks have achieved excellent performance when trained with relevant large-scale datasets. As we know, the performance of deep learning network relies on data. As the network gets deeper, the number of trainable parameters becomes larger and the demand for data increases. Insufficient data can easily lead to overfitting (Simonyan and Zisserman, <xref ref-type="bibr" rid="B48">2014</xref>; Dong et al., <xref ref-type="bibr" rid="B12">2021</xref>). In plant disease recognition, the existing data resources are limited. Meanwhile, creating a large-scale plant disease dataset is difficult due to: (1) the number of species and diseases are very huge; (2) disease identification and annotation requires expert involvement; (3) some diseases are too rare to collect sufficient samples. The long-tailed distribution of data is common in nature and it is difficult to be used to train a balanced model. In brief, creating large-scale dataset of plant disease is a time-consuming and exhausting work (Deng et al., <xref ref-type="bibr" rid="B9">2009</xref>; Singh et al., <xref ref-type="bibr" rid="B49">2020</xref>). Severe shortage of data has become a barrier to take advantage of deep learning methods.</p>
<p>Generally, there are three ways to alleviate the problems caused by data shortages. Data augmentation, as the most common solution, augments instances by image scaling, rotation, affine transformation, etc. Transfer learning method delivers prior knowledge from source domain to target domain and adapts to the target domain by a small amount of data. But the two solutions cannot generalize to new categories in test, which means that the classes in test must have been learned in training. In addition to these two solutions, meta-learning, an approach that mimics human learning mechanisms, has been proposed in recent years. The objective of this solution is not to learn knowledge, but to learn to learn. Different from the conventional classification methods, few-shot learning (FSL) is a kind of meta-learning method which can quickly generalize to unseen categories with the supports of few samples.</p>
<p>One branch of FSL is metric-based method (Wang et al., <xref ref-type="bibr" rid="B55">2020</xref>). The principle is that the features of samples belonging to the same category are close to each other, while the features of samples belonging to different categories are far from each other. The earliest representative work is Siamese Network, which is trained with positive or negative sample pairs (Koch et al., <xref ref-type="bibr" rid="B25">2015</xref>). Vinyals et al. (<xref ref-type="bibr" rid="B53">2016</xref>) proposed the Matching Networks, and they borrowed the concept &#x0201C;seq2seq&#x0002B;attention&#x0201D; to train an end-to-end nearest neighbor classifier. Snell et al. (<xref ref-type="bibr" rid="B50">2017</xref>) proposed Prototypical Network, which learns to match the proto center of class in semantic space through few samples. Sung et al. (<xref ref-type="bibr" rid="B52">2018</xref>) proposed Relation Network, which concatenates the feature vectors of the support samples and the query samples to discover the relationship of classes. Li et al. (<xref ref-type="bibr" rid="B29">2019</xref>) proposed CoveMNet based on the covariance presentation and covariance metric of the consistency of distribution. The network extracts the second order statistic information of each category by an embedding local covariance to measure the consistency of the query samples with the novel classes. Chen et al. (<xref ref-type="bibr" rid="B8">2020</xref>) proposed Meta-Baseline method, which achieves good performance on some FSL benchmarks. The accuracy achieves at 83.74% with <italic>5-way, 5-shot</italic> task of Tiered-ImageNet, and 90.95% with <italic>1-way, 5-shot</italic> task of Mini-ImageNet.</p>
<p>Recently, FSL has started to be used in research on plant disease identification. Arg&#x000FC;eso et al. (<xref ref-type="bibr" rid="B3">2020</xref>) used Siamese Network on the dataset PlantVillage (PV). Jadon (<xref ref-type="bibr" rid="B24">2020</xref>) proposed SSM-Net that uses the Siamese framework and combines two features from a Conv and a VGG16. Zhong et al. (<xref ref-type="bibr" rid="B60">2020</xref>) proposed a novel generative model for zero-shot and few-shot recognition of citrus aurantium L. diseases by using conditional adversarial auto-encoders. Afifi et al. (<xref ref-type="bibr" rid="B2">2021</xref>) compared Triplet network, Baseline, Baseline&#x0002B;&#x0002B;, and DAML on PV and coffee leaf datasets. The results show that the Baseline has the best performance. Li and Chao (<xref ref-type="bibr" rid="B32">2021b</xref>) proposed a semi-supervised FSL method and tested it with PV. Nuthalapati and Tunga (<xref ref-type="bibr" rid="B42">2021</xref>) introduced transformer into plant disease recognition. Chen et al. (<xref ref-type="bibr" rid="B7">2021</xref>) used meta-learning on Mini-plant-disease dataset and PV. Li and Yang (<xref ref-type="bibr" rid="B35">2021</xref>) used Matching Network and tested cross-domain performance by mixing pest data. These methods have been tried from various perspective and have made important progresses. Nevertheless, FSL still has two common challenging issues: (1) limited features extracted from few samples are less representative for a class (Wang et al., <xref ref-type="bibr" rid="B55">2020</xref>); (2) the generalization requirements are very high and various. In this work, we tackle the two issues by using multi-scale feature fusion (MSFF) and improving training strategies.</p>
<p>CNN is widely used in image-based deep learning methods. In a CNN architecture, the local features with more details and small perceptive fields are extracted from low-level layers, while the global features with rich semantic information and large perceptive fields are extracted from high-level layers (Goodfellow et al., <xref ref-type="bibr" rid="B13">2016</xref>). MSFF is the technology using multi-scale features which are extracted from different layers of CNN (Dogra et al., <xref ref-type="bibr" rid="B10">2017</xref>). In object detection and semantic segmentation, many excellent networks are proposed by using MSFF, such as Feature Pyramid Network (Lin et al., <xref ref-type="bibr" rid="B39">2017</xref>), U-net (Ronneberger et al., <xref ref-type="bibr" rid="B46">2015</xref>), Fully Convolutional Network (Long et al., <xref ref-type="bibr" rid="B41">2015</xref>) etc. MSFF is also used in image restoration, image dehazing and image super resolution etc. (Li et al., <xref ref-type="bibr" rid="B28">2018</xref>; Zhang and Patel, <xref ref-type="bibr" rid="B58">2018</xref>; Zhang et al., <xref ref-type="bibr" rid="B59">2018</xref>; Lan et al., <xref ref-type="bibr" rid="B27">2020</xref>). These methods fuse features by using dense connection, feature concatenation or weighted element-wise summation (Dong et al., <xref ref-type="bibr" rid="B11">2020</xref>). In common, the mentioned methods have encoder-decoder framework. The multi-scale features extracted from encoder are reused in decoder to enhance feature representation. However, in conventional classification task, MSFF is seldom used because the network does not have decoder. Generally, only the top semantic features are fed into classifier, but other scale features are abandoned. But in fact, the high-level features and the low-level features are not subordination relationship. The local features including rich fine-grained features can be an effective compensation to formulate a richer feature representation of sample (Lim and Kang, <xref ref-type="bibr" rid="B37">2019</xref>). In the data-limitation condition, it requires to extract as many features as possible from a limited amount of data. Therefore, in this work, we propose to leverage the MSFF to enhance feature representation. Multi-scale features can be fused in different ways. In our work, we use cascaded multi-scale feature fusion (CMSFF).</p>
<p>The channels of feature maps increase after feature fusion. But it does not mean that all channels are the same significance. The contribution of each channel is different. Some channels should be emphasized and some should be suppressed. Attention can help to focus on the meaningful channels. Attention mechanism plays important role in human perception to selectively focus on salient parts in order to capture visual structure better (Guo et al., <xref ref-type="bibr" rid="B14">2021</xref>). It has been leaded into some areas of machine learning such as computer vision, natural language processing etc. and has significance to improve performance (Hu, <xref ref-type="bibr" rid="B20">2019</xref>; Hafiz et al., <xref ref-type="bibr" rid="B16">2021</xref>). It not only tells where to focus, but also improves the representation of interests. Recently, some light-weight attention modules have been proposed. Wang et al. (<xref ref-type="bibr" rid="B54">2017</xref>) proposed Residual Attention Network that uses encoder-decoder style attention module. Hu et al. (<xref ref-type="bibr" rid="B21">2018</xref>) introduced a compact module to exploit the inter-channel relationship, which was named as Squeeze-and-excitation module. Woo et al. (<xref ref-type="bibr" rid="B56">2018</xref>) proposed Convolutional Block Attention Module that includes channel attention (CA) and spatial attention. These light-weight attention modules can be easily embedded into deep learning networks as plug-ins. In this work, we use the CA to weight the accumulated channels obtained from CMSFF. The CMSFF and CA is an effective combination to enhance the representation of category under few-shot condition.</p>
<p>As the definition of FSL, it is asked to generalize to novel categories or novel domains. Generalizing to new categories within the same domain of training is defined as intra-domain classification, while generalizing to novel domain is defined as cross-domain classification. Long-tail distribution of data is common in plant disease datasets. To identify the part of categories with few samples, the model can be trained with the part of diseases that have more samples. This generalization happens in the same domain. Cross-domain happens when a set of categories with few shots is required to be identified but does not belong to any dataset. Cross-domain adaption happens between different datasets, which is more difficult than intra-domain adaption. However, researchers found that it is frequently encountered situation and inescapable for boosting FSL to practical application. Guo et al. (<xref ref-type="bibr" rid="B15">2020</xref>) established a new broader study of cross-domain few-shot learning benchmark and pointed out that all meta-learning methods underperform in relation to simple fine-tuning methods, which indicates that the difficulty of the cross-domain issue. Adler et al. (<xref ref-type="bibr" rid="B1">2020</xref>) proposed a method of representation fusion by an ensemble of Hebbian learners acting on different layers of a deep neural network, which is from feature representation perspective. Li W.-H. et al. (<xref ref-type="bibr" rid="B30">2022</xref>) proposed a task-specific adapters for cross-domain problem from the perspective of network architecture. Qi et al. (<xref ref-type="bibr" rid="B44">2022</xref>) proposed a meta-based adversarial training framework for this problem, which is also from the perspective of network architecture. As we know, there is no research that has been done from a training strategy perspective. These efforts are the kind of general explorations of using general benchmarks (e.g., ImageNet, CIFAR etc.) and rarely discuss specific domains. In fact, different domain has its own characteristics and resources to utilize when crossing domains. Hence, in this work, we propose a set of training strategies to match various cases of generalization using the available data resources.</p>
<p>The contributions of this work are summarized as: (1) we propose a Meta-Baseline (MB) based FSL approach merging with CMSFF and CA for plant disease recognition; (2) we propose a group of training strategies to meet different generalization requirements; (3) through extensive comparative experiments and ablation experiments, we validate the superiority of our method and analyze various factors of FSL. Comparing with the existing related works under the same data conditions, our method has achieved at the best accuracy.</p></sec>
<sec sec-type="materials and methods" id="s2">
<title>2. Materials and methods</title>
<sec>
<title>2.1. Materials</title>
<p>In this research, three public datasets are used in our experiments. Mini-ImageNet is a subset of the ImageNet, which includes 100 classes and 600 images per class. We select 64 classes in our experiments. The second is PV (Hughes and Salath&#x000E9;, <xref ref-type="bibr" rid="B23">2015</xref>) released in 2015 by Pennsylvania State University. It is the most frequently used and comprehensive dataset in academic research up to now in plant disease recognition. Totally, it includes 50,403 images which crosses over 14 crop species and covers 38 classes, as shown in <xref ref-type="table" rid="T1">Table 1</xref>. Because the number of samples in PV is unbalanced, we use the data after augmentation and select 1,000 images per class to keep balance. The third is the dataset of apple foliar disease (AFD), which was published in FGVC8 Plant Pathology 2021 Competition. All images of AFD were taken in wild with complicated backgrounds, as shown in <xref ref-type="fig" rid="F1">Figure 1A</xref>. We perform pre-processing to reduce the complexity of the surroundings by removing background other than leaves. YOLO-v3 (Redmon and Farhadi, <xref ref-type="bibr" rid="B45">2018</xref>) is adopted to detect leaves in images which is shown in <xref ref-type="fig" rid="F1">Figure 1B</xref>. After segmentation and resizing, the images with a single leaf in each image are used in this work, as shown in <xref ref-type="fig" rid="F1">Figure 1C</xref>.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>The 14 species and 38 categories in PV.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Species</bold></th>
<th valign="top" align="left"><bold>Class number</bold></th>
<th valign="top" align="left"><bold>Class name</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Apple</td>
<td valign="top" align="left">4</td>
<td valign="top" align="left">Apple scab, black rot, cedar apple rust, healthy</td>
</tr>
<tr>
<td valign="top" align="left">Blueberry</td>
<td valign="top" align="left">1</td>
<td valign="top" align="left">Healthy</td>
</tr>
<tr>
<td valign="top" align="left">Cherry</td>
<td valign="top" align="left">2</td>
<td valign="top" align="left">Healthy, powdery mildew</td>
</tr>
<tr>
<td valign="top" align="left">Corn</td>
<td valign="top" align="left">4</td>
<td valign="top" align="left">Gray leaf spot, common rust, healthy, northern leaf blight</td>
</tr>
<tr>
<td valign="top" align="left">Grape</td>
<td valign="top" align="left">4</td>
<td valign="top" align="left">Black rot, black measles, healthy, leaf blight</td>
</tr>
<tr>
<td valign="top" align="left">Orange</td>
<td valign="top" align="left">1</td>
<td valign="top" align="left">Haunglongbing</td>
</tr>
<tr>
<td valign="top" align="left">Peach</td>
<td valign="top" align="left">2</td>
<td valign="top" align="left">Bacterial spot, healthy</td>
</tr>
<tr>
<td valign="top" align="left">Pepper</td>
<td valign="top" align="left">2</td>
<td valign="top" align="left">Bacterial spot, healthy</td>
</tr>
<tr>
<td valign="top" align="left">Potato</td>
<td valign="top" align="left">3</td>
<td valign="top" align="left">Early blight, healthy, late blight</td>
</tr>
<tr>
<td valign="top" align="left">Raspberry</td>
<td valign="top" align="left">1</td>
<td valign="top" align="left">Healthy</td>
</tr>
<tr>
<td valign="top" align="left">Soybean</td>
<td valign="top" align="left">1</td>
<td valign="top" align="left">Healthy</td>
</tr>
<tr>
<td valign="top" align="left">Squash</td>
<td valign="top" align="left">1</td>
<td valign="top" align="left">Powdery mildew</td>
</tr>
<tr>
<td valign="top" align="left">Strawberry</td>
<td valign="top" align="left">2</td>
<td valign="top" align="left">Healthy</td>
</tr>
<tr>
<td valign="top" align="left">Tomato</td>
<td valign="top" align="left">10</td>
<td valign="top" align="left">Bacterial spot, early blight, healthy, late blight, leaf mold, septoria leaf spot, spider mites, target, mosaic virus, yellow leaf curl virus</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p><bold>(A)</bold> The original samples of AFD. <bold>(B)</bold> The leaf detection result by YOLO-v3. <bold>(C)</bold> The samples of 10 classes after segmentation and resizing.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-907916-g0001.tif"/>
</fig>
<p>The hardware configurations are: Graphics: Tesla V100-DGXS-32GB; Video Memory: 32<italic>G</italic>&#x000D7;4; Processor: Intel(R) Xeon(R) CPU E5-2698 v4 &#x00040; 2.20GHz; Operating System: Ubuntu 18.04.6 LTS.</p></sec>
<sec>
<title>2.2. Problem formulation</title>
<p>In FSL paradigm, given two labeled sets with categories <italic>C</italic><sub><italic>train</italic></sub> and <italic>C</italic><sub><italic>novel</italic></sub>, <italic>C</italic><sub><italic>train</italic></sub> is used in training and <italic>C</italic><sub><italic>novel</italic></sub> is used in test. The two sets are exclusive, <italic>C</italic><sub><italic>train</italic></sub>&#x02229;<italic>C</italic><sub><italic>novel</italic></sub> &#x0003D; &#x02205;, which means that categories used in test are not seen during training. Data is formulated to tasks and each task <italic>T</italic> is made up of a <italic>supportsetS</italic> and a <italic>querysetQ</italic>. The sample of <italic>S</italic> is denoted by (<italic>x</italic><sub><italic>s</italic></sub>, <italic>y</italic><sub><italic>s</italic></sub>) which is a (<italic>image, label</italic>) pair and the sample of <italic>Q</italic> is denoted by (<italic>x</italic><sub><italic>q</italic></sub>, <italic>y</italic><sub><italic>q</italic></sub>). In training, the label <italic>y</italic><sub><italic>q</italic></sub> is used for calculating loss, which is supervised learning.</p>
<p>An <italic>N-way, K-shot</italic> task indicates that the <italic>S</italic> contains <italic>N</italic> categories with <italic>K</italic> samples in each category, and the <italic>Q</italic> contains the same <italic>N</italic> categories with <italic>W</italic> samples in each category. The goal is to classify the <italic>N</italic>&#x000D7;<italic>W</italic> unlabeled samples of Q into <italic>N</italic> categories. For evaluation, the average accuracy is computed from many tasks sampled from <italic>C</italic><sub><italic>novel</italic></sub>, <italic>N</italic> &#x02208;<italic>C</italic><sub><italic>novel</italic></sub>.</p></sec>
<sec>
<title>2.3. Architecture</title>
<sec>
<title>2.3.1. Meta-Baseline framework</title>
<p>Like classical classification structure, our framework contains two components: an encoder and a classifier, which is illustrated in <xref ref-type="fig" rid="F2">Figure 2A</xref>. The encoder noted as <italic>f</italic><sub>&#x003B8;</sub> is a CNN-based network merging with CMSFF and CA. It is trained in two stages: base-training and meta-learning.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p><bold>(A)</bold> The network architecture of our method. The training includes two stages: base-training stage and meta-learning stage. The CMSFF&#x0002B;CA Encoder is unfolded to CMSFF module and CA module. <bold>(B)</bold> The parallel multi-scale feature fusion and cascaded multi-scale feature fusion.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-907916-g0002.tif"/>
</fig>
<p>In base-training, the network contains <italic>f</italic><sub>&#x003B8;</sub> and base-training classifier, which is trained with image-wise data. The goal in this stage is to learn the general features as prior knowledge. Some large-scale general datasets with more classes and diverse data, such as ImageNet, Mini-ImageNet etc. are good choices for learning prior knowledge. The classifier can be linear classifier, fully connected layer, SVM, or other classifiers. The cross-entropy loss is calculated to update the parameters of <italic>f</italic><sub>&#x003B8;</sub> during back propagation. After base-training is completed, the classifier is removed and the trained model is delivered to the meta-learning stage.</p>
<p>In meta-learning, <italic>f</italic><sub>&#x003B8;</sub> is initialized by the trained model from base-training. Meta-learning is a concept of learning to learn. So, the purpose is not to learn the knowledge of the training classes, but to learn how to differentiate between classes. Aiming at the objective, the classifier in meta-learning is replaced by a distance measurement module. The classification result is decided by the distances from the support samples to the query sample. Meta-learning is a task-driven paradigm where training data is formulated as <italic>N-way, K-shot</italic> tasks. Based on a simple machine learning principle: test and training conditions must match (Vinyals et al., <xref ref-type="bibr" rid="B53">2016</xref>), the data of <italic>C</italic><sub><italic>novel</italic></sub> is also formatted into tasks in test.</p>
<p>Given an <italic>N-way, K-shot</italic> task, <italic>K</italic> samples of a category <italic>c</italic> in <italic>S</italic> are embedded into feature space by <italic>f</italic><sub>&#x003B8;</sub> and become <italic>K</italic> feature vectors. A mean vector of the <italic>K</italic> vectors are calculated as the centroid of <italic>c</italic>, which is consider as the representative of category <italic>c</italic>:</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>&#x003C9;</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munder></mml:mstyle><mml:msub><mml:mrow><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B8;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where, <italic>S</italic><sub><italic>c</italic></sub> denotes the samples of class <italic>c</italic> in <italic>S</italic>, |<italic>S</italic><sub><italic>c</italic></sub>| &#x0003D; <italic>K</italic>, <italic>x</italic><sub><italic>s</italic></sub> denotes each sample of class <italic>c</italic>. The query sample <italic>x</italic><sub><italic>q</italic></sub> in an <italic>N-way, K-shot</italic> task is also embedded by <italic>f</italic><sub>&#x003B8;</sub>. The probability that sample <italic>x</italic><sub><italic>q</italic></sub> belongs to class <italic>c</italic> is calculated as:</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mi>c</mml:mi><mml:mo stretchy='false'>&#x0007C;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>q</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>p</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003B3;</mml:mi><mml:mo>.</mml:mo><mml:mo>&#x0003C;</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mi>&#x003B8;</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>q</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>&#x003C9;</mml:mi><mml:mi>c</mml:mi></mml:msub><mml:mo>&#x0003E;</mml:mo><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mrow><mml:mstyle displaystyle='true'><mml:msub><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:msup><mml:mi>c</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup></mml:mrow></mml:msub><mml:mrow><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:mstyle><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003B3;</mml:mi><mml:mo>.</mml:mo><mml:mo>&#x0003C;</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mi>&#x003B8;</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>q</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>&#x003C9;</mml:mi><mml:mrow><mml:msup><mml:mi>c</mml:mi><mml:mo>&#x02032;</mml:mo></mml:msup></mml:mrow></mml:msub><mml:mo>&#x0003E;</mml:mo><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math></disp-formula>
<p>where, &#x0003C; ., .&#x0003E; denotes the distance of two vectors, <italic>c</italic>&#x02032; denotes all the classes in <italic>S</italic>, <inline-formula><mml:math id="M3"><mml:msub><mml:mrow><mml:mi>&#x003C9;</mml:mi></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msub></mml:math></inline-formula> denotes all the centroids of <italic>S</italic>, &#x003B3; is a learnable parameter to scale the distance. In training, we use cross-entropy loss to update the parameters of the network. The algorithm of meta-learning is shown in <xref ref-type="table" rid="T2">Table 2</xref>.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>The algorithm of meta-learning.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Algorithm of meta-learning</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><italic><bold>Input</bold></italic><bold>:</bold> data_loader,n_way,n_shot,n_query,task_per_batch</td>
</tr>
<tr>
<td valign="top" align="left"><italic><bold>Output</bold></italic><bold>:</bold> avg_acc, avg_loss</td>
</tr>
<tr>
<td valign="top" align="left"><bold>for</bold> i in epoch:</td>
</tr>
<tr>
<td valign="top" align="left">&#x000A0;&#x000A0;<italic>train</italic>:</td>
</tr>
<tr>
<td valign="top" align="left">&#x000A0;&#x000A0;<bold>for</bold> j in batch:</td>
</tr>
<tr>
<td valign="top" align="left">&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>task</italic> &#x0003D; <italic>task</italic>(<italic>data</italic>_<italic>loader, n</italic>_<italic>way, n</italic>_<italic>shot, n</italic>_<italic>query, task</italic>_<italic>per</italic>_<italic>batch</italic>)</td>
</tr>
<tr>
<td valign="top" align="left">&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>x</italic><sub>0</sub>&#x000B7;&#x000B7;&#x000B7;<italic>x</italic><sub><italic>n</italic></sub> &#x0003D; <italic>f</italic><sub>&#x003B8;</sub>(<italic>task</italic>.<italic>x</italic>_<italic>shot</italic>)</td>
</tr>
<tr>
<td valign="top" align="left">&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>x</italic> &#x0003D; <italic>mean</italic>(<italic>x</italic><sub>0</sub>&#x000B7;&#x000B7;&#x000B7;<italic>x</italic><sub><italic>n</italic></sub>)</td>
</tr>
<tr>
<td valign="top" align="left">&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>y</italic> &#x0003D; <italic>f</italic><sub>&#x003B8;</sub>(<italic>task</italic>.<italic>x</italic>_<italic>query</italic>)</td>
</tr>
<tr>
<td valign="top" align="left">&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>logits</italic> &#x0003D; <italic>classifier</italic>(<italic>distance</italic>(<italic>x, y</italic>))</td>
</tr>
<tr>
<td valign="top" align="left">&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>loss</italic> &#x0003D; <italic>cross</italic>_<italic>entropy</italic>(<italic>logits, task</italic>.<italic>label</italic>)</td>
</tr>
<tr>
<td valign="top" align="left">&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>acc</italic> &#x0003D; <italic>compute</italic>_<italic>acc</italic>(<italic>logits, task</italic>.<italic>label</italic>)</td>
</tr>
<tr>
<td valign="top" align="left">&#x000A0;&#x000A0;&#x000A0;&#x000A0;<italic>loss</italic>.<italic>backwardpropagation&#x00026;optimize</italic></td>
</tr>
<tr>
<td valign="top" align="left">&#x000A0;&#x000A0;<bold>end for</bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x000A0;&#x000A0;<italic>validation</italic>:<italic>val</italic></td>
</tr>
<tr>
<td valign="top" align="left">&#x000A0;&#x000A0;<italic>compute</italic>:<italic>avg</italic>_<italic>acc, avg</italic>_<italic>loss</italic></td>
</tr>
<tr>
<td valign="top" align="left"><bold>end for</bold></td>
</tr>
<tr>
<td valign="top" align="left"><italic><bold>return</bold></italic><bold>:</bold> avg_acc, avg_loss</td>
</tr>
</tbody>
</table>
</table-wrap></sec>
<sec>
<title>2.3.2. Distance measurement</title>
<p>After embedding, the 2D color image has been a high dimensional vector in semantic space. The distance of query sample to the class centroid is calculated by a distance metric. Distance metric uses distance function which provides a relationship metric between each element in the dataset. In many machine learning algorithms, distance metric is used to know the input data pattern in order to make any data-based decision. The most common used measures to calculate the distance between two vectors are cosine similarity, dot product and Euclidean distance.</p>
<p>Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space. It is measured by the cosine of the angle between two vectors and determines whether two vectors are pointing in roughly the same direction. It is the same as the inner product after normalization (Han et al., <xref ref-type="bibr" rid="B17">2012</xref>). In Euclidean geometry, the dot product of the Cartesian coordinates of two vectors is widely used. It is often called as inner product or projection product of Euclidean space. The length of projection represents the distance of two vectors. In mathematics, the Euclidean distance between two high-dimensional vectors is the square root of the sum of the squares of the distances in each dimension.</p></sec>
<sec>
<title>2.3.3. MSFF</title>
<p>Basically, the structure of MSFF includes two categories: parallel multi-scale feature fusion (PMSFF) and cascaded multi-scale feature fusion (CMSFF). The two fusion methods are illustrated in <xref ref-type="fig" rid="F2">Figure 2B</xref>. The PMSFF concatenates the features from different layers of CNN simultaneously. The different resolutions of feature maps are uniformed before concatenation. Comparatively, the CMSFF fuses the different resolution feature maps step by step. Taking Resnet12 as backbone network, four convolutional blocks are linked. A group of feature maps of double times of channels and half resolution is generated after each block forwarding. In the backward fusion, small size feature maps are two times up-sampled and concatenated with the feature maps of previous block. After a series of up-sampling and concatenation, all channels are fused together to be the fused full-scale feature, noted as <italic>F</italic>. The CMSFF is used in this work.</p></sec>
<sec>
<title>2.3.4. CA</title>
<p>The CA is used to exploit the inter-channel relationship of features by learning the weights of channels (Woo et al., <xref ref-type="bibr" rid="B56">2018</xref>). The structure of CA module is shown in <xref ref-type="fig" rid="F2">Figure 2B</xref>. Each channel of <italic>F</italic> is considered as a feature detector. The spatial dimension of input feature map is aggregated by pooling operation. In this module, average-pooling and max-pooling are conducted simultaneously and two spatial context descriptors: <italic>F</italic><sub><italic>avg</italic></sub> and <italic>F</italic><sub><italic>max</italic></sub>, are generated, respectively. Then they are forwarded to a shared network which is composed of multi-layer perceptron (MLP) with one hidden layer. The element-wise summation of the two outputs from MLP goes through a sigmoid. Then the channel attention map <inline-formula><mml:math id="M4"><mml:msub><mml:mrow><mml:mi>M</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>C</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x000D7;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> is produced.</p></sec></sec></sec>
<sec sec-type="results" id="s3">
<title>3. Results</title>
<p>We carried out 43 groups of comparison experiments and ablation experiments to illustrate our method, training strategies, and the effects of various factors. The details of experiments and results are illustrated and analyzed as below. The bold values listed in tables indicate the highest results for each group under the same conditions.</p>
<sec>
<title>3.1. Data settings</title>
<p>The PV is separated into three parts for training, validation, and test, respectively. According to the requirement of FSL: the testing categories are novel, the classes of the three parts do not intersect, <italic>C</italic><sub><italic>train</italic></sub>&#x02229;<italic>C</italic><sub><italic>val</italic></sub>&#x02229;<italic>C</italic><sub><italic>test</italic></sub> &#x0003D; &#x02205;. In this work, PV is split to three settings as shown in <xref ref-type="table" rid="T3">Table 3</xref>. PV-Setting-1 is with 22 classes for training, 6 classes for validation, and 10 classes covered by tomato for test. The samples are shown in <xref ref-type="fig" rid="F3">Figure 3A</xref>, which are very similar with each other. PV-Setting-2 is with 22 classes for training, six classes for validation, and 10 classes belonging to nine different species for test. The samples of this setting are shown in <xref ref-type="fig" rid="F3">Figure 3B</xref>. PV-Setting-3 exchanges the training set and testing set of PV-Setting-1 and keeps the same validation set as PV-Setting-1, using 10 classes for training and 22 classes for test. The samples are shown in <xref ref-type="fig" rid="F3">Figure 3C</xref>. The three settings represent &#x0201C;sub-class&#x0201D; task, &#x0201C;train more, test less&#x0201D; task and &#x0201C;train less, test more&#x0201D; task, respectively. In addition, 10 classes of AFD and 200 samples per class are used in this work for cross-domain testing purpose. Since all classes belong to the same super-class: apple leaf, it is also a sub-class classification task.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Three data settings of PV used in our experiments.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>ID</bold></th>
<th valign="top" align="left"><bold>Training</bold></th>
<th valign="top" align="left"><bold>Validation</bold></th>
<th valign="top" align="left"><bold>Test</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">PV-Setting-1<break/> (22-6-10)</td>
<td valign="top" align="left">(PV-1-22): apple-3,blueberry-1,cherry-2,corn-3,grape-3,orange-1,peach-2,pepper-1,potato-2,raspberry-1,soybean-1,squash-1,strawberry-1</td>
<td valign="top" align="left">Apple-1,corn-1,grape-1,pepper-1,potato-1,strawberry-1</td>
<td valign="top" align="left">(PV-1-10T): tomato-10</td>
</tr>
<tr>
<td valign="top" align="left">PV-Setting-2<break/> (22-6-10)</td>
<td valign="top" align="left">(PV-2-22): apple-2,blueberry-1,cherry-1,corn-2,grape-2,orange-1,peach-1,pepper-1,potato-1,raspberry-1,soybean-1,squash-1,strawberry-1,tomato-6</td>
<td valign="top" align="left">Apple-1,corn-1,grape-1,potato-1,tomato-2</td>
<td valign="top" align="left">(PV-2-10): apple-1,cherry-1,corn-1,grape-1,peach-1,pepper-1,potato-1,strawberry-1,tomato-2</td>
</tr>
<tr>
<td valign="top" align="left">PV-Setting-3<break/> (10-6-22)</td>
<td valign="top" align="left">(PV-3-10): apple-1,cherry-1,corn-1,grape-1,peach-1,pepper-1,potato-1,strawberry-1,tomato-2</td>
<td valign="top" align="left">Apple-1,corn-1,grape-1,potato-1,tomato-2</td>
<td valign="top" align="left">(PV-3-22): apple-2,blueberry-1,cherry-1,corn-2,grape-2,orange-1,peach-1,pepper-1,potato-1,raspberry-1,soybean-1,squash-1,strawberry-1,tomato-6</td>
</tr>
</tbody>
</table><table-wrap-foot>
<p>The total 38 classes are separated into three parts for training, validation and test, respectively. &#x0201C;Apple-1&#x0201D; means a class of apple species.</p>
</table-wrap-foot>
</table-wrap>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p><bold>(A)</bold> The testing classes of PV-Setting-1. <bold>(B)</bold> The testing classes of PV-Setting-2. <bold>(C)</bold> The testing classes of PV-Setting-3.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-907916-g0003.tif"/>
</fig></sec>
<sec>
<title>3.2. Training strategy</title>
<p>The domain of training is noted as source domain (SD), and the domain of test is noted as target domain (TD). Data from different domains can be used in the three stages: base-training, meta-learning, and test. It is special that there are two training stages of our method, and the datasets used in the two stages could be different. We just consider the domain of meta-learning stage as SD. When SD is the same as TD, it is intra-domain adaption, otherwise, it is cross-domain adaption.</p>
<p>In order to mimic different adaption situations, we design different data configurations. Five adaption configurations using Mini-ImageNet, three PV settings, and AFD are proposed. As shown in <xref ref-type="fig" rid="F4">Figure 4</xref>, <italic>S</italic>1 uses a general dataset (e.g., Mini-ImageNet) in base-training and meta-learning, then uses target dataset (e.g., PV) in test, which is the adaptation from one domain to another, denoted in Formula 3. <italic>S</italic>2 uses a general dataset in base-training, target dataset in meta-learning and test, which is denoted in Formula 4. <italic>S</italic>3 uses target dataset in three stages, which is denoted in Formula 5. <italic>S</italic>4 uses general dataset in base-training, similar-target dataset (e.g., PV) in meta-learning, and target dataset (e.g., AFD) in test, which is denoted in Formula 6. When AFD is used in test, PV is considered as a similar domain as the target domain, because they are both associated with leaf diseases of the plants. <italic>S</italic>5 uses the similar-target dataset in base-training and meta-learning, and target domain dataset in test, which is denoted in Formula 7. <italic>S</italic>1, <italic>S</italic>4, <italic>S</italic>5 are cross-domain, and <italic>S</italic>2, <italic>S</italic>3 are intra-domain.</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M5"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>S</mml:mi><mml:mn>1</mml:mn><mml:mo>:</mml:mo><mml:mi>G</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>G</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>T</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E4"><label>(4)</label><mml:math id="M6"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>S</mml:mi><mml:mn>2</mml:mn><mml:mo>:</mml:mo><mml:mi>G</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>T</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>T</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E5"><label>(5)</label><mml:math id="M7"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>S</mml:mi><mml:mn>3</mml:mn><mml:mo>:</mml:mo><mml:mi>T</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>T</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>T</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E6"><label>(6)</label><mml:math id="M8"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>S</mml:mi><mml:mn>4</mml:mn><mml:mo>:</mml:mo><mml:mi>G</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>S</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>T</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E7"><label>(7)</label><mml:math id="M9"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>S</mml:mi><mml:mn>5</mml:mn><mml:mo>:</mml:mo><mml:mi>S</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>S</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>T</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>The data formats used in base-training, meta-learning, and test. The five training strategies.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-907916-g0004.tif"/>
</fig>
<p>where, <italic>G</italic> denotes the general domain, T denotes the target domain, <italic>S</italic> denotes the similar-target domain.</p>
<p>As shown in <xref ref-type="table" rid="T4">Table 4</xref>, e1, e2, and e3 are conducted with Mini-ImageNet and PV-Setting-1 by using <italic>S</italic>1, <italic>S</italic>2, <italic>S</italic>3. e4, e5, e6 are conducted with Mini-ImageNet and PV-Setting-2 by using <italic>S</italic>1, <italic>S</italic>2, <italic>S</italic>3. e7, e8, e9 are conducted with Mini-ImageNet and PV-Setting-3 by using <italic>S</italic>1, <italic>S</italic>2, <italic>S</italic>3. e10, e11, and e12 are conducted with Mini-ImageNet, PV-Setting-2, and AFD by using <italic>S</italic>1, <italic>S</italic>4, <italic>S</italic>5. For the 12 experiments, the training epoch is 100, and the learning rate is 0.1 and decayed to 0.01 after 90 epochs in base-training. In meta-learning, the training epoch is 50, and the learning rate is 0.001. The validation task is 5-<italic>way</italic>, 1-<italic>shot</italic>, 15-<italic>query</italic>. The backbone network is Resnet12. The distance metric is cosine similarity.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>The group of experiments with different training strategies and different data settings.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>ID</bold></th>
<th valign="top" align="left"><bold>Method</bold></th>
<th valign="top" align="left"><bold>TS</bold></th>
<th valign="top" align="left"><bold>Base-training</bold></th>
<th valign="top" align="left"><bold>Meta-learning</bold></th>
<th valign="top" align="left"><bold>Test</bold></th>
<th valign="top" align="center"><bold>1-shot</bold></th>
<th valign="top" align="center"><bold>5-shot</bold></th>
<th valign="top" align="center"><bold>10-shot</bold></th>
<th valign="top" align="center"><bold>20-shot</bold></th>
<th valign="top" align="center"><bold>30-shot</bold></th>
<th valign="top" align="center"><bold>40-shot</bold></th>
<th valign="top" align="center"><bold>50-shot</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><bold>PV-Setting-1</bold></td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">e1</td>
<td valign="top" align="left">MB</td>
<td valign="top" align="left">S1</td>
<td valign="top" align="left">Mini</td>
<td valign="top" align="left">Mini</td>
<td valign="top" align="left">PV-1-10T</td>
<td valign="top" align="center">41.08</td>
<td valign="top" align="center">60.59</td>
<td valign="top" align="center">66.27</td>
<td valign="top" align="center">69.87</td>
<td valign="top" align="center">71.26</td>
<td valign="top" align="center">71.86</td>
<td valign="top" align="center">72.30</td>
</tr>
<tr>
<td valign="top" align="left">e2</td>
<td valign="top" align="left">MB</td>
<td valign="top" align="left">S2</td>
<td valign="top" align="left">Mini</td>
<td valign="top" align="left">PV-1-22</td>
<td valign="top" align="left">PV-1-10T</td>
<td valign="top" align="center">56.07</td>
<td valign="top" align="center">72.90</td>
<td valign="top" align="center">76.62</td>
<td valign="top" align="center">78.87</td>
<td valign="top" align="center">79.74</td>
<td valign="top" align="center">79.81</td>
<td valign="top" align="center">80.11</td>
</tr>
<tr>
<td valign="top" align="left">e3</td>
<td valign="top" align="left">MB</td>
<td valign="top" align="left">S3</td>
<td valign="top" align="left">PV-1-22</td>
<td valign="top" align="left">PV-1-22</td>
<td valign="top" align="left">PV-1-10T</td>
<td valign="top" align="center"><bold>57.85</bold></td>
<td valign="top" align="center"><bold>75.04</bold></td>
<td valign="top" align="center"><bold>79.08</bold></td>
<td valign="top" align="center"><bold>81.51</bold></td>
<td valign="top" align="center"><bold>82.47</bold></td>
<td valign="top" align="center"><bold>82.83</bold></td>
<td valign="top" align="center"><bold>83.08</bold></td>
</tr>
<tr>
<td valign="top" align="left"><bold>PV-Setting-2</bold></td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">e4</td>
<td valign="top" align="left">MB</td>
<td valign="top" align="left">S1</td>
<td valign="top" align="left">Mini</td>
<td valign="top" align="left">Mini</td>
<td valign="top" align="left">PV-2-10</td>
<td valign="top" align="center">60.23</td>
<td valign="top" align="center">83.08</td>
<td valign="top" align="center">87.02</td>
<td valign="top" align="center">88.97</td>
<td valign="top" align="center">89.61</td>
<td valign="top" align="center">89.76</td>
<td valign="top" align="center">90.12</td>
</tr>
<tr>
<td valign="top" align="left">e5</td>
<td valign="top" align="left">MB</td>
<td valign="top" align="left">S2</td>
<td valign="top" align="left">Mini</td>
<td valign="top" align="left">PV-2-22</td>
<td valign="top" align="left">PV-2-10</td>
<td valign="top" align="center">80.88</td>
<td valign="top" align="center"><bold>91.75</bold></td>
<td valign="top" align="center"><bold>93.44</bold></td>
<td valign="top" align="center"><bold>94.27</bold></td>
<td valign="top" align="center"><bold>94.53</bold></td>
<td valign="top" align="center"><bold>94.70</bold></td>
<td valign="top" align="center"><bold>94.84</bold></td>
</tr>
<tr>
<td valign="top" align="left">e6</td>
<td valign="top" align="left">MB</td>
<td valign="top" align="left">S3</td>
<td valign="top" align="left">PV-2-22</td>
<td valign="top" align="left">PV-2-22</td>
<td valign="top" align="left">PV-2-10</td>
<td valign="top" align="center"><bold>81.05</bold></td>
<td valign="top" align="center">91.47</td>
<td valign="top" align="center">93.14</td>
<td valign="top" align="center">94.00</td>
<td valign="top" align="center">94.29</td>
<td valign="top" align="center">94.41</td>
<td valign="top" align="center">94.53</td>
</tr>
<tr>
<td valign="top" align="left"><bold>PV-Setting-3</bold></td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">e7</td>
<td valign="top" align="left">MB</td>
<td valign="top" align="left">S1</td>
<td valign="top" align="left">Mini</td>
<td valign="top" align="left">Mini</td>
<td valign="top" align="left">PV-3-22</td>
<td valign="top" align="center">65.46</td>
<td valign="top" align="center">85.37</td>
<td valign="top" align="center">88.81</td>
<td valign="top" align="center">90.54</td>
<td valign="top" align="center">91.09</td>
<td valign="top" align="center">91.33</td>
<td valign="top" align="center">91.45</td>
</tr>
<tr>
<td valign="top" align="left">e8</td>
<td valign="top" align="left">MB</td>
<td valign="top" align="left">S2</td>
<td valign="top" align="left">Mini</td>
<td valign="top" align="left">PV-3-10</td>
<td valign="top" align="left">PV-3-22</td>
<td valign="top" align="center"><bold>78.74</bold></td>
<td valign="top" align="center"><bold>88.96</bold></td>
<td valign="top" align="center"><bold>90.58</bold></td>
<td valign="top" align="center"><bold>91.52</bold></td>
<td valign="top" align="center"><bold>91.97</bold></td>
<td valign="top" align="center"><bold>92.05</bold></td>
<td valign="top" align="center"><bold>92.17</bold></td>
</tr>
<tr>
<td valign="top" align="left">e9</td>
<td valign="top" align="left">MB</td>
<td valign="top" align="left">S3</td>
<td valign="top" align="left">PV-3-10</td>
<td valign="top" align="left">PV-3-10</td>
<td valign="top" align="left">PV-3-22</td>
<td valign="top" align="center">74.58</td>
<td valign="top" align="center">84.77</td>
<td valign="top" align="center">86.82</td>
<td valign="top" align="center">87.82</td>
<td valign="top" align="center">88.29</td>
<td valign="top" align="center">88.43</td>
<td valign="top" align="center">88.57</td>
</tr>
<tr>
<td valign="top" align="left" colspan="4"><italic><bold>AFD</bold></italic></td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">e10</td>
<td valign="top" align="left">MB</td>
<td valign="top" align="left">S1</td>
<td valign="top" align="left">Mini</td>
<td valign="top" align="left">Minit</td>
<td valign="top" align="left">AFD-10</td>
<td valign="top" align="center">28.26</td>
<td valign="top" align="center">39.12</td>
<td valign="top" align="center">44.20</td>
<td valign="top" align="center">47.83</td>
<td valign="top" align="center">49.02</td>
<td valign="top" align="center">50.31</td>
<td valign="top" align="center">51.32</td>
</tr>
<tr>
<td valign="top" align="left">e11</td>
<td valign="top" align="left">MB</td>
<td valign="top" align="left">S4</td>
<td valign="top" align="left">Mini</td>
<td valign="top" align="left">PV-2-22</td>
<td valign="top" align="left">AFD-10</td>
<td valign="top" align="center"><bold>38.41</bold></td>
<td valign="top" align="center"><bold>51.71</bold></td>
<td valign="top" align="center"><bold>55.58</bold></td>
<td valign="top" align="center"><bold>58.08</bold></td>
<td valign="top" align="center"><bold>58.84</bold></td>
<td valign="top" align="center"><bold>59.70</bold></td>
<td valign="top" align="center"><bold>60.09</bold></td>
</tr>
<tr>
<td valign="top" align="left">e12</td>
<td valign="top" align="left">MB</td>
<td valign="top" align="left">S5</td>
<td valign="top" align="left">PV-2-22</td>
<td valign="top" align="left">PV-2-22</td>
<td valign="top" align="left">AFD-10</td>
<td valign="top" align="center">36.19</td>
<td valign="top" align="center">49,16</td>
<td valign="top" align="center">54.05</td>
<td valign="top" align="center">57.13</td>
<td valign="top" align="center">58.47</td>
<td valign="top" align="center">59.25</td>
<td valign="top" align="center">59.46</td>
</tr>
</tbody>
</table><table-wrap-foot>
<p>(Task in meta-learning: 5-way, 1-shot, 15-query; backbone network: Resnet12; batchsize: 128; Lr: 0.1 in base-training, 0.001 in meta-learning; distance metric: cosine similarity; Mini, Mini-ImageNet; TS, training strategy).</p>
</table-wrap-foot>
</table-wrap>
<sec>
<title>3.2.1. Intra-domain</title>
<p>According to the definitions of SD and TD, e2, e3, e5, e6, e8, e9 are intra-domain experiments, because the data used in meta-learning and test is from the same dataset. The results are shown in <xref ref-type="table" rid="T4">Table 4</xref> and <xref ref-type="fig" rid="F5">Figure 5A</xref>. In PV-Split-2, the accuracy of e5 is better than e4 and e6. In PV-Split-3, the accuracy of e8 is better than e7 and e9. What the two settings have in common is that the disease classes belong to different plants. To the diverse species cases, <italic>S</italic>2 is better than <italic>S</italic>1 and <italic>S</italic>3. Especially when the number of species is bigger, the superiority of <italic>S</italic>2 is more obvious. As listed, e6 gets close to e5, but e8 is much better than e9, which means that the general dataset is better supported when the testing data is more diverse. A broad prior knowledge is very useful for adapting to diverse target. However, in PV-Split-1, e3 is the best one by using <italic>S</italic>3 because the testing data belongs to the same plant. So, the features of testing data are intensive and the general date in base-training is not helpful. Oppositely, the data belonging to the same dataset is easier for adaption. In short, to the intra-domain cases, if the testing classes are of super-classes, <italic>S</italic>2 is the best strategy. If the testing classes are sub-classes, <italic>S</italic>3 is the best strategy.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p><bold>(A)</bold> Intra-domain experiments with three data settings. <bold>(B)</bold> Cross-domain experiments with AFD. <bold>(C)</bold> The accuracy decreases as Way increases. <bold>(D)</bold> Distance metrics.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-907916-g0005.tif"/>
</fig></sec>
<sec>
<title>3.2.2. Cross-domain</title>
<p>Experiments e1, e4, e7, e10, e11, e12 are cross-domain cases. e1, e4, e7, e10 are the experiments with the worst results in their respective data settings by using <italic>S</italic>1, due to the big gap between the general domain and target domain.</p>
<p>Comparing e10, e11 and e12, e11 has the highest accuracy by using <italic>S</italic>4, which are shown in <xref ref-type="table" rid="T4">Table 4</xref> and <xref ref-type="fig" rid="F5">Figure 5B</xref>. e12 is not as good as e11 because too intensive features extracted from monotonous samples leads to weaker adaptation. <italic>S</italic>4 is the best training strategy for cross-domain cases, which uses general dataset in base-training to learn the prior knowledge in a wide range, and uses similar-target dataset in meta-learning for adapting to new domain smoothly.</p></sec></sec>
<sec>
<title>3.3. CMSFF and CA</title>
<p>Ablation experiments e13&#x02013;e22 are conducted to show the positive effects of CMSFF module and CA module, respectively. The results are listed in <xref ref-type="table" rid="T5">Table 5</xref>. Under four data configurations: PV-Setting-1, PV-Setting-2, PV-Setting-3, and AFD, we execute 8 experiments. The training settings are listed: Mini-ImageNet is used in base-training; backbone network is Resnet12; distance metric is cosine similarity; training strategy is <italic>S</italic>2 and <italic>S</italic>4. Taking e2, e5, e8, e11 as the baseline, the CMSFF module is added and the results of e13, e15, e19, e21 show the improvement of CMSFF. e14, e18, e20, and e22 indicate that CA has further improved the performances on the basis of CMSFF. e15 and e17 are used to compare the PMSFF module with the CMSFF module, and the results show that CMSFF outperforms PMSFF.</p>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption><p>The ablation experiment results of MB, MB&#x0002B;CMSFF, and MB&#x0002B;CMSFF&#x0002B;CA.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>ID</bold></th>
<th valign="top" align="left"><bold>Method</bold></th>
<th valign="top" align="left"><bold>TS</bold></th>
<th valign="top" align="center"><bold>1-shot</bold></th>
<th valign="top" align="center"><bold>5-shot</bold></th>
<th valign="top" align="center"><bold>10-shot</bold></th>
<th valign="top" align="center"><bold>20-shot</bold></th>
<th valign="top" align="center"><bold>30-shot</bold></th>
<th valign="top" align="center"><bold>40-shot</bold></th>
<th valign="top" align="center"><bold>50-shot</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="10"><bold>PV-Setting-1</bold></td>
</tr>
<tr>
<td valign="top" align="left">e2</td>
<td valign="top" align="left">MB</td>
<td valign="top" align="left">S2</td>
<td valign="top" align="center">56.07</td>
<td valign="top" align="center">72.90</td>
<td valign="top" align="center">76.62</td>
<td valign="top" align="center">78.87</td>
<td valign="top" align="center">79.74</td>
<td valign="top" align="center">79.81</td>
<td valign="top" align="center">80.11</td>
</tr>
<tr>
<td valign="top" align="left">e13</td>
<td valign="top" align="left">MB&#x0002B;CMSFF</td>
<td valign="top" align="left">S2</td>
<td valign="top" align="center">61.20</td>
<td valign="top" align="center">77.09</td>
<td valign="top" align="center">80.92</td>
<td valign="top" align="center">83.03</td>
<td valign="top" align="center">84.05</td>
<td valign="top" align="center">84.34</td>
<td valign="top" align="center">84.56</td>
</tr>
<tr>
<td valign="top" align="left">e14</td>
<td valign="top" align="left">MB&#x0002B;CMSFF&#x0002B;CA</td>
<td valign="top" align="left">S2</td>
<td valign="top" align="center"><bold>61.24</bold></td>
<td valign="top" align="center"><bold>77.43</bold></td>
<td valign="top" align="center"><bold>81.28</bold></td>
<td valign="top" align="center"><bold>83.59</bold></td>
<td valign="top" align="center"><bold>84.46</bold></td>
<td valign="top" align="center"><bold>84.70</bold></td>
<td valign="top" align="center"><bold>84.86</bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="10"><bold>PV-Setting-2</bold></td>
</tr>
<tr>
<td valign="top" align="left">e5</td>
<td valign="top" align="left">MB</td>
<td valign="top" align="left">S2</td>
<td valign="top" align="center">81.05</td>
<td valign="top" align="center">91.47</td>
<td valign="top" align="center">93.14</td>
<td valign="top" align="center">94.00</td>
<td valign="top" align="center">94.29</td>
<td valign="top" align="center">94.41</td>
<td valign="top" align="center">94.53</td>
</tr>
<tr>
<td valign="top" align="left">e15</td>
<td valign="top" align="left">MB&#x0002B;PMSFF</td>
<td valign="top" align="left">S2</td>
<td valign="top" align="center">81.46</td>
<td valign="top" align="center">91.86</td>
<td valign="top" align="center">93.51</td>
<td valign="top" align="center">94.57</td>
<td valign="top" align="center">94.81</td>
<td valign="top" align="center">94.88</td>
<td valign="top" align="center">95.03</td>
</tr>
<tr>
<td valign="top" align="left">e16</td>
<td valign="top" align="left">MB&#x0002B;CMSFF</td>
<td valign="top" align="left">S2</td>
<td valign="top" align="center">82.21</td>
<td valign="top" align="center">92.32</td>
<td valign="top" align="center">93.87</td>
<td valign="top" align="center">94.71</td>
<td valign="top" align="center">95.03</td>
<td valign="top" align="center">95.15</td>
<td valign="top" align="center">95.31</td>
</tr>
<tr>
<td valign="top" align="left">e17</td>
<td valign="top" align="left">MB&#x0002B;PMSFF&#x0002B;CA</td>
<td valign="top" align="left">S2</td>
<td valign="top" align="center">81.87</td>
<td valign="top" align="center">92.39</td>
<td valign="top" align="center">93.93</td>
<td valign="top" align="center">94.86</td>
<td valign="top" align="center">95.29</td>
<td valign="top" align="center">95.31</td>
<td valign="top" align="center">95.50</td>
</tr>
<tr>
<td valign="top" align="left">e18</td>
<td valign="top" align="left">MB&#x0002B;CMSFF&#x0002B;CA</td>
<td valign="top" align="left">S2</td>
<td valign="top" align="center"><bold>82.52</bold></td>
<td valign="top" align="center"><bold>92.83</bold></td>
<td valign="top" align="center"><bold>94.39</bold></td>
<td valign="top" align="center"><bold>95.29</bold></td>
<td valign="top" align="center"><bold>95.65</bold></td>
<td valign="top" align="center"><bold>95.73</bold></td>
<td valign="top" align="center"><bold>95.74</bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="10"><bold>PV-Setting-3</bold></td>
</tr>
<tr>
<td valign="top" align="left">e8</td>
<td valign="top" align="left">MB</td>
<td valign="top" align="left">S2</td>
<td valign="top" align="center">74.58</td>
<td valign="top" align="center">84.77</td>
<td valign="top" align="center">86.82</td>
<td valign="top" align="center">87.82</td>
<td valign="top" align="center">88.29</td>
<td valign="top" align="center">88.43</td>
<td valign="top" align="center">88.57</td>
</tr>
<tr>
<td valign="top" align="left">e19</td>
<td valign="top" align="left">MB&#x0002B;CMSFF</td>
<td valign="top" align="left">S2</td>
<td valign="top" align="center">76.61</td>
<td valign="top" align="center">88.45</td>
<td valign="top" align="center">90.17</td>
<td valign="top" align="center">91.32</td>
<td valign="top" align="center">91.78</td>
<td valign="top" align="center">91.86</td>
<td valign="top" align="center">92.14</td>
</tr>
<tr>
<td valign="top" align="left">e20</td>
<td valign="top" align="left">MB&#x0002B;CMSFF&#x0002B;CA</td>
<td valign="top" align="left">S2</td>
<td valign="top" align="center"><bold>78.15</bold></td>
<td valign="top" align="center"><bold>89.57</bold></td>
<td valign="top" align="center"><bold>91.24</bold></td>
<td valign="top" align="center"><bold>92.46</bold></td>
<td valign="top" align="center"><bold>92.67</bold></td>
<td valign="top" align="center"><bold>93.02</bold></td>
<td valign="top" align="center"><bold>93.07</bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="10"><bold>AFD</bold></td>
</tr>
<tr>
<td valign="top" align="left">e11</td>
<td valign="top" align="left">MB</td>
<td valign="top" align="left">S4</td>
<td valign="top" align="center">38.41</td>
<td valign="top" align="center">51.71</td>
<td valign="top" align="center">55.58</td>
<td valign="top" align="center">58.08</td>
<td valign="top" align="center">58.84</td>
<td valign="top" align="center">59.70</td>
<td valign="top" align="center">60.09</td>
</tr>
<tr>
<td valign="top" align="left">e21</td>
<td valign="top" align="left">MB&#x0002B;CMSFF</td>
<td valign="top" align="left">S4</td>
<td valign="top" align="center">40.77</td>
<td valign="top" align="center">54.14</td>
<td valign="top" align="center">57.68</td>
<td valign="top" align="center">60.13</td>
<td valign="top" align="center">61.30</td>
<td valign="top" align="center">62.03</td>
<td valign="top" align="center">62.69</td>
</tr>
<tr>
<td valign="top" align="left">e22</td>
<td valign="top" align="left">MB&#x0002B;CMSFF&#x0002B;CA</td>
<td valign="top" align="left">S4</td>
<td valign="top" align="center"><bold>43.94</bold></td>
<td valign="top" align="center"><bold>56.93</bold></td>
<td valign="top" align="center"><bold>60.64</bold></td>
<td valign="top" align="center"><bold>63.66</bold></td>
<td valign="top" align="center"><bold>64.50</bold></td>
<td valign="top" align="center"><bold>65.55</bold></td>
<td valign="top" align="center"><bold>66.18</bold></td>
</tr>
</tbody>
</table><table-wrap-foot>
<p>(Base-training: Mini-ImageNet; backbone network: Resnet12; distance metric: cosine similarity; TS, training strategy).</p>
</table-wrap-foot>
</table-wrap></sec>
<sec>
<title>3.4. Sub-class classification</title>
<p>Sub-class is defined as the classes belong to the same entry class. The PV-Setting-1 and AFD are sub-class classification examples. Sub-class classification is also named as fine-grained vision categorization which aims to distinguish subordinate categories within entry level categories. Because the samples belonging to the same super-class are similar with each other, sub-class classification is a challenging problem.</p>
<p>In <xref ref-type="table" rid="T4">Table 4</xref>, the PV-setting-1 is the lowest accuracy group among the three PV-settings, as the samples all belong to tomato and are indistinguishable. The results of AFD group are worse than PV-Setting-1, which is not only because of Sub-class reason, also due to cross-domain and in-wild setting of images. Even if the images of AFD are already pre-processed, the backgrounds of images are still different from PV. Also, the illumination condition, resolution, photography devices are all different. Intuitively, the gap of features from SD to TD causes the accuracy declining.</p></sec>
<sec>
<title>3.5. Way and shot</title>
<p><italic>N-way</italic> and <italic>K-shot</italic> are the configurations of the task that indicate the difficulty of the task. Given a fixed <italic>K</italic>, the accuracy decreases as <italic>N</italic> increases. The result of PV-split-1 with <italic>N-way, 10-shot</italic> is shown in <xref ref-type="fig" rid="F5">Figure 5C</xref>. The accuracy drops down from 85.39% to 64.35% as <italic>N-way</italic> increases from 3 to 10.</p>
<p>All experimental results listed in <xref ref-type="table" rid="T4">Table 4</xref> are executed with fixed <italic>5-way</italic>, which indicates that regardless of the data configurations, all experiments follow the common trend: accuracy increases with the number of shots. The accuracy sharply increases as the <italic>Shot</italic> increases from <italic>1-shot</italic> to <italic>5-shot</italic>, and tends to be stable when the <italic>Shot</italic> is larger than 10. After the <italic>shot</italic> is larger than 20, the growth is not significant. From <italic>1-shot</italic> to <italic>50-shot</italic>, the increase of accuracy ranges from at least 10% to a maximum of 32%.</p>
<p>The results show that the accuracy increases with the number of <italic>shot</italic> and decreases with the number of <italic>way</italic>. More <italic>ways</italic> means higher complexity, and more <italic>shots</italic> means more supporting information. In existing researches, the <italic>N</italic>&#x02212;<italic>way</italic> is set to 5 generally. In application scenarios, the <italic>N</italic> is determined by the number of target categories and should not be limited to 5. For example, a plant may have more than five diseases, then the <italic>ways</italic> should the same as the number of diseases that may occur in the specific scenario. <italic>N-way</italic> and <italic>K-shot</italic> are a pair with trade-off relationship. When expanding novel classes, we can increase the number of shots as compensation to maintain accuracy. For a new class to be identified, it is acceptable to collect 10 to 50 samples as its support set. However, the positive relationship of shots and accuracy is not linear. The increase of accuracy as <italic>K-shot</italic> has ceiling. When the <italic>K</italic> is larger than 30, the accuracy is still growing but very slowly.</p></sec>
<sec>
<title>3.6. The diversity of meta-learning data</title>
<p>The number of classes in meta-learning is noted as <italic>N</italic><sub><italic>train</italic></sub>, and noted as <italic>N</italic><sub><italic>test</italic></sub> in test. Comparing e5 with e8, they are both trained with Mini-ImageNet in base-training. e5 uses 28 classes in meta-learning and 10 classes in test, which is the case <italic>N</italic><sub><italic>train</italic></sub>&#x0003E;<italic>N</italic><sub><italic>test</italic></sub>. The training set and testing set of e5 are exchanged in e8, which is the case <italic>N</italic><sub><italic>train</italic></sub>&#x0003C;<italic>N</italic><sub><italic>test</italic></sub>.</p>
<p>The training tasks and testing tasks are all formulated as <italic>5-way</italic>, which means that five classes are sampled in each task. The <italic>N</italic>-<italic>way</italic> of task is the same in e5 and e8. However, the accuracy of e5 is at least 2% higher than e8. It indicates that the size of data used in meta-learning is a factor effects the performance. Using more classes in meta-learning leads to positive results, providing more diverse features and improving the robustness of the model.</p></sec>
<sec>
<title>3.7. Distance metric</title>
<p>In this work, we compared three distance metrics: dot product, cosine similarity, and Euclidean distance. The same distance measurement module is used in meta-learning and test. This is because even if there is no parameter to be trained in this module, the losses calculated from the distance measurement still affect the parameter updates in the iterations.</p>
<p>An appropriate distance metric significantly helps in improving the performance of classification, clustering process etc. Cosine similarity hits the best performance, as shown in <xref ref-type="table" rid="T6">Table 6</xref> and in <xref ref-type="fig" rid="F5">Figure 5D</xref>. The reason is that the vectors obtained from encoder are high dimensional vectors. The cosine similarity has often been used to counteract the problem of Euclidean distance in high dimensional space. The normalization in cosine similarity also has positive effect.</p>
<table-wrap position="float" id="T6">
<label>Table 6</label>
<caption><p>The results of different distance metrics.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>ID</bold></th>
<th valign="top" align="left"><bold>Metric</bold></th>
<th valign="top" align="center"><bold>1-shot</bold></th>
<th valign="top" align="center"><bold>5-shot</bold></th>
<th valign="top" align="center"><bold>10-shot</bold></th>
<th valign="top" align="center"><bold>20-shot</bold></th>
<th valign="top" align="center"><bold>30-shot</bold></th>
<th valign="top" align="center"><bold>40-shot</bold></th>
<th valign="top" align="center"><bold>50-shot</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">e23</td>
<td valign="top" align="left">Dot product</td>
<td valign="top" align="center">77.58</td>
<td valign="top" align="center">86.2</td>
<td valign="top" align="center">87.52</td>
<td valign="top" align="center">88.05</td>
<td valign="top" align="center">88.55</td>
<td valign="top" align="center">88.65</td>
<td valign="top" align="center">88.88</td>
</tr>
<tr>
<td valign="top" align="left">e5</td>
<td valign="top" align="left">Cosine similarity</td>
<td valign="top" align="center"><bold>80.88</bold></td>
<td valign="top" align="center"><bold>91.75</bold></td>
<td valign="top" align="center"><bold>93.44</bold></td>
<td valign="top" align="center"><bold>94.27</bold></td>
<td valign="top" align="center"><bold>94.53</bold></td>
<td valign="top" align="center"><bold>94.70</bold></td>
<td valign="top" align="center"><bold>94.84</bold></td>
</tr>
<tr>
<td valign="top" align="left">e24</td>
<td valign="top" align="left">Euclidean distance</td>
<td valign="top" align="center">75.96</td>
<td valign="top" align="center">89.17</td>
<td valign="top" align="center">91.52</td>
<td valign="top" align="center">92.64</td>
<td valign="top" align="center">93.17</td>
<td valign="top" align="center">93.23</td>
<td valign="top" align="center">93.42</td>
</tr>
</tbody>
</table><table-wrap-foot>
<p>(Method: MB; backbone network: Resnet12; batchsize: 128; Lr: 0.1 in base-training, 0.001 in meta-learning).</p>
</table-wrap-foot>
</table-wrap></sec>
<sec>
<title>3.8. Backbone networks</title>
<p>In this work, we compared different backbone networks: Convnet4 (Snell et al., <xref ref-type="bibr" rid="B50">2017</xref>), AlexNet (Krizhevsky et al., <xref ref-type="bibr" rid="B26">2012</xref>), Resnet12, Resnet18, Resnet50, Resnet101 (He et al., <xref ref-type="bibr" rid="B19">2016</xref>), DenseNet (Huang et al., <xref ref-type="bibr" rid="B22">2017</xref>), MobileNet-V2 (Sandler et al., <xref ref-type="bibr" rid="B47">2018</xref>). The Convnet4 is the classical architecture used in FSL which stacks four blocks of convolutional calculation. Different networks include different sizes of trainable parameters. The trainable parameters are more in base-training than in meta-learning because the base-training classifier is removed in meta-learning. The size of trainable parameters, learning rate (Lr), training time, and epochs in the two training stages are listed in <xref ref-type="table" rid="T7">Table 7</xref>. e25&#x02013;e31 are conducted with the configuration: Mini-ImageNet is used in base-training (100 epochs) and PV-2-22 is used in meta-learning. The different number of iterations is due to the different convergence speed in meta-learning. The performances of the backbone networks are listed in <xref ref-type="table" rid="T8">Table 8</xref>. Resnet12 and Resnet50 outperform the other networks, with Resnet12 being more efficient.</p>
<table-wrap position="float" id="T7">
<label>Table 7</label>
<caption><p>The experiment efficiencies of different backbone networks.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th/>
<th valign="top" align="center" style="border-bottom: thin solid #000000;" colspan="4"><bold>Base-training</bold></th>
<th/>
<th valign="top" align="center" style="border-bottom: thin solid #000000;" colspan="4"><bold>Meta-learning</bold></th>
</tr>
<tr>
<th valign="top" align="left"><bold>ID</bold></th>
<th valign="top" align="left"><bold>Backbone network</bold></th>
<th valign="top" align="center"><bold>Size</bold></th>
<th valign="top" align="center"><bold>Lr</bold></th>
<th valign="top" align="center"><bold>Training time</bold></th>
<th valign="top" align="center"><bold>Epoch</bold></th>
<th/>
<th valign="top" align="center"><bold>Size</bold></th>
<th valign="top" align="center"><bold>Lr</bold></th>
<th valign="top" align="center"><bold>Training time</bold></th>
<th valign="top" align="center"><bold>Epoch</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">e25</td>
<td valign="top" align="left">Convnet4</td>
<td valign="top" align="center">215.6 K</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">40 m</td>
<td valign="top" align="center">100</td>
<td/>
<td valign="top" align="center">113.1 K</td>
<td valign="top" align="center">0.001</td>
<td valign="top" align="center">31 m</td>
<td valign="top" align="center">50</td>
</tr>
<tr>
<td valign="top" align="left">e26</td>
<td valign="top" align="left">AlexNet</td>
<td valign="top" align="center">3.8 M</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">40 m</td>
<td valign="top" align="center">100</td>
<td/>
<td valign="top" align="center">3.7 M</td>
<td valign="top" align="center">0.001</td>
<td valign="top" align="center">17 m</td>
<td valign="top" align="center">50</td>
</tr>
<tr>
<td valign="top" align="left">e5</td>
<td valign="top" align="left">Resnet12</td>
<td valign="top" align="center">8.0 M</td>
<td valign="top" align="center">0.1</td>
<td valign="top" align="center">1.2 h</td>
<td valign="top" align="center">100</td>
<td/>
<td valign="top" align="center">8.0 M</td>
<td valign="top" align="center">0.001</td>
<td valign="top" align="center">18 m</td>
<td valign="top" align="center">20</td>
</tr>
<tr>
<td valign="top" align="left">e27</td>
<td valign="top" align="left">Resnet18</td>
<td valign="top" align="center">11.2 M</td>
<td valign="top" align="center">0.1</td>
<td valign="top" align="center">1.4 h</td>
<td valign="top" align="center">100</td>
<td/>
<td valign="top" align="center">11.2 M</td>
<td valign="top" align="center">0.001</td>
<td valign="top" align="center">40 m</td>
<td valign="top" align="center">50</td>
</tr>
<tr>
<td valign="top" align="left">e28</td>
<td valign="top" align="left">Resnet50</td>
<td valign="top" align="center">23.6 M</td>
<td valign="top" align="center">0.1</td>
<td valign="top" align="center">2.3 h</td>
<td valign="top" align="center">100</td>
<td/>
<td valign="top" align="center">23.5 M</td>
<td valign="top" align="center">0.001</td>
<td valign="top" align="center">38 m</td>
<td valign="top" align="center">30</td>
</tr>
<tr>
<td valign="top" align="left">e29</td>
<td valign="top" align="left">Resnet101</td>
<td valign="top" align="center">42.6 M</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">3.3 h</td>
<td valign="top" align="center">100</td>
<td/>
<td valign="top" align="center">42.5 M</td>
<td valign="top" align="center">0.001</td>
<td valign="top" align="center">35 m</td>
<td valign="top" align="center">20</td>
</tr>
<tr>
<td valign="top" align="left">e30</td>
<td valign="top" align="left">DenseNet</td>
<td valign="top" align="center">791.1 K</td>
<td valign="top" align="center">0.1</td>
<td valign="top" align="center">3.8 h</td>
<td valign="top" align="center">100</td>
<td/>
<td valign="top" align="center">769.2 K</td>
<td valign="top" align="center">0.001</td>
<td valign="top" align="center">1.9 h</td>
<td valign="top" align="center">50</td>
</tr>
<tr>
<td valign="top" align="left">e31</td>
<td valign="top" align="left">MobileNet-v2</td>
<td valign="top" align="center">3.6 M</td>
<td valign="top" align="center">0.1</td>
<td valign="top" align="center">2.2 h</td>
<td valign="top" align="center">100</td>
<td/>
<td valign="top" align="center">3.5 M</td>
<td valign="top" align="center">0.001</td>
<td valign="top" align="center">1.0 h</td>
<td valign="top" align="center">50</td>
</tr>
</tbody>
</table><table-wrap-foot>
<p>(Bae-training: Mini-imageNet; meta-learning: PV-2-22; distance metric: cosine similarity).</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T8">
<label>Table 8</label>
<caption><p>The results of different backbone networks.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>ID</bold></th>
<th valign="top" align="left"><bold>Backbone networks</bold></th>
<th valign="top" align="center"><bold>1-shot</bold></th>
<th valign="top" align="center"><bold>5-shot</bold></th>
<th valign="top" align="center"><bold>10-shot</bold></th>
<th valign="top" align="center"><bold>20-shot</bold></th>
<th valign="top" align="center"><bold>30-shot</bold></th>
<th valign="top" align="center"><bold>40-shot</bold></th>
<th valign="top" align="center"><bold>50-shot</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">e25</td>
<td valign="top" align="left">Convnet4</td>
<td valign="top" align="center">69.06</td>
<td valign="top" align="center">85.91</td>
<td valign="top" align="center">89.91</td>
<td valign="top" align="center">91.88</td>
<td valign="top" align="center">92.35</td>
<td valign="top" align="center">92.79</td>
<td valign="top" align="center">93.11</td>
</tr>
<tr>
<td valign="top" align="left">e26</td>
<td valign="top" align="left">AlexNet</td>
<td valign="top" align="center">68.35</td>
<td valign="top" align="center">83.12</td>
<td valign="top" align="center">85.73</td>
<td valign="top" align="center">87.00</td>
<td valign="top" align="center">87.27</td>
<td valign="top" align="center">87.44</td>
<td valign="top" align="center">87.92</td>
</tr>
<tr>
<td valign="top" align="left">e5</td>
<td valign="top" align="left">Resnet12</td>
<td valign="top" align="center">80.88</td>
<td valign="top" align="center"><bold>91.75</bold></td>
<td valign="top" align="center"><bold>93.44</bold></td>
<td valign="top" align="center"><bold>94.27</bold></td>
<td valign="top" align="center"><bold>94.53</bold></td>
<td valign="top" align="center"><bold>94.70</bold></td>
<td valign="top" align="center"><bold>94.84</bold></td>
</tr>
<tr>
<td valign="top" align="left">e27</td>
<td valign="top" align="left">Resnet18</td>
<td valign="top" align="center">78.58</td>
<td valign="top" align="center">89.16</td>
<td valign="top" align="center">91.36</td>
<td valign="top" align="center">91.96</td>
<td valign="top" align="center">92.26</td>
<td valign="top" align="center">92.44</td>
<td valign="top" align="center">92.78</td>
</tr>
<tr>
<td valign="top" align="left">e28</td>
<td valign="top" align="left">Resnet50</td>
<td valign="top" align="center"><bold>80.89</bold></td>
<td valign="top" align="center">90.91</td>
<td valign="top" align="center">92.56</td>
<td valign="top" align="center">93.86</td>
<td valign="top" align="center">94.08</td>
<td valign="top" align="center">94.15</td>
<td valign="top" align="center">94.33</td>
</tr>
<tr>
<td valign="top" align="left">e29</td>
<td valign="top" align="left">Resnet101</td>
<td valign="top" align="center">74.93</td>
<td valign="top" align="center">85.59</td>
<td valign="top" align="center">87.63</td>
<td valign="top" align="center">89.12</td>
<td valign="top" align="center">89.67</td>
<td valign="top" align="center">89.91</td>
<td valign="top" align="center">89.91</td>
</tr>
<tr>
<td valign="top" align="left">e30</td>
<td valign="top" align="left">DenseNet</td>
<td valign="top" align="center">79.39</td>
<td valign="top" align="center">89.21</td>
<td valign="top" align="center">90.82</td>
<td valign="top" align="center">91.84</td>
<td valign="top" align="center">92.21</td>
<td valign="top" align="center">92.10</td>
<td valign="top" align="center">92.50</td>
</tr>
<tr>
<td valign="top" align="left">e31</td>
<td valign="top" align="left">MobileNet-V2</td>
<td valign="top" align="center">78.17</td>
<td valign="top" align="center">89.21</td>
<td valign="top" align="center">91.48</td>
<td valign="top" align="center">92.42</td>
<td valign="top" align="center">92.83</td>
<td valign="top" align="center">93.02</td>
<td valign="top" align="center">93.41</td>
</tr>
</tbody>
</table><table-wrap-foot>
<p>(Method: MB; backbone network: resnet12; batchsize: 128; Lr: 0.1 in base-training, 0.001 in meta-learning; Data: Mini-imageNet in base-training, PV-setting-2 in meta-learning and test).</p>
</table-wrap-foot>
</table-wrap>
<p>In base-training and meta-learning, we use the validation data to test the accuracy of <italic>5-way, 1-shot</italic> tasks which is shown in <xref ref-type="fig" rid="F6">Figure 6</xref>. The black numbers on the black lines are the best accuracy in base-training, and the black numbers on the red lines are the best accuracy in meta-learning. The lifting ranges of accuracy in meta-learning are marked in red numbers. It is shown that the model trained in base-training stage already has the identification ability with few shots to some extent, even without training with tasks in meta-learning. However, in base-training, the model is already convergent by training with image-wise data, and the accuracy of task testing no longer increases. In fact, the model still has space to improve. Based on this, in meta-learning, by using task-wise data, the accuracy has been further promoted around 20% to 30%.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>The best validation accuracy (%) of &#x0201C;1-shot, 5-way&#x0201D; task in base-training and meta-learning. The red digits represent the accuracy lifting ranges (%) of meta-learning.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-907916-g0006.tif"/>
</fig>
<p>In recent years, the architectures of networks go deeper and deeper. Some researchers proposed a question that do we really need so deep networks? Our results show that a medium-sized network outperforms other networks in this task. We summarized two reasons: (1) In CNNs, the simpler and more basic features are learnt in shallower layers, the more abstract and complex features are learnt from deeper layers. From shallower layers to deeper layers, the features transition from edges, lines, and colors, to textures and patterns, to complex graphics, even to specific objects. For our specific task, even humans (e.g. plant experts) rely more on color, shape, and texture for disease identification. Hence, the too deep networks may be not critical meaningful. (2) FSL is the kind of learning task with limited data-scale. For a deeper network, it always has large number of parameters needed to be updated. In the data-limitation condition, too deep network could meet insufficient updating of parameters in backpropagation due to the too long backpropagation path. In parameter updating, shallower networks are more flexible, while the deeper networks look bulky. In short, it does not mean that deeper networks always outperform shallower networks. The size of network should match the specific task and data resources.</p></sec>
<sec>
<title>3.9. Compare with related works</title>
<p>In order to show the superiority of our method, we conducted several experiments to compare with some recent related researches. Arg&#x000FC;eso et al. (<xref ref-type="bibr" rid="B3">2020</xref>) used Siamese Network, Triplet Network, and PV as their experimental material. They set a different data splitting: 32 classes are used for training and the rest six classes (apple four classes, blueberry healthy, cherry healthy) for testing. They listed results of three methods: transfer learning, Siamese Network, and Triplet Network. Their backbone network is Inception-V3. In order to be comparable, we executed the experiments with the same data setting as their work. Mini-ImageNet is used in base-training, 32 classes of PV are used in meta-learning, and the rest 6 classes are used in test. The results of e32&#x02013;e34 are shown in <xref ref-type="table" rid="T9">Table 9</xref>.</p>
<table-wrap position="float" id="T9">
<label>Table 9</label>
<caption><p>The results compared with related works.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>ID</bold></th>
<th valign="top" align="left"><bold>Method</bold></th>
<th valign="top" align="center"><bold>1-shot</bold></th>
<th valign="top" align="center"><bold>5-shot</bold></th>
<th valign="top" align="center"><bold>10-shot</bold></th>
<th valign="top" align="center"><bold>20-shot</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td/>
<td/>
<td valign="top" align="center" colspan="4"><bold>Data setting in Arg&#x000FC;eso et al. (</bold><xref ref-type="bibr" rid="B3"><bold>2020</bold></xref><bold>)</bold></td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Finetuning (Arg&#x000FC;eso et al., <xref ref-type="bibr" rid="B3">2020</xref>)</td>
<td valign="top" align="center">18.2</td>
<td valign="top" align="center">25.4</td>
<td valign="top" align="center">30.3</td>
<td valign="top" align="center">41.1</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Siamese contrastive (Arg&#x000FC;eso et al., <xref ref-type="bibr" rid="B3">2020</xref>)</td>
<td valign="top" align="center">50.2</td>
<td valign="top" align="center">64.2</td>
<td valign="top" align="center">70.2</td>
<td valign="top" align="center">74.1</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Siamese triplet (Arg&#x000FC;eso et al., <xref ref-type="bibr" rid="B3">2020</xref>)</td>
<td valign="top" align="center">65.2</td>
<td valign="top" align="center">72.3</td>
<td valign="top" align="center">76.8</td>
<td valign="top" align="center">81.8</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Single SS (Li and Chao, <xref ref-type="bibr" rid="B32">2021b</xref>)</td>
<td valign="top" align="center">74.5</td>
<td valign="top" align="center">89.7</td>
<td valign="top" align="center">92.6</td>
<td valign="top" align="center">93.9</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Iterative SS (Li and Chao, <xref ref-type="bibr" rid="B32">2021b</xref>)</td>
<td valign="top" align="center">75.1</td>
<td valign="top" align="center">90.0</td>
<td valign="top" align="center">92.7</td>
<td valign="top" align="center">93.9</td>
</tr>
<tr>
<td valign="top" align="left">e32</td>
<td valign="top" align="left"><bold>Ours MB</bold></td>
<td valign="top" align="center">76.4</td>
<td valign="top" align="center">91.0</td>
<td valign="top" align="center">93.2</td>
<td valign="top" align="center">94.2</td>
</tr>
<tr>
<td valign="top" align="left">e33</td>
<td valign="top" align="left"><bold>Ours MB&#x0002B;CMSFF</bold></td>
<td valign="top" align="center">80.0</td>
<td valign="top" align="center">91.9</td>
<td valign="top" align="center">93.7</td>
<td valign="top" align="center"><bold>94.3</bold></td>
</tr>
<tr>
<td valign="top" align="left">e34</td>
<td valign="top" align="left"><bold>Ours MB&#x0002B;CMSFF&#x0002B;CA</bold></td>
<td valign="top" align="center"><bold>80.4</bold></td>
<td valign="top" align="center"><bold>92.8</bold></td>
<td valign="top" align="center"><bold>94.1</bold></td>
<td valign="top" align="center"><bold>94.3</bold></td>
</tr>
<tr>
<td/>
<td/>
<td valign="top" align="center" colspan="4"><bold>Data Split-1 of Li and Chao (</bold><xref ref-type="bibr" rid="B32"><bold>2021b</bold></xref><bold>)</bold></td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Baseline (Li and Chao, <xref ref-type="bibr" rid="B32">2021b</xref>)</td>
<td valign="top" align="center">32.8</td>
<td valign="top" align="center">46.7</td>
<td valign="top" align="center">64</td>
<td valign="top" align="center">73.2</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Single SS (Li and Chao, <xref ref-type="bibr" rid="B32">2021b</xref>)</td>
<td valign="top" align="center">33.7</td>
<td valign="top" align="center">50.9</td>
<td valign="top" align="center">66.7</td>
<td valign="top" align="center">74.7</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Iterative SS (Li and Chao, <xref ref-type="bibr" rid="B32">2021b</xref>)</td>
<td valign="top" align="center">34</td>
<td valign="top" align="center">53.1</td>
<td valign="top" align="center">68.8</td>
<td valign="top" align="center">75.6</td>
</tr>
<tr>
<td valign="top" align="left">e35</td>
<td valign="top" align="left"><bold>Ours MB</bold></td>
<td valign="top" align="center">55.7</td>
<td valign="top" align="center">72.8</td>
<td valign="top" align="center">76.7</td>
<td valign="top" align="center">79.5</td>
</tr>
<tr>
<td valign="top" align="left">e36</td>
<td valign="top" align="left"><bold>Ours MB&#x0002B;CMSFF</bold></td>
<td valign="top" align="center">60.6</td>
<td valign="top" align="center"><bold>78.4</bold></td>
<td valign="top" align="center"><bold>82.4</bold></td>
<td valign="top" align="center">84.3</td>
</tr>
<tr>
<td valign="top" align="left">e37</td>
<td valign="top" align="left"><bold>Ours MB&#x0002B;CMSFF&#x0002B;CA</bold></td>
<td valign="top" align="center"><bold>60.7</bold></td>
<td valign="top" align="center">78.1</td>
<td valign="top" align="center">82.2</td>
<td valign="top" align="center"><bold>84.5</bold></td>
</tr>
<tr>
<td/>
<td/>
<td valign="top" align="center" colspan="4"><bold>Data Split-2 of Li and Chao (</bold><xref ref-type="bibr" rid="B32"><bold>2021b</bold></xref><bold>)</bold></td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Baseline (Li and Chao, <xref ref-type="bibr" rid="B32">2021b</xref>)</td>
<td valign="top" align="center">43.9</td>
<td valign="top" align="center">68.5</td>
<td valign="top" align="center">78.7</td>
<td valign="top" align="center">89.1</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Single SS (Li and Chao, <xref ref-type="bibr" rid="B32">2021b</xref>)</td>
<td valign="top" align="center">44.7</td>
<td valign="top" align="center">74.7</td>
<td valign="top" align="center">85.7</td>
<td valign="top" align="center">89.7</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Iterative SS (Li and Chao, <xref ref-type="bibr" rid="B32">2021b</xref>)</td>
<td valign="top" align="center">46.4</td>
<td valign="top" align="center">76.9</td>
<td valign="top" align="center">89.2</td>
<td valign="top" align="center">91.9</td>
</tr>
<tr>
<td valign="top" align="left">e38</td>
<td valign="top" align="left"><bold>Ours MB</bold></td>
<td valign="top" align="center">77.1</td>
<td valign="top" align="center">91.1</td>
<td valign="top" align="center">92.9</td>
<td valign="top" align="center">93.8</td>
</tr>
<tr>
<td valign="top" align="left">e39</td>
<td valign="top" align="left"><bold>Ours MB&#x0002B;CMSFF</bold></td>
<td valign="top" align="center">78.8</td>
<td valign="top" align="center">91.6</td>
<td valign="top" align="center">93.5</td>
<td valign="top" align="center">94.6</td>
</tr>
<tr>
<td valign="top" align="left">e40</td>
<td valign="top" align="left"><bold>Ours MB&#x0002B;CMSFF&#x0002B;CA</bold></td>
<td valign="top" align="center"><bold>79.1</bold></td>
<td valign="top" align="center"><bold>92.2</bold></td>
<td valign="top" align="center"><bold>94.0</bold></td>
<td valign="top" align="center"><bold>95.1</bold></td>
</tr>
<tr>
<td/>
<td/>
<td valign="top" align="center" colspan="4"><bold>Data Split-3 of Li and Chao (</bold><xref ref-type="bibr" rid="B32"><bold>2021b</bold></xref><bold>)</bold></td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Baseline (Li and Chao, <xref ref-type="bibr" rid="B32">2021b</xref>)</td>
<td valign="top" align="center">50.7</td>
<td valign="top" align="center">63.1</td>
<td valign="top" align="center">77.2</td>
<td valign="top" align="center">89.3</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Single SS (Li and Chao, <xref ref-type="bibr" rid="B32">2021b</xref>)</td>
<td valign="top" align="center">52.3</td>
<td valign="top" align="center">67.6</td>
<td valign="top" align="center">79.9</td>
<td valign="top" align="center">90.1</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Iterative SS (Li and Chao, <xref ref-type="bibr" rid="B32">2021b</xref>)</td>
<td valign="top" align="center">55.2</td>
<td valign="top" align="center">69.3</td>
<td valign="top" align="center">80.8</td>
<td valign="top" align="center">91.5</td>
</tr>
<tr>
<td valign="top" align="left">e41</td>
<td valign="top" align="left"><bold>Ours MB</bold></td>
<td valign="top" align="center">78.1</td>
<td valign="top" align="center">89.4</td>
<td valign="top" align="center">91.4</td>
<td valign="top" align="center">92.6</td>
</tr>
<tr>
<td valign="top" align="left">e42</td>
<td valign="top" align="left"><bold>Ours MB&#x0002B;CMSFF</bold></td>
<td valign="top" align="center">80.6</td>
<td valign="top" align="center">90.8</td>
<td valign="top" align="center">92.4</td>
<td valign="top" align="center">93.3</td>
</tr>
<tr>
<td valign="top" align="left">e43</td>
<td valign="top" align="left"><bold>Ours MB&#x0002B;CMSFF&#x0002B;CA</bold></td>
<td valign="top" align="center"><bold>81.5</bold></td>
<td valign="top" align="center"><bold>91.1</bold></td>
<td valign="top" align="center"><bold>92.8</bold></td>
<td valign="top" align="center"><bold>93.4</bold></td>
</tr>
</tbody>
</table><table-wrap-foot>
<p>(Ours: backbone network: Resnet12; distance metric: cosine similarity; base-training: Mini-ImageNet).</p>
</table-wrap-foot>
</table-wrap>
<p>We also compared with Li and Chao (<xref ref-type="bibr" rid="B32">2021b</xref>). They proposed a Semi-supervised (SS) FSL approach. The baseline is a typical fine-tuning model. The Single SS adds Semi-supervised step on the top of baseline. The Iterative SS adds one more Semi-supervised step on the top of Single SS. PV was also used as their experimental material and set to three splits. Each split has 28 classes for training and the rest 10 classes for testing. They compared with Arg&#x000FC;eso et al. (<xref ref-type="bibr" rid="B3">2020</xref>) too. We also conducted experiments by our methods with the same data settings as Li and Chao (<xref ref-type="bibr" rid="B32">2021b</xref>). The results of e35&#x02013;e43 are shown in <xref ref-type="table" rid="T9">Table 9</xref>. All the comparison results are shown in <xref ref-type="fig" rid="F7">Figure 7</xref>.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>The results compared with related works. <bold>(A)</bold> Our work compares with (Arg&#x000FC;eso et al., <xref ref-type="bibr" rid="B3">2020</xref>) and (Li and Chao, <xref ref-type="bibr" rid="B32">2021b</xref>). <bold>(B)</bold> Our work compares with Li and Chao (<xref ref-type="bibr" rid="B32">2021b</xref>) using the data split-1. <bold>(C)</bold> Our work compares with Li and Chao (<xref ref-type="bibr" rid="B32">2021b</xref>) using the data split-1. <bold>(D)</bold> Our work compares with Li and Chao (<xref ref-type="bibr" rid="B32">2021b</xref>) using the data split-1.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-907916-g0007.tif"/>
</fig>
<p>The data settings of the two references are different from our data settings. The results indicate that our method outperforms the existing works with all data settings, which means that our method is superior and robust.</p></sec></sec>
<sec sec-type="discussion" id="s4">
<title>4. Discussion</title>
<sec>
<title>4.1. Motivation and contribution</title>
<p>The method learning from few samples is very promising in plant disease recognition, which has wide range of potential application scenarios for its saving of cost on data. When expanding the range of application, a well-established model of FSL can easily generalize to novel species or diseases without retraining and providing large-scale training data. However, some existing limitations of the FSL itself and the specific applied areas are needed to be considered. Our main contributions in this work are two-folds: (1) we propose to merge the CMSFF in the backbone network to enhance the feature representation, and combine the CA to focus on the informative channels; (2) we propose a group of training strategies to match the different generalization scenarios.</p></sec>
<sec>
<title>4.2. Limitation and future work</title>
<p>The theoretical research of FSL is in the stage of rapid development at present. Although FSL is very suitable for plant disease recognition, the applications of smart agriculture have just begun (Yang et al., <xref ref-type="bibr" rid="B57">2022</xref>). In this research direction, there are still huge potential space needed to explore. In here, we discuss the limitations of this work and some future works.</p>
<p><bold>1. Multi-disease</bold>. The PV and AFD used in this work as target data which have a common characteristic that only single disease is included in per image. In fact, once a plant is infected by the first disease, it is easily infected by other diseases because the immune system is attacked and becomes weak (Barbedo, <xref ref-type="bibr" rid="B4">2016</xref>). Multiple diseases occur in a plant is more common in the real field condition. But the combinations of different diseases are too many to collect sufficient samples for each category from classification perspective (e.g., three diseases of a species generate 7 categories). The current researches prefer to solve this problem by semantic segmentation. We do not cover this challenging problem due to limitations of data resources in this work.</p>
<p><bold>2. Formulation of meta-learning data</bold>. The samples of PV were taken under controlled condition (lab-settings), which have a clean board as the unified background, the illumination is under controlled, only single leaf in per image, only single disease occurs in per leaf. The settings are simple and very different from the in-wild conditions. That is the reason many researches already achieved high accuracy by using deep learning CNNs on PV (Hasan et al., <xref ref-type="bibr" rid="B18">2020</xref>). But the samples of AFD were taken under in-wild condition, which have complex surroundings. When testing with AFD, we use PV in meta-learning, mainly considering that both datasets are about plant diseases. Since we did not find any other appropriate dataset, the degree of similarity of the data used in training and test was not taken in account.</p>
<p>According to our hypothesis, the degree of similarity of data used in meta-learning and test is higher, the adapting is easier, and the result would be better. It is demonstrated that the selection of meta-learning data is critical in this pipeline. The data used in meta-learning stage should be determined by the target. When the application scenarios cannot be predicted, how to formulate an appropriate meta-learning dataset is worthy to study. Inspired by Nuthalapati and Tunga (<xref ref-type="bibr" rid="B42">2021</xref>) and Li and Yang (<xref ref-type="bibr" rid="B35">2021</xref>), the effectiveness of a mixed dataset for meta-learning will be considered.</p>
<p><bold>3. Sub-class classification</bold>. For the application of plant disease recognition, it is more meaningful to distinguish the diseases belonging to the same species. What farmers need more than anything else is a diagnostic assistant that can identify similar diseases belonging to the same plant. Although sub-class classification is difficult (Liu and Wang, <xref ref-type="bibr" rid="B40">2021</xref>), it is an inescapable work in plant disease recognition and the performance is needed to be improved urgently. Fine-grained features of the lesions being the distinguishable features to solve this issue. In this direction, lesion detection and segmentation, fine-grained visual classification are involved.</p>
<p><bold>4. The quality and quantity of training data</bold>. Most of the current researches of FSL deal with the configuration of data used in test, but very little work has concerned the data used in training. The common sense is that deep learning networks rely on large-scale data. However, a new direction is discussing the quality and quantity of training data recently (Li and Chao, <xref ref-type="bibr" rid="B31">2021a</xref>,<xref ref-type="bibr" rid="B33">c</xref>; Li et al., <xref ref-type="bibr" rid="B36">2021</xref>; Li Y. et al., <xref ref-type="bibr" rid="B34">2022</xref>). These works indicate that part of data can achieve at the same performance as full data. Date quality can be assessed, which can guide to establish a dataset with enough diversity data while without redundant samples. The networks of appropriate depth using good data can achieve optimal results in many traditional CNN classification tasks.</p>
<p>In this work, we use large-scale data in base-training and meta-learning. The quantity of data follows the conventional settings for comparison purposes. The data quality assessment work is not involved in this work. For the specific topic of plant disease, the data quality is very important. We know that at different stages of development of plants and diseases, the symptom appearances are very different. How to construct a comprehensive set without redundant data to represent a disease is a valuable work in the future (Barbedo, <xref ref-type="bibr" rid="B5">2018</xref>).</p>
<p><bold>5. Cross-domain</bold>. The significance of cross-domain has been introduced in prior sections. We emphasize cross-domain again because it is common when we cannot predict the species, surroundings, and photo conditions in test. In this work, we consider it from training strategies. There are many aspects to explore in future work, such as network architecture, feature distribution calibration etc.</p></sec></sec>
<sec sec-type="conclusions" id="s5">
<title>5. Conclusion</title>
<p>In response to the two problems when using FSL for plant disease recognition, we propose a network based on the MB approach that merges CMSFF and CA to obtain a richer feature representation. From experiments, we found that the CMSFF is effective to obtain richer feature representation, especially under the few-shot condition. The CA is an important compensation to the CMSFF, which helps to focus on these meaningful channels. Our method outperforms the existing related works, which indicates that our method is highly robust. The CMSFF&#x0002B;CA is an appropriate combination that fits for any algorithm that needs enhance the feature representation. In addition, a group of training strategies is proposed to meet requirements of different generalization situations. Many factors are discussed in this work, such as backbone networks, distance metrics etc. The limitations of this work and some new related research directions are discussed.</p></sec>
<sec sec-type="data-availability" id="s6">
<title>Data availability statement</title>
<p>The dataset Mini-Imagenet for this study can be found in Kaggle, <ext-link ext-link-type="uri" xlink:href="https://www.kaggle.com/datasets/whitemoon/miniimagenet">https://www.kaggle.com/datasets/whitemoon/miniimagenet</ext-link>; The dataset PlantVillage for this study can be found in Kaggle, <ext-link ext-link-type="uri" xlink:href="https://www.kaggle.com/datasets/abdallahalidev/plantvillage-dataset">https://www.kaggle.com/datasets/abdallahalidev/plantvillage-dataset</ext-link>; The dataset Apple Foliar Diseases for this study can be found in Kaggle, <ext-link ext-link-type="uri" xlink:href="https://www.kaggle.com/c/plant-pathology-2021-fgvc8/data">https://www.kaggle.com/c/plant-pathology-2021-fgvc8/data</ext-link>.</p></sec>
<sec id="s7">
<title>Author contributions</title>
<p>HL, GP, RT, and S-KT: conceptualization. HL: methodology, experiment, and writing&#x02014;original draft and editing. GP, RT, and S-KT: writing&#x02014;review. Z-pQ: experiment and writing&#x02014;review. All authors contributed to the article and approved the submitted version.</p></sec>
<sec sec-type="funding-information" id="s8">
<title>Funding</title>
<p>This research was funded by the projects of Natural Science Foundation of China (Grant No. 12163004), the Yunnan Fundamental Research Projects (Grant No. 202101BD070001-053), and the Fundamental Research Projects of Yunnan Provincial Department of Education (Grant No. 2022J0496). This work was also supported in part by the Macao Polytechnic University&#x02014;Edge Sensing and Computing: Enabling Human-centric (Sustainable) Smart Cities (RP/ESCA-01/2020).</p></sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p></sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p></sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Adler</surname> <given-names>T.</given-names></name> <name><surname>Brandstetter</surname> <given-names>J.</given-names></name> <name><surname>Widrich</surname> <given-names>M.</given-names></name> <name><surname>Mayr</surname> <given-names>A.</given-names></name> <name><surname>Kreil</surname> <given-names>D.</given-names></name> <name><surname>Kopp</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Cross-domain few-shot learning by representation fusion</article-title>. <source>arXiv preprint arXiv:2010.06498</source>. <pub-id pub-id-type="doi">10.48550/arXiv.2010.06498</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Afifi</surname> <given-names>A.</given-names></name> <name><surname>Alhumam</surname> <given-names>A.</given-names></name> <name><surname>Abdelwahab</surname> <given-names>A.</given-names></name></person-group> (<year>2021</year>). <article-title>Convolutional neural network for automatic identification of plant diseases with limited data</article-title>. <source>Plants</source> <volume>10</volume>:<fpage>28</fpage>. <pub-id pub-id-type="doi">10.3390/plants10010028</pub-id><pub-id pub-id-type="pmid">33374398</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Arg&#x000FC;eso</surname> <given-names>D.</given-names></name> <name><surname>Picon</surname> <given-names>A.</given-names></name> <name><surname>Irusta</surname> <given-names>U.</given-names></name> <name><surname>Medela</surname> <given-names>A.</given-names></name> <name><surname>San-Emeterio</surname> <given-names>M. G.</given-names></name> <name><surname>Bereciartua</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Few-shot learning approach for plant disease classification using images taken in the field</article-title>. <source>Comput. Electron. Agric</source>. <volume>175</volume>:<fpage>105542</fpage>. <pub-id pub-id-type="doi">10.1016/j.compag.2020.105542</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barbedo</surname> <given-names>J. G. A.</given-names></name></person-group> (<year>2016</year>). <article-title>A review on the main challenges in automatic plant disease identification based on visible range images</article-title>. <source>Biosyst. Eng</source>. <volume>144</volume>, <fpage>52</fpage>&#x02013;<lpage>60</lpage>. <pub-id pub-id-type="doi">10.1016/j.biosystemseng.2016.01.017</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barbedo</surname> <given-names>J. G. A.</given-names></name></person-group> (<year>2018</year>). <article-title>Impact of dataset size and variety on the effectiveness of deep learning and transfer learning for plant disease classification</article-title>. <source>Comput. Electron. Agric</source>. <volume>153</volume>, <fpage>46</fpage>&#x02013;<lpage>53</lpage>. <pub-id pub-id-type="doi">10.1016/j.compag.2018.08.013</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Campbell</surname> <given-names>C. L.</given-names></name> <name><surname>Madden</surname> <given-names>L. V.</given-names></name></person-group> (<year>1990</year>). <source>Introduction to Plant Disease Epidemiology</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>John Wiley &#x00026; Sons</publisher-name>.</citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>L.</given-names></name> <name><surname>Cui</surname> <given-names>X.</given-names></name> <name><surname>Li</surname> <given-names>W.</given-names></name></person-group> (<year>2021</year>). <article-title>Meta-learning for few-shot plant disease detection</article-title>. <source>Foods</source> <volume>10</volume>:<fpage>2441</fpage>. <pub-id pub-id-type="doi">10.3390/foods10102441</pub-id><pub-id pub-id-type="pmid">34681490</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Xu</surname> <given-names>H.</given-names></name> <name><surname>Darrell</surname> <given-names>T.</given-names></name></person-group> (<year>2020</year>). <article-title>A new meta-baseline for few-shot learning</article-title>. <source>arXiv preprint arXiv:2003.04390</source>. <pub-id pub-id-type="doi">10.48550/arXiv.2003.04390</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Deng</surname> <given-names>J.</given-names></name> <name><surname>Dong</surname> <given-names>W.</given-names></name> <name><surname>Socher</surname> <given-names>R.</given-names></name> <name><surname>Li</surname> <given-names>L.-J.</given-names></name> <name><surname>Li</surname> <given-names>K.</given-names></name> <name><surname>Fei-Fei</surname> <given-names>L.</given-names></name></person-group> (<year>2009</year>). <article-title>&#x0201C;Imagenet: a large-scale hierarchical image database,&#x0201D;</article-title> in <source>2009 IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Miami, FL</publisher-loc>), <fpage>248</fpage>&#x02013;<lpage>255</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2009.5206848</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dogra</surname> <given-names>A.</given-names></name> <name><surname>Goyal</surname> <given-names>B.</given-names></name> <name><surname>Agrawal</surname> <given-names>S.</given-names></name></person-group> (<year>2017</year>). <article-title>From multi-scale decomposition to non-multi-scale decomposition methods: a comprehensive survey of image fusion techniques and its applications</article-title>. <source>IEEE Access</source> <volume>5</volume>, <fpage>16040</fpage>&#x02013;<lpage>16067</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2017.2735865</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Dong</surname> <given-names>H.</given-names></name> <name><surname>Pan</surname> <given-names>J.</given-names></name> <name><surname>Xiang</surname> <given-names>L.</given-names></name> <name><surname>Hu</surname> <given-names>Z.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Wang</surname> <given-names>F.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>&#x0201C;Multi-scale boosted dehazing network with dense feature fusion,&#x0201D;</article-title> in <source>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Seattle, WA</publisher-loc>), <fpage>2157</fpage>&#x02013;<lpage>2167</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR42600.2020.00223</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dong</surname> <given-names>S.</given-names></name> <name><surname>Wang</surname> <given-names>P.</given-names></name> <name><surname>Abbas</surname> <given-names>K.</given-names></name></person-group> (<year>2021</year>). <article-title>A survey on deep learning and its applications</article-title>. <source>Comput. Sci. Rev</source>. <volume>40</volume>:<fpage>100379</fpage>. <pub-id pub-id-type="doi">10.1016/j.cosrev.2021.100379</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Goodfellow</surname> <given-names>I.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name> <name><surname>Courville</surname> <given-names>A.</given-names></name></person-group> (<year>2016</year>). <source>Deep Learning</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT Press</publisher-name>.</citation>
</ref>
<ref id="B14">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Guo</surname> <given-names>M.-H.</given-names></name> <name><surname>Xu</surname> <given-names>T.-X.</given-names></name> <name><surname>Liu</surname> <given-names>J.-J.</given-names></name> <name><surname>Liu</surname> <given-names>Z.-N.</given-names></name> <name><surname>Jiang</surname> <given-names>P.-T.</given-names></name> <name><surname>Mu</surname> <given-names>T.-J.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Attention mechanisms in computer vision: a survey</article-title>. <source>arXiv preprint arXiv:2111.07624</source>. <pub-id pub-id-type="doi">10.1007/s41095-022-0271-y</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Guo</surname> <given-names>Y.</given-names></name> <name><surname>Codella</surname> <given-names>N. C.</given-names></name> <name><surname>Karlinsky</surname> <given-names>L.</given-names></name> <name><surname>Codella</surname> <given-names>J. V.</given-names></name> <name><surname>Smith</surname> <given-names>J. R.</given-names></name> <name><surname>Saenko</surname> <given-names>K.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>&#x0201C;A broader study of cross-domain few-shot learning,&#x0201D;</article-title> in <source>European Conference on Computer Vision</source> (<publisher-loc>Glasgow</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>124</fpage>&#x02013;<lpage>141</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-58583-9_8</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hafiz</surname> <given-names>A. M.</given-names></name> <name><surname>Parah</surname> <given-names>S. A.</given-names></name> <name><surname>Bhat</surname> <given-names>R. U. A.</given-names></name></person-group> (<year>2021</year>). <article-title>Attention mechanisms and deep learning for machine vision: a survey of the state of the art</article-title>. <source>arXiv preprint arXiv:2106.07550</source>. <pub-id pub-id-type="doi">10.48550/arXiv.2106.07550</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Han</surname> <given-names>J.</given-names></name> <name><surname>Kamber</surname> <given-names>M.</given-names></name> <name><surname>Pei</surname> <given-names>J.</given-names></name></person-group> (<year>2012</year>). <article-title>&#x0201C;Chapter 2: Getting to know your data,&#x0201D;</article-title> in <source>Data Mining, 3rd Edn, The Morgan Kaufmann Series in Data Management Systems</source>, eds D. Cerra, and H. Severson (Boston, MA: Morgan Kaufmann), <fpage>39</fpage>&#x02013;<lpage>82</lpage>. <pub-id pub-id-type="doi">10.1016/B978-0-12-381479-1.00002-2</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hasan</surname> <given-names>R. I.</given-names></name> <name><surname>Yusuf</surname> <given-names>S. M.</given-names></name> <name><surname>Alzubaidi</surname> <given-names>L.</given-names></name></person-group> (<year>2020</year>). <article-title>Review of the state of the art of deep learning for plant diseases: a broad analysis and discussion</article-title>. <source>Plants</source> <volume>9</volume>:<fpage>1302</fpage>. <pub-id pub-id-type="doi">10.3390/plants9101302</pub-id><pub-id pub-id-type="pmid">33019765</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>He</surname> <given-names>K.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Ren</surname> <given-names>S.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Deep residual learning for image recognition,&#x0201D;</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Las Vegas, NV</publisher-loc>), <fpage>770</fpage>&#x02013;<lpage>778</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2016.90</pub-id><pub-id pub-id-type="pmid">32166560</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hu</surname> <given-names>D.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;An introductory survey on attention mechanisms in NLP problems,&#x0201D;</article-title> in <source>Proceedings of SAI Intelligent Systems Conference</source> (<publisher-loc>London, UK</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>432</fpage>&#x02013;<lpage>448</lpage>.</citation>
</ref>
<ref id="B21">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hu</surname> <given-names>J.</given-names></name> <name><surname>Shen</surname> <given-names>L.</given-names></name> <name><surname>Sun</surname> <given-names>G.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Squeeze-and-excitation networks,&#x0201D;</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Salt Lake City, UT</publisher-loc>), <fpage>7132</fpage>&#x02013;<lpage>7141</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2018.00745</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>G.</given-names></name> <name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Van Der Maaten</surname> <given-names>L.</given-names></name> <name><surname>Weinberger</surname> <given-names>K. Q.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x0201C;Densely connected convolutional networks,&#x0201D;</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Honolulu, HI</publisher-loc>), <fpage>4700</fpage>&#x02013;<lpage>4708</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2017.243</pub-id></citation>
</ref>
<ref id="B23">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hughes</surname> <given-names>D.</given-names></name> <name><surname>Salath&#x000E9;</surname> <given-names>M.</given-names></name></person-group> (<year>2015</year>). <article-title>An open access repository of images on plant health to enable the development of mobile disease diagnostics</article-title>. <source>arXiv preprint arXiv:1511.08060</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1511.08060</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Jadon</surname> <given-names>S.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;SSM-net for plants disease identification in low data regime,&#x0201D;</article-title> in <source>2020 IEEE/ITU International Conference on Artificial Intelligence for Good (AI4G)</source> (<publisher-loc>Geneva</publisher-loc>), <fpage>158</fpage>&#x02013;<lpage>163</lpage>. <pub-id pub-id-type="doi">10.1109/AI4G50087.2020.9311073</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Koch</surname> <given-names>G.</given-names></name> <name><surname>Zemel</surname> <given-names>R.</given-names></name> <name><surname>Salakhutdinov</surname> <given-names>R.</given-names></name></person-group> (<year>2015</year>). <article-title>&#x0201C;Siamese neural networks for one-shot image recognition,&#x0201D;</article-title> in <source>ICML Deep Learning Workshop, Vol. 2</source> (<publisher-loc>Lille</publisher-loc>).</citation>
</ref>
<ref id="B26">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Krizhevsky</surname> <given-names>A.</given-names></name> <name><surname>Sutskever</surname> <given-names>I.</given-names></name> <name><surname>Hinton</surname> <given-names>G. E.</given-names></name></person-group> (<year>2012</year>). <article-title>&#x0201C;Imagenet classification with deep convolutional neural networks,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems</source> (<publisher-loc>Lake Tahoe, NV</publisher-loc>), <fpage>25</fpage>.</citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lan</surname> <given-names>R.</given-names></name> <name><surname>Sun</surname> <given-names>L.</given-names></name> <name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Lu</surname> <given-names>H.</given-names></name> <name><surname>Pang</surname> <given-names>C.</given-names></name> <name><surname>Luo</surname> <given-names>X.</given-names></name></person-group> (<year>2020</year>). <article-title>MADNet: a fast and lightweight network for single-image super resolution</article-title>. <source>IEEE Trans. Cybern</source>. <volume>51</volume>, <fpage>1443</fpage>&#x02013;<lpage>1453</lpage>. <pub-id pub-id-type="doi">10.1109/TCYB.2020.2970104</pub-id><pub-id pub-id-type="pmid">32149667</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Fang</surname> <given-names>F.</given-names></name> <name><surname>Mei</surname> <given-names>K.</given-names></name> <name><surname>Zhang</surname> <given-names>G.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Multi-scale residual network for image super-resolution,&#x0201D;</article-title> in <source>Proceedings of the European Conference on Computer Vision (ECCV)</source> (<publisher-loc>Munich</publisher-loc>), <fpage>517</fpage>&#x02013;<lpage>532</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-01237-3_32</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>W.</given-names></name> <name><surname>Xu</surname> <given-names>J.</given-names></name> <name><surname>Huo</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>L.</given-names></name> <name><surname>Gao</surname> <given-names>Y.</given-names></name> <name><surname>Luo</surname> <given-names>J.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;Distribution consistency based covariance metric networks for few-shot learning,&#x0201D;</article-title> in <source>Proceedings of the AAAI Conference on Artificial Intelligence</source> (<publisher-loc>Honolulu, HI</publisher-loc>), <volume>33</volume>, <fpage>8642</fpage>&#x02013;<lpage>8649</lpage>. <pub-id pub-id-type="doi">10.1609/aaai.v33i01.33018642</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>W.-H.</given-names></name> <name><surname>Liu</surname> <given-names>X.</given-names></name> <name><surname>Bilen</surname> <given-names>H.</given-names></name></person-group> (<year>2022</year>). <article-title>&#x0201C;Cross-domain few-shot learning with task-specific adapters,&#x0201D;</article-title> in <source>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>, <fpage>7161</fpage>&#x02013;<lpage>7170</lpage>.</citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Chao</surname> <given-names>X.</given-names></name></person-group> (<year>2021a</year>). <article-title>Distance-entropy: an effective indicator for selecting informative data</article-title>. <source>Front. Plant Sci</source>. <volume>12</volume>:<fpage>818895</fpage>. <pub-id pub-id-type="doi">10.3389/fpls.2021.818895</pub-id><pub-id pub-id-type="pmid">35095987</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Chao</surname> <given-names>X.</given-names></name></person-group> (<year>2021b</year>). <article-title>Semi-supervised few-shot learning approach for plant diseases recognition</article-title>. <source>Plant Methods</source> <volume>17</volume>, <fpage>1</fpage>&#x02013;<lpage>10</lpage>.<pub-id pub-id-type="pmid">34176505</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Chao</surname> <given-names>X.</given-names></name></person-group> (<year>2021c</year>). <article-title>Toward sustainability: trade-off between data quality and quantity in crop pest recognition</article-title>. <source>Front. Plant Sci</source>. <volume>12</volume>:<fpage>811241</fpage>. <pub-id pub-id-type="doi">10.3389/fpls.2021.811241</pub-id><pub-id pub-id-type="pmid">35003196</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Chao</surname> <given-names>X.</given-names></name> <name><surname>Ercisli</surname> <given-names>S.</given-names></name></person-group> (<year>2022</year>). <article-title>Disturbed-entropy: a simple data quality assessment approach</article-title>. <source>ICT Express</source>. <pub-id pub-id-type="doi">10.1016/j.icte.2022.01.006</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Yang</surname> <given-names>J.</given-names></name></person-group> (<year>2021</year>). <article-title>Meta-learning baselines and database for few-shot classification in agriculture</article-title>. <source>Comput. Electron. Agric</source>. <volume>182</volume>:<fpage>106055</fpage>. <pub-id pub-id-type="doi">10.1016/j.compag.2021.106055</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Yang</surname> <given-names>J.</given-names></name> <name><surname>Wen</surname> <given-names>J.</given-names></name></person-group> (<year>2021</year>). <article-title>Entropy-based redundancy analysis and information screening</article-title>. <source>Digit. Commun. Netw</source>. <pub-id pub-id-type="doi">10.1016/j.dcan.2021.12.001</pub-id><pub-id pub-id-type="pmid">15790388</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Lim</surname> <given-names>Y.-C.</given-names></name> <name><surname>Kang</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;Global and local multi-scale feature fusion for object detection and semantic segmentation,&#x0201D;</article-title> in <source>2019 IEEE Intelligent Vehicles Symposium (IV)</source> (<publisher-loc>Paris</publisher-loc>), <fpage>2557</fpage>&#x02013;<lpage>2562</lpage>. <pub-id pub-id-type="doi">10.1109/IVS.2019.8813786</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Lin</surname> <given-names>H.</given-names></name> <name><surname>Tse</surname> <given-names>R.</given-names></name> <name><surname>Tang</surname> <given-names>S.-K.</given-names></name> <name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>Ke</surname> <given-names>W.</given-names></name> <name><surname>Pau</surname> <given-names>G.</given-names></name></person-group> (<year>2021</year>). <article-title>&#x0201C;Near-realtime face mask wearing recognition based on deep learning,&#x0201D;</article-title> in <source>2021 IEEE 18th Annual Consumer Communications</source> &#x00026; <italic>Networking Conference (CCNC)</italic> (Las Vegas, NV), <fpage>1</fpage>&#x02013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1109/CCNC49032.2021.9369493</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Lin</surname> <given-names>T.-Y.</given-names></name> <name><surname>Doll&#x000E1;r</surname> <given-names>P.</given-names></name> <name><surname>Girshick</surname> <given-names>R.</given-names></name> <name><surname>He</surname> <given-names>K.</given-names></name> <name><surname>Hariharan</surname> <given-names>B.</given-names></name> <name><surname>Belongie</surname> <given-names>S.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x0201C;Feature pyramid networks for object detection,&#x0201D;</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Honolulu, HI</publisher-loc>), <fpage>2117</fpage>&#x02013;<lpage>2125</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2017.106</pub-id></citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name></person-group> (<year>2021</year>). <article-title>Plant diseases and pests detection based on deep learning: a review</article-title>. <source>Plant Methods</source> <volume>17</volume>, <fpage>1</fpage>&#x02013;<lpage>18</lpage>. <pub-id pub-id-type="doi">10.1186/s13007-021-00722-9</pub-id><pub-id pub-id-type="pmid">33627131</pub-id></citation></ref>
<ref id="B41">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Long</surname> <given-names>J.</given-names></name> <name><surname>Shelhamer</surname> <given-names>E.</given-names></name> <name><surname>Darrell</surname> <given-names>T.</given-names></name></person-group> (<year>2015</year>). <article-title>&#x0201C;Fully convolutional networks for semantic segmentation,&#x0201D;</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Boston, MA</publisher-loc>), <fpage>3431</fpage>&#x02013;<lpage>3440</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2015.7298965</pub-id></citation>
</ref>
<ref id="B42">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Nuthalapati</surname> <given-names>S. V.</given-names></name> <name><surname>Tunga</surname> <given-names>A.</given-names></name></person-group> (<year>2021</year>). <article-title>&#x0201C;Multi-domain few-shot learning and dataset for agricultural applications,&#x0201D;</article-title> in <source>Proceedings of the IEEE/CVF International Conference on Computer Vision</source> (<publisher-loc>Montreal, BC</publisher-loc>), <fpage>1399</fpage>&#x02013;<lpage>1408</lpage>. <pub-id pub-id-type="doi">10.1109/ICCVW54120.2021.00161</pub-id></citation>
</ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Oerke</surname> <given-names>E.-C.</given-names></name> <name><surname>Dehne</surname> <given-names>H.-W.</given-names></name></person-group> (<year>2004</year>). <article-title>Safeguarding production-losses in major crops and the role of crop protection</article-title>. <source>Crop Protect</source>. <volume>23</volume>, <fpage>275</fpage>&#x02013;<lpage>285</lpage>. <pub-id pub-id-type="doi">10.1016/j.cropro.2003.10.001</pub-id></citation>
</ref>
<ref id="B44">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Qi</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>R.</given-names></name> <name><surname>Li</surname> <given-names>C.</given-names></name> <name><surname>Mao</surname> <given-names>Y.</given-names></name></person-group> (<year>2022</year>). <article-title>Cross domain few-shot learning via meta adversarial training</article-title>. <source>arXiv preprint arXiv:2202.05713</source>. <pub-id pub-id-type="doi">10.48550/arXiv.2202.05713</pub-id></citation>
</ref>
<ref id="B45">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Redmon</surname> <given-names>J.</given-names></name> <name><surname>Farhadi</surname> <given-names>A.</given-names></name></person-group> (<year>2018</year>). <article-title>YOLOV3: an incremental improvement</article-title>. <source>arXiv preprint arXiv:1804.02767</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1804.02767</pub-id></citation>
</ref>
<ref id="B46">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ronneberger</surname> <given-names>O.</given-names></name> <name><surname>Fischer</surname> <given-names>P.</given-names></name> <name><surname>Brox</surname> <given-names>T.</given-names></name></person-group> (<year>2015</year>). <article-title>&#x0201C;U-Net: Convolutional networks for biomedical image segmentation,&#x0201D;</article-title> in <source>International Conference on Medical Image Computing and Computer-Assisted Intervention</source> (<publisher-loc>Munich</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>234</fpage>&#x02013;<lpage>241</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-24574-4_28</pub-id></citation>
</ref>
<ref id="B47">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sandler</surname> <given-names>M.</given-names></name> <name><surname>Howard</surname> <given-names>A.</given-names></name> <name><surname>Zhu</surname> <given-names>M.</given-names></name> <name><surname>Zhmoginov</surname> <given-names>A.</given-names></name> <name><surname>Chen</surname> <given-names>L.-C.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Mobilenetv2: inverted residuals and linear bottlenecks,&#x0201D;</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Salt Lake City, UT</publisher-loc>), <fpage>4510</fpage>&#x02013;<lpage>4520</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2018.00474</pub-id></citation>
</ref>
<ref id="B48">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Simonyan</surname> <given-names>K.</given-names></name> <name><surname>Zisserman</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <article-title>Very deep convolutional networks for large-scale image recognition</article-title>. <source>arXiv preprint arXiv:1409.1556</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1409.1556</pub-id></citation>
</ref>
<ref id="B49">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Singh</surname> <given-names>D.</given-names></name> <name><surname>Jain</surname> <given-names>N.</given-names></name> <name><surname>Jain</surname> <given-names>P.</given-names></name> <name><surname>Kayal</surname> <given-names>P.</given-names></name> <name><surname>Kumawat</surname> <given-names>S.</given-names></name> <name><surname>Batra</surname> <given-names>N.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Plantdoc: a dataset for visual plant disease detection,&#x0201D;</article-title> in <source>Proceedings of the 7th ACM IKDD CoDS and 25th COMAD</source> (<publisher-loc>Hyderabad</publisher-loc>), <fpage>249</fpage>&#x02013;<lpage>253</lpage>. <pub-id pub-id-type="doi">10.1145/3371158.3371196</pub-id></citation>
</ref>
<ref id="B50">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Snell</surname> <given-names>J.</given-names></name> <name><surname>Swersky</surname> <given-names>K.</given-names></name> <name><surname>Zemel</surname> <given-names>R. S.</given-names></name></person-group> (<year>2017</year>). <article-title>Prototypical networks for few-shot learning</article-title>. <source>arXiv preprint arXiv:1703.05175</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1703.05175</pub-id></citation>
</ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Strange</surname> <given-names>R. N.</given-names></name> <name><surname>Scott</surname> <given-names>P. R.</given-names></name></person-group> (<year>2005</year>). <article-title>Plant disease: a threat to global food security</article-title>. <source>Annu. Rev. Phytopathol</source>. <volume>43</volume>, <fpage>83</fpage>&#x02013;<lpage>116</lpage>. <pub-id pub-id-type="doi">10.1146/annurev.phyto.43.113004.133839</pub-id><pub-id pub-id-type="pmid">34588314</pub-id></citation></ref>
<ref id="B52">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sung</surname> <given-names>F.</given-names></name> <name><surname>Yang</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>L.</given-names></name> <name><surname>Xiang</surname> <given-names>T.</given-names></name> <name><surname>Torr</surname> <given-names>P. H.</given-names></name> <name><surname>Hospedales</surname> <given-names>T. M.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Learning to compare: relation network for few-shot learning,&#x0201D;</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Salt Lake City, UT</publisher-loc>), <fpage>1199</fpage>&#x02013;<lpage>1208</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2018.00131</pub-id></citation>
</ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vinyals</surname> <given-names>O.</given-names></name> <name><surname>Blundell</surname> <given-names>C.</given-names></name> <name><surname>Lillicrap</surname> <given-names>T.</given-names></name> <name><surname>Wierstra</surname> <given-names>D.</given-names></name> <name><surname>Kavukcuoglu</surname> <given-names>K.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Matching networks for one shot learning,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems</source> (<publisher-loc>Barcelona</publisher-loc>), <volume>29</volume>, <fpage>3630</fpage>&#x02013;<lpage>3638</lpage>.</citation>
</ref>
<ref id="B54">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>F.</given-names></name> <name><surname>Jiang</surname> <given-names>M.</given-names></name> <name><surname>Qian</surname> <given-names>C.</given-names></name> <name><surname>Yang</surname> <given-names>S.</given-names></name> <name><surname>Li</surname> <given-names>C.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>&#x0201C;Residual attention network for image classification,&#x0201D;</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Honolulu, HI</publisher-loc>), <fpage>3156</fpage>&#x02013;<lpage>3164</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2017.683</pub-id><pub-id pub-id-type="pmid">35931949</pub-id></citation></ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Yao</surname> <given-names>Q.</given-names></name> <name><surname>Kwok</surname> <given-names>J. T.</given-names></name> <name><surname>Ni</surname> <given-names>L. M.</given-names></name></person-group> (<year>2020</year>). <article-title>Generalizing from a few examples: a survey on few-shot learning</article-title>. <source>ACM Comput. Surveys</source> <volume>53</volume>, <fpage>1</fpage>&#x02013;<lpage>34</lpage>. <pub-id pub-id-type="doi">10.48550/arXiv.1904.05046</pub-id></citation>
</ref>
<ref id="B56">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Woo</surname> <given-names>S.</given-names></name> <name><surname>Park</surname> <given-names>J.</given-names></name> <name><surname>Lee</surname> <given-names>J.-Y.</given-names></name> <name><surname>Kweon</surname> <given-names>I. S.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;CBAM: Convolutional block attention module,&#x0201D;</article-title> in <source>Proceedings of the European Conference on Computer Vision (ECCV)</source> (<publisher-loc>Munich</publisher-loc>), <fpage>3</fpage>&#x02013;<lpage>19</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-01234-2_1</pub-id></citation>
</ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>J.</given-names></name> <name><surname>Guo</surname> <given-names>X.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Marinello</surname> <given-names>F.</given-names></name> <name><surname>Ercisli</surname> <given-names>S.</given-names></name> <name><surname>Zhang</surname> <given-names>Z.</given-names></name></person-group> (<year>2022</year>). <article-title>A survey of few-shot learning in smart agriculture: developments, applications, and challenges</article-title>. <source>Plant Methods</source> <volume>18</volume>, <fpage>1</fpage>&#x02013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1186/s13007-022-00866-2</pub-id><pub-id pub-id-type="pmid">35248105</pub-id></citation></ref>
<ref id="B58">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Patel</surname> <given-names>V. M.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Densely connected pyramid Dehazing network,&#x0201D;</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Salt Lake City, UT</publisher-loc>), <fpage>3194</fpage>&#x02013;<lpage>3203</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2018.00337</pub-id></citation>
</ref>
<ref id="B59">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Sindagi</surname> <given-names>V.</given-names></name> <name><surname>Patel</surname> <given-names>V. M.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Multi-scale single image Dehazing using perceptual pyramid deep network,&#x0201D;</article-title> in <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops</source> (<publisher-loc>Salt Lake City, UT</publisher-loc>), <fpage>902</fpage>&#x02013;<lpage>911</lpage>. <pub-id pub-id-type="doi">10.1109/CVPRW.2018.00135</pub-id></citation>
</ref>
<ref id="B60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhong</surname> <given-names>F.</given-names></name> <name><surname>Chen</surname> <given-names>Z.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Xia</surname> <given-names>F.</given-names></name></person-group> (<year>2020</year>). <article-title>Zero-and few-shot learning for diseases recognition of citrus aurantium l. using conditional adversarial autoencoders</article-title>. <source>Comput. Electron. Agric</source>. <volume>179</volume>:<fpage>105828</fpage>. <pub-id pub-id-type="doi">10.1016/j.compag.2020.105828</pub-id></citation>
</ref>
</ref-list>
</back>
</article>