<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Plant Sci.</journal-id>
<journal-title>Frontiers in Plant Science</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Plant Sci.</abbrev-journal-title>
<issn pub-type="epub">1664-462X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpls.2022.857104</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Plant Science</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>A Dataset for Forestry Pest Identification</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Liu</surname> <given-names>Bing</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1833085/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Liu</surname> <given-names>Luyang</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1435610/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Zhuo</surname> <given-names>Ran</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Chen</surname> <given-names>Weidong</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Duan</surname> <given-names>Rui</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Wang</surname> <given-names>Guishen</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1843634/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>School of Computer Science and Engineering, Changchun University of Technology</institution>, <addr-line>Changchun</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>College of Computer Science and Technology, Jilin University</institution>, <addr-line>Changchun</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Hanno Scharr, Helmholtz Association of German Research Centres (HZ), Germany</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Gilbert Yong San Lim, SingHealth, Singapore; Sai Vikas Desai, Indian Institute of Technology Hyderabad, India</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Guishen Wang <email>wangguishen&#x00040;ccut.edu.cn</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Technical Advances in Plant Science, a section of the journal Frontiers in Plant Science</p></fn></author-notes>
<pub-date pub-type="epub">
<day>14</day>
<month>07</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>13</volume>
<elocation-id>857104</elocation-id>
<history>
<date date-type="received">
<day>18</day>
<month>01</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>13</day>
<month>06</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2022 Liu, Liu, Zhuo, Chen, Duan and Wang.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Liu, Liu, Zhuo, Chen, Duan and Wang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license> </permissions>
<abstract>
<p>The identification of forest pests is of great significance to the prevention and control of the forest pests&#x00027; scale. However, existing datasets mainly focus on common objects, which limits the application of deep learning techniques in specific fields (such as agriculture). In this paper, we collected images of forestry pests and constructed a dataset for forestry pest identification, called Forestry Pest Dataset. The Forestry Pest Dataset contains 31 categories of pests and their different forms. We conduct several mainstream object detection experiments on this dataset. The experimental results show that the dataset achieves good performance on various models. We hope that our Forestry Pest Dataset will help researchers in the field of pest control and pest detection in the future.</p></abstract>
<kwd-group>
<kwd>forestry pest identification</kwd>
<kwd>deep learning</kwd>
<kwd>forestry pest dataset</kwd>
<kwd>object detection</kwd>
<kwd>transformer</kwd>
</kwd-group>
<counts>
<fig-count count="6"/>
<table-count count="8"/>
<equation-count count="4"/>
<ref-count count="38"/>
<page-count count="10"/>
<word-count count="6029"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>It is well known that the untimely control of pests will cause serious damage and loss of commercial crops (Estruch et al., <xref ref-type="bibr" rid="B7">1997</xref>). In recent years, the scope and extent of forestry pest events in China have increased dramatically, resulting in huge economic losses (Gandhi et al., <xref ref-type="bibr" rid="B10">2018</xref>; FAO, <xref ref-type="bibr" rid="B8">2020</xref>). The identification and detection of pests play a crucial role in agricultural pest control, providing a strong guarantee for crop yield growth and the agricultural economy (Fina et al., <xref ref-type="bibr" rid="B9">2013</xref>). Traditional forestry pest identification relies on a small number of forestry protection workers and insect researchers (Al-Hiary et al., <xref ref-type="bibr" rid="B2">2011</xref>), generally based on the appearance of insects, through manual inspection, visual inspection of insect wings, antennae, mouthparts, feet, etc. to complete the identification of insects, but Due to the wide variety of pests and the small differences between the species, this method has major defects in practice. With the development of machine learning and computer vision technology, automatic pest identification has received more and more attention.</p>
<p>Most of the early pest identification work was done by using a machine learning framework, which consists of two modules: (1) hand-made feature extractors based on GIST (Torralba et al., <xref ref-type="bibr" rid="B30">2003</xref>), Scale-Invariant Feature Transform(SIFT) (Lowe, <xref ref-type="bibr" rid="B25">2004</xref>), and (2) machine learning classifiers, including support vector machine (SVM) (Ahmed et al., <xref ref-type="bibr" rid="B1">2012</xref>) and k-nearest neighbor (KNN) (Li et al., <xref ref-type="bibr" rid="B18">2009</xref>) classifiers. The goodness of the hand-designed feature components will affect the accuracy of the model. If incomplete or incorrect features are extracted from pest images, subsequent classifiers may not be able to distinguish between similar pest species.</p>
<p>With the continuous development of science and technology, deep learning technology has become a research hot spot of artificial intelligence. Image recognition technology based on deep learning improves the efficiency and accuracy of recognition, shortens the recognition time, reduces the workload of staff greatly, and lowers the cost. At present, pest identification methods based on deep learning technology are becoming more and more mature, and the scope of the research includes crops, plants, and fruits (Li and Yang, <xref ref-type="bibr" rid="B19">2020</xref>; Liu and Wang, <xref ref-type="bibr" rid="B21">2020</xref>; Zhu J. et al., <xref ref-type="bibr" rid="B37">2021</xref>). However, the detection of forest pests faces many difficulties due to the lack of effective datasets. Some datasets are too small to meet the detection needs. Furthermore, most pest datasets are collected through traps or controlled laboratory environments, but they lack consideration of the real environment (Sun et al., <xref ref-type="bibr" rid="B28">2018</xref>; Hong et al., <xref ref-type="bibr" rid="B14">2021</xref>). Different species of pests may have a similar appearance. The same species may have different morphologies (nymphs, larvae, and adults) at different times (Wah et al., <xref ref-type="bibr" rid="B31">2011</xref>; Krause et al., <xref ref-type="bibr" rid="B15">2013</xref>; Maji et al., <xref ref-type="bibr" rid="B26">2013</xref>).</p>
<p>For solving the problems mentioned above, we proposed a new forestry pest dataset for the forestry pest identification task. We collected pest data by searching through Google search engine and major forestry control websites. After filtering, we collected 2,278 original pest images covering adults, larvae, nymphs, and eggs of various pests. To alleviate the problem of category imbalance and improve the performance of the dataset for generalization ability, we took data enhancement operations, After data enhancement operations, the total amount of data increased to 7,163. For our pest dataset, we invited three experts in the field to assist us in classifying pests with the help of authoritative websites. Under the premise of knowing the category, we use the LabelImg annotation tool to annotate the image.</p>
<p>Our dataset covers 31 common forestry pests. We collected the forms of pests in different periods in the real wild environment. It meets the basic requirements of forestry pest identification. <xref ref-type="fig" rid="F1">Figure 1</xref> shows some examples of the dataset. To explore the application value of our proposed dataset, we use popular object detection algorithms to test the dataset.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Sample images of Forestry Pest Dataset. <bold>(A)</bold> Drosicha contrahens; <bold>(B)</bold> Apriona germari; <bold>(C)</bold> Hyphantria cunea; <bold>(D)</bold> Micromelalopha troglodyta(Graeser); <bold>(E)</bold> Plagiodera versicolora(Laicharting); and <bold>(F)</bold> Hyphantria cunea larvae.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-857104-g0001.tif"/>
</fig>
<p>The contributions of this work are summarized as follows:</p>
<list list-type="simple">
<list-item><p>1) We construct a new forestry pest dataset for the target detection task.</p></list-item>
<list-item><p>2) We tested our dataset on several popular object detection models. The results indicate that the dataset is challenging and creates new research opportunities. We hope this work will help advance future research on related fundamental issues as well as forestry pests identification tasks.</p></list-item>
</list>
</sec>
<sec id="s2">
<title>2. Related Works</title>
<p>In this section, we introduce the related work of agricultural pest identification and review the existing data sets.</p>
<sec>
<title>Pest Identification of Agriculture</title>
<p>Pest identification helps researchers improve the quality and yield of agricultural products. Earlier pest identification models are mainly based on machine learning techniques. For example, Le-Qing and Zhen (<xref ref-type="bibr" rid="B16">2012</xref>) utilizes local average color features and SVM to diagnose 10 insect pests based on a dataset containing 579 samples. Fina et al. (<xref ref-type="bibr" rid="B9">2013</xref>) combined K-mean clustering algorithm with adaptive filter for crop pest identification. Zhang et al. (<xref ref-type="bibr" rid="B36">2013</xref>) designed a field pest identification system and their dataset comprises approximately 270 training samples. Ebrahimi et al. (<xref ref-type="bibr" rid="B6">2017</xref>) used a differential kernel function SVM method for classification and detection, but the evaluated dataset is small, containing just 100 samples. Wang et al. (<xref ref-type="bibr" rid="B34">2018</xref>) uses digital morphological features and K-means to segment pest images.The above traditional pest identification algorithms have been studied with good results, but all of them have limitations, and their detection performance depends on the performance of the pre-designed manual feature extractor and the selected classifier.</p>
<p>Convolutional neural network (CNN) has strong image feature learning capability, such as ResNet (He et al., <xref ref-type="bibr" rid="B11">2016</xref>) and GoogleNet (Szegedy et al., <xref ref-type="bibr" rid="B29">2015</xref>) can learn deep higher-order features from images and can automatically learn shape, color, and texture of complex images and other multi-level features, overcoming the traditional manually designed feature extractors&#x00027; limitations and subjectivity. It has obvious advantages in target detection, segmentation, classification of complex images, <italic>etc</italic>.</p>
<p>Liu and Wang (<xref ref-type="bibr" rid="B21">2020</xref>) constructed a tomato diseases and pests dataset and improved the YOLOV3 algorithm to detect tomato pests. Wang et al. (<xref ref-type="bibr" rid="B32">2020</xref>) introduced an attention mechanism in residual networks for improving the recognition accuracy of small targets. A two-stage aphid detector named Coarse-to-Fine Network (CFN) is proposed by Li et al. (<xref ref-type="bibr" rid="B17">2019</xref>) to detect aphids with different distributions. Zhu J. et al. (<xref ref-type="bibr" rid="B37">2021</xref>) uses super-resolution image enhancement technology and an improved YOLOv3 algorithm to detect black rot on grape leaves.</p>
<p>In general, CNN-based pest identification work can well avoid the limitations of traditional methods and improve the performance of pest identification. However, most target detection models have applied many hand-crafted components.To some extent, the parameters of the manual components increase the workload. To eliminate the impact of manual components on the model, researchers have considered using the versatile and powerful relational modeling capabilities of the transformer to replace the hand-crafted components. Carion et al. (<xref ref-type="bibr" rid="B4">2020</xref>) put forward the end-to-end object detection with transformers (DETR) by combining the convolutional neural network and the transformer, which built the first complete end-to-end target detection model and achieved highly competitive performance.</p>
</sec>
<sec>
<title>Related Datasets</title>
<p>At present, deep learning-based agricultural pest identification and classification is maturing. The research scope includes a variety of cash crops such as crops, vegetables, and fruits, and relevant datasets have also been constructed.</p>
<p>Wu et al. (<xref ref-type="bibr" rid="B35">2019</xref>) constructed the IP102 pest dataset, which covers more than 70,000 images of 102 common crop pests. Wang et al. (<xref ref-type="bibr" rid="B33">2021</xref>) constructed the Agripest field pest dataset, which includes more than 49,700 images of pests in 14 categories. Hong et al. (<xref ref-type="bibr" rid="B13">2020</xref>) constructed a moth dataset by pheromone traps, which were labeled with four classes, including three moth classes and an unknown class of non-target insects. As a result of data collection and labeling, a total of 1,142 images were obtained. Liu Z. et al. (<xref ref-type="bibr" rid="B24">2016</xref>) constructed a rice pest dataset. The data were collected from image search databases of Google, Naver, and FreshEye, including 12 typical species of paddy field pest insects with a total of over 5,000 images. He et al. (<xref ref-type="bibr" rid="B12">2019</xref>) designed an oilseed rape pest image database, including a total of 3,022 images with 12 typical oilseed rape pests. Lim et al. (<xref ref-type="bibr" rid="B20">2018</xref>) build an insects dataset by specimens and Internet. The dataset consists of about 29,000 image files for 30 classes. Baidu constructed a forestry pest dataset that includes over 2,000 images for 7 classes through the specimen and traps. Chen et al. (<xref ref-type="bibr" rid="B5">2019</xref>) build a garden pests datasets. The dataset consists of about 9,070 image files for 38 classes. Liu et al. (<xref ref-type="bibr" rid="B23">2022</xref>) constructed a representative dataset of forest pests classification, including 67 categories and 67,953 original images. However, so far, only the dataset of Liu et al. (<xref ref-type="bibr" rid="B23">2022</xref>) is available for the detection of forest pests.</p>
<p>In conclusion, the research on crop diseases and insect pests based on deep learning covers a wide range, but in forestry, the detection and control of forest diseases and insect pests is still a challenge.</p>
</sec>
</sec>
<sec id="s3">
<title>3. Our Forestry Pest Dataset</title>
<sec>
<title>Data Collection and Annotation</title>
<p>We collect and annotate the dataset with following four stages: 1) taxonomic system establishment, 2) image collection, 3) preliminary data filtering, 4) Data Augmentation, and 5) professional data annotation.</p>
<sec>
<title>Taxonomic System Establishment</title>
<p>We have established hierarchical classification criteria for the Forestry Pest Dataset. We asked three forestry experts to help us discuss common forest pest species. In addition, to better meet the needs of forest pest control, we use the larvae, eggs, and nymphs of each pest as subclasses, specifically, <italic>Sericinus montela</italic> and <italic>Sericinus montela(larvae)</italic> according to our The standards are divided into two categories. There are 31 classes finally obtained and they present a hierarchical structure as shown in <xref ref-type="fig" rid="F2">Figure 2</xref>.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>The classification structure of Forestry Pest Dataset.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-857104-g0002.tif"/>
</fig></sec>
<sec>
<title>Image Collection</title>
<p>We utilize the Internet and forestry pest databases as the main sources of dataset images. We use the Chinese and scientific names of pests to search and save on common image search engines and also search for their corresponding eggs, larvae, and other images. Afterward, we searched for corresponding images from specialized agricultural and forestry pest websites.</p>
</sec>
<sec>
<title>Preliminary Data Filtering</title>
<p>From candidate images obtained from various websites and databases, we organized four volunteers to manually screen images. With the assistance of forestry experts, volunteers removed invalid and duplicate images that did not contain pests and repaired damaged images. And establish the initial category information. Specifically, in the initial pest collection work, we collected according to 15 categories, the purpose of this is to enhance the balance of data in the next step. Finally, we obtained 2,278 original images.</p>
</sec>
<sec>
<title>Data Augmentation</title>
<p>To ensure the effectiveness of the model and improve the generalization ability of the dataset, we use 7 image enhancement techniques such as rotation, noise, and brightness transformation to expand our dataset. For the species with less data, we adopt 7 methods for augmentation, and our purpose is to balance the number of pest images for each category. <xref ref-type="fig" rid="F3">Figure 3</xref> shows some examples of data augmentation. At the same time, we extract subclasses such as eggs, larvae, and nymphs under each category to establish subclass information. Finally, we obtained a forestry pest dataset of 31 categories (including 16 sub-categories) with a total of 7,163 images. <xref ref-type="table" rid="T1">Table 1</xref> shows specific data for each category.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Example of image data enhancement method. The first row is the original image, and the second row corresponds to the enhanced image. <bold>(A)</bold> Original image, <bold>(B)</bold> Original image, <bold>(C)</bold> Original image, <bold>(D)</bold> Noise, <bold>(E)</bold> Brightness transformation, and <bold>(F)</bold> Rotation.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-857104-g0003.tif"/>
</fig>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Details of Forestry Pest Dataset.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Class index</bold></th>
<th valign="top" align="left"><bold>Pest</bold></th>
<th valign="top" align="center"><bold>Sample size</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">0</td>
<td valign="top" align="left"><italic>Drosicha contrahens (female)</italic></td>
<td valign="top" align="center">218</td>
</tr>
<tr>
<td valign="top" align="left">1</td>
<td valign="top" align="left"><italic>Drosicha contrahens (male)</italic></td>
<td valign="top" align="center">210</td>
</tr>
<tr>
<td valign="top" align="left">2</td>
<td valign="top" align="left"><italic>Chalcophora japonica</italic></td>
<td valign="top" align="center">158</td>
</tr>
<tr>
<td valign="top" align="left">3</td>
<td valign="top" align="left"><italic>Anoplophora chinensis</italic></td>
<td valign="top" align="center">426</td>
</tr>
<tr>
<td valign="top" align="left">4</td>
<td valign="top" align="left"><italic>Psacothea hilaris(Pascoe)</italic></td>
<td valign="top" align="center">218</td>
</tr>
<tr>
<td valign="top" align="left">5</td>
<td valign="top" align="left"><italic>Apriona germari(Hope)</italic></td>
<td valign="top" align="center">342</td>
</tr>
<tr>
<td valign="top" align="left">6</td>
<td valign="top" align="left"><italic>Monochamus alternatus</italic></td>
<td valign="top" align="center">184</td>
</tr>
<tr>
<td valign="top" align="left">7</td>
<td valign="top" align="left"><italic>Plagiodera versicolora(Laicharting)</italic></td>
<td valign="top" align="center">306</td>
</tr>
<tr>
<td valign="top" align="left">8</td>
<td valign="top" align="left"><italic>Latoia consocia(Walker)</italic></td>
<td valign="top" align="center">290</td>
</tr>
<tr>
<td valign="top" align="left">9</td>
<td valign="top" align="left"><italic>Hyphantria cunea</italic></td>
<td valign="top" align="center">303</td>
</tr>
<tr>
<td valign="top" align="left">10</td>
<td valign="top" align="left"><italic>Cnidocampa flavescens(Walker)</italic></td>
<td valign="top" align="center">290</td>
</tr>
<tr>
<td valign="top" align="left">11</td>
<td valign="top" align="left"><italic>Cnidocampa flavescens(Walker) (pupa)</italic></td>
<td valign="top" align="center">176</td>
</tr>
<tr>
<td valign="top" align="left">12</td>
<td valign="top" align="left"><italic>Erthesina fullo</italic></td>
<td valign="top" align="center">280</td>
</tr>
<tr>
<td valign="top" align="left">13</td>
<td valign="top" align="left"><italic>Erthesina fullo (nymph)</italic></td>
<td valign="top" align="center">156</td>
</tr>
<tr>
<td valign="top" align="left">14</td>
<td valign="top" align="left"><italic>Erthesina fullo (nymph 2)</italic></td>
<td valign="top" align="center">192</td>
</tr>
<tr>
<td valign="top" align="left">15</td>
<td valign="top" align="left"><italic>Spilarctia subcarnea(Walker)</italic></td>
<td valign="top" align="center">188</td>
</tr>
<tr>
<td valign="top" align="left">16</td>
<td valign="top" align="left"><italic>Psilogramma menephron</italic></td>
<td valign="top" align="center">218</td>
</tr>
<tr>
<td valign="top" align="left">17</td>
<td valign="top" align="left"><italic>Sericinus montela</italic></td>
<td valign="top" align="center">364</td>
</tr>
<tr>
<td valign="top" align="left">18</td>
<td valign="top" align="left"><italic>Sericinus montela (larvae)</italic></td>
<td valign="top" align="center">200</td>
</tr>
<tr>
<td valign="top" align="left">19</td>
<td valign="top" align="left"><italic>Clostera anachoreta</italic></td>
<td valign="top" align="center">294</td>
</tr>
<tr>
<td valign="top" align="left">20</td>
<td valign="top" align="left"><italic>Micromelalopha troglodyta(Graeser)</italic></td>
<td valign="top" align="center">238</td>
</tr>
<tr>
<td valign="top" align="left">21</td>
<td valign="top" align="left"><italic>Latoia consocia(Walker) (larvae)</italic></td>
<td valign="top" align="center">204</td>
</tr>
<tr>
<td valign="top" align="left">22</td>
<td valign="top" align="left"><italic>Plagiodera versicolora(Laicharting) (larvae)</italic></td>
<td valign="top" align="center">196</td>
</tr>
<tr>
<td valign="top" align="left">23</td>
<td valign="top" align="left"><italic>Plagiodera versicolora(Laicharting) (ovum)</italic></td>
<td valign="top" align="center">134</td>
</tr>
<tr>
<td valign="top" align="left">24</td>
<td valign="top" align="left"><italic>Spilarctia subcarnea(Walker) (larvae)</italic></td>
<td valign="top" align="center">186</td>
</tr>
<tr>
<td valign="top" align="left">25</td>
<td valign="top" align="left"><italic>Spilarctia subcarnea(Walker) (larvae 2)</italic></td>
<td valign="top" align="center">164</td>
</tr>
<tr>
<td valign="top" align="left">26</td>
<td valign="top" align="left"><italic>Psilogramma menephron (larvae)</italic></td>
<td valign="top" align="center">208</td>
</tr>
<tr>
<td valign="top" align="left">27</td>
<td valign="top" align="left"><italic>Cerambycidae (larvae)</italic></td>
<td valign="top" align="center">196</td>
</tr>
<tr>
<td valign="top" align="left">28</td>
<td valign="top" align="left"><italic>Micromelalopha troglodyta(Graeser) (larvae)</italic></td>
<td valign="top" align="center">226</td>
</tr>
<tr>
<td valign="top" align="left">29</td>
<td valign="top" align="left"><italic>Hyphantria cunea (larvae)</italic></td>
<td valign="top" align="center">224</td>
</tr>
<tr>
<td valign="top" align="left">30</td>
<td valign="top" align="left"><italic>Hyphantria cunea (pupa)</italic></td>
<td valign="top" align="center">174</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>Professional Data Annotation</title>
<p>For object detection tasks, annotation information is very important, which is related to the recognition accuracy of the model. The first is to classify the collected pests. In the image collection stage, we already have the initial classification information. On this basis, our three experts first need to independently determine whether the image conforms to the category. Uncertain images are eliminated by three experts. The location information of pests is also very important, which can help forestry protection workers better find the specific location of pests. On the premise of understanding the types of pests, we use the LabelImg tool to label the images, mainly labeling the types and locations of pests.</p>
<p>We recruited three volunteers to assist us in the annotation of the data. First, each volunteer will receive guidance and training from three forestry professionals to understand the basic characteristics of each type of pest. After that, we will train the three volunteers to use the LabelImg tool. Volunteers need to master the basic usage of LabelImg, including importing files and adding, modifying, and deleting annotation information. Experts will assist volunteers to annotate some images in the early stage, and then volunteers will independently complete subsequent image annotations. In the process of annotation, images that are difficult to identify or annotate will be resolved through consultation by three experts. After all image annotations are completed, volunteers use the annotation visualization to check whether there is any wrong or defective annotation information and submit it to experts for the final ruling.</p>
</sec>
</sec>
<sec>
<title>Dataset Split</title>
<p>Our Forestry Pest Dataset contains 7,163 images and 31 pest species. To ensure the training results, we randomly divide according to the following ratio: (Train: Val=9: 1): Test=9: 1. Specifically, the Forestry Pest Dataset is split into 5,801 training, 645 validation, and 717 testing images for the object detection task.</p>
</sec>
<sec>
<title>Comparison With Other Forestry Pest Datasets</title>
<p>In <xref ref-type="table" rid="T2">Table 2</xref>, we compare our dataset with some existing datasets related to forestry pest identification tasks. Sun et al. (<xref ref-type="bibr" rid="B28">2018</xref>) and Hong et al. (<xref ref-type="bibr" rid="B14">2021</xref>) created related datasets using pheromone trap collection, but their datasets only deal with specific species of pests. The forestry pest dataset proposed by Baidu is processed and collected in a controlled laboratory environment. Due to these limitations, these related datasets are difficult to apply to practical applications. Chen et al. (<xref ref-type="bibr" rid="B5">2019</xref>) and Liu et al. (<xref ref-type="bibr" rid="B23">2022</xref>) focus on the classification of forest pests. Their dataset is rich in pest species and has a sufficient number of samples, which has played a huge role in practical applications. However, they have not made relevant attempts on pest detection tasks, and the relevant datasets have not been published.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Comparison with existing forestry pest datasets.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Dataset</bold></th>
<th valign="top" align="center"><bold>Year</bold></th>
<th valign="top" align="center"><bold>Class</bold></th>
<th valign="top" align="center"><bold>Sample size</bold></th>
<th valign="top" align="center"><bold>Avg</bold></th>
<th valign="top" align="left"><bold>Public</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Sun et al. (<xref ref-type="bibr" rid="B28">2018</xref>)</td>
<td valign="top" align="center">2018</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">2,183</td>
<td valign="top" align="center">-</td>
<td valign="top" align="left">Y</td>
</tr>
<tr>
<td valign="top" align="left">BaiDu</td>
<td valign="top" align="center">2019</td>
<td valign="top" align="center">7</td>
<td valign="top" align="center">2,183</td>
<td valign="top" align="center">311</td>
<td valign="top" align="left">Y</td>
</tr>
<tr>
<td valign="top" align="left">Chen et al. (<xref ref-type="bibr" rid="B5">2019</xref>)</td>
<td valign="top" align="center">2019</td>
<td valign="top" align="center">38</td>
<td valign="top" align="center">9,072</td>
<td valign="top" align="center">238</td>
<td valign="top" align="left">N</td>
</tr>
<tr>
<td valign="top" align="left">Hong et al. (<xref ref-type="bibr" rid="B14">2021</xref>)</td>
<td valign="top" align="center">2021</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">-</td>
<td valign="top" align="left">N</td>
</tr>
<tr>
<td valign="top" align="left">Liu et al. (<xref ref-type="bibr" rid="B23">2022</xref>)</td>
<td valign="top" align="center">2022</td>
<td valign="top" align="center">67</td>
<td valign="top" align="center">67,953</td>
<td valign="top" align="center">1,014</td>
<td valign="top" align="left">N</td>
</tr>
<tr>
<td valign="top" align="left">Ours</td>
<td valign="top" align="center">2022</td>
<td valign="top" align="center">31</td>
<td valign="top" align="center">7,163</td>
<td valign="top" align="center">231</td>
<td valign="top" align="left">Y</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>The &#x0201C;Class&#x0201D; denotes the number of categories. The &#x0201C;Public&#x0201D; indicates if the dataset is open source and available. The &#x0201C;Y&#x0201D; and &#x0201C;N&#x0201D; denote &#x0201C;yes&#x0201D; and &#x0201C;no,&#x0201D; respectively. The &#x0201C;Avg&#x0201D; denotes average numbers of samples per class</italic>.</p>
</table-wrap-foot>
</table-wrap></sec>
<sec>
<title>Diversity and Difficulty</title>
<p>Pests with different life cycles have different degrees of damage to forestry, so we retained images of these different morphological pests during data collection and annotation. However, due to the small differences between classes (similar features) and large differences within classes (there are many stages in the life cycle) of pests, accurate classification of their features is a difficult task in detection tasks. In addition, the imbalanced data distribution brings challenges to the feature learning of the model, and the imbalanced data will cause the learning results of the model to be biased toward a relatively large number of classes.</p>
</sec>
</sec>
<sec id="s4">
<title>4. Experiment</title>
<p>To explore the application value of our proposed dataset, we evaluate several popular object detection algorithms on this dataset. Based on the two-stage approach of Faster RCNN (Ren et al., <xref ref-type="bibr" rid="B27">2015</xref>), they scan the feature maps for potential objects by sliding windows, then classify them and regress the corresponding coordinate information. YOLOV4 (Bochkovskiy et al., <xref ref-type="bibr" rid="B3">2020</xref>) and SSD (Liu W. et al., <xref ref-type="bibr" rid="B22">2016</xref>) based on one-stage methods directly regress category and location information. In addition, we also evaluate the transformer-based end-to-end object detection algorithm Deformable DETR (Zhu X. et al., <xref ref-type="bibr" rid="B38">2021</xref>).</p>
<sec>
<title>Experimental Settings</title>
<p>The framework used for this experiment is python3.8, torch1.9, cuda11.1. The experimental hardware is shown in <xref ref-type="table" rid="T3">Table 3</xref>.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Configuration of experimental environment.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Hardware</bold></th>
<th valign="top" align="left"><bold>Model</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">CPU</td>
<td valign="top" align="left">i7&#x02013;8,700</td>
</tr>
<tr>
<td valign="top" align="left">Memory</td>
<td valign="top" align="left">64GB</td>
</tr>
<tr>
<td valign="top" align="left">GPU</td>
<td valign="top" align="left">RTX 3,090 24GB</td>
</tr>
<tr>
<td valign="top" align="left">Hard disk</td>
<td valign="top" align="left">2.5TB</td>
</tr>
</tbody>
</table>
</table-wrap></sec>
<sec>
<title>Object Detection Algorithms</title>
<p>After the accumulation of R-CNN and Fast RCNN, Faster RCNN integrates feature extraction (feature extraction), proposal extraction, bounding box regression (rect refine), and classification into one network in structure, which greatly improves the comprehensive performance., especially in terms of detection speed. SSD is a single-stage target detection algorithm, which uses convolutional neural network for feature extraction, and takes different feature layers for detection output. SSD is a multi-scale detection method. Based on the original YOLO target detection architecture, the YOLOV4 algorithm adopts the best optimization strategy in the CNN field in recent years, and has different degrees of optimization in terms of data processing, backbone network, network training, activation function, loss function, etc., achieving the perfect balance of speed and precision. Based on DETR, Deformable DETR improves the calculation method of the attention mechanism through sparse sampling, reduces the amount of calculation, and greatly reduces the training time of the model while ensuring accuracy.</p>
</sec>
<sec>
<title>Parameters of Model Training</title>
<p>SSD, Faster RCNN, YOLOV4, and Deformable DETR initial model parameter settings are shown in <xref ref-type="table" rid="T4">Tables 4</xref>, <xref ref-type="table" rid="T5">5</xref>. To take into account the accuracy and training time, in the previous Deformable DETR model training process, the model reached convergence around 150 epoch, therefore, we chose 150 epoch, and Deformable DETR performed a learning rate decay every 40 epoch, so we chose 80 epoch as the intermediate result, Compare the performance of the four models on the dataset. At the same time, to maintain the consistency of the training cycle, we set the same epoch as Deformable DETR for the other three models.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>Model parameter settings of SSD, Faster RCNN, and YOLOV4.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Name</bold></th>
<th valign="top" align="center"><bold>Value</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Batch size</td>
<td valign="top" align="center">16</td>
</tr>
<tr>
<td valign="top" align="left">Epoch</td>
<td valign="top" align="center">150</td>
</tr>
<tr>
<td valign="top" align="left">Learn rate</td>
<td valign="top" align="center">0.0001</td>
</tr>
<tr>
<td valign="top" align="left">NMS</td>
<td valign="top" align="center">0.3</td>
</tr>
<tr>
<td valign="top" align="left">Match threshold</td>
<td valign="top" align="center">0.5</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption><p>Model parameter settings of Deformable DETR.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Name</bold></th>
<th valign="top" align="center"><bold>Value</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Batch size</td>
<td valign="top" align="center">2</td>
</tr>
<tr>
<td valign="top" align="left">Epoch</td>
<td valign="top" align="center">150</td>
</tr>
<tr>
<td valign="top" align="left">Learn rate</td>
<td valign="top" align="center">0.00002</td>
</tr>
</tbody>
</table>
</table-wrap></sec>
<sec>
<title>Evaluation Metrics</title>
<p>We use <italic>mAP</italic> and <italic>Recall</italic> as evalution metrics which are two widely used metrics in target detection. <italic>mAP</italic> and <italic>Recall</italic> are calculated as follows:</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>m</mml:mi><mml:mi>A</mml:mi><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mi>A</mml:mi><mml:msubsup><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E4"><label>(4)</label><mml:math id="M4"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>m</mml:mi><mml:mi>A</mml:mi><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>u</mml:mi><mml:mi>l</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mo>-</mml:mo><mml:mi>s</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>10</mml:mn></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munderover></mml:mstyle><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>5</mml:mn><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>05</mml:mn></mml:mrow><mml:mrow><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>95</mml:mn></mml:mrow></mml:munderover></mml:mstyle><mml:mi>m</mml:mi><mml:mi>A</mml:mi><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:munderover></mml:mstyle></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Where, <italic>TP</italic> is a positive sample predicted by the model as a positive class, <italic>FP</italic> is a negative sample predicted as a positive class by the model, <italic>FN</italic> is a negative class predicted by the model positive sample. Each class can calculate its Precision and Recall, and each class can get a PR curve, and the area under the curve is <italic>AP</italic>. <italic>mAP</italic><sub>&#x003B1;</sub> and <italic>mAP</italic><sub><italic>multi</italic>&#x02212;<italic>scale</italic></sub> are the average of all classes <italic>AP</italic> at different confidence levels &#x003B1; and different scales value.</p>
<p>In the MS COCO dataset, objects with an area less than 32&#x0002A;32 are considered small objects, while objects with an area greater than 32&#x0002A;32 and less than 96&#x0002A;96 are considered medium objects.</p>
</sec>
<sec>
<title>Experimental Results</title>
<p>Average precision performance of object detection methods under different IoU thresholds. The results are shown in <xref ref-type="table" rid="T6">Table 6</xref>.</p>
<table-wrap position="float" id="T6">
<label>Table 6</label>
<caption><p>mAP<sub>&#x003B1;</sub> values of different models on Forestry Pest Dataset.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Model</bold></th>
<th valign="top" align="center"><bold>Epoch</bold></th>
<th valign="top" align="center"><bold><italic>mAP</italic><sub>0.5</sub></bold></th>
<th valign="top" align="center"><bold><italic>mAP</italic><sub>0.75</sub></bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">SSD</td>
<td valign="top" align="center">80</td>
<td valign="top" align="center">96.6</td>
<td valign="top" align="center">80.6</td>
</tr>
<tr>
<td valign="top" align="left">Faster RCNN</td>
<td valign="top" align="center">80</td>
<td valign="top" align="center">96.8</td>
<td valign="top" align="center">83.6</td>
</tr>
<tr>
<td valign="top" align="left">YOLOV4</td>
<td valign="top" align="center">80</td>
<td valign="top" align="center">98.8</td>
<td valign="top" align="center">70.2</td>
</tr>
<tr>
<td valign="top" align="left">Deformable DETR</td>
<td valign="top" align="center">80</td>
<td valign="top" align="center">96.6</td>
<td valign="top" align="center">89.8</td>
</tr>
<tr style="border-top: thin solid #000000;">
<td valign="top" align="left">SSD</td>
<td valign="top" align="center">150</td>
<td valign="top" align="center">98.1</td>
<td valign="top" align="center">91.1</td>
</tr>
<tr>
<td valign="top" align="left">Faster RCNN</td>
<td valign="top" align="center">150</td>
<td valign="top" align="center">97.5</td>
<td valign="top" align="center">85.2</td>
</tr>
<tr>
<td valign="top" align="left">YOLOV4</td>
<td valign="top" align="center">150</td>
<td valign="top" align="center">99.7</td>
<td valign="top" align="center">88.3</td>
</tr>
<tr>
<td valign="top" align="left">Deformable DETR</td>
<td valign="top" align="center">150</td>
<td valign="top" align="center">97.1</td>
<td valign="top" align="center">90.4</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>From the experimental results in <xref ref-type="table" rid="T6">Table 6</xref>, it can be seen that the dataset in this paper has good accuracy on mainstream target detection models under short-time training. The recently proposed Deformable DETR can also be used on the dataset in this paper. Achieve roughly the same performance as SSD, Faster RCNN, and YOLOV4. An example of the detection of the model is shown in <xref ref-type="fig" rid="F4">Figure 4</xref>.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Sample detection results on adults and larvae. From left to right are <bold>(A,E)</bold> SSD, <bold>(B,F)</bold> Faster RCNN, <bold>(C,G)</bold> YOLOV4, and <bold>(D,H)</bold> Deformable DETR.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-857104-g0004.tif"/>
</fig>
<p>From the above results, Deformable DETR based on Transformer architecture does not perform as well as YOLOV4 or even Faster RCNN in some cases. Based on our analysis, there are the following reasons.</p>
<list list-type="simple">
<list-item><p>1) Deformable DETR has no prior information. Whether it is YOLOV4 or Faster RCNN, they all have a part of prior information input, such as the clustering results of the coordinate information of the dataset, which can help the model find the target faster.</p></list-item>
<list-item><p>2) Although the attention mechanism calculation of Deformable DETR has been improved, its essence is still based on pixel calculation, which leads to a huge amount of calculation for high-resolution images. Deformable DETR does not have a feature fusion module similar to YOLOV4, which is detrimental to the detection of small objects.</p></list-item>
<list-item><p>3) Deformable DETR uses the Hungarian matching algorithm to match the prediction and ground truth, which cannot guarantee the convergence and accuracy of the model to a certain extent.</p></list-item>
</list>
</sec>
<sec>
<title>Confusion Matrix</title>
<p>The confusion matrix in target detection is very similar to that in classification, but the difference is that the object of the classification task is a picture, while the detection task is different. It includes two tasks of positioning and classification, and the object is each target in the picture. Therefore, to be able to draw positive and negative examples in the confusion matrix, it is necessary to distinguish which results are correct and which are wrong in the detection results. At the same time, the detection of errors also needs to be classified into different error categories. How to judge whether a detection result is correct, the most common way at present is to calculate the IOU of the detection frame and the real frame, and then judge whether the two frames match according to the IOU. For some targets below the threshold or not detected, they will be considered as the background class. The confusion matrix results of the model on the test set are shown in <xref ref-type="fig" rid="F5">Figure 5</xref>.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Confusion matrix of the mo del on the test set. Epoch=150, Iou-threshold=0.5, Index=31 means background. <bold>(A)</bold> SSD, <bold>(B)</bold> Faster RCNN, <bold>(C)</bold> YOLOV4, and <bold>(D)</bold> Deformable DETR.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-857104-g0005.tif"/>
</fig></sec>
<sec>
<title>Case Study: Experiment on Large, Medium, and Small Targets</title>
<p>Small targets have always been a difficult task in target detection due to their small size and lack of feature information. In the field of forest pest detection, the detection of small targets is also a difficult task due to the real complexity. Our dataset contains small objects such as larvae and eggs. We also consider the model&#x00027;s ability to detect small objects in our dataset. The results are shown in <xref ref-type="table" rid="T7">Tables 7</xref>, <xref ref-type="table" rid="T8">8</xref>. The detection example of each model on small targets is shown in <xref ref-type="fig" rid="F6">Figure 6</xref>.</p>
<table-wrap position="float" id="T7">
<label>Table 7</label>
<caption><p>mAP<sub>multi&#x02212;scale</sub> values of multi-scale results achieved by different models on Forestry Pest Dataset.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Model</bold></th>
<th valign="top" align="center"><bold>Epoch</bold></th>
<th valign="top" align="center"><bold>mAP<sub><italic>small</italic></sub></bold></th>
<th valign="top" align="center"><bold><italic>mAP</italic><sub><italic>medium</italic></sub></bold></th>
<th valign="top" align="center"><bold><italic>mAP</italic><sub><italic>large</italic></sub></bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">SSD</td>
<td valign="top" align="center">80</td>
<td valign="top" align="center">27.4</td>
<td valign="top" align="center">53.5</td>
<td valign="top" align="center">72.7</td>
</tr>
<tr>
<td valign="top" align="left">Faster RCNN</td>
<td valign="top" align="center">80</td>
<td valign="top" align="center">14.2</td>
<td valign="top" align="center">49.0</td>
<td valign="top" align="center">74.0</td>
</tr>
<tr>
<td valign="top" align="left">YOLOV4</td>
<td valign="top" align="center">80</td>
<td valign="top" align="center">49.4</td>
<td valign="top" align="center">57.0</td>
<td valign="top" align="center">62.2</td>
</tr>
<tr>
<td valign="top" align="left">Deformable DETR</td>
<td valign="top" align="center">80</td>
<td valign="top" align="center">28.0</td>
<td valign="top" align="center">61.6</td>
<td valign="top" align="center">87.1</td>
</tr>
<tr style="border-top: thin solid #000000;">
<td valign="top" align="left">SSD</td>
<td valign="top" align="center">150</td>
<td valign="top" align="center">35.2</td>
<td valign="top" align="center">65.4</td>
<td valign="top" align="center">84.7</td>
</tr>
<tr>
<td valign="top" align="left">Faster RCNN</td>
<td valign="top" align="center">150</td>
<td valign="top" align="center">30.0</td>
<td valign="top" align="center">48.9</td>
<td valign="top" align="center">76.5</td>
</tr>
<tr>
<td valign="top" align="left">YOLOV4</td>
<td valign="top" align="center">150</td>
<td valign="top" align="center">56.2</td>
<td valign="top" align="center">63.1</td>
<td valign="top" align="center">73.2</td>
</tr>
<tr>
<td valign="top" align="left">Deformable DETR</td>
<td valign="top" align="center">150</td>
<td valign="top" align="center">30.3</td>
<td valign="top" align="center">63.8</td>
<td valign="top" align="center">87.7</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="T8">
<label>Table 8</label>
<caption><p>Recall<sub>multi&#x02212;scale</sub> values of multi-scale results achieved by different models on Forestry Pest Dataset.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Model</bold></th>
<th valign="top" align="center"><bold>Epoch</bold></th>
<th valign="top" align="center"><bold><italic>Recall</italic><sub><italic>small</italic></sub></bold></th>
<th valign="top" align="center"><bold><italic>Recall</italic><sub><italic>medium</italic></sub></bold></th>
<th valign="top" align="center"><bold><italic>Recall</italic><sub><italic>large</italic></sub></bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">SSD</td>
<td valign="top" align="center">80</td>
<td valign="top" align="center">41.2</td>
<td valign="top" align="center">61.5</td>
<td valign="top" align="center">77.1</td>
</tr>
<tr>
<td valign="top" align="left">Faster RCNN</td>
<td valign="top" align="center">80</td>
<td valign="top" align="center">23.8</td>
<td valign="top" align="center">55.8</td>
<td valign="top" align="center">78.1</td>
</tr>
<tr>
<td valign="top" align="left">YOLOV4</td>
<td valign="top" align="center">80</td>
<td valign="top" align="center">53.0</td>
<td valign="top" align="center">61.0</td>
<td valign="top" align="center">67.7</td>
</tr>
<tr>
<td valign="top" align="left">Deformable DETR</td>
<td valign="top" align="center">80</td>
<td valign="top" align="center">31.4</td>
<td valign="top" align="center">68.8</td>
<td valign="top" align="center">91.3</td>
</tr>
<tr style="border-top: thin solid #000000;">
<td valign="top" align="left">SSD</td>
<td valign="top" align="center">150</td>
<td valign="top" align="center">44.9</td>
<td valign="top" align="center">69.6</td>
<td valign="top" align="center">87.4</td>
</tr>
<tr>
<td valign="top" align="left">Faster RCNN</td>
<td valign="top" align="center">150</td>
<td valign="top" align="center">38.8</td>
<td valign="top" align="center">54.7</td>
<td valign="top" align="center">80.0</td>
</tr>
<tr>
<td valign="top" align="left">YOLOV4</td>
<td valign="top" align="center">150</td>
<td valign="top" align="center">60.2</td>
<td valign="top" align="center">67.5</td>
<td valign="top" align="center">77.0</td>
</tr>
<tr>
<td valign="top" align="left">Deformable DETR</td>
<td valign="top" align="center">150</td>
<td valign="top" align="center">34.3</td>
<td valign="top" align="center">71.1</td>
<td valign="top" align="center">91.6</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Small sample detection results on our Forestry Pest Dataset. <bold>(A)</bold> SSD, <bold>(B)</bold> Faster RCNN, <bold>(C)</bold> YOLOV4, and <bold>(D)</bold> Deformable DETR.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpls-13-857104-g0006.tif"/>
</fig>
<p>As can be seen from the above table, YOLOV4 significantly leads the rest of the models in the detection of small targets, thanks to its powerful network structure and feature fusion, Deformable DETR is based on the attention mechanism of pixel-level computing, and the detection of small targets is not very friendly.</p>
</sec>
</sec>
<sec id="s5">
<title>5. Conclusion and Future Directions</title>
<sec>
<title>Conclusion</title>
<p>In this work, we collect a dataset, for forest insect pest recognition, including over 7,100 images of 31 classes. Compared with previous datasets, our dataset focuses on a variety of forestry pests, meets the detection needs of both real and experimental environments, and also includes pest forms in different periods, which some previous forestry pest datasets neglected. Meanwhile, we also evaluate some state-of-the-art recognition methods on our dataset. Exceptionally, this dataset has received good feedback on some mainstream object detection algorithms. However, in the detection of small objects, the existing deep learning methods cannot achieve the desired accuracy. Inspired by the success of the application in computer vision of the Transformer model, we also introduced the Transformer model to solve the forestry pest identification problem. We hope this work will help advance future research on related fundamental issues as well as forestry pests identification tasks.</p>
</sec>
<sec>
<title>Future Directions</title>
<p>To better promote the development of forestry pest identification, we will continue to collect forestry pest data and expand the dataset to 99 categories. For pests that have occurred or diseases caused by pests, there is a lack of relevant data sets and research support. In response, we will collect images of diseases caused by insect pests.</p>
<p>Although the existing deep learning models have achieved good results in forest pest identification, small target recognition is still a challenge. We will optimize and improve the model in the follow-up to further improve the model&#x00027;s ability to detect small targets.</p>
</sec>
</sec>
<sec sec-type="data-availability" id="s6">
<title>Data Availability Statement</title>
<p>The datasets for this study can be found in the <ext-link ext-link-type="uri" xlink:href="https://drive.google.com/drive/folders/1WnNDLEZCNpXKwJzjnJsQKSAYKljIIRCH?usp=sharing">https://drive.google.com/drive/folders/1WnNDLEZCNpXKwJzjnJsQKSAYKljIIRCH?usp=sharing</ext-link>.</p>
</sec>
<sec id="s7">
<title>Author Contributions</title>
<p>GW designed research and revised the manuscript. LL and BL conducted experiments, data analysis, and wrote the manuscript. RZ collected pest data. WC and RD revised the paper. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec sec-type="funding-information" id="s8">
<title>Funding</title>
<p>The research is supported by the Educational Department of Jilin Province of China (Grant No. JJKH20210752KJ). The research is also supported by the project of research on independent experimental teaching mode of program design foundation based on competition to promote learning which is the Higher Education Research of Jilin Province of China (Grant No. JGJX2021D191).</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x00027;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ahmed</surname> <given-names>F.</given-names></name> <name><surname>Al-Mamun</surname> <given-names>H. A.</given-names></name> <name><surname>Bari</surname> <given-names>A. H.</given-names></name> <name><surname>Hossain</surname> <given-names>E.</given-names></name> <name><surname>Kwan</surname> <given-names>P.</given-names></name></person-group> (<year>2012</year>). <article-title>Classification of crops and weeds from digital images: a support vector machine approach</article-title>. <source>Crop Prot</source>. <volume>40</volume>, <fpage>98</fpage>&#x02013;<lpage>104</lpage>. <pub-id pub-id-type="doi">10.1016/j.cropro.2012.04.024</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Al-Hiary</surname> <given-names>H.</given-names></name> <name><surname>Bani-Ahmad</surname> <given-names>S.</given-names></name> <name><surname>Reyalat</surname> <given-names>M.</given-names></name> <name><surname>Braik</surname> <given-names>M.</given-names></name> <name><surname>Alrahamneh</surname> <given-names>Z.</given-names></name></person-group> (<year>2011</year>). <article-title>Fast and accurate detection and classification of plant diseases</article-title>. <source>Int. J. Comput. Appl</source>. <volume>17</volume>, <fpage>31</fpage>&#x02013;<lpage>38</lpage>. <pub-id pub-id-type="doi">10.5120/2183-2754</pub-id><pub-id pub-id-type="pmid">32124447</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bochkovskiy</surname> <given-names>A.</given-names></name> <name><surname>Wang</surname> <given-names>C.-Y.</given-names></name> <name><surname>Liao</surname> <given-names>H.-Y. M.</given-names></name></person-group> (<year>2020</year>). <article-title>Yolov4: optimal speed and accuracy of object detection</article-title>. <source>arXiv preprint arXiv:2004.10934</source>. <pub-id pub-id-type="doi">10.48550/arXiv.2004.10934</pub-id><pub-id pub-id-type="pmid">34300543</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Carion</surname> <given-names>N.</given-names></name> <name><surname>Massa</surname> <given-names>F.</given-names></name> <name><surname>Synnaeve</surname> <given-names>G.</given-names></name> <name><surname>Usunier</surname> <given-names>N.</given-names></name> <name><surname>Kirillov</surname> <given-names>A.</given-names></name> <name><surname>Zagoruyko</surname> <given-names>S.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;End-to-end object detection with transformers,&#x0201D;</article-title> in <source>European Conference on Computer Vision</source> (<publisher-loc>Springer</publisher-loc>), <fpage>213</fpage>&#x02013;<lpage>229</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-58452-8_13</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>J.</given-names></name> <name><surname>Chen</surname> <given-names>L.</given-names></name> <name><surname>Wang</surname> <given-names>S.</given-names></name></person-group> (<year>2019</year>). <article-title>Pest image recognition of garden based on improved residual network</article-title>. <source>Trans. Chin. Soc. Agric. Machi</source> <volume>50</volume>, <fpage>187</fpage>&#x02013;<lpage>195</lpage>. <pub-id pub-id-type="doi">10.6041/j.issn.1000-1298.2019.05.022</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ebrahimi</surname> <given-names>M.</given-names></name> <name><surname>Khoshtaghaza</surname> <given-names>M.-H.</given-names></name> <name><surname>Minaei</surname> <given-names>S.</given-names></name> <name><surname>Jamshidi</surname> <given-names>B.</given-names></name></person-group> (<year>2017</year>). <article-title>Vision-based pest detection based on svm classification method</article-title>. <source>Comput. Electron. Agric</source>. <volume>137</volume>, <fpage>52</fpage>&#x02013;<lpage>58</lpage>. <pub-id pub-id-type="doi">10.1016/j.compag.2017.03.016</pub-id><pub-id pub-id-type="pmid">29747429</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Estruch</surname> <given-names>J. J.</given-names></name> <name><surname>Carozzi</surname> <given-names>N. B.</given-names></name> <name><surname>Desai</surname> <given-names>N.</given-names></name> <name><surname>Duck</surname> <given-names>N. B.</given-names></name> <name><surname>Warren</surname> <given-names>G. W.</given-names></name> <name><surname>Koziel</surname> <given-names>M. G.</given-names></name></person-group> (<year>1997</year>). <article-title>Transgenic plants: an emerging approach to pest control</article-title>. <source>Nat. Biotechnol</source>. <volume>15</volume>, <fpage>137</fpage>&#x02013;<lpage>141</lpage>. <pub-id pub-id-type="doi">10.1038/nbt0297-137</pub-id><pub-id pub-id-type="pmid">9035137</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="web"><person-group person-group-type="author"><collab>FAO</collab></person-group> (<year>2020</year>). <source>New Standards to Curb the Global Spread of Plant Pests and Diseases</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.fao.org/news/story/en/item/1187738/icode/">https://www.fao.org/news/story/en/item/1187738/icode/</ext-link></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fina</surname> <given-names>F.</given-names></name> <name><surname>Birch</surname> <given-names>P.</given-names></name> <name><surname>Young</surname> <given-names>R.</given-names></name> <name><surname>Obu</surname> <given-names>J.</given-names></name> <name><surname>Faithpraise</surname> <given-names>B.</given-names></name> <name><surname>Chatwin</surname> <given-names>C.</given-names></name></person-group> (<year>2013</year>). <article-title>Automatic plant pest detection and recognition using k-means clustering algorithm and correspondence filters</article-title>. <source>Int. J. Adv. Biotechnol. Res</source>. <volume>4</volume>, <fpage>189</fpage>&#x02013;<lpage>199</lpage>.</citation>
</ref>
<ref id="B10">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gandhi</surname> <given-names>R.</given-names></name> <name><surname>Nimbalkar</surname> <given-names>S.</given-names></name> <name><surname>Yelamanchili</surname> <given-names>N.</given-names></name> <name><surname>Ponkshe</surname> <given-names>S.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Plant disease detection using cnns and gans as an augmentative approach,&#x0201D;</article-title> in <source>2018 IEEE International Conference on Innovative Research and Development (ICIRD)</source> (<publisher-loc>Bangkok</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>5</lpage>.</citation>
</ref>
<ref id="B11">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>He</surname> <given-names>K.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Ren</surname> <given-names>S.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Deep residual learning for image recognition,&#x0201D;</article-title> in <source>Proceedings of the IEEE conference on computer vision and pattern recognition</source> (<publisher-loc>Las Vegas, NV</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>770</fpage>&#x02013;<lpage>778</lpage>.<pub-id pub-id-type="pmid">32166560</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>He</surname> <given-names>Y.</given-names></name> <name><surname>Zeng</surname> <given-names>H.</given-names></name> <name><surname>Fan</surname> <given-names>Y.</given-names></name> <name><surname>Ji</surname> <given-names>S.</given-names></name> <name><surname>Wu</surname> <given-names>J.</given-names></name></person-group> (<year>2019</year>). <article-title>Application of deep learning in integrated pest management: a real-time system for detection and diagnosis of oilseed rape pests</article-title>. <source>Mobile Inf. Syst</source>. 2019, 4570808. <pub-id pub-id-type="doi">10.1155/2019/4570808</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hong</surname> <given-names>S.-J.</given-names></name> <name><surname>Kim</surname> <given-names>S.-Y.</given-names></name> <name><surname>Kim</surname> <given-names>E.</given-names></name> <name><surname>Lee</surname> <given-names>C.-H.</given-names></name> <name><surname>Lee</surname> <given-names>J.-S.</given-names></name> <name><surname>Lee</surname> <given-names>D.-S.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Moth detection from pheromone trap images using deep learning object detectors</article-title>. <source>Agriculture</source> <volume>10</volume>, <fpage>170</fpage>. <pub-id pub-id-type="doi">10.3390/agriculture10050170</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hong</surname> <given-names>S.-J.</given-names></name> <name><surname>Nam</surname> <given-names>I.</given-names></name> <name><surname>Kim</surname> <given-names>S.-Y.</given-names></name> <name><surname>Kim</surname> <given-names>E.</given-names></name> <name><surname>Lee</surname> <given-names>C.-H.</given-names></name> <name><surname>Ahn</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Automatic pest counting from pheromone trap images using deep learning object detectors for matsucoccus thunbergianae monitoring</article-title>. <source>Insects</source>. <volume>12</volume>, <fpage>342</fpage>. <pub-id pub-id-type="doi">10.3390/insects12040342</pub-id><pub-id pub-id-type="pmid">33921492</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Krause</surname> <given-names>J.</given-names></name> <name><surname>Stark</surname> <given-names>M.</given-names></name> <name><surname>Deng</surname> <given-names>J.</given-names></name> <name><surname>Fei-Fei</surname> <given-names>L.</given-names></name></person-group> (<year>2013</year>). <article-title>&#x0201C;3D object representations for fine-grained categorization,&#x0201D;</article-title> in <source>Proceedings of the IEEE International Conference on Computer Vision Workshops</source> (<publisher-loc>Sydney, NSW</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>554</fpage>&#x02013;<lpage>561</lpage>.</citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Le-Qing</surname> <given-names>Z.</given-names></name> <name><surname>Zhen</surname> <given-names>Z.</given-names></name></person-group> (<year>2012</year>). <article-title>Automatic insect classification based on local mean colour feature and supported vector machines</article-title>. <source>Orient Insects</source>. <volume>46</volume>, <fpage>260</fpage>&#x02013;<lpage>269</lpage>. <pub-id pub-id-type="doi">10.1080/00305316.2012.738142</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>R.</given-names></name> <name><surname>Wang</surname> <given-names>R.</given-names></name> <name><surname>Xie</surname> <given-names>C.</given-names></name> <name><surname>Liu</surname> <given-names>L.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>F.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>A coarse-to-fine network for aphid recognition and detection in the field</article-title>. <source>Biosyst. Eng</source>. <volume>187</volume>, <fpage>39</fpage>&#x02013;<lpage>52</lpage>. <pub-id pub-id-type="doi">10.1016/j.biosystemseng.2019.08.013</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>X.-L.</given-names></name> <name><surname>Huang</surname> <given-names>S.-G.</given-names></name> <name><surname>Zhou</surname> <given-names>M.-Q.</given-names></name> <name><surname>Geng</surname> <given-names>G.-H.</given-names></name></person-group> (<year>2009</year>). <article-title>&#x0201C;Knn-spectral regression lda for insect recognition,&#x0201D;</article-title> in <source>2009 First International Conference on Information Science and Engineering</source> (<publisher-loc>Nanjing</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1315</fpage>&#x02013;<lpage>1318</lpage>.<pub-id pub-id-type="pmid">25268913</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Yang</surname> <given-names>J.</given-names></name></person-group> (<year>2020</year>). <article-title>Few-shot cotton pest recognition and terminal realization</article-title>. <source>Comput. Electron. Agric</source>. 169, 105240. <pub-id pub-id-type="doi">10.1016/j.compag.2020.105240</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Lim</surname> <given-names>S.</given-names></name> <name><surname>Kim</surname> <given-names>S.</given-names></name> <name><surname>Park</surname> <given-names>S.</given-names></name> <name><surname>Kim</surname> <given-names>D.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Development of application for forest insect classification using cnn,&#x0201D;</article-title> in <source>2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV)</source> (<publisher-loc>Singapore</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1128</fpage>&#x02013;<lpage>1131</lpage>.</citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name></person-group> (<year>2020</year>). <article-title>Tomato diseases and pests detection based on improved yolo v3 convolutional neural network</article-title>. <source>Front. Plant Sci</source>. 11, 898. <pub-id pub-id-type="doi">10.3389/fpls.2020.00898</pub-id><pub-id pub-id-type="pmid">32612632</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>W.</given-names></name> <name><surname>Anguelov</surname> <given-names>D.</given-names></name> <name><surname>Erhan</surname> <given-names>D.</given-names></name> <name><surname>Szegedy</surname> <given-names>C.</given-names></name> <name><surname>Reed</surname> <given-names>S.</given-names></name> <name><surname>Fu</surname> <given-names>C.-Y.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>&#x0201C;Ssd: single shot multibox detector,&#x0201D;</article-title> in <source>European Conference on Computer Vision</source> (<publisher-loc>Amsterdam</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>2</fpage>&#x02013;<lpage>37</lpage>.</citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>S.</given-names></name> <name><surname>Xu</surname> <given-names>J.</given-names></name> <name><surname>Kong</surname> <given-names>X.</given-names></name> <name><surname>Xie</surname> <given-names>L.</given-names></name> <name><surname>Chen</surname> <given-names>K.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>Forest pest identification based on a new dataset and convolutional neural network model with enhancement strategy</article-title>. <source>Comput. Electron. Agric</source>. 192, 106625. <pub-id pub-id-type="doi">10.1016/j.compag.2021.106625</pub-id></citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Gao</surname> <given-names>J.</given-names></name> <name><surname>Yang</surname> <given-names>G.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>He</surname> <given-names>Y.</given-names></name></person-group> (<year>2016</year>). <article-title>Localization and classification of paddy field pests using a saliency map and deep convolutional neural network</article-title>. <source>Sci. Rep</source>. <volume>6</volume>, <fpage>1</fpage>&#x02013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1038/srep20410</pub-id><pub-id pub-id-type="pmid">26864172</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lowe</surname> <given-names>D. G..</given-names></name></person-group> (<year>2004</year>). <article-title>Distinctive image features from scale-invariant keypoints</article-title>. <source>Int. J. Comput. Vis</source>. <volume>60</volume>, <fpage>91</fpage>&#x02013;<lpage>110</lpage>. <pub-id pub-id-type="doi">10.1023/B:VISI.0000029664.99615.94</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Maji</surname> <given-names>S.</given-names></name> <name><surname>Rahtu</surname> <given-names>E.</given-names></name> <name><surname>Kannala</surname> <given-names>J.</given-names></name> <name><surname>Blaschko</surname> <given-names>M.</given-names></name> <name><surname>Vedaldi</surname> <given-names>A.</given-names></name></person-group> (<year>2013</year>). <article-title>Fine-grained visual classification of aircraft</article-title>. <source>arXiv preprint arXiv:1306.5151</source>. <pub-id pub-id-type="doi">10.48550/arXiv.1306.5151</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Ren</surname> <given-names>S.</given-names></name> <name><surname>He</surname> <given-names>K.</given-names></name> <name><surname>Girshick</surname> <given-names>R.</given-names></name> <name><surname>Sun</surname> <given-names>J.</given-names></name></person-group> (<year>2015</year>). <article-title>&#x0201C;Faster r-cnn: towards real-time object detection with region proposal networks,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems 28 (NIPS 2015)</source> (<publisher-loc>Montreal, QC</publisher-loc>), <fpage>28</fpage>.<pub-id pub-id-type="pmid">27295650</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>X.</given-names></name> <name><surname>Yuan</surname> <given-names>M.</given-names></name> <name><surname>Ren</surname> <given-names>L.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name> <name><surname>Chen</surname> <given-names>Z.</given-names></name></person-group> (<year>2018</year>). <article-title>Automatic in-trap pest detection using deep learning for pheromone-based dendroctonus valens monitoring</article-title>. <source>Biosyst. Eng</source>. <volume>176</volume>, <fpage>140</fpage>&#x02013;<lpage>150</lpage>. <pub-id pub-id-type="doi">10.1016/j.biosystemseng.2018.10.012</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Szegedy</surname> <given-names>C.</given-names></name> <name><surname>Liu</surname> <given-names>W.</given-names></name> <name><surname>Jia</surname> <given-names>Y.</given-names></name> <name><surname>Sermanet</surname> <given-names>P.</given-names></name> <name><surname>Reed</surname> <given-names>S.</given-names></name> <name><surname>Anguelov</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>&#x0201C;Going deeper with convolutions,&#x0201D;</article-title> in <source>Proceedings of the IEEE Conference on Computer vision and Pattern Recognition</source> (<publisher-loc>Boston, MA</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>9</lpage>.</citation>
</ref>
<ref id="B30">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Torralba</surname> <given-names>A.</given-names></name> <name><surname>Murphy</surname> <given-names>K. P.</given-names></name> <name><surname>Freeman</surname> <given-names>W. T.</given-names></name> <name><surname>Rubin</surname> <given-names>M. A.</given-names></name></person-group> (<year>2003</year>). <article-title>&#x0201C;Context-based vision system for place and object recognition,&#x0201D;</article-title> in <source>Computer Vision, IEEE International Conference on, Vol. 2</source> (<publisher-loc>Nice</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>273</fpage>&#x02013;<lpage>273</lpage>.</citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wah</surname> <given-names>C.</given-names></name> <name><surname>Branson</surname> <given-names>S.</given-names></name> <name><surname>Welinder</surname> <given-names>P.</given-names></name> <name><surname>Perona</surname> <given-names>P.</given-names></name> <name><surname>Belongie</surname> <given-names>S.</given-names></name></person-group> (<year>2011</year>). The caltech-ucsd birds-200-2011 dataset. California, CA.</citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>F.</given-names></name> <name><surname>Wang</surname> <given-names>R.</given-names></name> <name><surname>Xie</surname> <given-names>C.</given-names></name> <name><surname>Yang</surname> <given-names>P.</given-names></name> <name><surname>Liu</surname> <given-names>L.</given-names></name></person-group> (<year>2020</year>). <article-title>Fusing multi-scale context-aware information representation for automatic in-field pest detection and recognition</article-title>. <source>Comput. Electron. Agric</source>. 169, 105222. <pub-id pub-id-type="doi">10.1016/j.compag.2020.105222</pub-id></citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>R.</given-names></name> <name><surname>Liu</surname> <given-names>L.</given-names></name> <name><surname>Xie</surname> <given-names>C.</given-names></name> <name><surname>Yang</surname> <given-names>P.</given-names></name> <name><surname>Li</surname> <given-names>R.</given-names></name> <name><surname>Zhou</surname> <given-names>M.</given-names></name></person-group> (<year>2021</year>). <article-title>Agripest: A large-scale domain-specific benchmark dataset for practical agricultural pest detection in the wild</article-title>. <source>Sensors</source> <volume>21</volume>, <fpage>1601</fpage>. <pub-id pub-id-type="doi">10.3390/s21051601</pub-id><pub-id pub-id-type="pmid">33668820</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Z.</given-names></name> <name><surname>Wang</surname> <given-names>K.</given-names></name> <name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Pan</surname> <given-names>S.</given-names></name></person-group> (<year>2018</year>). <article-title>A cognitive vision method for insect pest image segmentation</article-title>. <source>IFAC PapersOnLine</source> <volume>51</volume>, <fpage>85</fpage>&#x02013;<lpage>89</lpage>. <pub-id pub-id-type="doi">10.1016/j.ifacol.2018.08.066</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>X.</given-names></name> <name><surname>Zhan</surname> <given-names>C.</given-names></name> <name><surname>Lai</surname> <given-names>Y.-K.</given-names></name> <name><surname>Cheng</surname> <given-names>M.-M.</given-names></name> <name><surname>Yang</surname> <given-names>J.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;Ip102: a large-scale benchmark dataset for insect pest recognition,&#x0201D;</article-title> in <source>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Long Beach, CA</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>8787</fpage>&#x02013;<lpage>8796</lpage>.</citation>
</ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>H. T.</given-names></name> <name><surname>Hu</surname> <given-names>Y. X.</given-names></name> <name><surname>Zhang</surname> <given-names>H. Y.</given-names></name></person-group> (<year>2013</year>). <article-title>Extraction and classifier design for image recognition of insect pests on field crops</article-title>. <source>Adv. Mater. Res</source>. <volume>756</volume>, <fpage>4063</fpage>&#x02013;<lpage>4067</lpage>. <pub-id pub-id-type="doi">10.4028/www.scientific.net/AMR.756-759.4063</pub-id></citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>J.</given-names></name> <name><surname>Cheng</surname> <given-names>M.</given-names></name> <name><surname>Wang</surname> <given-names>Q.</given-names></name> <name><surname>Yuan</surname> <given-names>H.</given-names></name> <name><surname>Cai</surname> <given-names>Z.</given-names></name></person-group> (<year>2021</year>). <article-title>Grape leaf black rot detection based on super-resolution image enhancement and deep learning</article-title>. <source>Front. Plant Sci</source>. 12, 695749. <pub-id pub-id-type="doi">10.3389/fpls.2021.695749</pub-id><pub-id pub-id-type="pmid">34267773</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>X.</given-names></name> <name><surname>Su</surname> <given-names>W.</given-names></name> <name><surname>Lu</surname> <given-names>L.</given-names></name> <name><surname>Li</surname> <given-names>B.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Dai</surname> <given-names>J.</given-names></name></person-group> (<year>2021</year>). <article-title>&#x0201C;Deformable {detr}: Deformable transformers for end-to-end object detection,&#x0201D;</article-title> in <source>International Conference on Learning Representations</source> (<publisher-loc>Vienna</publisher-loc>).</citation>
</ref>
</ref-list>
</back>
</article>