Selection of Reference Genes for the Normalization of RT-qPCR Data in Gene Expression Studies in Insects: A Systematic Review

Reverse transcriptase-quantitative polymerase chain reaction (RT-qPCR) is a reliable technique for quantifying expression levels of targeted genes during various biological processes in numerous areas of clinical and biological research. Selection of appropriate reference genes for RT-qPCR normalization is an elementary prerequisite for reliable measurements of gene expression levels. Here, by analyzing datasets published between 2008 and 2017, we summarized the current trends in reference gene selection for insect gene expression studies that employed the most widely used SYBR Green method for RT-qPCR normalization. We curated 90 representative papers, mainly published in 2013–2017, in which a total of 78 insect species were investigated in 100 experiments. Furthermore, top five journals, top 10 frequently used reference genes, and top 10 experimental factors have been determined. The relationships between the numbers of the reference genes, experimental factors, analysis tools on the one hand and publication date (year) on the other hand was investigated by linear regression. We found that the more recently the paper was published, the more experimental factors it tended to explore, and more analysis tools it used. However, linear regression analysis did not reveal a significant correlation between the number of reference genes and the study publication date. Taken together, this meta-analysis will be of great help to researchers that plan gene expression studies in insects, especially the non-model ones, as it provides a summary of appropriate reference genes for expression studies, considers the optimal number of reference genes, and reviews the average number of experimental factors and analysis tools per study.


INTRODUCTION
Reverse transcriptase-quantitative polymerase chain reaction (RT-qPCR) is a premier molecular biology tool and a powerful method for quantification of gene expression levels in real-time (Vandesompele et al., 2002). Although RT-qPCR is one of the most efficient, reliable, and reproducible techniques to quantify gene expression, multiple factors, including the quality and integrity of RNA samples, efficiency of cDNA synthesis, and PCR efficiency, can significantly influence signal normalization (Bustin et al., 2005;Strube et al., 2008). RT-qPCR generally involves normalization of expression levels of multiple genes to the expression levels of a suite of stable reference genes. Even though reference gene transcript levels should ideally be stable across a range of different conditions, previous studies have shown that expression of many commonly used reference genes differs dramatically under different treatment conditions (Kalushkov and Hodek, 2004;Bustin et al., 2013). It is clear that the expression level of many reference genes is condition-specific and accordingly, there is no universal gene that can be used for internal control for all application scenarios, strongly indicating the necessity of conducting custom reference gene selection for RT-qPCR analyses on a case-by-case basis, even for the same species.
Over the last 10 years, RT-qPCR has been increasingly used in genome/transcriptome expression studies in insect species. Furthermore, considerable advancements have been made for identification and validation of appropriate reference genes across various biotic and abiotic experimental conditions in many insect species (Table 1). In RT-qPCR experiments, SYBR Green and TaqMan probes have been the two most frequently used methodologies, with the SYBR Green method being utilized much more frequently. Here, we have summarized only the studies that used the SYBR Green method. It is well known that characterization of reference genes is an onerous task requiring well-designed molecular experiments followed by elaborate computational analyses (Andersen et al., 2004;Pfaffl et al., 2004). Therefore, a comprehensive summary of published sets of experimentally validated reference genes in conjunction with the description of relevant experimental conditions and analysis tools would be timely (Sang et al., 2017).
In order to fill this gap and provide molecular biologists with informative guidance on selecting the reference genes to customize their RT-qPCR experiments, this present review summarizes the current trends in reference gene selection for RT-qPCR normalization in gene expression studies performed on insects between 2008 and 2017 (Table 1). Specifically, the insect species, reference genes, experimental conditions, analysis tools, and publication year have been summarized. Furthermore, the relationships between the numbers of the reference genes, experimental factors, analysis tools, and publication date (year) were investigated by linear regression. We hoped that our metaanalysis would be of great help for researchers that plan gene expression studies in insects, especially the non-model ones, as it provides a summary of appropriate reference genes for expression studies, considers the optimal number of reference genes, and reviews average numbers of experimental factors and analysis tools per study.

NUMBER OF RELEVANT STUDIES IN INSECTS THAT UTILIZED EXPRESSION LEVELS OF REFERENCE GENES FOR NORMALIZATION OF RT-QPCR DATA
The relevant publications that analyzed reference gene expression in insects in 2008-2017 are summarized in Table 1. All data were extracted from databases such as https://www.ncbi.nlm.nih.gov/ pubmed, https://scholar.google.com/, https://link.springer.com/, http://onlinelibrary.wiley.com/, and https://www.sciencedirect. com/ using the following search terms: ("internal control genes" OR "reference genes" OR "housekeeping genes") AND ("qPCR" OR "quantitative PCR" OR "qRT-PCR" OR "RT-qPCR") occurring in the Title/Abstract. Additionally, we also curated relevant papers that came to our attention independently but were not uncovered by the above search algorithm. We found and curated 90 representative papers published in 36 journals. The top five journals by the number of published studies on gene expression in insects were PLoS One (26/90), Scientific Reports (9/90), Journal of Economic Entomology (6/90), Journal of Insect Science (5/90), and BMC Research Notes (4/90; Table 1). These papers were mainly published between 2013 and 2017 with an average of 14 papers published over the last 5 years ( Figure 1A). We can clearly see that open access journals provide the main platform for publications on this topic.

NUMBER OF INSECT SPECIES THAT WERE ANALYZED FOR EXPRESSION OF REFERENCE GENES
The 90 reviewed papers reported results of gene expression studies in 78 insect species in 100 separate experiments (Table 1). These insects were from 10 insect orders ( Figure 1B). They predominantly belonged to the following four insect orders: Hemiptera (25 insect species), Lepidoptera (16 insect species), Coleoptera (12 insect species), and Diptera (13 insect species; Figure 1B). Some insects, such as Bemisia tabaci Su et al., 2013;Collins et al., 2014;Liang et al., 2014;Dai et al., 2017;Lü et al., 2017) and Helicoverpa armigera (Chandra et al., 2014;Shakeel et al., 2015;Zhang et al., 2015), which cause serious damage to crops, were investigated extensively and frequently. There were six and three papers, respectively, for the above-mentioned species that analyzed expression levels of reference genes and were published during the last 5 years.

DISTRIBUTION OF THE NUMBER OF REFERENCE GENES PER STUDY
In the 90 papers, 3-21 reference genes were investigated per single study (Figure 2). In the majority of studies, the expression level of 5-10 reference genes was determined (Figure 2A). The breakdown of the papers that analyzed expression of multiple reference genes was as follows: five genes (10%), six genes (16%), seven genes (14%), eight genes (15%), nine genes (14%), and ten genes (10%). Recently, in some studies, more than 10 candidate reference genes were analyzed to provide more choices for expression level comparisons and normalization (Table 1). However, linear regression analysis did not reveal a significant correlation between the number of reference genes used in the study and its publication date (year; Figure 2B).  Frontiers in Physiology | www.frontiersin.org

TOP 10 REFERENCE GENES
In the set of curated 90 papers, the expression level of reference genes was determined for 841 times. The number of experiments that utilized top 10 most frequently used reference genes, including Actin, RPL, Tubulin, GAPDH, RPS, 18S, EF1A, TATA, HSP, and SDHA, are shown in Figure 3. Actin, which encodes a major structural protein, is expressed at various levels in many cell types. It is considered the ideal reference gene for RT-qPCR analysis and has been investigated most frequently (Figure 3). For example, previous studies have shown that the expression of Actin was the most stable among other reference genes across different developmental stages of many insects, including Apis mellifera, Schistocerca gregaria, Drosophila melanogaster, Plutella xylostella, Chilo suppressalis, Chortoicetes terminifera, Liriomyza trifolii, and Diuraphis noxia (Scharlaken et al., 2008;Van Hiel et al., 2009;Chapuis et al., 2011;Ponton et al., 2011;Teng et al., 2012;Sinha and Smith, 2014;Chang et al., 2017). Nonetheless, the expression of Actin was less stable in several insects, including those of the species, Coleomegilla maculata, Coccinella septempunctata, and Hippodamia convergens of the family Coccinellidae Yang et al., 2015cYang et al., , 2016. Ribosomal protein (RP), a principal component of ribosomes, is among the most highly conserved proteins across all life forms. The fraction of studies in which the expression level of RPL and RPS family genes was used as reference was 18.55%. Together, these genes were the most widely selected reference genes for expression studies in insects during the past 10 years. In most of these studies, RP-encoding genes were stable reference genes. For example, RPS24 and RPS18 were stable reference genes across different developmental stages and sex treatments of C. maculata ; RPS13 and RPS23 were stable reference genes across different developmental stages of P. xylostella (Fu et al., 2013); whereas RPL11, RPS8, and RPL14 were the three most stable reference genes across different developmental stages and under different temperature conditions of Aphis craccivora . However, under some conditions, expression levels of RP-encoding genes may be unstable. For example, RPS20 was the least stable gene in P. xylostella strains that were collected in different fields, grown under different temperatures, exposed to different photoperiods, or presented different insecticide susceptibility (Fu et al., 2013).
Tubulin (α-tubulin, β-tubulin, and γ -tubulin), which encodes cytoskeletal structure proteins, was ranked as the third most widely investigated reference gene (Figure 3). In many studies, the stability of Tubulin was variable under different treatments for the same species. For example, a-tubulin exhibits a stable expression in different tissues and sexes of C. maculata, whereas its expression was unstable across different developmental stages and following dsRNA treatments (Yang et al., 2015c).
GAPDH is another commonly used reference gene, ranked as the fourth most widely utilized reference gene (Figure 3). Occasionally, the stability of GAPDH expression was variable under different treatments within the same species. For example, GAPDH expression was not affected by tissue type, sex, photoperiod, or dsRNA treatment in H. convergens, but it varied across different developmental stages and at different temperatures . GAPDH was a stable reference gene whose expression was not appreciably altered under different temperatures or by mechanical injury in different strains of P. xylostella; however, its expression was unstable across different developmental stages and was affected by photoperiod (Fu et al., 2013).
18S ribosomal RNA, a part of the ribosomal RNA, was ranked as the sixth most widely investigated reference gene (Figure 3). It was stably expressed throughout the vast majority of biotic and abiotic conditions in most studies that employed its expression level as reference (Table 1). However, it is generally acknowledged that the use of rRNA for normalization of RT-qPCR signals is problematic as rRNA forms a significant proportion of the total RNA pool (>80%), whereas mRNA accounts for a mere 3-5%, so the subtle changes in target gene expression levels may be potentially masked. With this in mind, it is much better to use the mRNA species of the ribosomal machinery, such as RPL and RPS genes, instead of rRNA. Altogether, the expression level of EF1A, TATA, HSP, and SDHA genes was used as a reference in 11.42% of the experiments. These four genes transiently exhibited variable expression under different treatments in different insect species. For example, EF1A was the least stable reference gene in A. craccivora across different developmental stages and at different temperatures . In contrast, EF1A was one of the best reference genes in H. convergens with its expression level being unaffected by three biological factors (developmental stage, tissue type, and sex) and three abiotic conditions (temperature, photoperiod, and dietary RNAi; Pan et al., 2015b).

DISTRIBUTION OF THE NUMBERS OF EXPERIMENTAL FACTORS STUDIED
In the 90 papers, changes in the reference gene expression level were investigated under the influence of one to seven experimental factors. Most of these studies analyzed the influence of one (10%), two (16%), or three (14%) experimental factors ( Figure 4A). The relationship between the number of experimental factors and study publication date (year) was investigated by linear regression. We found that the more recently the paper was published, the more experimental factors it tended to explore ( Figure 4B).

TOP 10 EXPERIMENTAL FACTORS
A total of 39 experimental factors were investigated in these 90 papers, with the top 10 experimental factors (in the descending order) being developmental stage, tissue, temperature, insecticide, diet, population, virus, sex, photoperiod, and starvation ( Figure 5).
RNA interference (RNAi) is a conserved mechanism whereby messenger RNA transcripts are targeted by small interfering RNAs in a sequence-specific manner, leading to downregulation of gene expression. During the past 20 years, RNAi has been widely used as a tool to investigate functions of insect genes (Zotti et al., 2018), whereas RT-qPCR is the method of choice to study gene expression in terms of its sensitivity and specificity. The genes that play important roles during insect metamorphosis and affect different tissues can serve as target genes for manipulations that kill the insect or retard its growth. This is why gene Insects are ectothermic organisms, and the body temperature of most insects is affected by changes in ambient temperature, ultimately influencing their growth, and development. Temperature was ranked as the third most widely investigated factor at 11.79% (Figure 5). We found that the numbers/kinds of reference genes under different temperatures varied in different insects. For instance, GAPDH, and EF1A were the best stable gene combinations in Spodoptera litura (Lu et al., 2013), while RPS15, β-tubulin, and EF1A were the most stable reference genes in Nilaparvata lugens .
Many insects, including the 78 insect species summarized in this study have developed resistance to insecticides. Insecticide resistance presents as a major challenge for pest control. The molecular mechanisms underlying insecticide resistance are under intense scrutiny; RT-qPCR is an important technology for investigating the gene functions involved in insecticide resistance. Insecticides ranked as the fourth most widely investigated factor at 5.00% (Figure 5). We found that different reference genes were used in different insects to study the effect of various insecticide treatments. RPS15 and RPL32 were stably expressed reference genes in insecticide treatment experiments in H. armigera (Zhang et al., 2015); while RPS11, EF1A, and β-tubulin were the best choice in the insecticidestressed N. lugens . Different classes of insecticides have warranted different sets of reference genes to normalize target gene expression in B. tabaci (Liang et al., 2014).
Diet was ranked as the fifth most widely investigated factor at 4.29% (Figure 5). Different gene combinations were required for different diet conditions. For examples, RPL10 and GAPDH were the most stable reference genes in S. litura that were reared on different diets (Lu et al., 2013); whereas, Actin, RPS18, and RPS15 were the most stable reference genes among different diets in Bradysia odoriphaga (Shi et al., 2016), Actin and 18S were the best reference gene combination for feeding assay experiments with Aphis gossypii (Ma et al., 2016).
Population, virus, and sex were all ranked as the sixth most widely investigated factor at 3.93% (Figure 5). Different reference gene combinations were suggested for the studies of each factor. For example, RPL10 and EF1A were the most stable reference genes in S. litura collected from different locations (Lu et al., 2013), EF1A, Actin, and GAPDH were the more stable reference genes in P. xylostella (Fu et al., 2013). The combination of Actin and EF1A was very useful for experiments involving A. gossypii (Ma et al., 2016). In addition, in viral infection experiments, different reference gene combinations were recommended for different insects. For example, GAPDH, RPL27, and β-tubulin was the best reference gene combination for nuclear polyhedrosis virus infection (Zhang et al., 2015), HSP90 and RPL29 were the most stable reference genes in B. tabaci when the whitefly carried the tomato yellow leaf curl virus and when it did not . Moreover, in females and males, different reference gene combinations were recommended for different insects. For instance, GAPDH and CypA were most stable reference genes for H. convergens , HSP90 and RP49 were the most stable ones for Harmonia axyridis (Yang et al., 2018), and 18S, EF1A, and GAPDH were the best for gene expression normalization in Sesamia inferens (Sun et al., 2015).
Photoperiod and starvation ranked as the seventh and eighth most widely investigated factors at 3.21 and 2.86%, respectively (Figure 5). Different reference gene combinations were recommended for different insects for these two factors. For instance, under photoperiod stressed conditions, GAPDH and CypA were most stable reference genes in for H. convergens , EF1A and V-ATPase A were the most stable ones for Danaus plexippus , and HSP90 and β-tubulin were the best reference genes for H. armigera (Shakeel et al., 2015). Under starvation conditions, RPL28 and RPS15 were the most stable reference genes for H. armigera (Shakeel et al., 2015), RPS3 and Actin were the best reference genes for S. litura (Lu et al., 2013), and RPS11, ArgK, and EF1A were recommended for N. lugens .

DISTRIBUTION OF THE NUMBER OF ANALYSIS TOOLS
In the 90 papers, one to five analysis tools were used to evaluate gene expression stability, with one tool (4%) and three tools (34%) being the least and most frequently used variants in these studies, respectively ( Figure 6A). Linear regression analysis showed that the more recently the paper was published, the more analysis tools it used ( Figure 6B).

CONCLUSIONS
Our review clearly suggests that no reference gene is universally stably expressed because variable expression levels even for the most popular reference genes have been observed under different circumstances in the same insect species or under the same experimental condition among different insects. In order to obtain reliable experimental data for the target gene, it is necessary to perform internal reference gene screening under specific experimental conditions. Given that the best internal reference genes in different species under different conditions often have large differences in expression, it may result in a multi-fold difference of target gene expression, or even false conclusion, if used improperly. For instance, the expression of V-ATPase A in the gut ranged from 7.7-to 22.4fold higher than that in the carcass of C. septempunctata when normalized to the most-and least-stable sets of reference genes, respectively . Furthermore, the relative hsp83 expression was noticeably variable when a less stable reference gene was used for RT-qPCR normalization in different tissues and developmental stages of S. inferens, whereas hsp83 was uniformly expressed when stable reference genes were used for normalization (Sun et al., 2015). Therefore, better accuracy in gene expression analysis can promote the investigation of gene function. We strongly recommend that prior to each RT-qPCR experiment, the reference gene expression stability must be validated. Furthermore, multiple reference genes should be used to achieve the best results. This review should help researchers select the best reference genes and optimize their experiments to examine gene expression levels in insects, especially the non-model ones, in terms of the number of reference genes chosen, experimental factors manipulated, and the analysis tools used.

AUTHOR CONTRIBUTIONS
HP and YZ conceived the topic of the review. HP, CY, and JL performed literature review analyzed the data. HP and CY wrote the manuscript.

FUNDING
This work was supported by the National Key R&D Program of China (grant No. 2017YFD0200900), project supported by GDUPS (2017), a start-up fund from the South China Agricultural University. The granting agencies have no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.