The Protocatechuate 3,4-Dioxygenase Solubility (PCDS) Tag Enhances the Expression and Solubility of Heterogenous Proteins in Escherichia coli

Escherichia coli has been developed as the most common host for recombinant protein expression. Unfortunately, there are still some proteins that are resistant to high levels of heterologous soluble expression in E. coli. Protein and peptide fusion tags are one of the most important methods for increasing target protein expression and seem to influence the expression efficiency and solubility as well. In this study, we identify a short 15-residue enhancing solubility peptide, the PCDS (protocatechuate 3,4-dioxygenase solubility) tag, which enhances heterologous protein expression in E. coli. This PCDS tag is a 45-bp long sequence encoding a peptide tag involved in the soluble expression of protocatechuate 3,4-dioxygenase, encoded by the pcaHG98 genes of Pseudomonas putida NCIMB 9866. The 45-bp sequence was also beneficial for pcaHG98 gene amplification. This tag was shown to be necessary for the heterologous soluble expression of PcaHG98 in E. coli. Purified His6-PcaHG98e04-PCDS exhibited an activity of 205.63±14.23U/mg against protocatechuate as a substrate, and this activity was not affected by a PCDS tag. This PCDS tag has been fused to the mammalian yellow fluorescent protein (YFP) to construct YFP-PCDS without its termination codons and YFPt-PCDS with. The total protein expressions of YFP-PCDS and YFPt-PCDS were significantly amplified up to 1.6-fold and 2-fold, respectively, compared to YFP alone. Accordingly, His6-YFP-PCDS and His6-YFPt-PCDS had 1.6-fold and 3-fold higher soluble protein yields, respectively, than His6-YFP expressed under the same conditions. His6-YFP, His6-YFP-PCDS, and His6-YFPt-PCDS also showed consistent fluorescence emission spectra, with a peak at 530nm over a scanning range from 400 to 700nm. These results indicated that the use of the PCDS tag is an effective way to improve heterologous protein expression in E. coli.


INTRODUCTION
Since the mammalian hormone somatostatin was first used to realize heterologous expression in Escherichia coli (Itakura et al., 1977), the expression of recombinant protein has developed rapidly for various applications, including industrial enzyme production (Headon and Walsh, 1994), biopharmaceuticals (Jozala et al., 2016), and vaccine production (Christodoulides et al., 2001). The production of recombinant proteins has been implemented in many different prokaryotic and eukaryotic host systems, including E. coli, yeast, filamentous fungi, insect cells, Arabidopsis, and even in mammalian cell culture lines (Demain and Vaishnav, 2009). Among these systems, E. coli is often preferred due to its fast growth, high cell density cultures, rich complex media, and easy transformation (Rosano and Ceccarelli, 2014). In order to improve the heterologous protein expression of E. coli systems, a number of protein tags have been developed, including the maltose-binding protein (MBP; Kapust and Waugh, 1999), glutathione S-transferase (GST; Smith and Johnson, 1988), small ubiquitin-related modifier (SUMO; Marblestone et al., 2006), N-utilization substance protein A (NusA; Davis et al., 1999), thioredoxin (Trx;LaVallie et al., 1993), and ubiquitin (Bachmair et al., 1986;Varshavsky, 2005). In addition, there are some short peptide tags for E. coli expression systems wherein the amino acid sequences are generally 15 residues or less (Ki and Pack, 2020), including the Arg-tag (Sassenfeld and Brewer, 1984), FLAG-tag (Hopp et al., 1988), His-tag (Hochuli et al., 1987), c-myc-tag (Ferrando et al., 2001), S-tag (Karpeisky et al., 1994), Strep-tags (Schmidt and Skerra, 1993), and Fh8 and H tags (Costa et al., 2013). These peptide tags can improve the solubility of a target protein by regulating the process of protein transcription and translation (Ki and Pack, 2020). However, a specific fusion tag is not suitable for the protein expression of all proteins (Zhao et al., 2019), and the function of an expressed protein may be negatively affected by the specific fusion tag used for expression (Arnau et al., 2006). Therefore, the identification of additional fusion protein and peptide tags is conducive to finding more beneficial common features for protein heterogeneous expression.
The chemical 2,4-xylenol, known to be harmful to human health, is listed as a toxic pollutant by the U.S. Environmental Protection Agency due to its damage to the environment. The two compounds 2,4-xylenol and p-cresol can be catabolized together and have the same enzymes to catalyze the oxidation of para-methyl groups in Pseudomonas putida NCIMB 9866 (Chen et al., 2014). Moreover, the metabolism of 2,4-xylenol and p-cresol is through the protocatechuate (3,4-dihydroxybenzoate, PCA) pathway (Elmorsi and Hopper, 1977;Chen et al., 2014;Chao et al., 2016). Protocatechuate is further metabolized through the ortho ring cleavage pathway. pcaHG genes encoding protocatechuate-3,4-dioxygenase are located on the pch gene cluster in P. putida NCIMB 9866 (Chen et al., 2014). In this study, we found a 45-bp-coded peptide tag involved in the soluble expression of protocatechuate 3,4-dioxygenase encoded by the pcaHG genes of P. putida NCIMB 9866 in E. coli. These pcaHG genes were cloned and expressed in vitro, and a 45-bp sequence termed as the PCDS tag (protocatechuate 3,4-dioxygenase solubility tag) required for its soluble expression is characterized. PcaHG98 appears as an inclusion body without the PCDS tag in E. coli. Moreover, this tag could also significantly promote the heterogenous soluble expression of yellow fluorescent protein (YFP), which has relatively low expression levels in E. coli.

MATERIALS AND METHODS
Strains, Plasmids, Culture Media, Primers, and Chemicals The strains used in this study are listed in Table 1

Gene Cloning and Expression, Protein Purification
Plasmid preparation and DNA manipulation were carried out as described previously (Green and Sambrook, 2012). The primers used in this study are listed in Table 2 and are shown in Figure 1. All the targeted genes were amplified by PCR using Taq DNA polymerase or TransEco FastPfu DNA Polymerase (Transgen, Beijing, China), according to its manufacturer's recommendation. The pcaHG98 gene was amplified with PCR with primers pcahg9803 and pcahg9804 from the extracted genomic DNA of strain NCIMB 9866. The PCR reaction volume usually uses 50 μl system and consists of 5-30 ng Plasmid DNA or 100 ng Genomic DNA, 1x TransEco FastPfu Buffer, 250 μM dNTP, 0.4 μM of each primer, 2.5 U of TransEco FastPfu DNA Polymerase or Taq DNA polymerase and ddH 2 O up to 50 μl. PCR amplification was performed with TransEco FastPfu DNA polymerase as follows: denaturation at 95°C for 5 min, 30 cycles of 95°C for 1 min, 58°C for 1 min, and 72°C for 1 min (1 kb/ min for Taq, 2 kb/min for FastPfu), and a final extension cycle at 72°C for 5 min. The PCR fragments were digested with NdeI/HindIII (Thermo Fisher Scientific, Shanghai, China) before being cloned into pET-28a (+) with T4 DNA ligase (TaKaRa, Dalian, China) to produce pET-28a-pcaHG98e01. The pcaHG98 gene was amplified using primers pcahg9801 and pcahg9802 from plasmid pET-28a-pcaHG98e01 and cloned into NdeI/HindIII sites of pET-28a (+) with T4 DNA ligase to produce pET-28a-pcaHG98e02. The pcaHG98 gene was amplified using primers pcahg9802 and pcahg9803 from plasmid pET-28a-pcaHG98e01 and cloned into NdeI/HindIII sites of pET-28a (+) with T4 DNA ligase to produce pET-28a-pcaHG98e03. The pcaHG98 gene was amplified using primers pcahg9801 and pcahg9804 from plasmid pET-28a-pcaHG98e01 and cloned into NdeI/HindIII sites of pET-28a (+) with T4 DNA ligase to produce pET-28a-pcaHG98e04. The nucleotide sequence for YFP was synthesized according to the sequences of pcDNA3YFP, which was a gift from Doug Golenbock (Addgene plasmid # 13033; http://n2t.net/ addgene:13033; RRID: Addgene_13,033). The YFP gene was amplified using primers pYFP01 and pYFP02 from this synthesized YFP gene (Sangon Biotech, Shanghai, China) and fused to the NdeI/HindIII restriction sites of pET-28a (+) with the ClonExpress II One Step Cloning Kit (Vazyme, Nanjing, China) to produce pET-28a-YFP. The YFP gene was amplified using primers pYFP03 and pYFP04 from pET-28a-YFP and fused to the NdeI/HindIII restriction sites of pET-28a (+) to produce pET-28a-YFPt-PCDS. The YFP gene was amplified by PCR using primers pYFP04 and pYFP05 and fused to the NdeI/HindIII restriction sites of pET-28a (+) to produce pET-28a-YFP-PCDS. PCR amplification conditions are the same as pcaHG98 gene described above.
The eGFP gene was amplified using primers pegfp03 and pegfp04 from the mammalian expression vector pEGFP-N1 and fused to the NdeI/EcoRI restriction sites of pET-28a (+) with the ClonExpress II One Step Cloning Kit to produce pET-28a-eGFP. The primers pET28a-pcas03 and pET28a-pcas04 containing PCDS tag were used to amplify linearized recombinant vector from pET-28a-eGFP, which was then fused by ClonExpress II One Step Cloning Kit to form cyclized vector pET-28a-eGFP-PCDS.
The nucleotide sequence of the resulting plasmid was confirmed by Sangon Biotech (Shanghai, China).
Heterologous expression of the cloned pcaHG gene was accomplished by introducing the constructed plasmid into E. coli BL21 (DE3; Novagen, Madison, WI). The transformed cells were grown at 37°C to an OD 600 of 0.4 in 100 ml LB supplemented with 50 g ml −1 of kanamycin in 500 ml shake flask; then, the protein expression was induced with 0.1 mM of isopropyl-β-D-thiogalactopyranoside (IPTG) for approximately 5 h at 30°C, resulting in OD 600 of 2. His 6 -PcaHG98e04-PCDS was purified using Ni 2+ -nitrilotriacetic acid agarose chromatography (Novagen) and eluted with 200 mM of imidazole. Purified target His 6 -PcaHG98e04-PCDS was further dialyzed away from imidazole with a Spectra/Por CE dialysis membrane with a molecular weight cut-off of 10,000 Da (Spectrum Laboratories Inc., Shanghai, China) at 4°C for 48 h against phosphate buffer (PB) before being further preserved in glycerol at −80°C.

Molecular Weight Determination
Molecular weight of the purified recombinant His 6 -PcaHG98e04-PCDS was determined by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) and liquid chromatograph high-resolution accurate mass spectrometry (LC-MS). LC-MS data were obtained using a Thermo Scientific™ Q Exactive™ mass spectrometer (Thermo Fisher Scientific, Waltham, MA, United States). The recombinant His 6 -PcaHG98e04-PCDS was purified by SDS-PAGE. The His 6 -PcaHG98e04-PCDS sample was dissolved in 2% acetonitrile and 0.1% formic acid. The mixture was vortex mixed and centrifuged at 12,000 g for 15 min. The sample was then purified by ultra-filtration, concentrated using Amicon Ultra-10K centrifugal filter device and stored in 0.1% formic acid. MS was performed using a Thermo Scientific Q Exactive mass spectrometer operated in the data-dependent/dynamic exclusion mode. A full resolution setting of 70,000 (full width at half maximum, FWHM) was used for full scan MS. All data were processed using Thermo Scientific™ Pinpoint™ software (Thermo Fisher Scientific, Waltham, MA, United States).

Enzyme Activity Assays
PCA-3,4-dioxygenase activities were detected by referring to previously described methods (Stanier and Ingraham, 1954;Gross et al., 1956). For the assay of PCA-3,4-dioxygenase activity, the reaction mixture contained 50 mM of PB (pH 7.5) and 200 μM of cell extracts or purified His 6 -PcaHG98e04-PCDS. All compounds except the substrate were added to the reference cuvette, and the enzyme activity assay was initiated by the addition of 30 μg of PCA. The spectra in the range of 220-400 nm were recorded every minute. The molar extinction coefficients for PCA at 290 nm and 3-carboxy-cis, cis-muconate at 270 nm were 3,890 M −1 cm −1 (Gross et al., 1956) and 6,390 M −1 cm −1 (Iwagami et al., 2000), respectively. One unit of enzyme activity was defined as the protein amount required for the production (or disappearance) of 1 μmol of product (or substrate) min −1 at 30°C. The protein amounts of E. coli BL21 (DE3)/pET-28a-pcaHG98e01 and E. coli BL21 (DE3)/pET-28a-pcaHG98e04 refer to the soluble protein after the 100 ml IPTG induced cell is ultrasonically broken and centrifuged. The amount of purified  His6-PcaHG98e04-PCDS was directly detected. All enzyme activity assays were run in triplicate in three independent experiments.

Statistical Analysis
The statistical analysis used in this study was performed by SPSS version 23.0 software (IBM SPSS Inc., New York, NY, United States). One-way analysis of variance (ANOVA) was used to calculate the values of p for PCA-3,4-dioxygenase activity analyses. Paired-sample tests were used to calculate probability (p) values for the expression of YFP. The values of p of 0.05 and 0.01 were considered as statistically significant and highly statistically significant, respectively.

RESULTS
pcaHG98 Gene Cloning and Expression pcaHG98, encoded a putative protocatechuate 3,4-dioxygenase, was cloned into pET-28a (+). The pcaHG98 gene was amplified by PCR with primers pcahg9801 and pcahg9802 (Figure 1) using strain NCIMB 9866 gDNA or bacteria lysate as a template. But the gene could not be amplified into PCR products using either Taq DNA polymerase or Pfu DNA polymerase. In order to determine this sequence on the genome of strain NCIMB 9866, two primers, pcahg98S01 and pcahg98S02 (Figure 1), were designed for sequencing analysis, and the results show that the sequence was consistent with previous reports (Chen et al., 2014). Then, two primers, pcahg9803 and pcahg9804 were redesigned (Figure 1) successfully cloned the pcaHG98 gene into pET-28a (+) to generate pET-28a-pcaHG98e01.
pET-28a-pcaHG98e01, which contained the amplified pcaHG98 gene from strain NCIMB 9866 gDNA, was expressed in E. coli BL21(DE3). The molecular masses calculated from the nucleotide sequence of PcaG and PcaH were 22.7 and 26.3 kDa, respectively, which corresponded to the proteins observed by SDS-PAGE (Figure 2A). The PcaHG9801 protein, however, did not bind to Ni 2+ -nitrilotriacetic acid agarose for chromatography. We obtained the same results three separate times. This may have been because the pcaHG98 gene included 54 bp of upstream sequence and 45 bp of downstream sequence in pET-28a-pcaHG98e01. The start and stop codons of the pcaHG98 gene still retained with it in this recombinant. Therefore, a His-tag is not fused to the pcaHG98 gene due to the presence of its start and stop codons.
The pcaHG98 gene in vector pET-28a-pcaHG98e02 showed no effective expression, and most of its proteins were insoluble, of which the α-subunit PcaG was not determined by SDS-PAGE (Figure 2A). The pcaHG98 gene was amplified from pET-28a-pcaHG98e01 to produce pET-28a-pcaHG98e03, in which the pcaHG98 gene is expressed in the form of inclusion body ( Figure 2B). However, the pcaHG98 gene in pET-28a-PcaHG98e04 was expressed in the form of soluble proteins, which was amplified using primers pcahg9801 and pcahg9804 ( Figure 2B).

Purification and Biochemical Properties of PcaHG98e04
The PcaHG98e04 protein was constructed using the primers pcahg9801 and pcahg9804, and the primer pcahg9801 (which removed the start codon ATG) starts with the pcaH gene 5′-end, with the lead pcahg9804 on the outside of the pcaG gene (containing 45 bases: GACATCGCCGGGGCGCGCGC GCG). Compared with the insoluble PcaHG98e02, these 45 bases translated into polypeptides make the His 6 -PcaHG98e04-PCDS protein soluble, the reasons for which are further explored below ( Figure 2B).
His 6 -PcaHG98e04-PCDS, purified using Ni 2+ -nitrilotriacetic acid agarose chromatography, was then determined using LC-MS. The molecular weight of the protein detected was 27.6 kDa (Figure 3), which contained 22.7 kDa encoded by the pcaG gene and 3.3 kDa encoded by the 45-bp (the underlined DNA sequence) and the other end sequences (GACATCGCCGGGG CAGCTGCTCCGGCACGAGTACGAAACGCGTCT CAAGCT TGCGGCCGCACTCGAGCACCACCACCACCACCACTGAGA TCCGGCTGCTAA). These results indicated that the expressed protein PcaG contained the PCDS tag.

Biochemical Properties of Recombinant His6-YFP, His6-YFP-PCDS, and His6-YFPt-PCDS
It has been reported that the YFP has relatively low expression in E. coli compared to that of codon-optimized YFP (Nguyen et al., 2019). The YFP gene was synthesized according to the sequence of the mammalian expression vector pcDNA3YFP and cloned into the E. coli expression vector pET28a (+) in this study. Based on our results, the YFP gene was expressed in our E. coli system and had a certain amount of solubility ( Figure 5B). The 45-bp PCDS tag enhanced the soluble expression of the YFP gene in E. coli after fusing the PCDS tag to the YFP C-terminus ( Figure 5B). This YFP-PCDS did not have YFP termination codons, whereas YFPt-PCDS contained YFP termination codons. The PCDS tag enhanced the expression level of YFP in E. coli was analyzed by fluorescence-activated cell sorting (FACS). FACS analysis showed that the fluorescence intensity of BL21 (DE3)/pET-28a-YFP-PCDS and BL21 (DE3)/ pET-28a-YFPt-PCDS was higher than that of BL21 (DE3)/ pET-28a-YFP, and BL21 (DE3)/pET-28a-YFPt-PCDS was the highest (Figure 5A). BL21 (DE3)/pET-28a-YFP showed weaker fluorescence intensity, as did the control BL21 (DE3)/pET-28a(+). The total protein expression of YFP-PCDS and YFPt-PCDS  was significantly amplified up to 1.6-fold and 2-fold compared to that of YFP alone, respectively. Accordingly, His 6 -YFP-PCDS and His 6 -YFPt-PCDS had 1.6-fold and 3-fold higher soluble expression yield, respectively, than that of His 6 -YFP under the same expression and purification conditions ( Figure 5C). All protein amounts were determined three times in three independent experiments. His 6 -YFP, His 6 -YFP-PCDS, and His 6 -YFPt-PCDS also showed consistent fluorescence emission spectra, with a peak at 530 nm across the scanning range from 400 to 700 nm (Figure 5D).

DISCUSSION
The β-ketoadipate pathway is preceded by the preliminary conversion of a broad range of organic compounds into one of two aromatic rings, catechol or PCA (Harwood and Parales, 1996;Wells and Ragauskas, 2012). Protocatechuate 3,4-dioxygenase catalyzes the cleavage of PCA into β-carboxycis, cis-muconate (Harwood and Parales, 1996). pcaHG98 encodes a putative protocatechuate 3,4-dioxygenase in NCIMB 9866. This gene, unfortunately, cannot be directly amplified by PCR with primers pcahg9801 and pcahg9802 using strain NCIMB 9866 gDNA or bacteria lysate as a template using either the Taq or Pfu DNA polymerase. Sequencing analysis results show that the sequence of the pcaHG98 gene was consistent with genome sequencing data. Then, two primers, pcahg9803 and pcahg9804 were redesigned (Figure 1), and the pcaHG98 gene was successfully cloned into pET-28a (+) to generate pET-28a-pcaHG98e01. The recombinantly expressed PcaHG98e01 (Figure 2A) had protocatechuate 3,4-dioxygenase activity ( Figure 4A). These results show that the 54 bp and 45 bp of the pcaHG98e01 gene (Figure 1) contain the sequences needed for its solubility expression, which were also beneficial for gene amplification. In order to determine the role of these two sequences, we constructed pET-28a-pcaHG98e03, which contained 54 bp on the N terminus, and pET-28a-pcaHG98e04, which contained 45 bp on the C terminus (Figure 1). The PcaHG98e03 protein was insoluble (Figure 2B), and the PCA 3,4-dioxygenase activity was not detectable in E. coli BL21 (DE3)/pET-28a-pcaHG98e03 (Figure 4C), while the PcaHG98e04 protein was soluble (Figure 2B), and E. coli BL21 (DE3)/pET-28a-pcaHG98e04 cell extracts were found to exhibit activity, with a specific activity of 72.8 ± 4.41 U/mg against PCA as a substrate. Therefore, the 45 bp on the C terminus of the pcaHG98e04 gene was beneficial for its soluble protein expression, which we named as the PCDS tag.
In order to analyze the solubility impact of the PCDS tag, we selected YFP, with its pI of 5.910 and its relatively low level of expression in E. coli according to the literature (Nguyen et al., 2019). Based on our results, the YFP gene was expressed in the E. coli system with a certain amount of solubility ( Figure 5B). FACS analysis showed that the fluorescence intensity of YFP-PCDS and YFPt-PCDS in E. coli BL21(DE3) was higher than that of YFP, and YFPt-PCDS was the highest (Figure 5A). The PCDS tag could enhance the soluble expression of the YFP gene by fusing the PCDS tag to the YFP C-terminus in E. coli ( Figure 5B). The total protein expression of YFP-PCDS and YFPt-PCDS without and with a termination codon was significantly amplified up to 1.6-fold and 2-fold compared to that of YFP alone, respectively. Accordingly, His 6 -YFP-PCDS and His 6 -YFPt-PCDS had 1.6-fold and 3-fold higher soluble expression yield, respectively, than that of His 6 -YFP under the same conditions. They also showed a consistent fluorescence emission spectra, with a peak at 530 nm over a scanning range from 400 to 700 nm ( Figure 5D). This result also showed that the presence of the stop codon also affected the improvement in solubility induced by the PCDS tag, which was located at a similar position as the pcaHG98 gene. In addition, the enhanced green fluorescent protein (eGFP) gene is also less expressed in the E. coli expression system and has been used for fusion tagging (Marblestone et al., 2006). Herein, the eGFP gene originated from the mammalian expression vector pEGFP-N1 was cloned into pET-28a (+) to produce pET-28a-eGFP-PCDS. Induced expression results (Figure 6) show that the recombinant eGFP (a pI of 5.731) expressed in E. coli BL21 (DE3)/ pET-28a-eGFP-PCDS was basically all soluble, which did not show the solubility effect of this 45-bp PCDS tag. The acidity of fusion tags can significantly improve the solubility of fusion proteins; so, acidic fusion tags that are negatively charged at physiological pHs have been widely screened and are widely used (Ki and Pack, 2020). However, the 15 amino acid PCDS tag in this study was an alkaline fusion tag with a pI value of 9.849 and six basic amino acid arginines. The pI value of PcaH and PcaG was 9.599 and 4.378, respectively. A PCDS tag located at the C-terminus of PcaG may promote PcaHG soluble expression by influencing the PcaG protein. The PCDS tag found in this study may be more suitable for insoluble expression due to protein acidity, which requires further analysis. In addition, the 45-bp sequence outside the end of the pcaHG98 gene affects its PCR cloning from the genome, but the reason is confusing and needs further analysis. A protein has its lowest solubility when the ambient pH is equal to its pI, at which point it shows a zero net charge. Therefore, optimizing the net charge to a positive or negative charge improves the solubility of a protein (Lawrence et al., 2007). Depending on the pI value of the target protein, the use of fusion tags to induce a rejection of static interaction between proteins can provide sufficient time for the correct folding of proteins, thus preventing protein aggregation (Zhang et al., 2004;Kato et al., 2007;Paraskevopoulou and Falcone, 2018). The pI values of PcaH and PcaG were 9.599 and 4.378, respectively. The PCDS tag having a high pI value of 9.849 may promote PcaHG soluble expression by influencing the static interaction between them. In this respect, the PCDS tag is similar to the protein fusion tag CBD. CBD also has a high pI value (Hopp et al., 1988;Marblestone et al., 2006) that improves the soluble expression of heterogeneous proteins in E. coli (Murashima et al., 2003). Most peptide fusion tags are located at the N-terminus of an expression protein, which promotes the correct folding of a protein to enable its solubility expression by affecting transcription initiation (Seo et al., 2013;Nguyen et al., 2019). In contrast to this, some studies have shown that poly-lysine or poly-arginine tags fused to the C-terminus of a target protein are more likely to enhance its solubility than a tag fused to the N-terminus (Park et al., 2003;Hage et al., 2015;Islam et al., 2015;Nautiyal and Kuroda, 2018). A PCDS tag located at the C-terminus of PcaG may promote PcaHG soluble expression by influencing the static interaction between them. In fact, the soluble expression of recombinant proteins is controlled by many factors, including the host organism, the types of expression vector used, codon bias, culture conditions, transcription initiation, mRNA stability, and protein toxicity (Rosano and Ceccarelli, 2014). An inclusion body is a common hurdle for the heterologous expression of recombinant proteins, which is usually solved by various solubility-enhancing tags, including protein fusion tags and peptide tags (Paraskevopoulou and Falcone, 2018). These peptide tags have been developed and applied to enhance heterologous protein expression, including FLAG-tags (Einhauer and Jungbauer, 2001), Arg-tags (Sassenfeld and Brewer, 1984;Terpe, 2003), Fh8 and H tags (Costa et al., 2013), and NT11-tags (Table 3; Terpe, 2003;Nguyen et al., 2019;Ki and Pack, 2020). The advantage of these peptide tags for heterologous expression is that their amino acid sequences usually only have 15 residues or less and therefore do not affect the structure or activity of heterologous proteins (Kato et al., 2007;Paraskevopoulou and Falcone, 2018;   Nguyen et al., 2019;Ki and Pack, 2020). Thus, it may not be necessary to remove these peptide tags for further applications of proteins, in contrast to the case of the larger protein fusion tags. In addition, protein and peptide fusion tags are one of the most important methods to increase target protein expression, including expression efficiency and solubility. Therefore, understanding the physicochemical properties of proteins, including their pI value, net charge, and GRAVY values, can help to select and design appropriate fusion tags. Although many protein labels and peptide tags have been reported to promote protein soluble expression, there are still other proteins that cannot achieve heterologous soluble expression, which requires research and development that focused on more and more extensive technologies and fusion tags.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
LZ and SL constructed the plasmids, cultured bacteria, and analyzed the data. NL and S-LR analyzed the data. H-JC designed and supervised the study. LZ, SL, JC, JW, DY, and H-JC wrote and revised the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by grants from the National Natural Science Foundation of China (NSFC; grant numbers 31770119, 31400068, and 32070098). The funding organization did not influence the design of the experiment or analysis, and interpretation of data, or preparing the manuscript.