Comparative mRNA and LncRNA Analysis of the Molecular Mechanisms Associated With Low Silk Production in Bombyx mori

Naked pupa sericin and Naked pupa are two mutant strains of Bombyx mori with extremely low or no fibroin production compared to the Qiufeng and Baiyu strains, both of which exhibit very high silk fibroin production. However, the molecular mechanisms by which long non-coding RNAs regulate fibroin synthesis need further study. In this study, we performed high-throughput RNA-seq to investigate lncRNA and mRNA expression profiles in the posterior silk gland of Qiufeng, Baiyu, Nd-sD, and Nd silkworms at the third day of the 5th instar. Our efforts yielded 26,767 novel lncRNAs and 6,009 novel mRNAs, the expression levels of silk protein genes and silk gland transcription factors were decreased in Qiufeng vs. Nd-sD and Qiufeng vs. Nd, while those of many genes related to autophagy, apoptosis, RNA degradation, ubiquitin-mediated proteolysis and heat shock proteins were increased. Moreover, the expression of a large number of genes responsible for protein synthesis and secretion was significantly decreased in Nd. GO and KEGG analysis results showed that nucleotide excision repair, mRNA surveillance pathways, amino acid degradation, protein digestion and absorption, ER-associated degradation and proteasome pathways were significantly enriched for the Qiufeng vs. Nd-sD and Qiufeng vs. Nd comparisons. In conclusion, our findings contribute to the lncRNA and mRNA database of Bombyx mori, and the identified differentially expressed mRNAs and lncRNAs help to reveal the molecular mechanisms of low silk production in Nd-sD and Nd, providing new insights for improvement of silk yield and elucidation of silk mechanical properties.


INTRODUCTION
Bombyx mori of Lepidoptera is an ideal and economical model insect for scientific research because of its ease of feeding, high fecundity and short life cycle (Zhu Xs and Xu, 2009). Qiufeng and Baiyu are two economical silkworm strains with large cocoons, high yields and excellent mechanical properties. In contrast, two other strains, the silk yield phenotypic mutants Naked pupa (Nd) (Takei et al., 1984) and Naked pupa sericin (Nd-s D ) (Inoue et al., 2005), exhibit extremely low or no fiber production.
Non-coding sequences were once considered to be the "junk" sequences of the genome because they do not have protein-coding functions (Balakirev and Ayala, 2003). With indepth study of the functions and mechanisms of microRNAs (miRNAs), it has gradually become recognized that noncoding RNAs (ncRNAs) are important regulators of proteincoding genes. NcRNAs are now generally named based on their locations in cells and on their sizes. A large number of ncRNA transcripts were identified in a large-scale sequencing study of the mouse transcriptome; that study also suggested the existence of long non-coding RNAs (lncRNAs) (Okazaki et al., 2002). However, further research was not conducted for a long time due to limitations in functional annotation techniques. The development of high-throughput sequencing technology has made large-scale lncRNA research covering the whole genome possible. LncRNA is a type of RNA transcript that is >200 nucleotides in length and does not encode a protein.
LncRNAs and mRNAs are both transcribed by RNA polymerase II, and their transcription and cleavage mechanisms are very similar; most of these sequences also contain 5 ′ cap and 3 ′ polyA tail structures. However, compared with mRNAs, lncRNAs exhibit relatively low expression, strong tissue specificity, limited homology among sequence families, and poor conservation across species (Derrien et al., 2012).
In recent years, studies have shown that lncRNAs are involved in various biological processes, and the mechanisms of gene expression regulation mediated by lncRNAs can be roughly divided into the following three categories: (1) epigenetic modification, which involves regulation of downstream genes by histone modification (Gong and Maquat, 2011) and chromatin remodeling (Beniaminov et al., 2008;Zhao et al., 2008;Mercer et al., 2009); (2) transcriptional regulation, which involves effects on the transcription of protein-coding genes (upregulation or downregulation of the expression of the genes) via interactions with their promoters or transcription factors (Igor et al., 2007;Zhao et al., 2008;Jing et al., 2010); and (3) posttranscriptional regulation, which involves processing of lncRNAs into small RNAs such as siRNAs (Tiash and Chowdhury, 2018;Tsai et al., 2018) and miRNAs (Yuya et al., 2008;Annilo et al., 2009;Pandey et al., 2018) that function via complementary binding to coding gene transcripts. Recent research on lncRNAs has mainly concentrated on humans and mice. Little research has been performed on insect ncRNAs, especially lncRNAs; furthermore, research focused on small ncRNAs, such as research on the functional annotations of miRNAs and siRNAs, and on the prediction of their target genes is still not very extensive. However, studies have shown that lncRNAs can facilitate multiple biological processes in Drosophila, such as sex determination (Mulvey et al., 2014), male courtship behavior (Chen et al., 2011) and X-chromosome gene regulation (Ilik et al., 2017;Samata and Akhtar, 2018).
Thus far, research on the silk gland development process has mostly concentrated on protein-coding genes. Similar to studies on other species, most Drosophila ncRNA investigations have explored the regulation of gene expression by miRNAs and have focused less on the regulation of gene expression by lncRNAs. In previous studies, Li et al. ( , 2014 identified 189 novel intermediate-size ncRNAs in the silkworm and speculated that some of the ncRNAs were to be involved in the regulation of silk gland development. Two of nine silk gland-enriched ncRNAs, Bm-102 and Bm-159, might play roles through epigenetic modification. A total of 11,810 lncRNAs were identified using RNA-seq, and coexpression network analysis revealed that hub lncRNAs associated with the middle silk gland (MSG) and posterior silk gland (PSG) may function as regulators of the biosynthesis, translocation, and secretion of silk proteins (Wu et al., 2016). The differentially expressed lncRNAs (DELs) in the silk gland are mainly associated with processes related to silk protein translation, which indicates that lncRNAs may participate in the regulation of silk protein production between domestic and wild silkworms (Zhou et al., 2018). Furthermore, lncRNAs play an important role in silk gland-related apoptosis by interacting with other RNAs (Chen et al., 2019).
In the current study, we performed high-throughput Illumina sequencing to systematically identify lncRNAs and differentially expressed genes (DEGs) in the PSG tissues of silkworms with high and extremely low silk fibroin production. Our efforts yielded a large number of silk gland lncRNAs and related target genes and revealed that some of the lncRNAs may serve as precursors of miRNAs. In this study, 26,767 novel lncRNAs and 6,009 novel mRNAs were identified, which identified more lncRNAs than previous RNA-seq studies. and greatly expanded the silkworm lncRNA database. Among them, 4,606 of identified lncRNA transcripts are Nd or Nd-s D strain-specific, which can be used as candidates to study the mechanism of low silk production. We further explored the potential regulatory functions of lncRNAs on the development of PSGs and the biosynthesis of fibroin. Our current study indicated that the function of the silkworm lncRNAs is not only related to fibroin synthesis and apoptosis of silk glands as stated in the previous studies (Wu et al., 2016;Zhou et al., 2018;Chen et al., 2019), but also promote the degradation of silk proteins by regulating RNA/amino acid degradation, the proteasome and proteolysis. Moreover, lncRNA may participate in nucleotide excision repair, heat shock proteins expression, autophagy activation and endoplasmic reticulum stress response to maintain the homeostasis of the PSG cells. Our study described more comprehensively than ever the elements that may affect the development of PSG and silk protein synthesis, as well as the potential regulatory function of lncRNA on these factors. These results enable a better understanding of the molecular mechanisms of PSG development and fibroin synthesis and thus provide a theoretical basis for improvement of the mechanical properties of silk.

Morphology of the PSG and Cocoon
The silk gland is a vital organ for the production of silk fibroin and is divided into three segments according to morphology and function, namely, the anterior silk gland (ASG), the MSG and the PSG. As shown in Figure 1, the PSGs of Nd and FIGURE 1 | Comparison of Qiufeng, Baiyu, Nd-s D and Nd. The dissected silk glands of silkworms at I5D3. The white long scale bars mark the PSG (A). The pupa of Qiufeng, Baiyu, Nd-s D and Nd from left to right (B). The cocoon of Qiufeng, Baiyu, Nd-s D and Nd from left to right (C). Comparison of the weight of the pupa (D) and cocoon shell (E) of Qiufeng, Baiyu, Nd-s D and Nd. Sixteen pupae and cocoons were collected for each species including eight males and eight females, and Student's t-test was performed to analyze their pupa and cocoon shell weight differentials. **P < 0.01. Scale bars, 1 cm in (A-C).
Nd-s D silkworms at the third day of the 5th instar (I5D3) were severely degraded and atrophied, while those of Qiufeng and Baiyu silkworms were long and folded ( Figure 1A). The morphology of the pupa was significantly different in different strains (Figure 1B), the weight of pupa of Qiufeng strain was the heaviest, followed by Baiyu and Nd-s D . However, the pupa weight of Nd was close to that of Qiufeng strain due to the dysfunction of silk protein secretion ( Figure 1D). Moreover, Nd-s D spun only very thin and easily torn cocoons mainly composed of sericin, called sericin cocoons, and Nd failed to spin normal cocoons such that they eventually formed naked pupae ( Figure 1C). The weight of cocoons of different strains was significantly different, among which Qiufeng strain had the heaviest cocoon, followed by Baiyu and Nd-s D ( Figure 1E).

Transcriptome Assembly and LncRNA Identification
To investigate the molecular mechanisms of mRNAs and lncRNAs in PSG development and fibroin synthesis, four cDNA libraries were constructed, and 96,750,098, 96,750,306, 96,749,478 and 96,748,404 clean reads were obtained in the Qiufeng, Baiyu, Nd-s D and Nd libraries, respectively. The clean data were mapped to the reference genome (Bombyx mori) using HISAT,and 76.81,76.57,78.96,and 79.70% of the sequences were successfully mapped for the Qiufeng, Baiyu, Nd-s D and Nd libraries, respectively. These data indicated that the four libraries were of good quality, that the numbers of reads obtained were large, and that most of the reads could be matched to the genome. The raw sequence reads was upload to Sequence Read Archive (SRA) (http://www.ncbi.nlm. nih.gov/) with accession PRJNA634502. In this study, 16,069 SilkDB-annotated mRNAs were all mapped from the assembled transcripts, and a total of 37,531 novel transcripts were identified (Supplementary Table 1). Since a true protein-coding transcript is more likely than a non-coding transcript to harbor a long, high-quality open reading frame (ORF), the coding potential of novel transcripts was predicted with the software programs CPC, txCdsPredict, and CNCI and the Pfam protein database. Herein, 27,589 and 8,291 transcripts were predicted to be isoforms of 26,908 novel lncRNA genes, and 6,009 novel mRNA genes, respectively (Figure 2). Among them, 141 unique transcripts can be compared to the existing lncRNA collected in Wu et al. (2016) and BmncRNAdb (Supplementary Table 2). Finally, 26,767 novel lncRNA genes were obtained. It is worth noting that 4,606 lncRNAs were specifically expressed only in Nd or Nd-s D (Supplementary Table 3).

Genomic Characteristics of the LncRNAs
In order to further study the genomic characteristics of the lncRNAs expressed in PSG tissues, we conducted a comparative analysis of the exon numbers, transcript numbers, and transcript lengths of all lncRNAs and mRNAs identified in this study (Supplementary Figure 1). Unlike the mRNAs, most lncRNAs were single-exon lncRNAs, while a smaller proportion were twoexon lncRNAs. LncRNAs with lengths between 0-500 nt and 500-1000 nt accounted for ∼90% of the total transcripts with lengths shorter than those of mRNAs; thus, the lengths of the lncRNAs were mainly within 1000 nt. Most lncRNAs encoded only one transcript, while some lncRNAs encoded multiple transcripts. In contrast, many mRNAs encoded multiple transcripts. In this study, the lncRNAs had fewer exons, fewer transcripts, and shorter transcript lengths than the mRNAs, similar to the results obtained for lncRNAs in other species. In addition, the lncRNAs exhibited lower expression levels than the mRNAs (Supplementary Figure 1).

Differential Expression Analysis
There were 2,143 DEGs (Supplementary Table 4) and 3,311 DELs (Supplementary Table 5) in the Baiyu library compared with the Qiufeng library. Among these genes and lncRNAs, 954 genes and 1,057 lncRNAs were upregulated, and 1,189 genes and 2,254 lncRNAs were downregulated. A total of 3,403 genes and 6,608 lncRNAs were differentially expressed in the Nd-s D library relative to the Qiufeng library (Supplementary Tables 6, 7). Among these, 1,805 genes and 3,406 lncRNAs were upregulated, including 637 Nd-s D strain-specific genes, and 1,598 genes and 3,202 lncRNAs were downregulated. In addition, 6,865 and 8,203 DEGs and DELs were identified between the Qiufeng library and the Nd library, respectively (Supplementary Tables 8, 9). In the Nd library compared with the Qiufeng library, 5,716 and 5,741 genes and lncRNAs were significantly upregulated, respectively, including 1,055 Nd strain-specific genes, 1,149 and 2,462 genes and lncRNAs were significantly downregulated. Scatter plots were drawn to represent the DEGs and DELs (Figure 3). Furthermore, to determine the primary patterns of gene expression, hierarchical clustering analysis of all DEGs and DELs was further conducted based on the FPKM values using pheatmap (Figure 4).
We further compared the DEGs in Qiufeng vs. Baiyu, Qiufeng vs. Nd-s D , and Qiufeng vs. Nd. The expression of fibroin genes (FibH,FibL,p25) in the Nd strains was significantly downregulated, and the fold change in FibH was >6-fold. The expression of SGF2 and Bmdimm was significantly decreased in Nd-s D , and the expression of Bmdimm was also lower in Qiufeng vs. Nd than in Qiufeng vs. Baiyu, and 23 rRNA genes related to protein biosynthesis were downregulated ( Table 1). Many genes related to RNA degradation, ubiquitin-mediated proteolysis, Hsps, autophagy, and apoptosis were upregulated in Nd. Three upregulated genes related to RNA degradation were identified in the PSG in Nd-s D , and many upregulated DEGs were also identified as being related to ubiquitin-mediated proteolysis, autophagy and apoptosis (Supplementary Table 10).

Analysis of Pre-miRNAs
LncRNAs may function as small RNA precursors; the long transcripts can be sequentially cleaved by Drosha and Dicer into multiple smaller RNAs, and each mature miRNA can perform its own function in specific subcellular structures (Wilusz et al., 2009). Mature miRNAs can act on multiple sites, inhibiting translation processes and affecting gene silencing, so identifying miRNAs and their target genes can aid in understanding of the PSG regulatory process. A total of 1,292 predicted positions on 961 lncRNAs were compared to miRNA precursors and were found to correspond to 183 miRNAs (Supplementary Table 11). Previous studies have confirmed that bmo-miR-275 has negative effects on the expression of SGF1 and BmSer-2, and SGF1 is an important transcription factor regulating the expression of FibH, Ser-1 and P25. Another study on miRNAs in the PSG has predicted that FibH can be targeted by bmo-mir-3334 and that bmo-miR-305 can simultaneously target FibL and SGF1. Notably, 11 lncRNAs were found to have the potential to be processed into mature bmo-mir-2774, which can target SGF3.

LncRNA Functions in Cis-/ Trans-Regulation
The potential target genes of lncRNAs acting in cis/trans were predicted in order to investigate lncRNA functions. A total of 17,874 lncRNAs were screened as potential regulators of 18,048 mRNAs within 10 kb upstream and 20 kb downstream, forming a total of 12,931 lncRNA-mRNA pairs (Supplementary Table 12). We sought to further elucidate the cis-regulatory mechanisms and identified 3,516 pairs of lncRNAs and mRNAs that partially or fully overlapped with each other. The forms of overlap could be divided into ten types. Interaction analysis of complementary lncRNAs-mRNAs was conducted to explore the trans actions of lncRNAs using RNAplex and produced 2,177 lncRNA-mRNA interaction pairs (Supplementary Table 12). The regulation of lncRNAs and mRNAs is not one-to-one, as lncRNAs can regulate one or more mRNAs at the same time; similarly, mRNAs can be regulated by multiple lncRNAs. Further investigation indicated that 811 lncRNAs were located on chromosome 25. Of particular interest was the observation that the fibroin gene Fib H, which has been confirmed to be responsible for the naked pupa phenotype  in a recent study (Hu et al., 2019b) resided within the 10 kb upstream/20 kb downstream of seven lncRNAs. LncRNAs were also screened in the 10 kb upstream and 20 kb downstream of FibL and p25, intimating that synthesis of silk fibroin may be regulated by lncRNAs and adjacent protein-coding genes. In addition, silk gland transcription factors involved in transcriptional regulation of fibroin genes were detected, and three lncRNAs were identified in the 20 kb downstream of SGF1 and SGF2 (Table 2).
We further studied the target genes of DELs in Qiufeng vs. Baiyu, Qiufeng vs. Nd-s D and Qiufeng vs. Nd (Table 3). We detected a large number of upregulated target genes involved in autophagy and apoptosis in both Nd-s D and Nd. Apart from this, two target genes related to RNA degradation and eight target genes related to ubiquitin-mediated proteolysis were differentially upregulated in Nd-s D ; more upregulated target genes were also identified in RNA degradation and ubiquitinmediated proteolysis, and 14 differentially downregulated rRNA genes related to protein biosynthesis were identified in Nd.

GO and KEGG Pathway Enrichment Analysis of DEGs
The DEGs in Qiufeng vs. Baiyu were enriched for 42 GO terms, including the metabolic process, cell, binding and catalytic activity terms. More upregulated DEGs were mapped to the cellular process, metabolic process, binding and biological regulation terms in Qiufeng vs. Nd-s D than in Qiufeng vs. Baiyu. A large number of upregulated DEGs in Qiufeng vs. Nd were enriched not only for the cellular process, metabolic process, binding and catalytic activity terms but also for the biological regulation and response to stimulus terms.
KEGG pathway analysis of the DEGs in Qiufeng vs. Baiyu, Qiufeng vs. Nd-s D and Qiufeng vs. Nd were performed, and the top 20 pathways for the three groups are shown in Figure 5. The enrichment results for Qiufeng vs. Baiyu showed that the DEGs were mapped mainly to the focal adhesion pathway and some signaling pathways, such as the PI3K-Akt signaling pathway, the Rap1 signaling pathway, and the Jak-STAT signaling pathway. A majority of DEGs in Qiufeng vs. Nd-s D were enriched in metabolic pathways, pyrimidine metabolism, nucleotide excision repair (NER) and some signaling pathways involved in the regulation of growth and tissue differentiation. In addition, DEGs were significantly enriched in disease-/cancer-related pathways and immune-related pathways, such as the T/B cell receptor signaling pathway and natural killer (NK) cell-mediated cytotoxicity. Among the top 20 pathways in Qiufeng vs. Nd were the cell cycle, notch signaling, cancer-related and natural killer cell-mediated cytotoxicity pathways, in which many mRNAs were enriched. Numerous DEGs were also matched to NER, base excision repair (BER), mismatch repair (MMR) and mRNA surveillance pathways. DEGs in Nd-s D and Nd were significantly enriched in the proteasome pathway and in pathways related to amino acid (valine, leucine, isoleucine, and lysine) degradation. A large number of DEGs in Nd were also enriched in the protein digestion and absorption (n = 118) and ubiquitin-mediated proteolysis (n = 96) pathways. Notably, protein processing in the endoplasmic reticulum (ER) was a pathway of significant enrichment for both Nd-s D and Nd (Supplementary Table 13).

GO and KEGG Pathway Enrichment Analysis of the DELs
The target genes of the DELs in the PSG in Qiufeng vs. Baiyu, Qiufeng vs. Nd-s D , and Qiufeng vs. Nd were annotated (Supplementary Figure 3). Many downregulated target genes in Qiufeng vs. Baiyu were enriched for the cellular process, metabolic process, cell, macromolecular complex, binding and Ribosomal protein BMSK0008917 Mitochondrial 28S ribosomal protein S14 0.5 0.7 −11.5 catalytic activity terms. The upregulated target genes in Qiufeng vs. Nd-s D were mainly annotated with the response to stimulus, development process, growth, and biological adhesion terms. A considerable number of upregulated target genes in Qiufeng vs. Nd were enriched for the cellular process, metabolic process, response to stimulus, biological regulation, binding and catalytic activity terms. The top 20 enriched KEGG pathways for the target genes in Qiufeng vs. Baiyu, Qiufeng vs. Nd-s D and Qiufeng vs. Nd were shown in Figure 6. The target genes in Qiufeng vs. Baiyu were mainly matched to the focal adhesion pathway and to eight signaling pathways. The pathways in which the target genes in Qiufeng vs. Nd-s D were enriched were roughly divided into three categories: disease, signaling and immunity pathways. Among them, eight disease-related pathways were associated mainly with cancer and hematological diseases, and ten signaling pathways were associated with immune responses, including the T/B cell receptor signaling pathway and the Fc epsilon RI signaling pathway. Notably, another enriched pathway, the estrogen signaling pathway, plays an important role in immune regulation in a variety of cancers and vascular diseases. In addition, the target genes were highly enriched in the NK cell-mediated cytotoxicity and actin cytoskeleton regulation pathways. The target genes in Qiufeng vs. Nd were associated with five diseases. Most of the target genes were enriched in seven signal transduction pathways and in the NK cell-mediated cytotoxicity pathway, which mediates inflammation and the immune response and induces programmed cell death. Finally, we found that a large number of target genes were associated with endocytosis (adjusted p = 0.00003); NER (adjusted p = 0.0015); regulation of autophagy (adjusted p = 0.003); the mRNA surveillance pathway (adjusted p = 0.006); lysine degradation (adjusted p = 0.0003); and valine, leucine and isoleucine degradation (adjusted p = 0.034) (Supplementary Table 14). These results were similar to the enrichment results for the DEGs in Qiufeng vs. Nd.

Expression Profiling of LncRNAs and mRNAs
In this study, RNA-seq was performed to construct RNA libraries, the obtained clean reads were compared to the silkworm genome (SilkDB 3.0) with SOAP, and the transcripts were assembled. In view of the low conservation of lncRNAs across species and tissues, we used three prediction software programs and the Pfam protein database together to predict the coding abilities of the new transcripts in order to reliably distinguish between lncRNAs and mRNAs and to reveal the potential functions of these molecules during PSG development and silk protein synthesis. In this study, 27,589 lncRNAs were compared with previously identified silkworm lncRNAs, and 23,770 novel lncRNAs were identified from the PSG tissues of four silkworm strains. Among them, 4,606 lncRNAs are only expressed in Nd or Nds D . Compared with protein-coding genes, these lncRNAs had significant differences in exon and transcript numbers and in transcript lengths, and the expression levels of the lncRNAs in all four strains were lower than those of the mRNAs. Wang et al. (2017) and Miao et al. (2018) identified lncRNAs in pigs and goats and similarly found that the lncRNAs exhibited fewer exons and transcripts and shorter transcripts than mRNAs. LncRNAs exhibit similar characteristics across different species, indicating that these characteristics may be important for the biological effects of lncRNAs. In this study, the cluster analysis showed that the mRNAs and lncRNAs expressed in Qiufeng, Baiyu, Nds D and Nd were clearly clustered and were classified reasonably well into four clusters. We found that among the mRNAs and lncRNAs, those for Qiufeng and Baiyu clustered closest together; the classification thus indicated that the two strains have high similarity. In contrast, the mRNAs and lncRNAs for Nd formed a separate cluster, indicating that the expression patterns of Nd mRNAs and lncRNAs are significantly different from those of the other three strains. The results of cluster analysis not only intuitively reflected the similarities and differences in PSG mRNAs and lncRNAs among the four silkworm strains but also reflected the objective and reasonable nature of the sequencing data. We further annotated lncRNAs that were assigned to "unknown regions" in the former analysis. If they are located upstream or downstream of a gene, lncRNAs can interact with cis-acting elements and participate in transcriptional regulation.
LncRNAs located in the 3'UTR or downstream of a gene may play other regulatory roles (Carninci et al., 2005). Given that upstream lncRNAs may overlap with promoters or other cisacting elements of a coexpressed gene, thereby regulating gene expression at the transcriptional or post-transcriptional level (Yazgan and Krebs, 2007;Mercer et al., 2009), we carefully classified the overlapping lncRNAs and target genes to help further elucidate the lncRNA-mediated cis-regulation.

Analysis of DEGs and DELs
Studies have shown that the silk mechanical properties and cocoon layers of Qiufeng are better than those of Baiyu, so we hypothesized that there would be DEGs in high-yield silkworm strains compared with lower-yield strains. We identified some specific DEGs with high expression levels in Qiufeng, such as the eukaryotic transcription initiation factors eIF2alpha (BMSK0011652) and eIF2beta (BMSK0003276), glycotransferase (BMSK0005817), and leucine tRNA ligase (BMSK0007808), which play important roles in transcription, protein translation and cell proliferation. In addition, Bmdimm (BMSK0010050) was also significantly upregulated in Qiufeng. These DEGs may enhance the ability of Qiufeng to synthesize silk proteins through transcription-and metabolism-related pathways.
There were fewer DEGs and DELs in the Qiufeng vs. Baiyu comparison than in the other two comparisons. Interestingly, many of the identified mRNAs and lncRNAs were expressed in only Nd or Nd-s D , which suggests that mickle mRNAs and lncRNAs are species-specific. These RNAs could regulate the development of the silk gland through specific spatiotemporal expression and in turn affect the yield and quality of silk.
The DEG analysis showed that the expression of silk proteinencoding genes and transcription factors was significantly downregulated in Nd, suggesting that the formation of naked pupae may be related to a significant decrease in the overall silk protein synthesis capacity. Studies have shown that SGF2 is an important transcription factor involved in the regulation of FibL expression (Ohno et al., 2013;Kimoto et al., 2014), and    the expression levels of SGF2 and Bmdimm were significantly reduced in Nd-s D in the current study, indicating that the FibL synthesis ability of Nd-s D is weakened. The sericin cocoon formation of Nd-s D may be related to the decreased transcription factor activity. In addition, lncRNAs located within a range of 10 kb upstream/20 kb downstream of silk protein-encoding genes and silk gland transcription factors were all downregulated in Nd. The expression of LTCONS_00013360, which has a potential cisregulatory effect on FibL, was also significantly downregulated in Nd-s D . These associations suggested that these lncRNAs may downregulate the biosynthesis of silk fibroin. Our analyses revealed that many DEGs and target genes of DELs related to ubiquitin-mediated proteolysis were upregulated in both Nd-s D and Nd. Nd-s D and Nd synthesize a large number of abnormal proteins due to mutations in FibL and FibH. Ubiquitin-mediated proteolysis is a necessary process for targeted degradation of most short-lived proteins in eukaryotic cells, including cell cycle regulatory proteins and proteins that are not properly folded in the ER, and timely degradation of cell cycle regulatory proteins is important for regulating cell division (Amm et al., 2014). We identified many upregulated DEGs and target genes of DELs that were related to RNA degradation, the proteasome and proteolysis. In addition, the expression of ribosomal proteins involved in protein synthesis and secretion was significantly decreased. These findings indicate that lncRNAs not only exert potential negative regulatory effects on fibroin synthesis but also promote protein degradation. mRNA molecules and ribosomal proteins fulfill indispensable functions in the protein synthesis process; for example, the ribosomal protein S9 is required in the early steps of ribosome biogenesis and is important for silk protein synthesis (Li et al., 2017). mRNA degradation, proteolysis and downregulation of the expression of large amounts of ribosomal proteins will greatly reduce silk production. Apart from the above results, we found that multiple Hsp genes related to the stress response were upregulated in Nds D and Nd, indicating that the PSG cells of Nd-s D and Nd are under stress and that Hsp is essential for homeostasis of the PSG cell environment, protein synthesis, transport, and folding. Unexpectedly, many DEGs and target genes of DELs related to autophagy and apoptosis were also identified in Nd-s D and Nd. However, PSG cell autophagy and apoptosis generally occur in the metamorphic stage; with the discharge of silk fibroin, the PSG gradually shrinks during the spinning process and eventually degrades. Premature activation of the two programmed cell death mechanisms (autophagy and apoptosis) likely causes the PSG tissue to dissociate and disappear earlier than normal, directly affecting the synthesis and secretion of silk protein.

GO and KEGG Pathway Enrichment Analysis
We performed GO and KEGG pathway enrichment analysis of the DEGs and DELs. DEGs enriched for basic cell processes such as cell metabolism, catalysis, and biological regulation were mainly downregulated in Baiyu compared with Qiufeng, indicating that Baiyu's physiological activity is weakened. In contrast, a large number of genes enriched for these biological processes were upregulated in Nd-s D and Nd, indicating that Nd-s D and Nd have enhanced metabolic activity but that this enhanced metabolic activity is not used to increase silk protein production. Similar enhancement of metabolism in the low-yield strain ZB is used not for the synthesis of silk protein but rather for growth and death (Wang et al., 2016).
KEGG pathway analysis of the DEGs in Qiufeng vs. Baiyu revealed that major genes were enriched in some signaling pathways and that other genes were involved in the regulation of basic cellular processes. The DEGs in Qiufeng vs. Nd-s D and in Qiufeng vs. Nd were mainly enriched in three metabolic pathways associated with cell growth; proliferation; and survival, diseases and immunity. These findings suggest that the mutant strains likely use metabolic modulation to maintain homeostasis and survive. The atrophy and degeneration of the PSG observed in Nd-s D and Nd actually result from malignant transformation. Some previous studies have confirmed that accumulation of damaged proteins caused by mutation/misfolding/aggregation disrupts cell homeostasis and endangers survival under severe stress conditions (Buchberger et al., 2010;Michalak, 2010). Since Nd and Nd-s D are silk fibroin-deficient mutants, many more target genes were mapped to immune/stress-related pathways than to other pathways; differential regulation of such genes enabled the mutants to cope with their silk fibroin deficiency. NK cells participate in early defenses against various forms of stress, such as malignant transformation, among autologous cells. NK cells release cytotoxic granules onto the surfaces of bound target cells, and the effector proteins contained in these granules penetrate the cell membrane and induce programmed death of PSG cells. More DEGs were associated with NER and MMR in Qiufeng vs. Nd-s D and in Qiufeng vs. Nd than in Qiufeng vs. Baiyu. However, BER enrichment was observed only for DEGs in Qiufeng vs. Nd, which implies that severe damage to PSG cells occurs in Nd. These enzymatic pathways are the main cellular mechanisms by which various structural abnormalities (DNA damage) are identified and correct and by which damage is eliminated (Sancar and Reardon, 2004;Li, 2008;Coey and Drohat, 2017). In addition, the mRNA surveillance pathway is a QC mechanism for detection and degradation of abnormal mRNA molecules that includes non-sense-mediated mRNA decay (NMD), non-stop mRNA decay (NSD) and no-go decay (NGD) processes. Among these processes, NMD and NGD were activated in Nd. Target genes were only significantly enriched in the NER and mRNA surveillance pathways in Qiufeng vs. Nds D and Qiufeng vs. Nd. It has been demonstrated that the Nd and Nd-s D phenotypes are caused by deletion of exons of FibH and FibL, respectively. Failure of elimination of damaged DNA or abnormal mRNA molecules can lead to mutagenesis, cancer and cell death, and lncRNAs can participate in activating NER and the mRNA surveillance pathway to eliminate intracellular DNA and mRNA damage. Therefore, we speculate that the PSG cell damage in Nd-s D and Nd directly hinders the synthesis and secretion of silk proteins. The results suggest that lncRNAs expressed in Nd exhibit certain functions that downregulate silk protein synthesis.
It is worth noting that target genes in Nd were also enriched for processes related to degradation of lysine, valine, leucine and isoleucine. These amino acids are indispensable constituents of silk fibroin. We analyzed the amino acid sequence of FibH and found that the four amino acids valine, leucine, isoleucine and lysine accounted for 24.5% (37 aa/151 aa) of the N-terminal domain of FibH. Silkworms synthesize large amounts of silk proteins on the third day of the fifth instar, but a large number of DEGs were enriched in amino acid degradation, protein digestion/absorption, and proteasome pathways in the mutants, suggesting that the synthesized proteins were continuously degraded.
Accumulation of unfolded or misfolded proteins in the ER can cause ER stress. Eukaryotic cells have many different mechanisms to deal with accumulation of unfolded proteins in the ER, such as the unfolded protein response (UPR) (Kudo et al., 2002). Nds D alleviates ER pressure by activating the proteins activating transcription factor 6 (ATF6) and JUN N-terminal kinase (JNK) in the UPR reaction to initiate apoptosis. On the one hand, Nd copes with ER stress through the actions of ATF6 and inositol-requiring protein 1 (IRE1). On the other hand, misfolded proteins activate the ER-associated degradation (ERAD) system, which ubiquitinates abnormal proteins that are transferred from the ER to the cytoplasm; the ubiquitinated proteins are then degraded by the proteasome. Similarly, activation of ERAD in Nd-s D helps the mutants to deal with ER stress (Figures 7A-C). We further studied DEGs in the proteasome pathway and found that the activity of regulatory particles and core particles (20S proteasome) was significantly enhanced, indicating that proteasome pathways are activated in Nd-s D and Nd (Figures 7D-F). Additionally, the upregulated target genes of the DELs in Nd were also enriched in the ERAD and proteasome pathways (Supplementary Figure 4), implying that lncRNAs participate in misfolded protein degradation through ERAD and the 26s proteasome to release the pressure on the ER caused by the continuous aggregation of synthesized proteins. ER stress is a prominent feature of many neurodegenerative diseases related to protein misfolding and aggregation, including amyotrophic lateral sclerosis (ALS), Parkinson's disease and Alzheimer's disease (Matus et al., 2011). Other previous studies have revealed consistent results indicating that DEGs in Bombyx mori with low silk yield are enriched in neurodegenerative disease pathways (Wang et al., 2014;Hu et al., 2019a). We hypothesize that the ER is under long-term stress due to the continuous accumulation of misfolded FibH, which causes irreversible damage to PSG cells. Nd PSGs gradually atrophy and degenerate, which ultimately leads to naked pupa formation and loss of the ability to synthesize and secrete silk proteins.

CONCLUSION
The potential lncRNA-and mRNA-mediated regulation underlying the low silk production in Nd-s D and Nd may be mainly related to four molecular mechanisms. First, decreases in the expression of silk protein genes and their transcription factors may cause the generation of sericin cocoons in Nd-s D . These genes are more downregulated in Nd than in Nd-s D , and their downregulation eventually leads to naked pupa. Second, the expression of autophagy and apoptosis-related genes is upregulated in Nd-s D and is more significantly upregulated in Nd. mRNAs and lncRNAs activate autophagy and apoptosis in the PSG through cis-regulation and promote dissociation and death of PSG cells. Third, decreases in the levels of genes encoding ribosomal proteins impair protein synthesis in Nd, and the numerous upregulated DEGs and target genes associated with RNA/protein degradation in Nd-s D and Nd indicate that proteins are continuously digested and degraded. Fourth, the truncated FibL and FibH produced by Nd-s D and Nd activate the UPS and ERAD to transfer abnormal proteins from the ER to the cytoplasm, where they are ubiquitinated. These proteins are subsequently degraded by the proteasome. The ER pressure caused by the constant accumulation of mutant FibL and FibH greatly hinders the ability of Nd-s D and Nd PSG cells to synthesize and secrete silk proteins.

Silkworm Strains and Samples
Domesticated silkworm strains with various quality and yield characteristics were selected in this study to enhance construction of lncRNA libraries and to eliminate bias due to strain specificity. The mutant strains Nd and Nd-s D (Mori et al., 1995;Inoue et al., 2005) and the wild-type strains Qiufeng and Baiyu were reared on fresh mulberry leaves under standard conditions until the third day of the fifth instar. For better lncRNA profiling and eliminating strain-specific effects, the tested samples of Qiufeng and Baiyu each contain 10 PSG tissues from different silkworms, Nd and ND-s D contain 40 PSGs from different silkworms. Each sample contained many silkworm individuals, which helps to minimize the biological error as much as possible. The PSGs were then dissected from the silkworms, washed with precooled 0.7% NaCl and stored in liquid nitrogen until further analysis.

RNA Isolation, LncRNA Library Preparation, and Illumina Sequencing
The PSGs of all individuals of each strains were mixed together separately, and then ground into powder as a pool for RNA extraction. Total RNA was extracted using TRIzol reagent (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's instructions, with an average concentration of 3391.7 ng/µl. The RNA concentration and quality were measured using a Nano Drop 2000 (Thermo Scientific, Wilmington, DE, USA). Agarose (1%) gel electrophoresis was performed to confirm the integrity of the extracted total RNA, which was then stored at −80 • C for future use. All 4 samples had an RNA integrity number (RIN) value >9.0 (Supplementary Table 15).
For lncRNA-seq, a total amount of 3 µg of RNA per sample was used as input material for RNA library construction. After extracting the total RNA, the mRNAs and ncRNAs were enriched by removing rRNA from the total RNA with a kit. The mRNAs and ncRNAs were fragmented into short segments (approximately 200∼500 nt) with fragmentation buffer, and first-strand cDNA was then synthesized with random hexamer primers using the fragments as templates. dTTP was replaced with dUTP during synthesis of the second strand. The short fragments were purified and resolved with EB buffer for end reparation and single adenine (A) nucleotide addition. After that, the short fragments were connected with adapters, and the second strands were degraded using uracil-N-glycosylase (UNG) (Parkhomchuk et al., 2009). After agarose gel electrophoresis, suitable fragments were selected for PCR amplification as templates. An Agilent 2100 Bioanalyzer and an ABI StepOnePlus Real-Time PCR System were used for quantification and qualification of the sample libraries in the quality control (QC) steps. Finally, the libraries were sequenced using an Illumina HiSeq TM 2000.

Quality Analysis, Mapping, and Transcriptome Assembly
We first compared the reads to the ribosomal database using the short reads alignment tool Short Oligonucleotide Analysis Package (SOAP) , allowing up to 5 mismatches. We removed the reads aligned with ribosomal sequences and used the retained data for subsequent analysis. Clean reads were obtained by removing reads with adapters, reads containing poly-N sequences (N > 10%), and low-quality reads whose numbers of bases with mass values (Q-values) ≤ 10 accounted for more than 50% of the total reads. The filtered data were aligned to the silkworm genome (https://silkdb.bioinfotoolkits.net) using the software HISAT2 (v2.0.4, parameters: -phred64 -sensitive -nodiscordant -no-mixed -I 1 -X 1000) (Kim et al., 2015). Then, the reads were assembled with StringTie (v1.0.4, parameters: -f 0.3 -j 3 -c 5 -g 100 -s 10000 -p 8) (Pertea et al., 2015). All the reconstructed transcripts were compared with known mRNA and lncRNA sequences to obtain information about their positional relationships. Regarding the assembly results for a single transcript, we required each transcript to meet the following requirements: (1) a fragments per kilobase of exon per million mapped fragments (FPKM) value ≥0.5, (2) a coverage value >1, and 3) a length>200. Classification and statistical analysis of the lncRNAs were conducted with the NONCODE database (Bu et al., 2011). Finally, we used Cufflinks gffread software (v2.2.1, parameter: -p 12) to generate the transcript sequence assembled by StringTie (Pertea et al., 2015), and the merged transcripts were retained for subsequent analysis.

Sequence Analysis and LncRNA Identification
After transcript reconstitution, the coding abilities of the transcripts were predicted to distinguish between mRNAs and lncRNAs using four software programs, the Coding Potential Calculator (CPC) (Lei et al., 2007), txCdsPredict, and the Coding-Non-Coding Index (CNCI) (Liang et al., 2013), and the Pfam database (Finn et al., 2015). All three prediction software programs scored the coding abilities of the transcripts and then set a scoring threshold to distinguish between lncRNAs and mRNAs. Transcripts that could be compared to the Pfam protein database were considered to be mRNAs, while the others were defined as lncRNAs. Only when the results of at least three of the four judgment methods were consistent did we consider that a transcript was definitively an mRNA or a lncRNA. The scoring thresholds for the three software programs were as follows: CPC, 0 (transcripts with a value >0 were considered mRNAs, and those with a value <0 were considered lncRNAs); CNCI, 0 (transcripts with a value >0 were considered mRNAs, and those with a value <0 were considered lncRNAs); and txCdsPredict, 500 (transcripts with a value >500 were considered mRNAs, while those with a value <500 were considered lncRNAs). All predicted lncRNAs were aligned to lncRNAs collected in BmncRNAdb and previous study (Wu et al., 2016;Zhou et al., 2018). We introduced NCBI Blast tool for the comparison which was based on sequence similarity (Parameters: blastall -e 1e-5 -p blastn -a 8 -d xx.fa -i xx.fa -o blast). Only those blast hits with %identity >90% and % query_coverage >85% were considered to be already annotated transcripts.

Analysis of DEGs
The expression levels of both lncRNAs and coding genes in each sample were calculated in FPKM values using RSEM (Li and Dewey, 2011). The formula used to calculate FPKM was as follows: FPKM = 10 6 C NL/10 3 We defined the expression level of gene A as FPKM(A). In the equation, C is the number of unique fragments aligned to gene A, N is the total number of unique fragments aligned to the reference genome, and L is the number of bases in the coding region of gene A. The FPKM method can eliminate the influences of differences in gene length and sequencing amounts on the calculation of gene expression. The calculated gene expression can be used to directly compare the expression of genes between different samples. The PoissonDis method (Audic and Claverie, 1997) was used to analyze the DEGs and DELs in the Qiufeng vs. Baiyu, Qiufeng vs. Nd-s D and Qiufeng vs. Nd comparisons with filter parameters of a fold change ≥ 2.00 and a false discovery rate (FDR) ≤ 0.001.

Analysis of LncRNA-miRNA Interactions
Given that lncRNAs may function as precursors of miRNAs, the lncRNA sequences were aligned to miRNAs published in miRBase (Griffiths-Jones et al., 2006) (http://www.mirbase. org/) in order to identify the known miRNA precursors and their secondary structures using BLASTN with a hit coverage threshold ≥90%. In addition, the interactions between lncRNAs and miRNAs were predicted using the Support Vector Machine (SVM)-based software miRanda (Wu et al., 2011).

LncRNA Functions in Cis-/Trans-Regulation
The functions of lncRNAs can be reflected through analysis of the genes that are cis-and trans-regulated by the lncRNAs. Cisregulation refers to the interaction of lncRNAs with neighboring target genes located either upstream or downstream. In this study, the 10 kb upstream and 20 kb downstream of the lncRNAs were investigated for potential coding genes. LncRNAs may play a trans-acting roles by binding to the sense strands of mRNA molecules. To further reveal the potential antisense lncRNA-mRNA interactions, we searched all antisense lncRNA-mRNA duplexes with complementary base pairing using RNAplex (Carrieri et al., 2012) with the ViennaRNA package, which predicts the best base-base pairing by using a minimum free energy algorithm according to thermodynamic features.

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Analyses
Using Blast2GO software, GO analysis of the DEGs and target genes of the DELs was performed based on Nr annotation information deposited in NCBI (https://www.ncbi.nlm.nih.gov/). For pathway enrichment analysis, a hypergeometric test was used to identify pathways that were significantly enriched for the DEGs compared to the entire genome according to the KEGG (http://www.genome.jp/kegg/). To evaluate whether the candidate genes were enriched for a certain GO/KEGG pathway, we used a hypergeometric model to calculate the degrees of enrichment of the candidate genes; a smaller Pvalue was associated with greater enrichment of a gene in the corresponding pathway. Then, we performed FDR correction of the P-values, and a corrected P value <= 0.01 was set as the threshold for significant enrichment. Each P-value was calculated as follows:

DATA AVAILABILITY STATEMENT
The raw sequence reads has been uploaded to Sequence Read Archive (SRA) (http://www.ncbi.nlm.nih.gov/) with accession PRJNA634502.

AUTHOR CONTRIBUTIONS
BZ designed, supervised, and coordinated the study. JR, MW, XY, SZ, JL, LY, and ZY performed the study, analyzed, and interpreted the data. BZ and JR were involved in analyzing and processing the sequencing data. JR wrote the manuscript with the help of BZ. All authors read and approved the final manuscript.