Digital RNA Sequencing of Human Epidermal Keratinocytes Carrying Human Papillomavirus Type 16 E7

High-risk human papillomavirus (HPV) infections are the predominant cause of cervical cancer and its early gene E7 plays an important role in cellular proliferation and cell-cycle progression. While tremendous progress has been made in exploring the molecular mechanisms in late tumorigenesis, many pathways showing how HPV deregulates host gene expression in early inapparent infections and early tumorigenesis still remain undefined. Digital RNA sequencing was performed and a total of 195 differentially expressed genes were identified between the HPV16 E7-transfected NHEKs and control cells (p < 0.05, fold-change > 2). GO enrichment showed that HPV16 E7 primarily affected processes involved in anti-viral and immune responses, while KEGG pathway analysis showed enrichment of gene clusters of associated with HPV infection and MAPK signaling. Of the differentially expressed genes, IFI6, SLC39A9 and ZNF185 showed a strong correlation with tumor progression and patient survival in the OncoLnc database while roles for AKAP12 and DUSP5 in carcinogenesis and poor prognosis have previously been established for other cancer types. Our study identified several novel HPV16 E7-regulated candidate genes with putative functions in tumorigenesis, thus providing new insights into HPV persistence in keratinocytes and early onset of tumorigenesis.


INTRODUCTION
Cervical cancer ranks fourth globally among the most abundant female malignancies, with about 530,000 new cases and 270,000 deaths each year. It has been estimated that 85% of cervical cancer deaths occur in underdeveloped or developing countries, and mortality rates in low-and middle-income countries are 18 times higher than those in richer countries (Small et al., 2017). It is well-established that cervical cancer is associated with persistent human papillomavirus (HPV) infections and high-risk HPV types can be detected in 99.7% of the cervical cancer cases (Tummers and Burg, 2015).
Human papillomavirus are a family of small, non-enveloped viruses containing a circular double-stranded DNA genome. Six early genes (E1, E2, E4, E5, E6 and E7) play distinct roles in viral replication and cell transformation. E6 and E7 have shown to interact with various host proteins and carry out many modulatory functions (Moody and Laimins, 2010) as well as to block negative regulators of the cell cycle within the infected cell. The majority of HPV infections are asymptomatic and self-resolving. Once HPVs evade host immune defenses and establish persistence in basal keratinocytes, genetic alterations will accumulate and cells infected with high-risk HPV types can ultimately be driven into invasive cancer cells (Crosbie et al., 2013;Westrich et al., 2017) and early unapparent infections will finally develop into late-stage clinical neoplasm. However, it is still unclear which pre-cancerous lesions might progress (Schiffman et al., 2016).
Among high-risk HPVs, the most common type is HPV16 (Smith et al., 2007) and it has been shown that E7 plays an important role in the process of cellular proliferation and cell-cycle progression (Moody and Laimins, 2010). HPV16 E7 promotes the degradation of retinoblastoma protein (Rb), which leads to oncogenic transformation (Boyer et al., 1996). HPV displays a high affinity for mucosal and cutaneous epithelia and only infects basal proliferating keratinocytes (Doorbar, 2006). In addition, the HPV infectious life cycle is closely restricted by the differentiation state of the infected host cells (Doorbar et al., 2012), suggesting that primary human keratinocytes are the natural host for HPV infection. Therefore, primary normal human epidermal keratinocytes (NHEKs) are considered the ideal cell model for studying the mechanisms of HPV infection.
Digital RNA sequencing is a high-throughput sequencing method that is able to remove amplification biases. When constructing a sequencing library, PCR amplification steps are usually required to reach sufficient yields. However, targetsequence-dependent preferences during PCR amplification may lead to different amplification multiples of each target sequence. What is generated by PCR amplification is called duplication (Aird et al., 2011). The existence of duplication in traditional RNA sequencing will cause deviation between the quantitative results of subsequent gene expression and the true abundance of the sequence. However, during digital RNA sequencing molecular tags (UMI; unique molecular identifier) are added to unique mRNA transcripts, which prevent duplication and improves the accuracy of transcriptome quantification.
In this study, HPV16 E7-transfected NHEKs were investigated by digital RNA sequencing to identify novel genes, functions and signal transduction pathways to further explore E7-dependent mechanisms involved in HPV16 infection.

Cell Culture and Cell Transfection
HPV16-containing plasmid DNA (ATCC:45113D) was used as the template for PCR amplification and cloning of the HPV16 E7 gene into vector pEGFP-C1 following protocols described previously. NHEKs (ScienCell, Carlsbad, CA, United States) were cultured in 6-well plates using EpiLife culture medium (Gibco, United States) with 10% fetal bovine serum (Gibco, United States) and Human Keratinocyte Growth Supplement (Gibco, United States). NHEKs at 80% confluence were transfected with pEGFP-16E7 and pEGFP-C1 using Lipofectamine 3000 reagent (Invitrogen, Carlsbad, CA, United States). Six hours after Transfection, culture medium was replaced with fresh culture medium.

RNA Isolation and Digital RNA Sequencing
Total RNA was isolated and purified using TRIzol reagent (Invitrogen, LifeTechnologies, United States) following the manufacturer's procedures. RNA yields and sample purity were quantified using NanoDrop ND-1000 (NanoDrop, Wilmington, DE, United States). Digital RNA sequencing was conducted by LC-BIO Technologies Co., Ltd (Hangzhou, China). Each fragment was tagged with UMI tags (Islam et al., 2014). In the process of amplification, the products sharing the same UMI tag were merged and incorrect bases were corrected by multiple sequence alignment, thus effectively removing the duplication and retaining the true gene expression (Shiroguchi et al., 2012;Pflug and von Haeseler, 2018).

Cluster Analysis of Differentially Expressed Genes
By grouping genes with the same or similar expression patterns, the function of unknown genes or unknown functions of known genes can be identified. In order to visualize the gene clustering expression pattern, the Fragments Per Kilobase of transcript per Million mapped reads (FPKM) value of the differential gene was transformed with the Z value (see the formula below) to draw a clustering diagram.

Gene Ontology (GO) Enrichment Analysis
Gene Ontology annotation is divided into three primary categories: Biological process, Cellular component and Molecular function, and explains the biological role of genes from different perspectives. GO functional significance enrichment analysis was firstly mapped on all significant differentially expressed genes to each term of the Gene Ontology database, then the number of genes in each term was calculated and a hypergeometric test was applied to find significantly enriched GO items compared with the whole genomic background. The histogram and scatterplot of GO enrichment analysis results reflected the number distribution of differentially expressed genes on the GO term enriched in Biological processes, Cell components and Molecular functions.

KEGG Pathway Enrichment Analysis
In vivo, different genes coordinate with each other to perform their biological functions. KEGG pathway enrichment analysis can determine the most important biochemical metabolic pathways and signal transduction pathways in which differentially expressed genes participate. The abundant pathway information in the KEGG database helps to understand the biological functions of genes from a systematic level, such as metabolic pathways, transmission of genetic information and some complex biological functions such as cell processes.

Immunofluorescence Assays
HPV16 E7-transfected and control NHEKs were prepared for immunofluorescence assays at 80% confluence. Cells were washed with PBS and fixed with 4% paraformaldehyde (Solarbio, Beijing, China) for 20 min at room temperature. The fixed cells were washed and incubated in 0.5% (v/v) Triton X-100 (Solarbio, Beijing, China) for 10 min. Subsequently, cells were blocked with 10% goat serum in 1% bovine serum albumin (BSA) for an hour at room temperature and incubated overnight at 4 • C with rabbit polyclonal anti-HPV16 E7 antibody (1:800 dilution), which was prepared by our laboratory previously (Zheng et al., 2016). Cells were subsequently washed and incubated with Alexa Fluor 594conjugated secondary donkey anti-rabbit IgG (1:400 dilution, Yeasen, Shanghai, China) for an hour at 37 • C in a dark box. Finally, the cell nuclei were stained with 4,6-diamidino-2phenylindole (DAPI, 1:1000 dilution, Abcam) for 10 min at room temperature. Cells were observed under a fluorescence microscope (Olympus cellSens 2.3).
Quantitative Real-Time PCR Analysis cDNA was generated from the total RNA using the ReverTra Ace R qPCR RT Master Mix (TOYOBO, JAPAN). TB Premix Ex TaqTM (Takara, Japan) was used for quantitative real-time PCR analysis according to the manufacturer's protocols. Each reaction was run in triplicate. The relative gene expression levels were normalized with human beta-actin as internal reference following the 2-CT method. Primer sequences are provided in Supplementary Table S1.

Survival Rate Analysis
OncoLnc public database 1 is a comprehensive interactive tool which contains patient survival data of 21 cancer types from TCGA projects. The patients were divided into 2 subgroups according to gene expression levels by setting the lower and higher percentile to 20. In this study, we analyzed the relationship between specific gene expressions and overall survival in cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC). The patient information including Case ID, Days to last follow up, Status, Expression Group, race, age at diagnosis and primary diagnosis are shown in the Supplementary Table S2.

Digital RNA Sequencing of HPV16 E7-Transfected NHEKs
NHEKs were transfected with pEGFP-16E7 as the experimental group and with pEGFP-C1 as the control group, and E7 expression was confirmed by immunofluorescence (Figure 1). To identify novel E7-dependent genes and pathways involved in HPV16 infection, transfected cells were analyzed by digital RNA sequencing. Differentially expressed genes were selected based on the criteria "p < 0.05" and "[logFC] > 1, " which resulted in the identification of 195 significantly differentially expressed genes between the experimental and control group, including 62 up-regulated genes and 133 downregulated genes in the experimental group. A number of differentially expressed genes between the HPV16 E7-expressing NHEKs and control cells were visualized in a heat map (Figure 2), which showed overall good consistency between replicates. The complete list of all 195 genes is shown in Supplementary Table S3.

GO Enrichment Analysis
GO annotation analysis was used to analyze the 195 significantly differentially expressed genes to explore their potential biological function. GO terms were classified into three primary categories: Biological process, Cellular component and Molecular function (Figure 3). In biological process, most of the GO terms are related to virus infection and immune response. In the cellular component category, membrane, nucleus and cytoplasm are the top three components among others and we assume that HPV infection is a comprehensive process with gene pattern change happening in the whole cell. The greatest change in gene expression clustered in molecular function is protein binding, which may indicate the importance of protein-protein interactions during HPV infection. The 20 GO terms with the most significant p-value were displayed in a scatter plot representing the rich factor (ratio of the number of differentially expressed genes to the total number of genes in the GO term or pathway) and number of FIGURE 1 | Verification of E7 expression in HPV16 E7-transfected NHEKs. NHEKs were transfected with the E7-expression vector pEGFP-16E7 or the control vector pEGFP-C1. E7 was detected by immunostaining (red) and the cell nucleus was stained with DAPI (blue).
differentially expressed genes (Figure 4). It appears that HPV16 E7 induced strong anti-viral immune responses, since most differentially expressed genes were related to a variety of direct virus-related responses, such as response to virus (GO: 0009615), defense response to virus (GO:0051607), detection of virus (GO: 0009597), double-stranded RNA binding (GO: 0003725), and cellular response to exogenous dsRNA (GO: 0071360), and to immune responses, such as innate immune response (GO: 0045087), immune system process (GO: 0002376), and type I interferon signaling pathway (GO: 0060337). Furthermore, approximately one third of the differentially expressed genes seem to encode proteins with a membranerelated function and proteins displaying protein-or metalbinding activity.

KEGG Pathway Enrichment Analysis
KEGG pathway enrichment analysis identifies pathways that are particularly abundant among the differentially expressed genes. The 20 KEGG pathways with the most significant p value were plotted in a scatter plot (Figure 4), which showed that the HPV infection (ko05165), with seven differentially expressed genes (ATP6V0A4, COL9A3, FZD8, HES2, ITGB3, OASL, and PPP2R2C), and MAPK signaling pathway (ko04010), also with seven differentially expressed genes (DUSP1, DUSP3, DUSP5, HSPA1A, HSPA6, IL1B, and PLA2G4D) were the most significant pathways. Furthermore, several pathways associated with other viral or intracellular bacterial infectious diseases showed numerous differentially expressed genes, such as Measles (ko05162), Influenza A (ko05164), Tuberculosis (ko05152) and Legionellosis (ko05134). However, the association of these pathways might in part be explained by overlapping genes, since IL1B, HSPA1A, and HSPA6 belong to almost all of these KEGG pathways. Nevertheless, it appears that transfection with only HPV16 E7 already induced a specific response against HPV and other intracellular viral or bacterial pathogens.

Correlation of Differentially Expressed Genes With Cancer Survival
The relation between the 23 differential expressed genes and cancer survival was analyzed using OncoLnc public database (Anaya, 2016). Six genes were identified from the surviving analysis to have a significant impact on cancer survival (Figure 6). HPV16 E7-transfected cells showed reduced expression of IGFBP3, TFRC and ZNF146, however, lower expression of these genes was correlated with increased survival. In contrast, SLC39A9 and IFI6 were induced in HPV16 E7-transfected cells, while for these genes high expression was correlated with reduced survival. Finally, ZNF185 expression was reduced in HPV16 E7-transfected cells, while low expression is correlated with reduced survival. Thus far, these genes have not yet been studied in relation with HPV infections.

AKAP12 and DUSP5 Protein Expression
In our previous study (Zhang et al., 2018), we used keratinocytes to overexpress E7 from HPV type 6b and 16 and identified that AKAP12 was downregulated, which is consistent with AKAP12 expression in the current study. Also, KEGG pathway enrichment analysis showed that the MAPK signaling pathway was the most clustered signaling pathway and among the seven genes enriched in MAPK signaling pathway low DUSP5 expression has already been clearly established as a correlate for carcinogenesis and poor prognosis. Therefore, protein expression levels of AKAP12 and DUSP5 were further verified by Western analysis. Indeed, protein levels were significantly reduced in HPV16 E7-transfected cells compared with the control cells (Figure 7).

DISCUSSION
In this study, NHEKs were transfected with a high-risk HPV type 16 E7-expression vector or control vector and expression profiles were analyzed by digital RNA sequencing. Inevitable random errors in the sequencing process and automated library construction which may be amplified to a certain extent, as well as the relatively low transfection efficiency of primary keratinocytes cause the variability between the intra-groups. In addition, our study focuses on early changes of host cell with HPV 16E7, since relatively low transfection efficiency may reflect natural infection, further selection is not included considering it may maximize or minimize some effects of related genes. As a result, we identified 23 potential candidates among 195 differentially expressed genes. Nevertheless, it is well acknowledged that the inhibition of Rb family which controls the activity of E2F transcription factors for G1-S cell cycle by E7 is important for cellular proliferation and progression to precancer (Moody and Laimins, 2010), the missing of some cell cycle genes such as p16 in the list may mainly due to our transient transfection method which leads to a higher proportion of normal keratinocytes comparing with HPV 16E7 positive cells thus minimizing the effects of these genes. What's more, since no human cancer arises as the acute consequence of infection, the latency periods between infected cells and invasive cancer cells are frequently in the range of 15-40 years (Zur, 2009), it is necessary to further study the role of E7 in HPV-induced persistence and the antiviral defense of host cell in escaping the malignant transformation. Integrated analysis of the differentially expressed genes showed that nine of the 23 consistent differentially expressed genes were related to immunity. Further studies of virusmediated immune dysregulation would also be critical for preventive and therapeutic tools of elimination of virus-infected cells. IFIT1, IFIT2, IFIT3 and IFI44 are all interferon-induced proteins that were up-regulated in HPV16 E7-transfected cells, which is consistent with previous studies on HPV infection  (Boccardo et al., 2010;Kaczkowski et al., 2012;Ambuhl et al., 2017;Klymenko et al., 2017). These genes are commonly regarded as viral restriction factors and those immune alterations are critical for the prevention of HPV-infected cells during cancer progression. The up-regulated gene OASL is also involved in interferon gamma and cytokine signaling. A recent study showed that OASL expression is relatively high in HPV positive cervical cancer, and seemed to be related with resistance to cisplatin (Zhang et al., 2019a). According to the NCBI RefSeq database, the product of the down-regulated gene IGFBP3 forms a ternary complex with insulin-like growth factor acid-labile subunit (IGFALS) and either insulin-like growth factor (IGF) I or II. In this form, it circulates in the plasma, prolonging the half-life of IGFs and altering their interaction with cell surface receptors. It has previously been shown that high-level expression of IGFBP3 stimulated growth inhibition of tumors in FIGURE 5 | Validation of RNA sequencing data by quantitative real-time PCR. The 31 most significant differentially expressed genes displaying high expression abundance were selected.
FIGURE 6 | Correlation of between differentially expressed genes in HPV16 E7-transfected NHEKs and cancer patient survival using the OncoLnc database. The red curves display patient survival when expression of IGFBP3, TFRC, ZNF185, ZNF146, SLC39A9 or IFI6 is high, while the blue curves display survival when expression is low.
an HPV16 E7-induced cervical cancer model (Musil et al., 2014). Furthermore, HPV16 early gene depletion induced expression of IGFBP3 (Hanning et al., 2013). Therefore, HPV16 E7induced repression of IGFBP3 might be related with onset of carcinogenesis. The down-regulated gene TFRC encodes a transferrin receptor and plays an important role in the uptake of cellular iron. But HPV-mediated immune alteration analysis using The Cancer Genome Atlas shows that upregulation of TFRC may relate to a poor prognosis of survival of cervical cancer patients . The down-regulated gene IL2RG encodes an important signaling component of many interleukin receptors, and IL2RG deficiency can result in differential signaling defects (Illig et al., 2019) while its mutation may cause immunodeficiency (Hsu et al., 2015). The high abundance of immunity-related differentially expressed genes indicates that HPV16 E7-mediated immune dysregulation might be important to prevent the elimination of HPV from infected cells during virus persistence. Three of the differentially expressed genes belong to the dual-specificity protein phosphatase subfamily, with DUSP1 and DUSP5 downregulated and DUSP3 upregulated. These negative regulators of the mitogen-activated protein (MAP) kinases show different substrate specificities for various MAP kinases and different modes of induction by extracellular stimuli. These genes appear to play an important role in cellular proliferation and differentiation as well as cellular response to environmental stress (Jeffrey et al., 2007;Patterson et al., 2009). For example, DUSP1expressing Hela cells display a relatively low proliferation rate in vitro (Cheng et al., 2015), while DUSP3 interacts with high-expressing nucleolar proteins involved in DNA repair and senescence in Hela cells (Panico and Forti, 2013). As our data shows, the protein level of DUSP5 is reduced with HPV 16E7 overexpressing. And several studies showed that suppression of DUSP5 expression correlates with cancer cell proliferation, migration, invasion and poor prognosis in different cancer types Wang et al., 2019;Du et al., 2020), but the relationship between DUSP5 and HPV has thus far remained undiscovered. Therefore, these genes may be the candidates for further studies.
The two downregulated genes ZNF185 and ZNF146 belong to the zinc finger protein family, which plays important roles in various cellular functions such as cell proliferation, differentiation, and apoptosis. A previous study showed that ZNF185 silencing strongly impaired keratinocyte differentiation in the head and neck, and in cervical and squamous cell carcinomas (Smirnov et al., 2019), however, functions for ZNF146 have remained unclear to date.
Other down-regulated genes, like LOX, EDN2, FOLR1, and CSRNP1, could not be categorized, although some distinct functions have previously been described. LOX encodes a member of the lysyl oxidase family proteins, which has previously been proposed to promote epithelial to mesenchymal transition in prostate cancer (Gonzalez-Chavarria et al., 2018), and its propeptide domain may display tumor suppressor activities (Agra et al., 2013). EDN2 belongs to the endothelin protein family of secretory vasoconstrictive peptides, and it has already been shown that it is involved in ovulation (Cacioppo et al., 2017). FOLR1 is a member of the folate receptor family, and a previous study showed it is associated with the progression of cervical cancer through the ERK signaling pathway (Liu et al., 2017). CSRNP1 is a key protein of the noncanonical Wnt pathway (Miyasaka et al., 2007), and because of its decreased level of expression in tumor tissues it was suggested to serve as a tumor suppressor (Ishiguro et al., 2001;Wang et al., 2013). Finally, two little-studied genes were identified, the upregulated pseudogene RPL13AP20 and the down-regulated gene SNRPD2, which may participate in lncRNA splicing (Azam et al., 2019).
Previous study has shown that AKAP12 expression is enhanced after trans-uterine arterial chemoembolization (TUACE) in cervical cancer patients (Chen et al., 2019). And we verified that AKAP12 is downregulated in both mRNA and protein level with HPV 16E7 overexpressing, so we deduce AKAP12 may play a role in HPV infection and further carcinogenesis.
HPV16 E7-mediated dysregulation of expression of target genes might be also the complex control of microRNAs (miRNA), long non-coding RNAs (lncRNA) and circular RNAs (circRNA) which are important to prevent the elimination of HPV from infected cells during virus persistence and the consequent tumorigenesis induction. For example, the lncRNA MALAT1 is overexpressed with high risk HPVs infection (Jiang et al., 2014;Yang et al., 2015) while the lncRNA MALAT1/miR-145-5p/AKAP12 axis has been verified in prostate cancer (Xue et al., 2018). In addition, both the luciferase reporter assay and western blot analysis showed DUSP5 could be repressed by miR-181a (Li et al., 2007). To date, several studies have verified LUCAT1/miR-181a axis (Zhang et al., 2019b), miR-181a-5p/TGFβI axis  and miR-181a-5p/MMP14 axis (Shen et al., 2019) in cervical cancer. These might give new insights for further study.

CONCLUSION
Most high-risk HPV infections are transient and will not progress to clinically significant cervical cancer with host immune responses generally able to clear the virus. In the process of persistent infections, gene alterations and changes in cellular phenotype will accumulate and finally drive the infected cells into invasive cancer cells. Cervical cancers are almost exclusively caused by high-risk HPV infections, and HPV16 accounts for the largest portion. HPV E7 has been reported to be closely related with cellular proliferation and oncogenic transformation. Therefore, HPV16 E7-transfected keratinocytes were studied to identify associated genes and pathways. A total of 23 consistent differentially expressed genes were identified, with several of them displaying high credibility for the association between E7 and carcinogenesis. Interestingly, the existence of alternative splicing isoforms of the E7 gene may influence E7 translation (Brant et al., 2019), which may explain some of the observed bias in transcript levels and diversity between biological repeats (Supplementary Figures S1a,b). Digital RNA sequencing allowed in-depth analysis of differential gene expression with high credibility. HPV16 E7 showed a large impact on immune-related networks in keratinocytes. AKAP12, DUSP5, IFI6 and SLC39A9 are five important novel candidate genes related to E7 expression for which future exploration into the molecular mechanisms. Overall, this study has laid the foundation for further studies exploring HPV16 E7 related mechanisms for clearance and persistence of HPV16 in human epidermal keratinocytes.

DATA AVAILABILITY STATEMENT
The raw datasets for this study can be found in the Gene Expression 317 Omnibus (https://www.ncbi.nlm.nih.gov/geo/ query/acc.cgi?&acc=gse145224).