Comprehensive Analysis of the Carcinogenic Process, Tumor Microenvironment, and Drug Response in HPV-Positive Cancers

Human papillomavirus (HPV) is a common virus, and about 5% of all cancers worldwide is caused by persistent high-risk HPV infections. Here, we reported a comprehensive analysis of the molecular features for HPV-related cancer types using TCGA (The Cancer Genome Atlas) data with HPV status. We found that the HPV-positive cancer patients had a unique oncogenic process, tumor microenvironment, and drug response compared with HPV-negative patients. In addition, HPV improved overall survival for the four cancer types, namely, cervical squamous cell carcinoma (CESC), head and neck squamous cell carcinoma (HNSC), stomach adenocarcinoma (STAD), and uterine corpus endometrial carcinoma (UCEC). The stronger activity of cell-cycle pathways and lower driver gene mutation rates were observed in HPV-positive patients, which implied the different carcinogenic processes between HPV-positive and HPV-negative groups. The increased activities of immune cells and differences in metabolic pathways helped explain the heterogeneity of prognosis between the two groups. Furthermore, we constructed HPV prediction models for different cancers by the virus infection score (VIS) which was linearly correlated with HPV load and found that VIS was associated with drug response. Altogether, our study reveals that HPV-positive cancer patients have unique molecular characteristics which help the development of precision medicine in HPV-positive cancers.


INTRODUCTION
Human papillomavirus (HPV) is an important carcinogen since the HPV proteins E6 and E7 are intimately related to the events that cause malignant transformation of HPV-infected cells (1,2). A global case statistics reported that cancers caused by HPV infection account for at least 5% (3). Persistent high-risk HPV infection can cause cancer in many different anatomical sites including cervix, penis, head and neck, lungs, prostate, bladder, and breast (4)(5)(6)(7)(8)(9)(10)(11). Therefore, HPV has received more and more attentions as an independent carcinogen.
Present pan-cancer studies mainly focus on the impact of HPV on the tumor immune microenvironment, and most of them explain the possible benefits of HPV infection to patients from the perspective of immunotherapy. Gameiro et al. explained that the antitumor immunity activated by HPV might be the main source that improved the prognosis of HPV-positive patients in head and neck squamous cell carcinoma and such patients were suitable for immunotherapy (12). Varn et al. highlighted the changes in tumors caused by diverse virus infections and suggested that different families of viruses should be distinguished when designing immunotherapy methods (13). Cao et al. stated that viruses might help tumors escape the PD-1 immune checkpoint pathway in multiple cancer types (14). Tumorigenesis depends not only on the alterations of the tumor microenvironment but also on gene mutations and the synergy of multiple carcinogenic pathways (15)(16)(17). However, those studies considered neither the differences of the carcinogenic processes between the HPV infection and other elements nor the possible impact of the expression level of HPV. Therefore, there is an urgent need for comprehensive and detailed analyses on the carcinogenic process, tumor microenvironment, and even the treatment outcome affected by both HPV and its expression level.
Here, we analyzed a total of 3,542 human samples representing 10 different cancers to describe how HPV caused cancers and shaped the tumor microenvironment at the genomics and transcriptomics level in The Cancer Genome Atlas (TCGA). Survival analysis showed that HPV played an important role for patients' prognosis. Furthermore, we analyzed the differences in the carcinogenic processes between HPV-positive and HPVnegative groups from three aspects: driver genes, genome instability, and mitotic carcinogenic pathways. The results implied that HPV might trigger cancer through the cell-cycle disorder rather than genome instability. The tumor microenvironment is significantly related to the improvement of cancer patient survival and treatment effect (18,19). In order to explain why HPV-infected patients' survival was better than that of non-infected patients, we explored the impact of HPV on immune cell infiltration and metabolic pathway activity in the tumor microenvironment. We also found that the differences in the carcinogenic process and the tumor microenvironment mostly tended to appear in the tumor types with high HPV expression level. Next, we constructed HPV status prediction models to yield a virus infection score (VIS) for each cancer. VIS was positively correlated with HPV expression, and the classification efficiency of VIS was verified by both internal data from TCGA and external data from Gene Expression Omnibus (GEO). These models were also extended to Genomics of Drug Sensitivity in Cancer (GDSC) data and yielded VIS which represented the HPV-like status of GDSC cell lines. The higher VIS was related to the chemotherapy effect of TCGA patients and the drug sensitivity in the GDSC cell lines. In general, our research will help researchers to better understand the impact of HPV on the host genome and tumor microenvironment, and it will also be helpful in chemotherapy and immunotherapy for tumor patients with high HPV expression.

Datasets
TCGA samples were collected from the UCSC Xena pan-cancer project (http://xena.ucsc.edu/). The expression data were transcript per million (TPM) values with log 2 (x+0.001) transformed, and non-silent mutation was defined as gene-level mutation calls, where 1 represents non-silent mutation and 0 represents wild type. HPV expression (normalized reads per million, NRPM) was collected from a previous study (20), and the samples with more than 10 NRPMs were defined as infected by HPV. Only tumor types with at least 10 HPV-positive samples were considered, including cervical squamous cell carcinoma (CESC), uterine corpus endometrial carcinoma (UCEC), colon adenocarcinoma (COAD), rectum adenocarcinoma (READ), glioblastoma multiforme (GBM), ovarian serous cystadenocarcinoma (OV), esophageal carcinoma (ESCA), stomach adenocarcinoma (STAD), head and neck squamous cell carcinoma (HNSC), and kidney renal clear cell carcinoma (KIRC). At last, we collected 3,254 tumor samples in total (Supplementary Tables S1, S2) with matched clinical data and chemotherapy response data from previous studies (21,22). The driver gene and viral integration site for HNSC were collected from two other studies (23,24).
External data with HPV status for model validation were obtained from the gene expression omnibus (GEO) with accession numbers GSE117973 and GSE151666. Cell line expression data and drug sensitivity data were downloaded from Genomics of Drug Sensitivity in Cancer (GDSC: https:// www.cancerrxgene.org/, release8.2).

Survival Analyses
The log-rank test was performed to evaluate the prognosis difference between HPV-positive and HPV-negative patients in each cancer type. In order to further explore the importance and impact of HPV on patient survival compared with other common clinical indicators, we used multivariate Cox regression for HPV status, age, gender, clinical stage, and TNM staging. Next, we performed stepwise regression based on the Akaike information criterion (AIC) to select variables which have important impact on patients' survival. Survival analysis was performed by "survival" package in R.

Calculation of Pathway Activity Scores and Collection of Immune Indicators
We collected gene sets for DNA damage repair (DDR) pathways (25), mitotic oncogenic pathways (17), and metabolic pathways (26). Pathway activity scores were calculated using the single sample gene set enrichment analysis (ssGSEA) method in R package "GSVA". The abundance of immune cells were derived from xCell which was a gene signature-based method to quantify 64 cell types through ssGSEA (27).

Construction of the HPV Status Prediction Model
In order to observe the spatial proximity of these samples, we put all samples to a two-dimensional coordinate system using Uniform Manifold Approximation and Projection (UMAP) dimensionality reduction through the "umap" package and then clustering analysis was performed by the "dbscan" package.
The differential gene expression analysis between HPV-positive and HPV-negative samples within a given cancer type was performed by the "DEseq2" package (29). The genes with both | Log2 fold change| > 1 and adjusted p-value < 0.05 represented the different transcriptome features between the two groups. To increase reliability and accuracy for predicting HPV status, the lasso regression model was constructed for each cancer type with the sample's HPV status as the response variable and the gene expression level as the predictor variable by the "glmnet" package. HPV signature gene sets (the predictor variables) were derived by stepwise regression and were used to calculate the virus infection score (VIS) by the corresponding lasso model in each cancer type. VIS was defined as the sum of (regression coefficient* signature gene expression level) in each sample (Supplementary Table S3). The relationship between VIS and NRPM was estimated by the Spearman correlation coefficient. AUC was calculated by the "pROC" package to verify the performance and ability of each cancer prediction model, and two external GEO datasets (GSE117973 and GSE151666) were further used. In addition, the prediction models were extended to GDSC cell line data to capture HPV-like infected samples which had similar transcriptomic features with HPV-positive cancer patients.

Connection Between VIS and Drug Response
To evaluate the connection between VIS and drug response, we combined all cancer types' VIS after z-score transformation and divided TCGA samples into four groups according to chemotherapy response: "complete response" (CR), "partial response" (PR), "stable disease" (SD), and "clinical progressive disease" (CPD). We explored the distribution of VIS in the four groups and calculated the proportion of chemotherapy response in different groups segmented by scaled VIS. GDSC data were divided into two categories according to the threshold of scaled VIS = 1, and the difference of the half maximal inhibitory concentration (IC50) was analyzed between the two categories.

Statistical Analyses
Fisher's exact test was used to evaluate the difference of gene mutation frequency between HPV-positive and HPV-negative groups. All the comparisons of pathway activity and other indicators between the two groups were performed by the twotailed Wilcoxon-sum rank test. In the GDSC data set, the difference of IC50 between two groups was calculated by the two-tailed T test or Mann-Whitney U test when the data were not normally distributed. All statistical analyses were performed by R.

HPV Improves Overall Survival for Four Cancer Types
Clinically, HPV-positive patients in HNSC have a better overall survival than HPV-negative patients (30). To confirm the impact of HPV on the prognosis for HPV-related cancers, we applied the log-rank test to analyze the differences in the survival times between HPV-positive and HPV-negative groups. In 4/10 cancer types including CESC (p = 0.076), HNSC (p = 0.00075), STAD (p = 0.012), and UCEC (p = 0.013), HPV-positive patients exhibited a significantly better prognosis ( Figure 1A). In particular, the survival rate of HPV-positive patients did not drop rapidly as HPV-negative patients within the first 5 years in HNSC. To further demonstrate the importance of HPV infection on patient survival, we examined the hazard ratio of HPV infection compared with other common clinical indicators through the multivariate Cox proportional hazard model among the above four cancer types. HPV remained as a favorable prognostic for the four cancer types after stepwise regression screening based on Akaike information criteria ( Figure 1B). This result implied that HPV could be an indicator of patient prognosis, which was as important as the clinical stage. These analyses hint that HPV infection induces an underlying mechanism that makes the prognosis of hosts better than that of non-infected samples.

HPV-Positive Patients Have Stronger Cell Cycle Activity in the Carcinogenic Process
The mutation frequencies of several driver genes in the HPVpositive group with CESC and HNSC were significantly lower than those in the HPV-negative group (Figure 2A and Supplementary Table S4). The lower mutation frequency of TP53 in the HPV-positive group of CESC and HNSC indicated that their genomes were more stable. In the HPV-positive group, the lower mutation frequency of ARID1A in CESC as well as FAT1, CDKN2A, and FGFR3 in HNSC demonstrated that abnormal cell proliferation of HPV-positive patients did not arise from driver gene mutations. Although CYLD and ZNF750 mutations were enriched in HPV-positive HNSC, the samples with these two gene mutations together accounted for only 20% of HPV-positive ones. These results indicated that those genes in the HPV-positive group were not the same as TP53 which was the main cause of cancer in the HPV-negative group. Additionally, TP53 mutations in UCEC were enriched in the HPV-negative group, and PTEN mutations were enriched in the HPV-positive group. The total number of mutations with TP53 or PTEN exceeded 80% in both HPV-positive and HPV-negative patients. This result illustrated that there was no difference in the driver gene level between HPV-positive and HPV-negative groups in UCEC. We further applied the t-test to compare the expression levels of different mutation driver genes with the HPV-negative group in the three cancer types. As shown in Supplementary Figure  The red horizontal lines correspond to the 95% CI, on which the dot reflects the hazard ratio. (Nx, regional lymph nodes could not be evaluated; N1, lymph node metastases with a maximum diameter of less than 3 cm; N2, lymph node metastases with a maximum diameter of less than 6 cm and greater than 3 cm; N3, the maximum diameter of metastatic lymph nodes is greater than 6 cm).
FGFR3, and TP53 group was higher in HPV-positive than those in HPV-negative group in HNSC. The expression of PTEN and TP53 was higher in the HPV-positive group than those in the HPV-negative group in UCEC, but there were no differences in CESC.
We examined the differences of DDR pathway activity and other genomic instability indicators between HPV-positive and HPV-negative groups ( Figure 2B). The DDR pathway activity in CESC and HNSC was enhanced in the HPV-positive group, which might be related to the less mutation of TP53. The DDR pathway activity was decreased in the HPV-positive group of UCEC, COAD, and READ, and there were a few changes in the rest of the cancer types. It is worth noticing that the alternation of indicators for genome instability was consistent with the DDR repair pathway activity only in HNSC. One possible explanation is that there may be other DDR repair mechanisms in addition to the 10 DDR repair pathways.
Next, we compared the difference in mitotic oncogenic pathways between HPV-positive and HPV-negative groups ( Figure 2B). HNSC was the most affected cancer by HPV infection, and the activity of the P53 pathway was also related to the low mutation rate of the TP53 gene in the HPV-positive group. In CESC and HNSC, HPV activated the cell cycle through different pathways like the PI3K and MYC signaling pathway in HNSC and the TGF-b signaling pathway in CESC. In addition, the cell-cycle activity of the HPV-positive group in COAD was lower than that of the HPV-negative group, indicating that the impact of HPV infection in this cancer was different from that in CESC and HNSC. In GBM, STAD, and UCEC, the cell-cycle activity did not change significantly, indicating that although individual mitotic oncogenic pathways of these cancers could be affected by HPV, it was not reflected in the cell cycle. In addition, we observed a considerable number of overlaps between DE genes and these essential pathways (Supplementary Table S5). Interestingly, we noticed that the changes in the DDR and mitotic oncogenic pathways were related to HPV expression. The cancer types with large-scale variations in the DDR and carcinogenic pathway activity exhibited a high HPV expression level. In most cases of CESC, HNSC, UCEC, COAD, and READ, the NRPM value exceeded 100 or even reached 1,000 ( Figure 2C). This phenomenon revealed that the impact of HPV on the host carcinogenic process might depend on its expression level. In summary, the carcinogenesis of the HPVpositive group in CESC and HNSC was triggered by the active cell cycle after HPV infection rather than genome instability, which was a major difference between HPV-positive and HPVnegative patients in CESC and HNSC.

HPV Affects the Tumor Microenvironment
To obtain insights into the immune infiltration affected by HPV, we examined the different abundance of immune cell infiltration between HPV-positive and HPV-negative groups. The results showed that HPV infection affected the tumor immune microenvironment in 8/10 cancer types and 49/64 cell types ( Figure 3). The immune cell infiltration of HNSC was most widely affected by HPV. The HPV-positive groups of CESC and HNSC had the common characteristics of elevated B cells and CD8+Tcm infiltration. The upregulation of immune cell infiltration may be the reason for the better prognosis in HPVpositive patients, such as B cells and CD8+ Tcm in CESC, CD8+ Tcm in HNSC, NKT cells in STAD, and B cell in UCEC. These cells can directly or indirectly kill tumor cells. At the same time, the stromal cells of CESC and HNSC were decreased on a large scale, which was helpful to improve the prognosis of patients (31,32).
We next examined the alteration of immune indicators by HPV infection ( Figure 4A). The CYT score increased in the HPVpositive group of CESC, HNSC, and COAD, indicating that HPV stimulated the enhancement of the cytotoxic T cells (CTL) of these three types of cancer. Studies have shown that cancer-testis antigen (CTA) contributes to tumorigenic signal transduction (33), and it has been regarded as a potential target of treatment (34,35). The CTA score in HPV-positive patients of CESC, HNSC, COAD, READ, and UCEC were decreased, implying that the treatment strategy for the CTA antigen might not work for these cancers. The reduction of neoantigens in the HPV-positive group of HNSC may be due to its lower mutation load. TCR is responsible for the detection of human "non-self" antigens (36). The increased TCR in the HPV-positive group of CESC, HNSC, and ESCA meant enhanced ability of T cell recognition. The higher BCR of the HPVpositive group in HNSC, COAD, and ESCA also indicated that HPV as a foreign substance stimulated the activation of the host humoral immune system.
The metabolic pathways were also affected by HPV infection. HNSC, UCEC, COAD, and READ patients received the energy for tumor cell growth through at least one metabolic pathway for the integration of energy or the tricarboxylic acid cycle. The upregulation of carbohydrate metabolism, nucleotide metabolism, and vitamin and cofactor metabolism metabolic subtypes is always associated with poor prognosis (37). The downregulation of these pathways in the HPV-positive group of HNSC and UCEC may be another reason for their better prognosis.
The integration of HPV DNA into the host genome is an important event that leads to abnormal proliferation and malignant progression during HPV-mediated carcinogenesis (38,39). The NHEJ (non-homologous end joining) pathway was more active in the HPV-positive group of HNSC (FDR = 2.84E-09), which provided the necessary conditions for HPV integration. HPV-integrated coding genes in HNSC tended to be enriched in GO terms that negatively regulated the host's immune response and cell adhesion (Supplementary Figure S2). Among the 60 HPV-integrated protein-coding genes, 47 genes were upregulated and 13 were downregulated according to the Tukey standard. The expression of HPV-integrated genes was increased abnormally in HPV-positive patients, which included the famous immune checkpoint genes CD274 and PDCD1LG2 ( Figure 4C).

Construction of the HPV Status Prediction Model by Transcriptome Characteristics
To explore the potential connection within and between tumor types of HPV-related cancer patients, we used the UMAP method to reduce the dimensionality of transcriptome in all samples and then projected samples into a two-dimensional coordinate system (Figures 5A, B). Cancer samples tended to cluster according to the cancer type and were also closer according to similar tissues among cancer types (the top-left corner were four types of pan-digestive tract cancers, and the center of the coordinate system was occupied by gynecological cancers). It is worth noting that CESC samples were close to HNSC samples and HPV-infected samples in HNSC tended to cluster with CESC, implying that HPV-positive samples in HNSC and CESC samples were relatively similar at the transcriptomic level ( Figure 5B). It is reasonable that HNSC and CESC are both squamous cell carcinomas in terms of cell  origin (40). This also explains the similar changes in several pathways and the tumor microenvironment in the HPV-positive group of the two cancers. In order to construct and evaluate the performance of HPV prediction models based on HPV-related transcriptome features, we used lasso regression to screen differential genes related to HPV infection status, then obtained the signature gene set and prediction model of 10 cancers (Supplementary Table S3). Next, lasso regression combined with the signature gene sets was used to calculate the virus infection scores (VISs) for each sample. The VIS was significantly positively correlated with the NRPM value (correlation coefficients from 0.47 to 0.96, Figure 5C). We also applied the prediction model for a specific cancer type into other cancer types and used the AUC value to evaluate the accuracy of each model across cancer types ( Figure 5D). The prediction model had the best efficiency in predicting the HPV infection status for its own cancer. However, some models still had high AUC values (AUC >0.90) when they were applied into other cancer types, such as the models established in CESC and HNSC as well as the models in COAD, READ, and UCEC ( Figure 5D).
High AUC values were still achieved when two sets of external data (HNSC: GSE117973, CESC: GSE151666) were used to verify the classification efficiency of the model ( Figure 5E), resulting in that the models of CESC and HNSC were interchangeable (AUC >0.9).

VIS Is Associated With Drug Response
To explore whether there is a relationship between VIS and drug response, we divided the TCGA samples into four groups according to the RECIST standard. We found that VIS was related to the chemotherapy response of the TCGA sample. When the scaled VIS >2, the rate of CR to the chemotherapy of the TCGA samples in stage III and stage IV was 94% without drug resistance ( Figures 6A, B), and the ratio of CR was much greater than that of the scaled VIS <2 group (OR = 11.10, p = 0.0034). We also found that scaled VIS had a connection with drug response in the GDSC cell line data of high-confidence cancer types whose correlation coefficients between VIS and NRPM were greater than 0.8. Some drugs had lower IC50 values in the cell lines with high-scaled VIS, indicating that these drugs were more sensitive in high-scaled VIS cell lines ( Figure 6C). Interestingly, more than half of samples (59%) in the CR group with scaled VIS >2 used platinum-based chemotherapy drugs in the TCGA dataset, and the lower IC50 value of cisplatin was also found in high-scaled VIS cell lines. We also found that the efflux gene ATP7B was significantly lower in the scaled VIS >2 group, which caused the different chemotherapy outcome by cisplatin (Supplementary Figure S3). There were no immunotherapy drugs in the above analysis, so we explored the feasibility of immunotherapy for scaled VIS >2 samples. Studies have shown that the HLA family (41), immune cells (42), and immune checkpoint (43) can affect immunotherapy. We found out the significantly increased expression of the HLA family, abundance of immune cells (except NKT, macrophages), and expression of immune checkpoints (except CD276) in the scaled VIS >2 group in TCGA data ( Figures 6D-F). These differences indicate that patients with higher VIS may be likely to benefit from immunotherapy.

DISCUSSION
We have discovered that HPV contributes to favorable prognosis for CESC, HNSC, UCEC, and STAD ( Figure 1B), implying that even if HPV is a carcinogen, it can also activate uncertain mechanisms of the host to prolong the survival. However, HPV viral load was not significantly correlated with overall survival (Spearman's rank correlation, p = 0.2012). HPVpositive clinical associations were further analyzed by the chisquare test or the Fisher test in R (Supplementary Table S6). HPV-positive tumors were more associated with lower stages (staging in UCEC, pathological T in STAD and HNSC, pathological N in HNSC, and pathological M in CESC) than the HPV-negative tumors. The above associations were the composite effects of the carcinogenic process and the tumor microenvironment.
The HPV E6 and E7 oncoproteins are the dominant paradigm for tumorigenesis. The expression of E6 stimulates p53 degradation, while the expression of E7 degrades Rb, leading to an increase in E2F-dependent transcription and a deregulation of the cell cycle without control of DNA replication, DNA repair, and apoptosis (44). In HPV-positive cervical cancer cell lines, the knocked-down E6/E7 could increase p53 at the protein level, thus hindering cell growth and triggering cell death in vitro and in vivo (45). The significantly lower frequency mutation ( Figure 2A) and higher expression (Supplementary Figure S1) of TP53 in the HPV-positive group of HNSC and UCEC implied a more functional P53 expression and might explain in part increased chemosensitivity and radiosensitivity (46,47). The loss control of the cell cycle induced by upregulated E2F1 in the HPVpositive group of CESC, HNSC, and UCEC and downregulated RB1 in the HPV-positive group of STAD played an important role on the formation and progression of the four cancers. The third oncoprotein E5 expressed together with two regulatory proteins (E1 and E2) attributes to p53-dependent enhanced proliferation in vitro and activates the FGFR pathway to accelerate tumorigenesis in vivo (48). Members of the FGFR family were upregulated in the HPV-positive group of the four cancers (Supplementary Table  S6), implying combined inhibition of FGFR and mTOR for targeted therapy (48,49).
We identified two cancer types, CESC and HNSC, whose driver genes were enriched in HPV-negative patients, especially the low-frequency mutation of TP53 which was a common feature of HPV-positive patients in CESC and HNSC. The important role of the P53 protein can maintain genome stability (50,51); thus, the activation of DNA damage repair pathways in HPV-positive patients of these two cancers was observed (Figures 2A, B). Mutated ARID1A can cause abnormal cell proliferation and block immune checkpoint therapy (52)(53)(54), and the co-occurrence of the lower mutation frequency of ARID1A and higher infiltration of CD8+Tcm may enable HPV-positive patients in CESC more suitable for immune checkpoint therapy (Figures 2A, 3). We also found that the mutation frequency of the antitumor gene PTEN (55,56) in the HPV-positive group of CESC was significantly lower than that in the HPV-negative group. The low mutation frequency of the two famous tumor-suppressor genes TP53 and PTEN in patients in the CESC HPV-positive group indicated that these patients received more "help" in the process of fighting against tumor cells. The mutation frequencies of several tumor-suppressor genes including FAT1, CDKN2A, FGFR3, and CASP8 were lower in the HPV-positive group of HNSC, and even the mutation frequencies of FAT1, CDKN2A, and CASP8 were 0 (Supplementary Table S4  processes regulated by these genes were not disrupted. Knocking down FAT1 and CASP8 separately or together resulted in enhanced cell motility and clonal development (57). Due to significantly lower mutation frequencies of the famous tumor-suppressor gene TP53 in the HPV-positive group of CESC and HNSC, we conjectured that the DNA damage repair mechanism was stronger in the HPV-positive group of CESC and HNSC. Therefore, we calculated the activity of 10 DNA damage-repair pathways through the ssGSEA method and examined the difference of pathway activity and other genomic instability indicators between HPV-positive and HPV-negative groups. The result was consistent with our conjecture. We did not observe changes in the genomic instability indicators in CESC, which might be that the DDR pathway was not activated as highly as that in HNSC. These results indicate that genomic instability might not be the major cause for the occurrence of CESC and HNSC in the HPV-positive group. Together with the differences in carcinogenic pathways, we found that HPV infection in CESC and HNSC allowed patients to bypass the genomic instability in the carcinogenic process and directly captured the characteristics of the active cell cycle, thereby causing abnormal proliferation (58)(59)(60). These results also remind us that the detection of TP53 mutations cannot be fit for all the people at risk of cancer, and gene mutation testing combined with HPV status is the best way to predict the risk of HNSC because of the low TP53 mutation frequency in HPVpositive patients. We also found that the most common mutations in TP53 were R248Q/W (19 of 431 mutations), E285K (3 of 28 mutations), and R273C/H/S (23 of 217 mutations) for HNSC, CESC, and UCEC, respectively. The R248 in p53's DNA-binding domain (DBD) could interact with DNA's minor groove directly and the R248Q mutation caused conformation alterations in areas of DBD far from the mutation site (61). Tumor mutations at site E285 in the H2 region of p53 may decrease essential interactions that stabilize H2, implying that the inactivation mechanisms may be linked to the loss of local structure around H2, reducing overall stability to a meaningful degree (62). Garg et al. found that the oncogenic p53 variations R273 (R273H, R273C, and R273L) not only lose their DNA-binding capabilities but also have different structural stability, aggregation, and toxicity profiles and lead to different types of cancer pathogenesis in vivo (63).
It is important to explore the impact of HPV infection on the tumor microenvironment (TME), because the immune infiltration and metabolism in TME are associated with patients' prognosis (64)(65)(66). HPV infection stimulates the immune system response which may be the reason why HPVinfected patients have better prognosis than non-infected patients in CESC, HNSC, UCEC, and STAD. The immune system of HNSC had the strongest response after being stimulated by HPV ( Figure 3). Interestingly, CD8+ memory T cells in CESC and HNSC were both increased in HPV-positive patients, implying that HPV vaccine injection may have the potential to prevent HPV infection which leads to the occurrence of HNSC. We also found that an increase in CD8+ T cells required the cooperation of dendritic cells. Upregulated dendritic cells presented more antigens to CD8+ T cells and made them upregulated ( Figure 3). The stromal cells were also affected by HPV in the tumor microenvironment. In the patients with HPVpositive CESC and HNSC, the reduction of multiple stromal cell types ( Figure 3) exerted a positive effect on the prognosis of patients (31,32). TCR and BCR increased in the HPV-positive group of HNSC without the increase of mutation load and neoantigen, implying that HPV might express viral antigens and be recognized by T cells and B cells. Notably, a general trend could be observed where significant differences in the carcinogenic process and tumor microenvironment occurred in cancers with high HPV expression levels (Figures 2A-C, 3). Expression analysis revealed that HPV integration disrupted gene expression, but the upregulation of CD274, PDCD1LG2, FOXA1, and TNFSF4 provided opportunities for tumor immunotherapy ( Figure 4C). Although HPV-integrated genes were enriched in GO terms that negatively regulate immunity, the presence of HPV still irreversibly activated the cell-mediated immune response (Supplementary Figure S2 and Figure 3).
Since HPV has a significant impact on the tumor microenvironment which is crucial to the chemotherapy effect of cancer patients (67-69), we analyzed whether the HPV could affect chemotherapy response. After developing HPV prediction models by transcriptome characteristics, the prediction score VIS was positively correlated with the abundance of virus expression and the correlation coefficient ranged from 0.47 to 0.96. The prediction models of the cancer types with a high correlation coefficient (R > 0.8) were extended to the GDSC data for the HPV-like propensity of the cell line. When scaled VIS reached a certain level (scaled VIS > 2), the patients were quite sensitive to the chemotherapy in TCGA ( Figure 6A). We further studied the sensitivity of HPV-like cell lines to drugs and screened out some drugs which were associated with scaled VIS ( Figure 6C). Although we have not collected appropriate immunotherapy data, we still analyzed the relationship between VIS and the signature of immunotherapy, and the results showed that patients with high VIS may benefit more from immunotherapy ( Figures 6D-F). If there are suitable data in the future, we can further explore the potential application of VIS as an immunotherapy marker.
Although there was a higher occurrence of HPV infection in males with STAD or HNSC (Supplementary Table S6), gender was not an important factor for overall survival ( Figure 1B). We also explored whether or not there was any difference in immune cell infiltration and drug response between males and females with HPV or without HPV for STAD and HNSC. The result showed that there was no significant difference in immune cell infiltration between males and females in HPV-positive patients, but there were some significant differences (such as for CD8+ Tcm) in HNSC HPV-negative patients ( Supplementary Table S8). Similarly, there was no significant difference in drug response both in the HPV-positive group (Fisher's exact test, p = 0.72) and in the HPV-negative group (Fisher's exact test, p = 0.83). Gender cannot be a considerable factor for HPV-positive cancer.
In conclusion, we conducted a multilevel analysis of a variety of cancer types with HPV infection, including the carcinogenic process of cancer and the tumor microenvironment, and propose that the high level of HPV expression may provide references for precision medicine for related cancer patients.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.