HPV infection and breast cancer risk: insights from a nationwide population study in Taiwan

Background The prevalence of cancer, specifically breast cancer, has raised globally. The etiology of breast cancer has been attributed to age, genetic mutations, reproductive history, hormone therapy, lifestyle factors, and viral infections. The human papillomavirus (HPV) has been one of the most widespread sexually transmitted infection in the United States. The role of HPV in breast oncogenesis was hypothesized before, yet the association remained unclear. Methods In this study, we employed a nationwide population study using centralized patient data managed by the Ministry of Health and Welfare in Taiwan and the Taiwan Cancer Registry database. The breast cancer incidence rates of the 467,454 HPV patients were compared to twice as many non-HPV patients with matching sex and age. Cumulative breast cancer incidence rates were presented by a Kaplan-Meier curve, and the relative risk of breast cancer for HPV and non-HPV patients were calculated using Cox-regression model. Results Our results indicated a crude hazard ratio (HR) and an adjusted hazard ratio (aHR) of 2.336 and 2.271, respectively, when comparing the risk of breast cancer in the HPV and non-HPV group. The risk of breast cancer was comparable or higher than those of head and neck cancer (aHR=1.595) and cervical cancer (aHR=2.225), which both were found to have causal relationships with HPV. The Kaplan-Meier curve further illustrated a higher cumulative risk across 84 months for HPV patients (p<.0001). Besides HPV, age (p<.0001), insurance providers (p<.001), and comorbidities such as abnormal liver function (aHR=1.191, p=.0069) and hyperlipidemia (aHR=1.218, p=.0002) were found to be correlated with higher risks of breast cancer. Conclusion A correlation between HPV and breast cancer can be inferred using national health databases. More molecular studies are required to understand the mechanism of the virus-induced oncogenesis of the breast.


Introduction
Cancer is the leading cause of death and morbidity across the globe. In 2020, approximately 19.3 million new cancer cases were registered worldwide (1), with 122 thousand cases registered in Taiwan, equivalent to 311.34 per 10 6 person (2). In both populations, the most prevalence cancer for female patients was breast cancer, accounting for 11.7% globally and 12.5% in Taiwan of all reported cancer incidents (1,2). Globally, around 2.26 million female patients were diagnosed of breast cancer, and approximately 685 thousand cases were fatal in 2020 (1). In Taiwan, about 15.3 thousand females were diagnosed with breast cancer, and around 2.66 thousand cases were fatal (2). Notably, elevated risk factors of breast cancer were reported in patients with lower educational attainment and among racial minorities (3). This finding highlights the need to promote accessible screening services and cancer awareness among individuals in lower socioeconomic statuses.
Factors contributing to the etiology of breast cancer include age (4), genetic mutations (5), reproductive history (6), hormone therapy (7), and lifestyle (8). The role of viral infection in oncogenesis has been examined, and 2.2 million new cancer incidents across the world were related to infection in 2018 (9). Some viruses such as hepatitis B virus and hepatitis C virus can cause chronic inflammation which further cell damages and raise the risks of carcinogenesis (10). Others such as Epstein-Barr virus (EBV) were found to upregulate oncogenes and accelerate cell cycles, leading to rapid cell division and cancer development (11). In addition, EBV was also reported to downregulate the tumor suppressor gene through epigenetic, post-transcriptional, and posttranslational modifications (12). Importantly, viruses like human papillomavirus (HPV) can integrate their viral DNA to the host genome, resulting in the dysregulation of cellular growth and the subsequent tumorigenesis (13).
HPV is a sexually transmitted virus characterized as developing genital warts on patients. It is the most prevalent sexually transmitted infection in the world, with a global estimation of 1 in 10 women being HPV carriers at any time (14). The oncogenic property of HPV was prominently shown as the leading cause of cervical cancer, reflected by the HPV vaccination campaigns in multiple countries' attempts to eliminate cervical cancer in patients across sexual orientations (15,16). HPV was also shown to cause a proportion of anal (17) and oropharyngeal cancer (18), and served as a known risk factor for head and neck, vulvar, vaginal, and penile cancer, as well as respiratory and laryngeal tumor (19). Other than HPV, human herpesvirus 8 (HHV-8) has been also reported to be associated with breast cancer in which HHV-8 antibodies were found in the blood sera (20). Furthermore, breast cancer might involve multiple viral infections, such as a combination of HSV-1, HPV, HCMV, EBV, and HHV-8, found in breast tissue samples (21).

Study design and ethical considerations
In the population-based cohort study, the index date was designated as the point of origin, and cancer incidence was measured from the index date until December 31, 2015, with the exclusion of the male population from the cervical cancer statistics. The study protocol underwent review and approval by the institutional review board of Chung Shan University Hospital (IRB CS13168) to ensure adherence to ethical considerations. Patient data from the National Health Insurance Research Dataset were obtained, and de-identification was performed, which waived the need for signed informed consent.

Statistical analysis
Statistical analyses were conducted using SAS 9.4 (SAS Institute Inc., Cary, North Carolina) in this study. HPV-infected cases were matched to non-infected cases according to age, sex, and index date in a 1:2 fashion using structured query language (SQL). Patients with a cancer onset before the index date were excluded from the analysis. Comparisons were made between HPV-infected and non-HPV-infected individuals. Demographic data were analyzed using the chi-square test for categorical data and Student's t-test for numerical data. We used the multivariate Cox regression to adjust the potential confounding effect of age, urbanization, insured type, co-morbidities (including ischemic heart disease, hypertension, ischemic stroke, diabetes mellitus, abnormal liver function, renal failure, gastrointestinal bleeding, hyperlipidemia, chronic kidney diseases, chronic obstructive pulmonary disease, peptic ulcer, and gout) to estimate the hazard ratio of breast cancer in patients with HPV infection compared with the non-HPV individuals. We used the Schoenfeld residuals to test the proportional hazard of breast cancer, and the assumption of proportional hazard was not violated. Additionally, Kaplan-Meier curves were utilized to generate cumulative incidence rates of cancer, which were then tested using the log-rank test.

Results
About 26 million patients were registered in the National Health Insurance Research Datasets in Taiwan between 2007 to 2015. Among them, 1,103,771 patients were once diagnosed with HPV infection. Patients who had prior history of HPV infection before 2008 were excluded from the study, so were those who developed cancer before the HPV infection. This resulted in a pool of 939,874 HPV patients included in this study [ Figure 1 (25)]. As the control, twice as many non-HPV patients with matching age, sex, and index date as the HPV patients (p=1.0000) were selected from the rest of the 25,462,267 non-HPV patients in the National Health Insurance Research Dataset using Structured Query Language (SQL) ( Table 1). Nevertheless, several demographic characteristics differed between the HPV positive and negative groups, such as the geological distribution of patients (p<0.0001), the regional levels of urbanization (p<0.0001), the insurance providers (p<0.0001), and several co-morbidities such as ischemic heart disease (p<0.0001), hypertension (p<0.0001), diabetes mellitus (p<0.0001), renal failure (p<0.0001), GI bleeding (p=0.0006), hyperlipidemia (p<0.0001), chronic kidney diseases (p<0.0001), COPD (p<0.0001), peptic ulcer (p<0.0001), and gout (p<0.0001). Subsequently, the breast cancer patients were identified from this pool of HPV positive and negative patients and crossexamined with the Taiwan Cancer Registry to validate the breast cancer status of each selected patient.
The incidence rate of breast cancer on women was 109.67 per 100000 person-years for HPV patients, comparing to 46.97 per 100000 person-years for non-HPV patients, giving a crude hazardratio (HR) of 2.336 and an adjusted HR (aHR) of 2.271 (Table 2). Not only did this indicate that the prevalence rate of breast cancer was higher for HPV than non-HPV patients, but the risk of breast cancer was also equivalent or higher than the risk of head and neck cancer and cervical cancer in HPV patients, reflected by an approximately 42% and 2% higher aHR of breast cancer than that of head and neck cancer and cervical cancer, respectively, between HPV and non-HPV patients ( Table 2). The cumulative breast cancer incidence rate over the span of 84 months was Kaplan-Meier curve of the cumulative incidence rates of breast cancer with and without HPV infection. The x-axis represents month since the index date, supplemented with the number of HPV cases and twice the non-HPV cases collected within each span of twelve months, p<.0001. The y-axis indicated the incidence rates in 10 -3 . significantly higher in HPV positive than negative group as shown in the Kaplan-Meier curve (p<0.001) (Figure 1). Besides HPV infection with a 2.271 aHR (p<0.0001), other demographic factors might also correlate with the breast cancer incidence rate. The biggest factor was age, where the highest aHR was 4.938 in the group aged 40 to 60 (p<0.0001), followed by 4.058 in the group aged 60 to 80 (p<0.0001), then 1.937 in group aged over and equal to 80 (p<0.0001), and finally 0.011 in group aged less and equal to 20 (p<0.0001), with the age group 20 to 40 served as the control (Table 3). In addition, the breast cancer incidence rates varied among patients covered under different insurance providers. Patients covered by the civil servants' insurance had a higher aHR of 1.234 (p=0.0002), whereas those covered by the insurance of Farmers', Fisherman's Association and the Water Conservancy had a lower aHR of 0.796 (p=0.0009) ( Table 3). The incidence rates also differed among breast-cancer patients with co-morbidities. Significantly, the incidence rates of breast cancer were higher in patients experiencing abnormal liver function (aHR=1.191, p=0.0069) and hyperlipidemia (aHR=1.218, p=0.0002).

Discussion
This study presented a positive association between HPV infection and the risk of breast cancer using national population data from the Taiwanese single-payer healthcare registry over 84 months. The scale of this study provided an advantage to counter the inconsistency of the past results attempting to associate HPV infection and breast cancer. The 1:2 HPV positive to negative study group design also strengthened the statistical power of the association. Additionally, our study included multiple demographic factors that might serve to offer further insights into the correlation.
The most common transmission route of HPV is through sexual activities. Nevertheless, other transmission mechanisms have been proposed that related to the contact of skin and mucus, including horizontal transfer through non-sexual contact of fomites, fingers, skin, and mouths, self-inoculation presented by female virgins and child with absence of sexual abuse history, and vertical transmission during childbirth (26). How HPV travelled to and resided in breast tissues remained unclear. One theory suggested that HPV can be transmitted from the primary tumor such as cervical neoplasm to the mammary glands through plasma circulation (24). Another proposed that the viral particle can enter the milk ducts and populated in the mammary glands by the means of direct or hand-mediated sexual contact.
On the cellular level, HPV infected the cellular membrane and inserted the L2 capsid proteins into endosome facilitated by a transmembrane protease. The endosome displaying L2 protein protrusion was then trafficked to the Golgi apparatus assisted by cytosolic host factors (27). HPV DNA was then sent to microtubule-organizing center and, finally, it was shipped to chromosome via kinesins and spindle fibers during metaphase and anaphase (28).
Molecular studies have also indicated the association between HPV and breast cancer by attempting to provide a plausible transmission mechanism. The comparison between HPV positive and negative breast cancer tissues revealed a downregulation of p53 and an upregulation of BCL2, a hallmark of uninhibited cellular checkpoints (29). The phosphorylation of Erk1/2 and ß-catenin pathway might also be enhanced in breast cancer tissue when HPV L6/L7 cooperated with LMP1 oncoproteins, leading to cell proliferation (30). The proinflammatory cytokine IL-6, which was reported to progress oncogenesis, exhibited increased expression in breast cancer patients with HPV (31). Lastly, the bloodtransmission theory of HPV was further supported by the finding  of HPV DNA in extracellular vesicles extracted from the serum of breast cancer patients (32). Other factors obtained from the patient data might confound the association of HPV and breast cancer. Our study reported that age, insurance provider, and co-morbidities such as abnormal liver function and hyperlipidemia influenced the breast cancer incidence rate. Among these confounding factors, our study indicated that the highest risk of breast cancer in the female population was between the ages 40 to 60, compared to all cancer whose risk retained positive correlation across age groups. We postulate that since most menopause occurred at the age of 40 to 60 where hormone homeostasis readjusted and HPV infection rate could also positively associate with menopausal status and negatively with hormone replacement therapy (33), breast cancer incidence rate was likely to be affected by HPV infection due to the alteration of hormone levels. On one hand, the decreased level of estrogen during menopause might result in virginal microbiomes being more favorable for HPV infection (34). On the other hand, molecular studies indicated the potential association of estrogen and apolipoprotein B messenger RNA-editing, enzyme-catalytic, polypeptide-like 3 (APOBEC3) family of cytidine deaminases, which served not only to induce antiviral immune response during the HPV infection, but also could mutate host DNA and initiate breast carcinogenesis (35). Estrogen and the lack of p53 were reported to potentially upregulate APOBEC enzymes synergistically in the breast cancer cells containing estrogen receptors (35,36).
Another confounding factor, the abnormal liver function could be potentially elucidated by the presence of HPV DNA circulating to liver. HPV was found to act cooperatively with hepatitis B virus (HBV) to develop hepatocellular carcinoma, and/or other viral infections might be facilitated by HPV to cause abnormal liver function (37).
The current theory proposed that HPV may be a cofactor or mediator of breast cancer rather than a causative agent, partially due to conflicting results of the presence of HPV in breast cancer tissues (24). Moreover, demographic and other individual factors may influence the possibility of oncogenesis after HPV infection. Our result was consistent with this theory and multiple demographic factors were examined in Table 3. The association of HPV infection status and breast cancer was further supported by increased breast cancer rate found in patients infected by highrisk HPV, including HPV 16, 18, and 33, in a large-scale metaanalysis study (38). Additionally, the association was verified by four different PCR approaches for HPV detection (39). In contrast, studies in different populations across the globe did not yield statistically significant results unanimously to indicate the contribution of HPV on breast cancer development (40,41). However, the lack of HPV prevalence in breast cancer tissue could potentially be explained by the disappearance of HPV strains in the later stage of cancer (24).
A main obstacle was to understand the route of HPV transmission to breast tissues. Unlike cervical cancer, where HPV infected the epithelial cells of the cervix via cervical lesion or the mucous membrane, HPV had less direct routes to infect breast tissues. Although two potential mechanisms where proposed above (24), more anatomic and molecular studies are required to understand the direct and indirect etiologies of HPV on breast cancer.
In summary, our results suggested a significant higher risk of breast cancer in female patients with HPV than those without. The risk of breast cancer in HPV positive patients was reported to be twice as large as HPV negative patients. The accumulated breast cancer incidence rates between HPV positive and negative patients were shown to be significantly different in Kaplan-Meier curve. Despite not being able to conclude a causal relationship, it could be assumed the role of HPV as a contributor of the breast carcinogenesis.
Interestingly, our analysis on patient demographics speculated a potential role of estrogen between HPV infection and breast cancer, which could only be explained by executing further investigations. The circulation of HPV in sera might also implicate other organs such as the liver, but it is beyond the scope of this study.
The hypothesis that breast cancer was associated with HPV was first proposed about 30 years ago (24). However, the link has not been strong enough due to mixed results of HPV prevalence rates in breast cancer patients across studies. Our study utilized large-scale datasets to provide a more robust statistical significance between the incidence rates of breast cancer and HPV status. 26 million people were registered in the Taiwanese Health Registry over the year 2007 to 2015. The breast cancer statuses of the patients were identified in the registry and verified by another database, Taiwanese National Cancer Registry. It is recognized that a limitation of the study was that the subtype and localization of the HPV was unknown. Investigating the HPV subtype with respect to breast cancer incidence rate may provide clues about the oncogenic properties of high-risk HPV subtype, while data on HPV localization may suggest its possible infectious mechanism in the development of breast cancer. Additionally, it is worth noting that the population-based data used in this study were not collected solely for the purpose of this research. As a result, there is a potential for some ICD outcomes to have been misclassified. Moreover, confounding variables previously reported to be associated with the risk of breast cancer, such as reproductive history, breast feeding, family history of breast cancer, lifestyle, and environmental factors, were not fully accounted for in this study due to the inherent nature of the data collection, which may cause bias in the results. Finally, the results of the study should be interpreted with caution, as the risk factors for breast cancer may differ in different populations.
Nevertheless, a causal relationship between HPV and breast cancer remains unclear. More hypotheses proposed from molecular studies are required to understand the route of HPV transmission to breast tissues as well as the mechanism of viral transmission in breast cells in relation to oncogenesis. Moreover, the HPV subtypes of the breast cancer patients could be examined in the future to provide insights into the mixed HPV prevalence rates in the previous studies of breast cancer tissues. Furthermore, the societal impacts of cancer treatments are becoming more prominent each year. Not only the national treatment cost for cancer was estimated to be 246 billion in the United States by 2030 (42), but also cancer patients pay four times more than patients without cancer on average (43), in addition to the higher risks of psychiatric disorders and mental distress on both cancer patients and their nuclear family members (44). Thus, our study included HPV status, age groups, co-morbidities, and other demographic variables to provide evidence supporting a more stringent approach to breast cancer screening. More research could be done to investigate these factors which could be valuable in drafting novel health initiatives to combat breast cancer.

Data availability statement
The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author.