Cervical Cancer Stem-Like Cell Transcriptome Profiles Predict Response to Chemoradiotherapy

Cervical cancer (CC) represents a major global health issue, particularly impacting women from resource constrained regions worldwide. Treatment refractoriness to standard chemoradiotheraphy has identified cancer stem cells as critical coordinators behind the biological mechanisms of resistance, contributing to CC recurrence. In this work, we evaluated differential gene expression in cervical cancer stem-like cells (CCSC) as biomarkers related to intrinsic chemoradioresistance in CC. A total of 31 patients with locally advanced CC and referred to Mário Penna Institute (Belo Horizonte, Brazil) from August 2017 to May 2018 were recruited for the study. Fluorescence-activated cell sorting was used to enrich CD34+/CD45- CCSC from tumor biopsies. Transcriptome was performed using ultra-low input RNA sequencing and differentially expressed genes (DEGs) using Log2 fold differences and adjusted p-value < 0.05 were determined. The analysis returned 1050 DEGs when comparing the Non-Responder (NR) (n=10) and Responder (R) (n=21) groups to chemoradiotherapy. These included a wide-ranging pattern of underexpressed coding genes in the NR vs. R patients and a panel of lncRNAs and miRNAs with implications for CC tumorigenesis. A panel of biomarkers was selected using the rank-based AUC (Area Under the ROC Curve) and pAUC (partial AUC) measurements for diagnostic sensitivity and specificity. Genes overlapping between the 21 highest AUC and pAUC loci revealed seven genes with a strong capacity for identifying NR vs. R patients (ILF2, RBM22P2, ACO16722.1, AL360175.1 and AC092354.1), of which four also returned significant survival Hazard Ratios. This study identifies DEG signatures that provide potential biomarkers in CC prognosis and treatment outcome, as well as identifies potential alternative targets for cancer therapy.


INTRODUCTION
Cervical Cancer (CC) is the second most frequent cancer and the fourth leading cause of cancer deaths in women worldwide. The Global Cancer Observatory (GLOBOCAN) estimates the burden of CC incidence in 2018 reached almost 570,000 women and a mortality rate of 54.6% (311,365 patients). Approximately 85% of cases occur in low-and middle-income countries (LMICs), and are predominantly diagnosed in advanced stages (1). Despite the scientific advances in primary and secondary prevention (vaccine, HPV screening and precancerous lesions treatment, respectively), CC continues to be a major global health challenge. Impressively, around 50% of patients with CC died as a consequence of treatment failure and other cancer-related complications. This dismal scenario is also reflected in South American patients (2), and highlights the importance of developing novel therapeutic approaches for CC and achieving a more personalized medical care.
The heterogeneous cellular composition of tumors leads to extreme genetic and epigenetic diversity and thereby produces a plethora of biological factors that can result in a poor prognosis and low survival rates (3). Cancer Stem Cells (CSCs) are a pivotal participant in these processes. In normal tissues, stem cells are generally defined by a controlled self-renewal feature, with the ability to produce both specialized and undifferentiated tissue-maintaining cells. Conversely, CSCs can display perturbed growth properties, leading to cancer initiation, chemoresistance, and metastasis (4). Tumor heterogeneity can therefore be supported by the CSC paradigm, where a subset of cells, organized into hierarchical structures based on differentiation capacity, drive malignancy and therapeutic refractoriness (5).
Given the importance of CSCs in cancer pathogenicity, it is not unexpected that many studies have sought to uncover the molecular pathways related to stemness. As observed in normal stem cells, CSCs exhibit genetic markers and pathways typically associated with proliferation (6). Aberrant expression of transcription factors SOX2, NANOG, OCT3/4, c-Myc (7), and disruption of Wnt/b-catenin, Hedgehog, Notch and PI3K/AKT/ mTOR signaling pathways are representative hallmarks that sustain the stem cell phenotype in CSC and support therapy resistance (8). Alternatively, mutations encompassing the tumor suppressor genes TP53, PTEN, and INK4A-ARF locus have been implicated in stem cell DNA damage pathways and self-renewal deregulation (6).
Given the importance of CSCs in the tumorigenic process of solid tumors, we aimed to study differential expressed genes (DEG) in cervical cancer stem-like cells (CCSC) from tumor biopsies taken before treatment began in a cohort of CC patients that responded or not to chemoradiotherapy. The analysis of DEGs from sorted CCSC between responder vs. non-responder patients points to a set of potential biomarkers for prediction of response to chemoradiotherapy, as well as offering new insights into the potential role of CCSCs in therapy failure.

Patient Recruitment and Sample Selection
A total of 31 patients with CC and referred to Maŕio Penna Institute (Belo Horizonte, Brazil) from August 2017 to May 2018 were recruited to the study. Inclusion criteria was histopathological diagnosis of Squamous cell carcinoma or Adenocarcinoma, no previous history of cancer or immune diseases. Cervical biopsies (FIGO stages II and III) were collected after participation agreement and signed the consent form. All patients were further submitted to radiotherapy concomitantly with chemotherapy, and clinical data were collected from medical records. The study was approved by the local Institutional Review Board (Number 1.583.784).
Patients were screened as Responders (R) or Non-Responders (NR) based on Gynecologic Oncology parameters of absence/persistence of cervical lesions 8 months after the chemoradiotherapy. Clinical examination (vaginal, pelvic, abdominal), laboratorial analysis (cytology, new biopsy) and imaging (Ultrasound, Computer Tomography and Magnetic Resonance Imaging, when available) were assigned up to 8 months after treatment. Patients were considered R when cervical lesions were undetected after chemoradiotherapy. Patients with persistence of lesions after the treatment (partial response, tumor progression or stable disease) were classified as NR.
Cell concentration was increased to 5 x10 6 cells/mL. CCSCsenriched subpopulations were isolated using FACSAria ® flowsorter (BD-Biosciences). Yield mode was performed at 45 psi with 85-mm nozzle at a frequency of ∼51 kHz. Two fluorescence channels were analyzed (APC-H7 and PE). Cells were distinguished from debris in the sample by distinct FSC values, since debris can be identified as particles with lowest FSC values. Sorting of CCSCs was performed using a gate containing the CD45-/CD34+ population to eliminate contamination with hematopoietic stem cells, thereby enriching for CCSC-like cells (Supplementary Figure 1). Cells were considered positive above 10 2 for each parameter based on negative populations defined by their autofluorescence. The CCSCs were sorted into cytometry tubes containing 1 mL of Lysis Solution (Lysis buffer and RNAse inhibitor from Takara ® Kit Smart Seq V4 RNA) for genomic library construction and sequencing.
cDNA Synthesis SMART-Seq v4 Ultra-low Input RNA Kit for Sequencing (Takara Bio USA, CA) was used to generate the full-length cDNA from the selected sorted cells following the manufacturer's instructions. Reactions with positive (Control Total RNA, provided by SMART-Seq v4 Ultra-low Input RNA Kit) and negative controls were carried out for quality control. Successful cDNA synthesis and amplification were considered when an Agilent High Sensitivity DNA Chip run on the Agilent Bioanalyzer 2100 (Agilent, CA) showed an electropherogram with a distinct peak spanning from 400 bp to 10,000 bp, and cDNA concentration ≥ 0.3 ng/ml detected with Qubit ™ dsDNA HS Assay Kit on a Qubit 3 Fluorometer (Thermo Fisher Scientific, MA). Purified cDNAs were stored at -20°C for further processing.

Sequencing Libraries
Library preparation of suitable cDNAs were performed using Nextera ® XT Library Prep Kit (Illumina, CA) with Nextera ® XT Index Kit V2 Set A (Illumina, CA). Samples were normalized to 40 pg/ml for a total 200 pg input of amplified cDNA. The protocol was performed as described by the manufacturer. Libraries were purified with 0.6 x bead ratio using Agencourt AMPure beads XP (Beckman Coulter, IN) and eluted in 52.5 ml of elution buffer. Quality parameters as size (440 bp average) and concentration (1.03 ng/ml average) were measured using High Sensitivity D1000 ScreenTape and reagents run on 2200 TapeStation System (Agilent, CA) and Qubit ™ dsDNA HS Assay Kit on a Qubit 3 Fluorometer (Thermo Fisher Scientific, MA), respectively. Good quality libraries were normalized to 1 nM. Thirteen samples were pooled to further perform 101 cycles of single read sequencing using a NextSeq ® 500/550 High Output Kit v2 (150 cycles) and NextSeq ® 550 sequencer (Illumina, CA). Sixty-two libraries with more than 20 million reads each were considered for analysis.

RNA-Seq Data Analysis
Sequencing quality control and adapters were analyzed in the FastQ files using FastQC version 0.11.9. Trimming of the adaptor content and overrepresented sequences was performed using Trimmomatic (9). Quality check using FastQC was also performed on the trimmed sequences. Reads from the fastq files were aligned to the human reference (build GRCH38 and annotation file Homo_sapiens.GRCh38.83.gtf) using the 2-Pass protocol from the STAR software (10). The resulting alignment file was compressed, indexed and name-sorted using the samtools (version 0.1.19-44428cd). The count table was generated using GeneCounts mode from STAR. All steps are summarized in the Supplementary Figure 2; command lines and Pearl scripts of the workflow are available upon request.

Differentially Expressed Genes (DEG) Analysis and Biomarkers Selection
Count data were imported to R software (11) and DESeq2 package (12) were used to perform differential expression analysis. Benjamini and Hochberg procedure, implemented in DESeq2, were used for p-values adjustment and a False Discovery Rate (FDR) cutoff of 0.05 was applied for assigning a given gene as a DEG between the groups NR and R. The panel of biomarkers were proposed based on the following criteria: (i) 22 highest values of AUC (area under the curve), (ii) 21 highest values of pAUC (partial area under the ROC curve statistic -(pAUC -0.10) (13) and lastly, (iii) the common loci between the two selection criteria. All statistics were calculated using the R package "genefilter".

Stemness Genes
The transcript abundance of stemness genes described in CC studies (Supplementary Table 1) across NR and R samples were compared by using Wilcoxon rank-sum test. Distribution of regularized log2 expression for statistically significant genes (p<0.05) were displayed as boxplots.

Screening microRNA Targets
To assess the putative regulation of target genes of miRNA, we downloaded all predictions for representative transcripts performed by targetScan -v72 (14). Only both miRNA and targets differentially expressed (adjusted p-value<0.05) with opposite regulation among NR and R according to Rfold change values were used in the further analysis.

Gene Enrichment Analysis and Networks
GO (Gene Ontology) function enrichment analyses of the 1050 DEGs and gene targets of miRNA were performed through clusterProfiler package (15). STRING (https://string-db.org) was employed to predict potential interactions between ILF2 and other proteins.

Statistical Analysis
Survival outcomes and the hazard ratio for disease progression or death were assessed via Kaplan-Meier methods and compared using Log-rank tests, using GraphPad Prism 5.0 (GraphPad Software Inc., La Jolla, Ca, USA). Forest plots represent the hazard ratio analysis of gene expression, obtained using a univariate Cox regression model in R (version 3.6.3, (Team, 2020 #3817); https://www.R-project.org, within survival package (version 3.2-3, https://CRAN.R-project.org/package=survival). Genes were selected by pAUC analysis.

Clinicopathological Characteristics of the Cohort
Samples from 31 women with an average age of 52.3 ± 16.8 years-old (range 24-82) were submitted to NGS analysis. Approximately 96% of patients had squamous cell carcinoma, only one patient had adenocarcinoma, stages FIGO II-B (48.3%) and III-B (48.3%). The mean tumor size was 6.8 cm. Most patients had bilateral parametrial (71%) and vaginal involvement (90%) and tumor lesions were moderately (42%) or poorly differentiated (42%). The overall response was evaluated 8 months after the end of treatment and 21 patients were classified as Responders (R) and 10 as Non-responders (NR). Clinical information from the patients and histopathological analysis are summarized in Supplementary Table 2. Kaplan-Meier curves highlighted the poorer survival outcomes of NR patients with 12% survival at 12 months, whereas the R group showed 79% survival (Hazard Ratio [HR]: 6.44 95% CI: 0.11-3.62, p<0.0001). (Figure 1).

Differential Expressed Genes (DEGs) in Cervical Cancer Stem-Like Cells (CCSC)
DEGs between NR and R patients were analyzed using an adjusted p value of < 0.05. This analysis returned 1050 differentially expressed transcripts. Ninety-one percent (91%; 633/694) of the coding genes were under expressed in NR patients; conversely, 86% of long non-coding RNAs (lncRNAs; 132/154) and 81% of pseudogenes (121/150) were overexpressed in the NR group. The snoRNA (Small nucleolar RNA), Mt_tRNA (Mitochondrial transfer RNA), misc_RNA (miscellaneous RNA), miRNA (microRNA), TEC (To be Experimentally Confirmed) and snRNA (Small nucleolar RNA) were grouped as "Other RNA" (n=52). The proportion of the DEGs and functional classes are detailed in Figure 2A.
An unsupervised clustering analysis of the 1050 DEGs revealed four main clusters related to the patients' treatment outcome (NR vs. R) ( Figure 2B). A heterogeneous expression profile is apparent across the genes as exemplified by the heatmap. The patients segregated into clusters based on an expression pattern in agreement with the failure/success to chemoradiation treatment, with few exceptions. Clinicopathological characteristics of the patients, including pathologic parametrial involvement, FIGO stages and tumor pathological grades did not differentially segregate between the groups.

Gene Ontology (GO) Classification and Functional Enrichment Analysis
Analysis and visualization of enriched functional profiles from the 1050 DEGs was performed to explore gene-related biological processes using the clusterProfiler package. Overall, transcripts were statistically assigned to 171 GO terms with all downregulated in the NR group (Supplementary Table 3). Genes were enriched for transcriptional/translational processes, metabolism, and respiratory pathways ( Figure 3). Specifically, the top gene ontology (GO) terms were related to "RNA catabolic process", "Oxidative phosphorylation", "Protein targeting to ER" and "Mitochondrial ATP synthesis" ( Figure 3A). The enrichment map shows two main functional modules, clustering the GO terms related to nucleic acids/protein metabolism and energetic activity in disconnected networks ( Figure 3B). The genes composing the two networks and their associations with the GO Biological Processes are shown in Supplementary Figure 3.

Stemness Genes
The distribution of regularized gene expression levels (rlog) from 51 stemness genes described in CC studies and tumor suppressor genes related to stemness TP53 and PTEN (Supplementary Table 1) was compared between NR and R patients. Wilcoxon rank-sum test revealed that CCSCs transcript levels of the genes CDKN2A (p=0.038) and PTEN (p=0.048) were decreased, while NANOG (p=0.006) was increased in the NR group. (Figure 4).

Non-Coding RNAs and Epigenetic Findings in CCSC
The long-noncoding RNAs (lncRNAs) differentially expressed in NR and R (n=154) represent 14.7% of all transcripts. From those, we detected nine transcripts previously related to CC progression ( Table 1) and 15 related to diverse cancers (Supplementary Table 4). From the small non-coding transcripts (snoRNA,    and coding SLC36A2 and SDS and have not previously been shown to be transcripts in cancer. The 21-top ranked pAUC genes from the CCSC DEGs separated into two groups using unsupervised cluster analysis, reflecting the treatment status between NR and R patients ( Figure 6A). The only exception was patient PCU124 who clustered with the NR group.

miRNA-Target Gene Pairs Analysis
The Hazard ratios (HR, 95% CI) for the 21 highest pAUC genes under a univariate analysis are shown in Figure 6B. To refine the number of robust biomarkers with potentially diagnostic utility, we also identified the highest 22 AUC values from the 1050 DEGs (Supplementary Table 6). Then, selecting the common genes between the pAUC and AUC analysis we identified seven loci as potential biomarkers for distinguishing NR vs. R patients: ILF2, SNX2, HNRNPA0, RBM22P2, ACO16722.1, AL360175.1 and AC092354.1 ( Figure 6C; Tables  2, 3). Of these seven potential biomarkers for distinguishing NR vs. R patients, four also returned a significant HR for protection (ILF2) or for increased risk (RBM22P2, ACO16722.1 and AL360175.1). To assess the quality parameters of prognostic prediction, accuracy, sensitivity, and specificity were evaluated using a univariate Logistic Regression (LR) for the seven loci ( Table 3). All parameters showed predictive values higher than 70%. The genes ILF2 and RBM22P2 presented with the highest accuracy (93%) and specificity (100%); AL360175.1 had the best sensitivity (90%).
Gene interaction network analysis identified ten proteins with the highest likelihood of functional relationships with ILF2: HNRNPA1, HNRNPA2B1, HNRNPC, PTBP1, ERH, HNRNPH1, ILF3, CDC5L, HNRNPL and HNRNPR. Amongst these, the first three genes are DEGs under expressed in NR.

DISCUSSION
Nearly half a million new cases of CC occur each year, with the majority of cases being diagnosed in developing countries and at advanced disease stages (1). In addition, the number of deaths from CC is expected to increase to 410,000 by 2030 (37). It is evident that CC continues to be an important public health challenge globally. Despite screening programs and the recent advent of HPV vaccines, the quality of screening and treatment options must be improved. Based on cancer stem cell research and deep sequencing approaches, molecular alterations in CC have been widely explored including tumor heterogeneity, which has brought new insights with the potential to impact clinical practice.
CSCs are self-renewing cancer cells responsible for expansion of the malignant mass in a dynamic process shaping the tumor microenvironment. CSCs have multiple differentiation capacities, anchorage independent growth, and chemoresistance (38). They represent approximately 0.1-10% of all tumor cells and express low levels of typical tumor-associated antigens compared to other tumor cells (39). After cytotoxic therapy regimens, residual tumor cells are enriched with CSCs, suggesting the importance of these cells in chemoresistance and disease relapse. CSCs may hijack host immune surveillance to escape the toxic effects of chemotherapy and evasion of apoptosis, resulting in typically aggressive tumors with poor prognosis (40). Despite representing a minimal number in the bulk tumor, CSCs actively coordinate intrinsic and extrinsic adaptation throughout carcinogenesis and therapy refractoriness. These unique properties place CSCs as a key component to be investigated, since their unique molecular signature might uncover effective therapeutic strategies in CC (41).
CD34 is a sialomucin family-related transmembrane protein that is involved in the modulation of cell adhesion and signal transduction (42). While it was first identified as a hallmark of hematopoietic stem cells, studies demonstrated that CD34 is also present on nonhematopoietic cell lines, including embryonic fibroblasts and vascular endothelial progenitors (43). Here we used CD34+ CD45-to sort subpopulations of tumor cells using flow cytometry as the most effective approach towards enriching stem-like tumor cells, thereby eliminating hematopoietic stem cells which are CD34+CD45+.
Based on the enriched CCSC from the bulk tumor biopsy, we conducted a low input RNA sequencing in patients with success (n=21) and refractoriness (n=10) to conventional   chemoradiotherapy. The landscape of DEGs between the Nonresponders (NR) and Responders (R) revealed 1050 genes (Figure 2A), characterized by coding, pseudogenes, long and small non-coding transcripts. Interestingly, most of the coding DEGs from CCSC showed negative values of expression when comparing NR to R patients (Figures 2A, B). The wide-ranging pattern of under expression in CCSCs might suggest a state of dormancy/quiescence, a key feature of tumor plasticity that protects stem-like cells from antiproliferative chemotherapeutic agents (44). While CCSCs represent a key driving factor of tumorigenesis and metastasis, the majority maintain in a quiescent or dormant state until changes occur in the microenvironment (45). Two main mechanisms account for tumor resistance to classical therapeutic approaches: Darwinian selection of cells harboring novel genetic variations (extrinsic resistance), or epigenetic events (chromatin remodeling and activation of pathways to cell stress), where dormancy and/or tumor quiescence can be achieved (46,47).
Gene Ontology (GO) analysis revealed enrichment of biomolecule synthesis GO terms, where pathways of DNA/ RNA and protein processing, as well as cell metabolism are overrepresented by suppressed genes (Figure 3). This is consistent with previous studies that suggest regulatory gene networks are downregulated in quiescent stem cells in glioblastoma (48) and ovarian cancer (49).
Indeed, epigenetic adaptations are often reflected in CCSCs resistance capabilities. Non-coding elements in the genome hold a diversity of regulatory factors responsible for the expression of proto-oncogenes or tumor-suppressor genes (50). Given that the deregulation of small and lncRNAs are strongly implicated in the tumorigenesis of CC, we performed a thorough investigation of these RNAs in our dataset. Amongst the 24 DEGs reported ( Table 1; Supplementary  Table 4), the canonical oncogenic lncRNAs MALAT1 (17,18), NEAT1 (20) and NORAD (24) showed negative expression values in NR as compared to R patients. Our finding differs from that seen in some of these studies, but it  is noteworthy that we are comparing expression levels from purified CCSCs, which represented a small percentage of cells in the entire tumor mass. Interestingly, decreased expression of the oncosupressive lncRNA GAS5 is correlated with tumor development and worse clinical outcome in CC patients (22,23). These apparently opposing findings are not unexpected in cancer studies, especially due to the molecular heterogeneity of tumors, diverse sample sources (cervical tissues, cell exfoliates, mucus, serum, cell cultures, and purified cell populations) and the variety of methodologies used. In fact, non-coding RNA in CC are mainly evaluated from immortalized cell lines (HeLa, CaSki, SiHa) and cervical tissues ( Table 1), and gene expression analyses were performed using non-cancerous cells and tissues as the reference. In our study we are using purified human biopsy derived CCSCs and comparing between NR vs. R groups. Correlation analysis and binding prediction were conducted to investigate the relationship between the miRNAs from CCSCs transcriptome and their potential target genes. The miRNAs MIR4278, MIR4422 and MIR4779 have been reported to exert a role as tumor suppressors, induction of apoptosis, cell cycle arrest and in overall cancer survival (51)(52)(53). Sixteen genes, enriched for the functions of cell cycle transition, immune response cell surface signaling, protein targeting/localization, and mitochondrial membrane organization pathways showed negative correlations with overexpression of these miRNAs ( Figure 5 and Supplementary Table 5). Noteworthy, a subset of genes directly involved in translational machinery (ZFP36L1, RSP8, SEC62), proteolytic degradation of intracellular proteins (PSMB1) regulation of centrosome cycle and progression (CALM1), cell surface structure adhesion, migration and organization (EZR) and oxidative stress (HEBP2) are included in the potentially downmodulated genes ( Figure 5).
The molecular mechanisms of non-coding RNAs in CC requires further characterization, particularly regarding the interplay between the diverse classes of RNAs and deregulation of metabolic pathways. Furthermore, the detection of these molecules in the serum of CC patients might lead to biosignatures of clinical relevance in non-invasive liquid biopsies (54).
The CSCs model and single-cell technologies have provided an opportunity to study a heterogeneous collection of cells with distinct genetic and phenotypic properties within tumors and investigate what roles they play in disease processes and therapeutic response (55). For CC, numerous markers have been associated with stemness properties of these cells (Supplementary Table 1), mainly regarding cellular self-renew and pluripotency maintenance features. When comparing rlog2 expression of stemness markers in NR and R samples, a statistical difference in three genes was observed (Figure 4). The overexpression of the canonical transcriptional factor NANOG is a well-known marker in CCSCs (56), and it shows an increased expression in the NR cells. Conversely, higher expression of tumor suppressors CDKN2A and PTEN were observed in R cells.
CDKN2A (P16/INK4A) acts as critical regulators of stem cell functions and defines a worse prognosis in CC (57). The PTEN gene is involved in key mechanisms specific for CSCs, such as self-renewal, quiescence/cell cycle, Epithelial-to-Mesenchymal-Transition (EMT) and treatment refractoriness (58), with decreased expression associated with aggressive cancer phenotypes (59). These findings support a speculation that higher stemness features in CCSCs are associated with treatment failure. In addition to driving cells to an undifferentiated stage with high proliferative capacity, stemness can poise cells to enter a senescence/quiescent state to escape death and reach a state of reversible cell cycle arrest (60,61).
Here we focused on a set of genes with the highest potential for classifying NR vs. R patients using a combination of pAUC and AUC analyses. The 21-gene signature selected using the pAUC analysis ( Table 2) clearly separated the NR and R patients in a cluster analysis ( Figure 6A). The transcriptional profile of a subgroup of these genes; ILF2, SCAND1, AK6, REXO2, SDS, OR1X1P, RBM22P2, MTND5P25, AC073324.1, AC016722.1, AL360091.3 and AC093801.1 indicate significant prognostic capacity for CC pathogenesis ( Figure 6B). Choosing genes that overlapped between the pAUC and AUC analysis led to a collection of seven genes that strongly segregated the NR vs. R groups: ILF2, SNX2, HNRNPA0, RBM22P2, ACO16722.1, AL360175.1 and AC092354.1 ( Figure 6C and Table 3). The parameters of accuracy, specificity and sensitivity highlights the strong potential of these molecules to predict failure or success to chemoradiotherapy (Table 3). Moreover, single gene or multigene signature assays can be used to measure specific molecular pathway perturbations that could guide therapeutic decisions in the future. Thus, our approach using high-throughput RNAseq, where thousands of individual molecules were investigated, offers an initial set of biomarkers with the potential for clinical use upon further validation.
The Interleukin enhancer-binding factor 2 gene (ILF2, NF45) forms a complex with ILF3 (NF90) involved in transcription regulation (62), mitosis (63), and DNA repair by nonhomologous end joining (64). ILF2 acts as a tumor promoter, with overexpression associated with poor prognosis in CC, pancreatic ductal adenocarcinoma, non-small cell lung cancer and breast cancer (28,65). However, our CCSC analysis revealed the opposite association with ILF2 overexpressed in CCSCs from patients with no CC recurrence after the chemoradiotherapy ( Figure 6). This pattern of higher expression related to better survival in CC patients is also seen in The Human Protein Atlas database (p=0.087; https:// www.proteinatlas.org/ENSG00000143621-ILF2/pathology/ cervical+cancer). Interestingly, from the ten proteins with the highest functional interactions with ILF2, network interaction analysis revealed three transcripts also under expressed in NR patients. The genes HNRNPA1, HNRNPA2B1 and HNRNPC are ubiquitously expressed heterogeneous nuclear ribonucleoproteins (hnRNPs) involved in mRNA metabolism, splicing and regulation of alternative splicing events. Indeed, these genes are associated with carcinogenesis and metastasis with a diverse set of tumor types (66,67). The spliceosomal host machinery is especially important for the high-risk HPV life cycle and transformation in CC. The viral oncoproteins E6 and E7 are responsible to abrogate P53 and RB functions, with the prevalence of different E6 isoforms linked to lesion severity (68).
Overall, our findings point towards a highly elaborated mechanism of treatment refractoriness and metastasis in cancer: the plasticity of CSCs in regulating the inter-conversion between dormancy and proliferation. When compared to R patients, NR patient CCSCs tend to show a comprehensive decrease in gene expression, mostly related to transcriptional and translational processes. We speculate that therapy resistance in CC amongst these patients is mediated by molecular signatures of growth arrested in CCSCs, where a quiescent/slow cycling state could promote survival and disease recurrence.
In conclusion, we performed a comprehensive transcriptome analysis of CC stem-like cells enriched from fresh tumor biopsies. Despite numerous efforts, the discovery and establishment of new biomarkers for CC prognosis are lacking. Currently, overall prediction is mainly defined by clinical parameters, and our results bring novel insights to the field. First, the landscape of intrinsic resistance to conventional chemoradiotherapy revealed seven distinguishing genes as novel putative biomarkers for predicting response to therapy in CC patients, with four returning significant HRs. Second, we defined a distinct subset of non-coding transcripts in stem-like cells from CC, adding to our knowledge concerning the epigenetic factors driving treatment refractoriness. Third, the selected genes, in addition to standard clinical parameters, offer new insights towards prognostic assessment and therapeutic support in clinical practice. Importantly, further molecular characterization in a larger cohort of cervical cancer patients is required to validate our findings and possibly develop their use as clinically actionable biomarkers in the future.

DATA AVAILABILITY STATEMENT
The data used in this manuscript is available at the NCBI's BioProject site under the accession number: PRJNA705088.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Mario Penna Institutional Review Board #1.583.784. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
LZ performed the NGS, contributed with the experimental design and wrote the paper. CM performed the FACS experiments, PM contributed with patient recruitment, clinicopathological and survival analysis. LC contributed with patient recruitment and FACS experiments, WM contributed with the experimental design and together IS and RD performed the bioinformatics analysis, LB contributed with paper writing, AT-C and OF participated in FACS experimental design and analysis. TF and SP were responsible for patient recruitment and clinicopathological data acquisition. KG was responsible for the study design, data analysis and paper writing, PS performed the anatomopathological analysis and contributed to experimental design and writing. All authors contributed to the article and approved the submitted version.  Seven loci intersect between the highest 21 values of pAUC and AUC analysis from the differential expression genes in cervical cancer stem-like cells. Log2 fold change values (Log2FC, adjusted p-value <0.05) represent the difference in expression in Non-responders (n=10) compared to Responders (n=21) to the chemoradiotherapy.