Identification of a Five-Gene Signature and Establishment of a Prognostic Nomogram to Predict Progression-Free Interval of Papillary Thyroid Carcinoma

Background: The incidence of papillary thyroid carcinoma (PTC) is high and increasing worldwide. Although prognosis is relatively good, it is important to select the minority of patients with poorer prognosis to avoid side effects associated with unnecessary over-treatment in low-risk patients; this requires accurate prognostic predictions. Materials and Methods: Six PTC expression datasets were obtained from the gene expression omnibus (GEO) database. Level 3 mRNA expression and clinicopathological data were obtained from The Cancer Genome Atlas Thyroid Cancer (TCGA–THCA) database. Through integrated analysis of these datasets, highly reliable differentially-expressed genes (DEGs) between tumor and normal tissue were identified and lasso Cox regression was applied to identify DEGs related to the progression-free interval (PFI) and to establish a prognostic gene signature. The performance of a five-gene signature was evaluated based on a Kaplan–Meier curve, receiver operating characteristic (ROC), and Harrell's concordance index (C-index). Multivariate Cox regression analysis was used to identify factors associated with PTC prognosis. Finally, a prognostic nomogram was established based on the TCGA-THCA dataset. Results: A novel five-gene signature was established to predict the PTC PFI, which included PLP2, LYVE1, FABP4, TGFBR3, and FXYD6, and the ROC curve and C-index showed good performance in both training and validation datasets. This could classify patients into high- and low-risk groups with distinct PFIs and differentiate PTC tumors from normal tissue. Univariate Cox regression revealed that this signature was an independent prognostic factor for PTC. The established nomogram, incorporating the prognostic gene signature and clinical parameters, was able to predict the PFI with high efficiency. The gene signature-based nomogram was superior to the American Thyroid Association (ATA) risk stratification to predict PTC PFI. Conclusions: Our study identified a five-gene signature and established a prognostic nomogram, which were reliable in predicting the PFI of PTC; this could be beneficial for individualized treatment and medical decision making.


INTRODUCTION
The incidence of thyroid cancer is highest among endocrine tumors, and has been increasing over the past 20 years; it is now the eighth-most commonly diagnosed cancer in the world (1). Differentiated thyroid cancer accounts for the most prevalent thyroid cancer and among this type, thyroid papillary cancer (PTC) and follicular thyroid cancer have relatively good prognoses with long term survival rates higher than 90% (2). Further, PTC accounts for ∼85% of thyroid cancers and by 2030, it will rank fourth among the most common malignant tumors in the United States (3,4). A variety of risk factors such as radiation are associated with the onset of differentiated thyroid cancer. Moreover, exposure to ionizing radiation during childhood or adolescence can lead to the development of PTC. Iodine deficiency and Hashimoto's thyroiditis were also reported to be associated risk factors for thyroid cancer (5).
The BRAF V600E mutation is the most common mutation in PTC. RAS mutations are also common, especially in the follicular variant of PTC, which is relatively indolent. TERT promoter mutations are reported to be predictive of a worse prognosis for PTC. Although the incidence of thyroid cancer continues to rise, its mortality rate remains low. However, the mechanisms underlying PTC recurrence remain unknown.
Secondary surgery for PTC recurrence results in surgical trauma and a higher risk of recurrent laryngeal nerve injury for patients. Further, 77% of PTC recurrence occurs within 5 years post-surgery (6). Lobectomy or total thyroidectomy is the main treatment for this disease and regular postoperative neck ultrasound examination and the detection of TSH and thyroglobulin levels are the major means to monitor recurrence in patients with postoperative PTC. In addition, patients at a high risk of recurrence and with aggressive tissue subtypes might require radioactive iodine (131I) remnant ablation. Postoperative thyroid cancer patients also undergo routine TSH inhibition therapy to inhibit tumor recurrence and improve prognosis. However, long-term subclinical hyperthyroidism caused by TSH inhibition might lead to a variety of potential side effects such as osteoporosis, atrial fibrillation, cardiac insufficiency, and increased risk of fracture and heart disease in elderly patients (7). Therefore, the challenge for PTC therapy lies in the balance between side effects due to treatment and benefits to patients. Accordingly, the accurate assessment of postoperative PTC prognosis is critical to ensure that low-risk patients are not over-treated, but that high-risk and advanced patients receive necessary and more aggressive therapeutics. The American Thyroid Association (ATA) currently recommends using TNM staging to predict mortality and have also proposed a system to estimate the risk of recurrence (8).
With advances in gene chips and high-throughput sequencing, an increasing number of studies has shown that gene signatures based on mRNA expression levels has great potential to predict PTC prognosis. Choi et al. established a 12-gene predictive model (including BCC8, CHI3L1, CLCNKA, FAM155B, GABRG1, LUM, MRO, MT1G, MT1H, SELV, SLC4A4, and TMEM92) that might accurately predict nodal metastasis in PTC using data from The Cancer Genome Atlas Thyroid Cancer (TCGA THCA) dataset (9). Moreover, using the TCGA dataset, Lin et al. proposed a seven-gene prognostic signature (including AGTR1, CTGF, FAM3B, IL11, IL17C, PTH2R, and SPAG11A) based on immune-related genes that might predict the prognosis of PTC (10). Thus, further exploration of public databases such as gene expression omnibus (GEO) and TCGA could reveal additional genes associated with PTC prognosis to establish a reliable prognostic prediction model. Such models combined with clinical pathological parameters might ultimately represent powerful tools to predict PTC prognosis and guide individualized postoperative treatment.
In this study, we integrated six PTC datasets from the GEO database and the TCGA-THCA dataset to identify reliable differentially-expressed genes (DEGs) in PTCs. Further, univariate Cox survival analysis and lasso Cox regression analysis were performed to identify DEGs associated with the progression-free interval (PFI) of PTC, and we proposed a prognostic gene prediction model using gene expression and clinical data from the TCGA-THCA dataset. The molecular mechanisms underlying the gene prediction models were also studied. The potential of this model to differentiate malignant thyroid nodules from normal tissue was also explored. We further applied multivariate Cox survival analysis to identify independent risk factors associated with PTC prognosis. Finally, a nomogram was established combining the gene prediction model with clinical pathological parameters to predict disease outcome. Overall, our new model and nomogram might provide a powerful tool to predict PTC prognosis.

Gene Expression and Clinical Data
mRNA expression data and related clinical data for PTC were searched and downloaded from GEO (https://www.ncbi.nlm.nih. gov/geo/). The keywords "Thyroid cancer, " "Thyroid carcinoma, " and "PTC" were used for retrieval. Studies based on "Homo sapiens" described as "Expression profiling by array" were included for the next round of screening. Studies involving only cases of "follicular thyroid cancer" or "undifferentiated thyroid cancer" were excluded. Studies focusing only on "cell lines" and "xenografts" were also excluded. Finally, six gene expression microarray datasets (GSE5364, GSE29265, GSE33630, GSE35570, GSE38545, GSE60542) were chosen and downloaded for DEG analysis. The selected datasets all met the following criteria: (1)contained human thyroid tissue samples; (2) contained tumor and non-tumor thyroid tissue control samples; (3) contained at least 40 samples. The probes were matched to the gene symbols using the annotation file provided by the manufacturers. The median ranking value was used to calculate the expression value if a single gene symbol was matched by multiple probes. The expression data were normalized based on the Robust Multi-array Average (RMA).
Harmonized RNA sequencing data (HTSeq-counts and HTSeq-FPKM) and associated clinical information for thyroid carcinoma (THCA) were downloaded from TCGA (https:// portal.gdc.cancer.gov/, up to June 30, 2019) using TCGAbiolinks R package (11), which included 507 cases, 510 tumor samples, and 58 normal tissue samples. After removing five cases without corresponding tumor samples, two cases with the pathological diagnosis of poorly differentiated oncocytic carcinoma or follicular carcinoma, five cases with a history of neoadjuvant therapy, eight samples of metastasis, 495 cases with corresponding tumor tissues and clinical information and 58 normal thyroid tissue samples were ultimately included in the study. Mutation and copy number alteration data were obtained from Cbioportal (http://www.cbioportal.org/) (12). Information regarding BRAF-like and RAS-like classification proposed by TCGA and was obtain from the official TCGA publication (13).

Integrated Analysis of Microarray Datasets and Identification of DEGs
For the GEO datasets, DEGs between tumor and normal tissues were identified using the LIMMA package from R software (14). For the TCGA dataset, DEG analysis was applied using TCGAbiolinks in R with harmonized RNA sequencing data in the form of HTSeq-counts following the official instruction (11). The cutoff value was set to | Log 2 FC (fold-change) | >1, p < 0.05, and false discovery rate (FDR) < 0.05. Integrated analysis of DEGs identified based on six GEO datasets was applied using the robust rank aggregation (RRA) method-based R package "RobustRankAggreg"; p < 0.05 was considered statistically significant. The intersection between integrated DEGs from GEO datasets and DEGs of the TCGA-THCA dataset was identified to obtain reliable DEGs indicative of PTC. Gene ontology and KEGG enrichment analyses were applied to explore the potential biological processes, cellular components, molecular functions, and significantly relevant signaling pathways associated with the DEGs using DAVID (https://david.ncifcrf.gov/) (15). p < 0.05 was considered statistically significant. The significantly relevant signaling pathways were visualized using Cytoscape v3.7.1 (https://cytoscape.org/).

Survival Analysis and Establishment of Prognostic Gene Signature
Recurrence and metastasis after initial surgery are the main factors associated with poor outcomes for patients with PTC. Meanwhile, considering the relatively good prognosis and the extremely low risks associated with overall survival, the PFI was chosen as the primary endpoint in this study. All follow-up data were derived from TCGA Pan-Cancer clinical data (16). The TCGA-THCA dataset was used to determine whether the DEGs were associated with the PFI. Normalized gene expression data in the form of Transcripts Per Million (TPM) were transformed based on the base-2 logarithm for further survival analysis. Three cases with follow-up ≤30 days were excluded from the survival analysis. A total of 492 TCGA cases with a follow-up >30 days were randomly and equally divided into a training dataset and a validation dataset. The expression levels of DEGs were then analyzed in the entire TCGA dataset using a univariate Cox proportional hazards regression model. DEGs with a p < 0.05 were considered statistically significant and included for further analysis. The training dataset was then used to construct the prognostic gene model. Lasso penalized Cox regression analysis was performed to select prognostic genes associated with the PFI and to construct a prognostic gene signature for patients with PTC based on a linear combination of the regression coefficients derived from the lasso Cox regression model coefficients (β) multiplied by normalized mRNA expression levels. X-Tile software was used to determine the optimal cut-off value of the gene signature (17). Patients were then divided into low-and high-risk groups accordingly. Kaplan-Meier analysis, the area under the (AUC) of the receiver operating characteristic (ROC) curve, and Harrell's concordance index were used to evaluate the performance of the prognostic gene signature. The validation and entire datasets were used for validation. The performance of the gene signature was also compared with the previously defined seven-gene signature proposed by Lin et al. (10). Risk scores for each case were calculated using the same formula and the optimal cut-off value for each dataset was determined using X-Tile software. The performance of the gene signature to differentiate PTC tissues from normal tissues was also tested based on the AUC of the ROC curve.

Identification of Independent Prognostic Parameters for PTC
To identify independent prognostic parameters for PTC associated with the PFI and to validate the independent prognostic value of the gene signature, univariate and multivariate Cox regression analyses were performed based on the prognostic gene signature and clinical parameters, including BRAF V600E mutation status, RAS mutation status, TERT mutation status, TERT expression level, age, histological type, aggressive subtypes, T stage, N stage, M stage, AJCC stage, residual tumor status, extrathyroidal extension, tumor size, multifocality, and the anatomic site of tumors based on the entire TCGA dataset. Parameters with p < 0.25 based on univariate analysis were further included in the multivariate Cox regression analysis. p < 0.05 was considered statistically significant.

Building and Validation of a Predictive Nomogram
After testing for collinearity, independent prognostic parameters and relevant clinical parameters were included to construct a prognostic nomogram to predict 1-, 2-, 3-, 4-, and 5year progression-free survival for PTC patients in the entire TCGA dataset using a stepwise Cox regression model. Kaplan-Meier analysis, AUC of the ROC curve, Harrell's concordance index, and a calibration plot comparing predicted progressionfree survival and observed survival were used to evaluate the performance of the prognostic nomogram. Harrell's concordance index was calculated to assess the discrimination of the nomogram based on a bootstrap method with 1,000 replicates. The calibration curve of the nomogram was plotted to compare predicted progression-free survival with observed survival rates. Based on the total points of the nomogram, patients were divided into two or three groups according to the optimal cutoff values determined by X-Tile. Kaplan-Meier analysis was used to plot the survival curves for groups with different risk levels of disease progression.

Statistical Analysis
Statistical analysis was performed using R software v3.4.3 and GraphPad Prism v8.01 (https://www.graphpad.com). A χ2 test or Fisher's exact test was used to analyze categorical variables. A Student's t-test was used to analyze continuous variables for paired samples. One-way ANOVA tests were used to analyze multiple groups of continuous variables. Univariate and multivariate Cox regression analyses were performed for survival analysis. Hazard ratios (HRs) and 95% confidence intervals (CIs) were calculated to identify DEGs associated with progression-free survival. P < 0.05 was considered statistically significant unless otherwise indicated.

Identification of DEGs
This study was carried out based on the flowchart shown in Figure 1. The detailed information regarding the six GEO datasets included in this study is shown in Table 1. In GSE35570, GSE33630, GSE5364, GSE60542, GSE58545, and GSE29265 datasets, a total of 1554, 807, 222, 731, 753, and 558 DEGs, respectively, were identified between tumor and normal thyroid tissues (Supplementary Figure 1). A total of 321 DEGs including 178 upregulated and 143 downregulated genes were identified after the integrated analysis of the GEO datasets using the RRA method. The top 20 upregulated and downregulated DEGs identified by integrated analysis are shown in Figure 2A. In the TCGA-THCA dataset, a total of 2,264 DEGs including 912 upregulated and 1352 downregulated genes were identified (Supplementary Figure 1). Finally, a total of 295 reliable DEGs including 158 upregulated and 137 downregulated genes were identified based on the intersection between GEO and TCGA results ( Figure 2B, Supplementary Table 1). Hierarchical cluster analysis showed that the 295 differential genes had significantly different expression patterns between tumor and non-tumor tissues, which could distinguish these tissue types ( Figure

Identification of PFI-Related DEGs and Establishment of a Five-Gene Prognostic Signature
The PFI was chosen as the primary endpoint in this study. A total of 492 TCGA cases with a follow-up period >30 days were included in survival analysis. The included cases were randomly and equally divided into a training dataset and a validation dataset. The baseline characteristics of the patients are shown in Table 2. The entire TCGA dataset was utilized to discover DEGs associated with the PFI of PTC using a univariate Cox regression model. A total of 50 DEGs were identified as associated with the PFI (Figure 3). A prognostic gene signature composed of five genes was then developed based on the training dataset using the lasso Cox penalized regression model, including proteolipid protein 2 (PLP2), transforming growth factor beta receptor type 3 (TGFBR3), lymphatic vessel endothelial hyaluronic acid receptor 1 (LYVE1), FXYD domain-containing ion transport regulator 6 (FXYD6), and fatty acid-binding protein, adipocyte (FABP4) (Supplementary Figure 4). Among these DEGs, PLP2 (upregulated) with a HR > 1 was considered an oncogene, whereas TGFBR3, LYVE1, FXYD6, and FABP4 (all downregulated) with HRs < 1 were considered tumor suppressor genes. The risk score was equal to A risk score for each case was then calculated according to the formula. X-Tile software was used to determine the optimal cut-off of the risk score. Patients in the training dataset were then divided into high-and low-risk groups accordingly (cut-off value = −1.24). The Kaplan-Meier survival curve revealed significantly worse prognosis in the high-risk group (p < 0.0001) ( Figure 4D). Next, the prognostic value of the five-gene prognostic signature was assessed based on the time-dependent ROC and C-index. The AUCs of the 1-, 2-, 3-, and 4-year risk scores for PFI prediction were 0.783, 0.783, 0.764, and 0.728, respectively ( Figure 4A). The C-index of the risk score was 0.734 (95% CI, 0.653-0.815). Further, the expression of the five genes changed gradually along with the increase in the risk score ( Figure 4G). In general, the results indicated good performance for the five-gene signature in predicting the PFI of PTC.

Validation of the Performance of the Five-Gene Prognostic Signature in Predicting PFI
The validation and entire TCGA datasets were then used to validate the performance of the five-gene prognostic signature in predicting PFI. A risk score for each case was calculated using the same formula. The optimal cut-off value was also determined for each dataset using X-Tile software. Patients in each dataset were then divided into high-and low-risk groups accordingly. Kaplan-Meier survival curves revealed that PFIs were significantly distinct between high-and low-risk groups in both datasets ( Figures 4E,F). Specifically, patients in the high-risk groups had notably poorer prognosis than those in the low-risk groups. The prognostic predictive power of the five-gene signature was then assessed based on the time-dependent ROC and C-index. In the validation dataset, the AUCs for 1-, 2-, 3-, and 5-year PFI prediction based on the gene signature were 0.584, 0.602, 0.619, and 0.593, respectively, and the C-index of the gene signature was 0.603 (95% confidence interval(CI): 0.484, 0.722) ( Figure 4B).
The distributions of the risk scores and gene expression data are shown in Figures 4H,I. Collectively, validation results indicated that the five-gene signature had good performance in predicting the PFI of PTC patients. The gene expression levels of the five genes were explored in the TCGA dataset (Figures 5A-E). The performance of the fivegene signature in differentiating normal thyroid tissue from PTC tissue was evaluated in the GEO and TCGA datasets based on    Figure 5F). The relationship between the 5-gene signature, transcriptome profiles and mutational profiles (BRAF, RAS, EIF1AX, NTRK3-fusion, NTRK1-fusion, RET-fusion and BRAFfusion) of PTC were also analyzed and presented in Figure 5G.

Building and Validation of a Prognostic Nomogram
A prognostic nomogram to predict the 1-, 2-, 3-, 4-, and 5-year PFI of PTC patients was established based on the 376 patients with complete clinical information from the TCGA-THCA dataset using the stepwise Cox regression model ( Figure 6A). Risk score, age, RAS mutation status, tumor size, aggressive subtype, N stage, and M stage were parameters included in the nomogram. The AUCs for the 1-, 2-, 3-, 4-, and 5-year PFI were 0.7480, 0.7097, 0.7550, 0.7761, and 0.7627, respectively ( Figure 6B). The C-index was 0.7600 (95% CI, 0.6759, 0.8440). The patients were then divided into two or three groups associated with different levels of risk based on the cut-off value determined by X-Tile software. Groups with a lower risk score were associated with better prognosis (Figures 6D,E). The calibration curve further revealed that the nomogram had good performance in predicting the PFI of PTC patients (Figure 6C). When the risk of progression was less than 0.15, the nomogram might overestimate the risk but when the risk of progression is greater than 0.15, the nomogram might underestimate the risk.
To compare the prognostic performance of the gene signature-based nomogram to the currently available risk stratification system, the estimated risk of tumor recurrence based on the 2009 American Thyroid Association guidelines for each case was retrieved from the TCGA integrated genomic data of PTC (16). For this, 346 cases with complete clinical information for the nomogram and sufficient information for ATA risk stratification from the entire TCGA-THCA dataset were included to compare prognostic performances. The AUCs to predict the 1-, 2-, 3-, 4-, and 5-year PFI for the nomogram were 0.7778, 0.7200, 0.7688, 0.7892, and 0.7621 respectively (Figures 7A-E). In contrast, the AUCs for 1-, 2-, 3-, 4-, and 5year PFI prediction based on ATA risk stratification were 0.6986, 0.6154, 0.6409, 0.6608, and 0.6636, respectively. The C-index of the nomogram was 0.7747 (95% CI, 0.6870-0.8625), whereas the C-index of ATA risk stratification was 0.6377 (95% CI, 0.5601-0.7153), we also compare the prognostic performance of the nomogram with the 2015 version of modified risk stratification  proposed by the American Thyroid Association (Noted that modification involving lymph node size is not included) (8).
The MACIS score was also used as control. The risk score had the highest C-index (0.7747 vs. 0.6449 and 0.6507) indicating a superior prognostic value (Supplementary Figures 6A-E). The results indicated that the gene signature-based nomogram was superior to ATA risk stratification and MACIS in predicting the PFI of PTC. The performance of the gene signature-based nomogram was further explored in different subgroups of BRAF-like and RAS-like PTCs proposed by TCGA (13). In compare with the ATA risk stratification and the previously defined seven-gene signature, the gene signature-based nomogram had the best prognostic performance in all the subgroups (Supplementary Figures 7B-G). The performance of both gene signatures was limited in the RAS-like subgroup indicating the potential role of RAS mutation as an independent prognostic factor.

Clinical Relevance of the Gene Signature
The relationships between the gene signature and clinical parameters were then analyzed. In terms of tumor stage, patients with stage III and IV PTC had significantly higher risk scores than patients with stage I and II disease ( Figure 7F). The risk scores of patients with lymph node metastasis were higher than those for patients without lymph node metastasis; however, this was not statistically significant ( Figure 7G). In terms of mutation status, patients with the BRAF V600E mutation had significantly higher risk scores than those without this alteration, whereas risk scores were comparable between patients with and without RAS mutations (Figures 7H,I). Patients with TERT mutation also had significantly higher risk scores than the wildtype. But the TERT expression level was comparable between high-risk and lowrisk group (Figures 7J,K). We further explored the difference of the 5-gene signature between BRAF-like, RAS-like and the Others proposed by TCGA. The risk scores were comparable between BRAF-like group and RAS-like group. But the BRAFlike group has significant higher risk score than the Others (Supplementary Figure 7A).

DISCUSSION
The incidence of PTC is high and increasing worldwide, resulting in a heavy disease burden on a global scale. Although the prognosis of PTC is relatively good, patients with recurrent PTC still suffer from additional surgical trauma and are at higher risk of surgical complications such as recurrent laryngeal nerve injury (2). In contrast, PTC patients with low recurrence risk suffer from long-term subclinical hyperthyroidism caused by unnecessary postoperative TSH inhibition therapy, which can lead to multiple potential side effects such as osteoporosis, atrial fibrillation, and cardiac insufficiency. Traditional clinicopathological parameters such as TNM staging can predict the mortality associated with PTC, but it is difficult to accurately estimate the risk of recurrence (8). The ATA recurrence risk stratification can predict the risk of recurrence for thyroid cancer, but the accuracy needs to be further improved. Moreover, it does not reflect the biological progression of PTC. The accurate prediction of prognosis for patients with PTC will help to select patients that could benefit from more aggressive treatments including a wider range of surgical treatments, I131 treatment, and a higher degree of TSH inhibition. It also allows patients with a low risk of recurrence to avoid unnecessary I131 treatment and TSH inhibition therapy. Therefore, treatment can be individualized to improve PTC prognosis and improve patient quality of life. Gene signatures can be quantified by standardized detection means, vary with the biological progression of the tumor, and can dynamically reflect prognosis as the patient's condition changes using such approaches. Thus, it might be more accurate and convenient to predict patient prognosis and risk of recurrence. In addition, these prognostic genes could play an important role in the progression of PTC and might represent potential targets to inhibit recurrence and metastasis. Combined with the detection of tumor-associated exosomes and circulating tumor cells (CTCs), the real-time detection of disease recurrence and response to treatment in patients with PTC after tumor resection can be achieved. Because these prognostic genes are closely related to the development of this disease, they are also potential markers for differential diagnosis and the evaluation of biological characteristics of tumors. Active surveillance of thyroid papillary microcarcinoma is currently advocated (23). Predicting the biological characteristics of tumors based on gene signatures through fine needle aspiration could make active surveillance safer. Further, PTC is a highly heterogeneous disease and tumor progression involves a complex network comprising multiple signaling pathways. Therefore, the combination of multiple genes can more accurately reflect the biological characteristics and prognosis of PTC, rather than a single marker. Nomograms are widely used in oncology to evaluate clinical prognosis as they integrate multiple prognostic determinants including molecular biology and clinicopathological parameters to estimate the individual numerical probabilities of clinical   events (24). Accordingly, personalized medicine can be achieved. Compared to a conventional staging system, nomograms might predict prognosis more accurately and are easier for patients to understand. Therefore, they could contribute to clinical decision making. In this study, 321 reliable DEGs in PTC were identified based on the integrated analysis of GEO and TCGA datasets. Survival analysis revealed that 50 DEGs were closely associated with the PFI of PTC. A novel five-gene signature was then established using lasso-Cox regression analysis to predict the PFI of PTC based on a training dataset (TCGA dataset). Among these genes, PLP2 was upregulated and positively associated poorer survival, whereas LYVE1, FABP4, TGFBR3, and FXYD6 were downregulated and identified as tumor suppressor genes. The five-gene signature was able to classify patients into groups with distinct PFIs and was an independent prognostic factor for PTC. Patients in the high-risk group had a significantly poorer prognosis than patients in the low-risk group. The prognostic performance of the five-gene signature was also confirmed based on the validation dataset and the entire TCGA dataset using AUC and C-index parameters. The five-gene signature also had good performance in differentiating PTC tissues from normal tissues. Moreover, a prognostic nomogram was established based on the five-gene signature and clinical pathological parameters to predict the 1-, 2-, 3-, 4-, and 5-year PFI of PTC.
Among this five-gene signature, two were previously reported to be associated with PTC. LYVE1 acts as a hyaluronic acid transporter and is involved in the catabolism of lymphatic endothelial cells and transport of substances (25). It is also considered a marker of lymphatic vessels (26). The upregulated expression of LYVE1 in tumor tissues indicates tumor-associated lymphangiogenesis and was reported to be associated with worse prognosis in breast cancer, renal cancer, and lung cancer (27)(28)(29). LYVE1 might also play a tumor suppressor role. In hepatocellular carcinoma, its expression was demonstrated to decrease progressively from cirrhotic nodules to cancer  tissues (30,31). In prostate cancer, LYVE1 was found to be downregulated and associated with the relapse of localized prostate cancer (32). Its downregulation was also identified in ovarian cancer and was associated with poorer survival (33). In PTC, current study results suggested that the downregulation of LYVE1 is associated with worse prognosis. In accordance with our study, LYVE1 was previously reported to be downregulated in PTC tumors based on microarrays and this result was confirmed by qPCR and IHC (34). However, Gao et al. reported that the expression of steroid receptor coactivator-1 (SRC-1), a potential oncogene, is positively associated with LYVE1 and associated with lymphatic metastasis in PTC (35). Thus, the role of LYVE1 in PTC and its relationship with lymph node metastasis remains to be elucidated. FABP4 is a lipid transporter in adipocytes that binds longchain fatty acids and retinoic acid, presenting these molecules to their receptors in the nucleus (36). In accordance with the results of our study, FABP4 was identified as a tumor suppressor in multiple cancers. In colorectal cancer, FABP4 was found to be downregulated and its upregulation inhibited the migration, invasion, and proliferation of cancer cells (37). In lung cancer, it was determined that the expression of FABP4 can be induced by the transcriptional activity of PPARγ , mediating lipolysis and tumor growth suppression (38). Further, in invasive breast cancer, the loss of FABP4 expression is associated with a higher risk of progression (39). In contrast, FABP4 plays an oncogenic role in hepatocellular carcinoma, promoting proliferation and migration via downregulation of the HIF1 pathway (40). In ovarian cancer, FABP4 was identified as a key regulator of  metastasis and was associated with poorer prognosis (41). FABP4 was also previously reported to convert T4 to T3 in adipocytes, mediating adaptive thermogenesis (42). In PTC, it was found that FABP4 is downregulated and partially mediates the tumorsuppressive effect of PROX1 (43). In our study, FABP4 was found to be associated with a short PFI in PTC. The tumor suppressor effect of FABP4 in PTC and its molecular mechanisms deserve further investigation. The roles of PLP2, TGFBR3, and FXYD6 in PTC have not yet been reported. PLP2 is a membrane protein of the endoplasmic reticulum (44). It was found to be highly expressed in glioma cells and positively associated with tumor grade and poorer prognosis. PLP2 mediates tumor proliferation, invasion, and metastasis via the p38/ERK pathway (45). In breast cancer, PLP2 is the direct target of the tumor suppressor MiR-422a (46). Further, its upregulation promotes the proliferation of breast cancer cells. PLP2 also plays an oncogenic role in melanoma (47,48). The upregulation of PLP2 in melanoma, caused by miR-664 downregulation, enhances the proliferation and metastasis of melanoma via the PI3K/AKT pathway. In hepatocellular carcinoma, an amplitude-modulated electromagnetic field was reported to inhibit the proliferation of cancer cells via PLP2 downregulation (49). However, its role in PTC has not yet been reported. As PLP2 was also found to be highly expressed in PTC and associated with poor prognosis in this study, its role in PTC and the underlying molecular mechanisms deserve further attention.
TGFBR3, also known as betaglycan, can bind TGF-beta. It functions as a co-receptor of TGFBR2 and also activates downstream signaling pathways in a non-canonical manner (50). Although its role in PTC has not yet been reported, it functions as a tumor suppressor in multiple cancers. For example, TGFBR3 was found to be downregulated in prostate cancer via a loss of heterozygosity at its encoding genomic locus and epigenetic regulation (51,52). The downregulation of TGFBR3 also promotes the invasion and progression of prostate cancer, as well as upregulation of the prostate stem cell marker CD133 (53). In non-small cell lung cancer, TGFBR3 is also downregulated and promotes cancer cell migration and invasion (54). In breast cancer, decreased TGFBR3 expression was correlated with the loss of heterozygosity of its gene locus and was associated with shorter recurrence-free survival and enhanced tumor invasion, metastasis, and angiogenesis (55). In pancreatic cancer, TGFBR3 is the target of exosomal miR-501-3p and inhibits tumor formation and metastasis (56). Similar tumor suppressor functions for TGFBR3 have also been reported for renal cell carcinoma, endometrial carcinoma, and bladder carcinoma, among others (57,58). In breast cancer and melanoma, loss of TGFBR3 in dendritic cells results in altered Treg cell infiltration and the suppression of antitumor immunity, indicating its potential role in the tumor immune microenvironment (59).
FXYD6, located at the plasma membrane, is a member of that FXYD family, which regulates the Na+/K+-ATPase (60). FXYD6 plays an important role in sensory organs such as the  inner ear and is associated with various mental illnesses such as Alzheimer's disease (61,62). In cancer, FXYD6 was identified as positively associated with chemotherapy sensitivity in advanced colorectal cancer (63). FXYD6 was also found to be upregulated in cholangiocarcinoma and hepatocellular carcinoma (64,65). The upregulation of this marker in hepatocellular carcinoma promotes the migration and proliferation of cancer cells via the Src-ERK pathway (65). FXYD6 was also found to be upregulated in osteosarcoma and was identified as the direct target of miR-372-3p and microRNA-137 (66). Accordingly, the upregulation of FXYD6 reverses the tumor suppressing effects of these miRNAs in osteosarcoma (67,68). Despite this evidence, the role of FXYD6 in PTC and other tumors remains unknown. In our study, the downregulation of FXYD6 was identified in PTC and was associated with worse prognosis. In addition to that in PTC, FXYD6 was also downregulated in tumor tissues of most cancer types with a |log 2 FC| > 1 based on TCGA expression data from GEPIA (http://gepia.cancer-pku.cn), including adrenocortical carcinoma, bladder urothelial carcinoma and breast invasive carcinoma, among others. Whether FXYD6 exerts a tumor suppressor role in these tumors and its molecular mechanisms deserves further study.
To the best of our knowledge, a prognostic model based on these five genes and the associated nomogram have not been reported to date. Our predictive model is based on the expression level of a limited number of genes, which is more economical and clinically practical than whole genome sequencing. Further, our nomogram combined with gene prognostic prediction models and clinicopathological parameters could provide clinicians with a convenient and accurate method to assess the prognosis of patients with PTC after surgery. The graphical scoring system is easy for patients to understand and is helpful to make medical decisions, thereby enabling individualized treatment.
However, our current research has some limitations. First, the main source of clinical information for our dataset was the TCGA database. The majority of patients were from North America, and thus, caution should be taken when expanding our results to patients of other ethnicities. Further, protein expression levels of the DEGs also require further investigation. Their role in the pathogenesis and progression of PTC depends on further experimental studies to elucidate the associated molecular mechanisms. The establishment and validation of the nomogram was also based on the TCGA database, and thus it is necessary to validate the clinical information and gene expression data using external datasets in future studies. Finally, the 2009 version of ATA risk stratification were used as the primary reference of evaluation in the current study since the publicly available TCGA data does not include information about the size of lymph nodes. Prospective study is required to further validate the performance of the five-gene signature and the associated nomogram using the latest version of ATA risk stratification.

CONCLUSION
In our study, we established a five-gene signature and developed a prognostic nomogram in combination with prognosis-related clinical pathological parameters to predict the PFI of PTC. The five DEGs are closely related to the progression and prognosis of PTC and are thus also potential therapeutic targets. The predicted nomogram proved to be reliable in predicting the PFI of PTC and might thus be beneficial for individualized treatment and medical decision making.