Prognostic Factors for Overall Survival in Nasopharyngeal Cancer and Implication for TNM Staging by UICC: A Systematic Review of the Literature

This study aims to identify prognostic factors in nasopharyngeal carcinoma (NPC) to improve the current 8th edition TNM classification. A systematic review of the literature reported between 2013 and 2019 in PubMed, Embase, and Scopus was conducted. Studies were included if (1) original clinical studies, (2) ≥50 NPC patients, and (3) analyses on the association between prognostic factors and overall survival. The data elements of eligible studies were abstracted and analyzed. A level of evidence was synthesized for each suggested change to the TNM staging and prognostic factors. Of 5,595 studies screened, 108 studies (44 studies on anatomical criteria and 64 on non-anatomical factors) were selected. Proposed changes/factors with strong evidence included the upstaging paranasal sinus to T4, defining parotid lymph node as N3, upstaging N-category based on presence of lymph node necrosis, as well as the incorporation of non-TNM factors including EBV-DNA level, primary gross tumor volume (GTV), nodal GTV, neutrophil-lymphocyte ratio, lactate dehydrogenase, C-reactive protein/albumin ratio, platelet count, SUVmax of the primary tumor, and total lesion glycolysis. This systematic review provides a useful summary of suggestions and prognostic factors that potentially improve the current staging system. Further validation studies are warranted to confirm their significance.


INTRODUCTION
Nasopharyngeal carcinoma (NPC) is an important global health burden with approximately 130,000 new cases diagnosed and more than 70,000 deaths in 2018 (1). It is a unique disease with distinctive natural behavior, epidemiology, and histopathology that differs from other head and neck cancers. Estimation of prognosis is a fundamental step in patient management. Among the various prognostic factors, the tumor-node-metastasis (TNM) staging, which has been jointly adopted by the American Joint Committee on Cancer (AJCC) and the Union for International Cancer Control (UICC), remains the most robust factor for global application. The TNM 5 th Edition issued in 1997, which introduced a customized staging system for NPC by merging the strengths of the AJCC/UICC 4 th edition and Ho's system, is a historic milestone with worldwide acceptance. Subsequent revisions refined the staging system based on diagnostic and therapeutic advances (2,3); the current 8 th Edition, released in 2017, is another milestone with the unification of the TNM and the Chinese staging systems (4).
In addition to the refinement of TNM parameters, there is a growing interest in the incorporation of non-anatomical prognostic factors that reflect biological tumor behavior. These factors are potentially useful for providing biomarkers on personalized risk stratification, especially with regard to metastatic risk, for tailoring the treatment intensity. There is increasing evidence that incorporation of these factors/biomarkers with TNM staging system could further improve risk stratification (5,6).
To provide the best available evidence for the upcoming TNM 9 th Edition and associated prognostic grouping, a comprehensive systematic review was carried out to identify potentially important suggestions on anatomic and non-anatomic prognostic factors. These suggestions will then be confirmed by a multicenter validation study before the final recommendation to UICC and AJCC for consideration. The current paper is our summary of suggested prognostic factors that warrant further validation.

Study Protocol
This review adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guideline (7). A systematic search of PubMed, Scopus, and Embase for relevant literature published from January 1, 2013, to September 13, 2019, was performed. This timeframe was selected because the construction of TNM 8 th Edition was based on literature reviews up to December 31, 2012. Both English and Chinese literatures were accepted, although unpublished studies were not included in the search. The search terms (Supplementary Table 1) were as follows: ("staging" or "TNM" or "prognostic") and ("nasopharyngeal carcinoma" or "nasopharyngeal cancer" or "nasopharyngeal neoplasm").

Inclusion Process and Criteria
From the literature identified in the initial search, the following studies were excluded after screening their titles and citations: duplicated studies, conference abstracts, reviews, letters, editorials, case reports, book chapters, and basic science studies. The remaining studies were further assessed to determine eligibility, which included original clinical studies, either prospective or retrospective, with a sample size of at least 50 NPC patients, treated with intensity modulated radiotherapy (IMRT) or equivalent, and showing a significant association between prognostic factors and overall survival (OS). Novel prognostic markers with limited potential for global applicability (e.g., radiomics, micro-RNA, circulating tumor cells, and genetic signatures) were excluded from this review. In cases of multiple studies from one institution, the study with the largest number of patients and the most recently published study was prioritized.
Two independent teams (University of Hong Kong-Shenzhen Hospital and Fujian Cancer Hospital) performed the first review to exclude the ineligible studies. Three independent reviewers (AL, W-TN, and C-LC) further assessed papers that generated disagreements based on the inclusion/exclusion before a final decision was made on the list of studies to be selected for inclusion in this review.

Data Extraction and Analyses
The primary data from the articles were extracted. The primary endpoint for the assessment of prognostic value in this review was OS; the secondary endpoints of distant-metastasis-free survival (DMFS) and local-relapse-free survival (LRFS) were included if they were reported by the original study.
We used the QUality In Prognosis Studies (QUIPS) tool to assess the risk of bias within individual studies (8). The QUIPS tool was originally designed to assess bias in studies of prognostic factors. The tool originally comprised six domains-Study Participation, Prognostic Factor Measurement, Outcome Measurement, Statistical Analysis and Reporting, Study Confounding, and Study Attrition-each of which is guided by three to seven prompting items. Based on the risk of bias, the overall quality of each study was determined as high (score 5-6), moderate (score 3-4), or low (score 0-2); low-quality studies were excluded from this review.
The criteria adopted in this systematic review were designed to synthesize the level of evidence (9), which was defined as "strong," if there were consistent recommendations (≥75%) in multiple high-quality cohorts; "moderate," if recommendations were consistent in ≥67% of multiple high-quality cohorts; "limited," if the recommendation was based on a single cohort; and "inconclusive," if there were inconsistent recommendations. by both teams, whereas 198 studies were selected by only one team. The studies with a discrepancy in agreement were further reviewed by three independent reviewers, and 74 were accepted for inclusion. Thus, a total of 108 original studies were included in this in-depth systematic review.

Risks of Bias
The assessment on study quality using the QUIPS tool showed that 62 (57.4%) of the included articles were classified as high quality and 46 (42.6%) as moderate quality. Supplementary  Figure 1 presents an algorithm of the study selection process, and Supplementary Tables 2-6 list the QUIPS scores of the included studies. Suggestions from well-conducted studies with large sample sizes or with evidence supported by multiple studies were identified for inclusion in this review.

Proposed Changes and Prognostic Factors
Summary of the level of evidence on the recommendations and studied prognostic factors is summarized in Table 1. Among the 44 reports on TNM parameters ( Table 2), 13 proposed changes to current TNM-8 were identified: six on T-category, eight on Ncategory, and one on M-category. The recommendations that were considered to have a strong level of evidence included the involvement of the paranasal sinus (PNS) as T4 disease (16,(18)(19)(20), parotid lymph node (PLN) as N3 disease (34,35), and the upstaging of N-classification in the presence of lymph node necrosis (LNN) (39,40,42).

DISCUSSION
To our understanding, this systematic review that evaluated the prognostic factors for NPC patients in 108 articles published from 2013 to 2019 is the most comprehensive review on this topic. The TNM 8 th Edition, based entirely on the anatomical tumor extent, is the most widely used prognostic tool for NPC and remains the most robust factor for guiding treatment decisions, evaluating treatment results, and comparing outcomes between institutions worldwide. However, continuous improvement is necessary in view of the advances in investigations and treatments. Furthermore, refinement of prognostic tools by the incorporation of novel proposals based on functional imaging, plasma biomarkers, and molecular tumor characteristics is desirable in the current era of personalized oncology. For tumors with disease sites such as the prostate, breast, and skin (i.e., melanoma), non-anatomical factors have been successfully incorporated while still maintaining essential anatomical information. For NPC, considerable progress on both anatomical and non-anatomical prognostic factors have been made since the publication of the TNM 8 th Edition. This systematic review reviewed the latest evidence to facilitate the formulation of a comprehensive proposal for designing the upcoming TNM 9 th Edition.

T-Classification
A major change in the TNM 8 th Edition was the replacement of the ambiguous terms IF/masseter space involvement with a clear specification of extensive soft tissue infiltration beyond the lateral surface of LP as T4, and the downstaging of MP/LP/ PM to T2. This change was supported by two studies (10,13,14). However, five studies showed that MP and/or LP involvement was associated with a worse prognosis than T2 and should be upstaged; suggestions included categorizing MP as T3 and LP as T4 disease (n = 1) (16), MP as T2 and LP as T4 (n = 3) (13,15,30), and both MP and LP as T3 disease (n = 1) (10). Thus, further validation of the prognostic significance of MP/LP is recommended.
Three studies, comprising a total of 1,348 patients, showed that PNS involvement should be upstaged from current T3 to T4 disease given its poorer outcomes (5-year OS rate of 53.7-83.7%) (16,19,20). Of note, Zhang et al. reported worse prognoses among patients with ethmoid sinus or maxillary sinus involvement as T4 disease, but better prognosis in those with sphenoid sinus invasion alone as T3 disease (18); further studies on the relapse risks of various PNS are warranted.
The widespread use of magnetic resonance imaging (MRI) has improved the accuracy of detection of the extent of involvement of the skull base and of intracranial extension. With better disease characterization, Li et al. proposed the subdivision of skull base involvement into T3-slight (pterygoid process and/or base of the pterygoid bone only) and T3-severe (others) (24); similarly, Cao et al. suggested the subdivision of T4 into T4a (without intracranial extension) and T4b (with intracranial extension) based on the presence of intracranial extension (29). Further studies are needed to validate these findings.
With the technological advances in both diagnostics and treatment, the differences in survival and local control in the T-category has diminished. Eight of the included studies proposed the simplification of the T-category (6,(21)(22)(23)(24)(25)(26)(27); these included three studies that suggested the merging of T1 and T2 disease (21,23,24), one suggested combining of T1, T2, and T3 disease (22), and one proposed a merging of T2 and T3 (27). Other studies proposed simplification of the definition of Tclassification, refinement of T2-T4 disease, and reclassification as T1 and T2 only (6,25,26).

N-Classification
Despite the rarity of PLN metastasis (0.4-2.8%), consistent findings were noted on its adverse prognostic outcome, which was similar to those with N3 disease, as demonstrated in two studies that included a total of 11,742 patients. Both reports recommended PLN involvement as the criteria for N3 classification (34,35). Also, suspicion of PLN metastasis, especially in patients with advanced nodal diseases, should be raised on pretreatment imaging, and biopsy is indicated in the suspected case.
Furthermore, in five studies, there was consensus that LNN was an adverse prognostic factor (hazard ratio [HR]: 1.75-5.79) (38)(39)(40)(41)(42). In the largest study by Lan et    NLR, neutrophil-lymphocyte ratio; CRP, C-reactive protein; PDW, platelet distribution width; LDH, lactate dehydrogenase; Hs-CRP, high-sensitivity CRP; PNI, prognostic nutrition index; AGR, albumin/globulin ratio; TIL, tumor-infiltrating lymphocytes; SUV max , maximum standardized uptake ratio; TLG, total lesion glycolysis; MTV, metabolic tumor volume. *Level of evidence: "strong," if there were consistent recommendations (≥75%) in multiple high-quality cohorts; "moderate," if recommendations were consistent in ≥67% of multiple highquality cohorts; "limited," if the recommendation was based on a single cohort; and "inconclusive," if there were inconsistent recommendations.    both p < 0.001); the authors proposed that patients with LNN should be upstaged in their respective N-category (39). In addition to the proposals identified in the current literature search, extra-nodal extension (ENE) was recently advocated as a new criterion for N3-classification in the TNM 8 th Edition for other head and neck cancers, but not for NPC. Specifically, Ai et al. proposed the categorization, as N3 disease, of ENE with infiltration into the adjacent muscle/skin/salivary gland (36). Lu et al. showed that ENE was a poor prognostic factor for NPC and proposed to categorize ENE as G0: lymph nodes without ENE; G1: tumor infiltration beyond the individual nodal capsule(s) into the surrounding fat plane; G2: coalescent nodal mass with unequivocal evidence of ENE; G3: tumor infiltration beyond the nodal capsule into adjacent structures (37). Only G2/G3 ENE, but not G1, was independently prognostic of death; the authors hence proposed a refined Nclassification: New-N1: N1/N2 without G2-/G3-ENE; New-N2: N1 with G2-ENE; New-N3: N2 with G2-ENE, N1/N2 with G3-rENE, or N3. On the contrary, Guo et al. suggested that ENE was not a poor prognostic factor; but the definition of ENE was not mentioned in their study (38).
The current TNM 8 th Edition categorizes retropharyngeal lymph node involvement (≤6 cm) as N1 disease, regardless of its unilateral or bilateral involvement. Tang  Furthermore, four studies proposed the simplification of the N-classification and supported the current N3 disease with merging of the previous N3a and N3b (14,23,45,47). Other studies on PLV LN, cervical LN level, and the number of LN regions had limited evidence (22,43,44,46

M-Classification
Several suggestions have been made on the subcategorization of de novo oligo-metastatic disease based on the number of metastatic lesions and the site(s) of involvement (48)(49)(50)(51)(52). However, given the diversity of definition and management of patients with oligo-metastasis, no conclusive recommendation could be made. Most studies have shown that the number of metastatic lesions and the number of organ involvements were independent poor prognostic factors. Furthermore, both Shen et al. and Zou et al. reported that single (or oligo-) metastatic lesions without liver involvement had better prognoses compared with lesions with liver involvement (49,52). In a multicenter study of 977 patients that was reported by Zou et al., liver metastases represented a worse prognostic factor regardless of the number of metastatic lesions with a 3-year OS rate of 34.3-72.8% vs. 22.6-23.6% (52).
Several studies have highlighted the important role of EBV-DNA to refine the prognosis of patients with similar TNM stage groups. In a study of 385 patients with Stage II (TNM 7 th edition) disease, the 3-year PFS, LRFS, and DMFS rates for the detectable and undetectable EBV-DNA groups were 89 (57). Similarly, Jin et al. showed that the prognosis of patients with stage IVa-b (TNM 7 th Edition) and low EBV-DNA level was similar to that of patients with Stage III disease and high EBV-DNA level (61).
The EBV-DNA concentration could provide biological information of tumors beyond the anatomical factors and thereby improve the prognostic performance of the staging system. Nonetheless, the heterogeneity of cutoff values has hindered the wide application of EBV-DNA in NPC staging. The EBV-DNA cutoff values varied markedly among our included studies (1,500-25,000 copies/ml), with 4,000 copies/ml being the most frequently used cutoff value (54,57,59,66). Plasma EBV-DNA is a laboratorydeveloped test with heterogeneity based on different DNA extraction, purification, and stabilization methods; different instruments used; different primers and probes that target a different part of the EBV genome; and different quantification controls (118). An earlier study showed that different PCR assays using primer/probe sets for latent membrane protein-2 (LMP-2) and BamHI-W might yield slightly different plasma EBV-DNA concentrations from that in the same sample (119). Also, the low sensitivity of EBV-DNA assays in patients with low-volume NPC is another concern (120). Thus, further international efforts are encouraged to harmonize the assay and validate it in large prospective cohorts to ensure that plasma EBV-DNA can unleash its full potential and be incorporated into the staging system.
Level of Evidence: Strong: Pretreatment EBV-DNA level
The current T-and N-classifications of the staging system are primarily based on the extent of tumor invasion and the maximum diameter of the LN, respectively. Tumor volume might correlate better with the number of clonogenic tumor cells, leading to a more accurate prediction of the chance of cure (121). Volumetric stratification has been demonstrated to improve the prognostic ability of the TNM staging system. Jeong et al. divided stage II-IV (TNM 8 th Edition) into the volume subgroup and found that the 5-year OS was significantly better in participants with GTV-P ≤33 ml compared to those with GTV-P >33 ml (87.3 vs. 66.7%) (80); Chen et al. showed that among 385 TNM-8 th Edition classified Stage II patients, those with a total GTV <30 cm 3 was associated with a better prognosis than those with a total GTV ≥30 cm 3 (63).
Despite the growing body of evidence, tumor volume is yet to be used for cancer staging in routine clinical practice for several reasons. Firstly, there are significant intra-and inter-observer variations in volume delineation. Secondly, the malignant tumor often grows into irregular shapes, and accurate measurement of tumor volume is hard to achieve with conventional imaging. Furthermore, the cutoff value of the tumor volume is difficult to define due to the differences in assessment software, measurement timing, and methods of statistical analysis (122,123). Future efforts are needed to overcome these challenges before tumor volume can be used as a widely applied prognostic marker.
Level of Evidence: Strong: Primary GTV volume and nodal GTV volume

Blood Inflammatory/Hematological Markers
In the 18 studies that were included, nine inflammatory/ hematological markers were evaluated: the most frequently   Table 5) (83,85,91,92,97). Evidence suggested that proinflammatory tumor microenvironments are closely related to cancer development and progression. Lymphocytes are immune cells that exhibit an antitumor function, while neutrophils are inflammatory cells that influence the cytotoxic activity of the immune system. Therefore, an increased NLR, with an elevated neutrophil count and/or reduced lymphocyte count, is a biomarker that reflects an imbalance in pro-and antitumor activities in the host's immune system. Various cutoff  values of NLR have been suggested (range 2.28-3.00, median 2.32), and the analysis suggested that NLR was a reliable prognostic marker regardless of the cutoff value (124).
Other hematological markers such as hemoglobin, platelet count, LDH, and CRP have the advantages of easy accessibility, inexpensive measurement, and high reproducibility and therefore possess a promising potential for integration into the international prognostic system. In particular, the significance of LDH and CRP have long been recognized (125)(126)(127), and these parameters had been incorporated in various recently published prognostic nomograms of NPC (128)(129)(130). Accordingly, further validations of these findings are encouraged.
Four studies consistently showed that the high SUV max of the primary tumor was associated with poor OS (HR 1.07-4.88) (99,101,102,111); however, conflicting results were shown with regard to the high SUV max of nodal and metastatic disease (102,113). High TLG was associated with inferior OS in two studies, and MTV was a poor prognostic factor in one study HR, hazard ratio; CI, confidence interval; SUV max , maximum standardized uptake value; SUV mean , mean standardized uptake value; MTV, metabolic tumor volume; TLG, total lesion glycolysis; T, primary tumor; N, lymph node; M, metastasis; LR, local recurrence. Table 6) (99,104,111). Therefore, we recommend further validation of the role of the high SUV max of the primary tumor and high TLG.

(Supplementary
The metabolic information of FDG-PET could predict tumor aggressiveness and be correlated with patient survival (131). The majority of FDG-PET studies evaluated the prognostic role of the SUV max of the tumor mass; however, the SUV max was limited by representing only the maximum uptake within the volume of interest (VOI) instead of within the entire mass. Emerging metabolic parameters such as TLG and MTV have been proposed to overcome these limitations: MTV is measured by contouring margins defined by thresholds, whereas TLG is calculated by multiplying the MTV by the mean SUV. Additional studies are encouraged to define the prognostic role of the abovementioned factor. However, the diverse range of cutoff values of these PET parameters used in different studies are attributable to several reasons. First, variables such as tumor delineation and definition of VOI may affect the MTV and TLG values; second, the cutoff values are established by the statistical parameters of each institution without cross-validation. Based on the evidence in the current literature, we cannot recommend a concrete cutoff value for further validation as the wide range of values has limited its reproducibility and global applicability.
Level of Evidence: Strong: High SUV max of the primary tumor and TLG Limited: MTV Inconclusive: High SUV max of nodal disease and SUV max of metastatic disease

Limitations
The limitations of this research merit discussion. Firstly, despite the exclusion of poor-quality studies, most of the included studies had a retrospective observational design, which is prone to biases. Secondly, the majority of the included studies that evaluated the non-anatomical markers used dichotomous variables to determine the prognostic value. The cutoff value of parameters varied among different studies, as it was calculated statistically in each study to achieve the most significant prognostic effect; therefore, the generalizability of the findings is uncertain. Thirdly, due to the heterogeneity of study designs, study populations, measurement techniques, and cutoff values, we were unable to perform a meta-analysis to estimate a pooled value reliably. Also, some of the studies of plasma EBV-DNA in early years were not included in the present analysis; however, our conclusion remains consistent with the previous findings (115)(116)(117). Lastly, some of the novel markers, such as radiomics, micro-RNA, circulating tumor cells, and genetic signatures, were not included in this review due to their limited global applicability at present.

Summary Remarks
This systematic review has identified a comprehensive list of prognostic factors and suggestions that could contribute toward more accurate risk stratification for designing personalized treatment for NPC. Further studies for the validation of these factors are needed to confirm reproducibility and define the optimal cutoff criterion, to formulate the recommendations for designing the upcoming 9 th Edition of the TNM staging system.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
WN, JP, and AL: conception and design. CC, QG, TM, ZX, HC, and JL: collection and assembly of data. CC, WN, HC, JP, and AL: data analysis and interpretation. CC, WN, and AL: manuscript writing. All authors contributed to the article and approved the submitted version.