Preoperative Molecular Markers in Thyroid Nodules

The need for distinguishing benign from malignant thyroid nodules has led to the pursuit of differentiating molecular markers. The most common molecular tests in clinical use are Afirma® Gene Expression Classifier (GEC) and Thyroseq® V2. Despite the rapidly developing field of molecular markers, several limitations exist. These challenges include the recent introduction of the histopathological diagnosis “Non-Invasive Follicular Thyroid neoplasm with Papillary-like nuclear features”, the correlation of genetic mutations within both benign and malignant pathologic diagnoses, the lack of follow-up of molecular marker negative nodules, and the cost-effectiveness of molecular markers. In this manuscript, we review the current published literature surrounding the diagnostic value of Afirma® GEC and Thyroseq® V2. Among Afirma® GEC studies, sensitivity (Se), specificity (Sp), positive predictive value (PPV), and negative predictive value (NPV) ranged from 75 to 100%, 5 to 53%, 13 to 100%, and 20 to 100%, respectively. Among Thyroseq® V2 studies, Se, Sp, PPV, and NPV ranged from 40 to 100%, 56 to 93%, 13 to 90%, and 48 to 97%, respectively. We also discuss current challenges to Afirma® GEC and Thyroseq® V2 utility and clinical application, and preview the future directions of these rapidly developing technologies.

iNTRODUCTiON Thyroid nodules are common among adults over the age of 60 years, with a prevalence of 50-70% (1, 2). Moreover, the incidence of thyroid cancer in the United States has increased by 211% between 1975 and 2013 (3), due to both an improved detection of small (<2 cm) thyroid nodules by thyroid ultrasonography and a true increase in thyroid cancer incidence (4). Nevertheless, the vast majority (85-95%) of thyroid nodules are benign (5). For this reason, the ability to distinguish between benign and malignant nodules is important in order to spare patients unnecessary diagnostic surgery.
In 2009, the Bethesda classification system for FNA reporting was introduced by the National Cancer Institute, and recently, revised, included six categories based upon cytopathological features, with an associated malignancy rate and standardized management recommendation for each  (10). FNA reliably establishes the diagnosis of a benign or malignant nodule in 70-80% of all cases (11) and has decreased the proportion of benign nodules unnecessarily resected from 86 to 50% (12). However, 20-30% of FNA cases have indeterminate or suspicious cytological results that include Bethesda III, IV, and V categories (12) and, of these, 6-75% are malignant on final surgical pathology (13,14). Due to the uncertainty of malignancy in these patients, their management has been challenging, usually including a repeat FNA or a diagnostic lobectomy. For this reason, the need for distinguishing benign from malignant lesions in this subset of thyroid nodules has led to the pursuit of differentiating molecular markers. Interest in achieving this distinction increased in 2002 with the recognition of the oncogenic role of the BRAF V600E mutation in approximately 58-69% of papillary thyroid cancers (PTC) (15,16). However, genetic testing for BRAF V600E alone for the detection of PTCs is inadequate for clinical decision making due to its low sensitivity of 60% for PTC (17). Indeed, our group first published its use in indeterminate and suspicious thyroid lesions and found it to add minimal clinical value (18). In addition to studying the diagnostic utility of BRAF V600E, numerous studies have investigated the association of BRAF V600E and patient prognosis. However, the correlation between BRAF V600E and clinical features of PTCs has yielded inconsistent results. Some studies report that BRAF V600E is associated with a more advanced phenotype including an increased risk of lymph node metastasis, cancer recurrence, and patient mortality (19)(20)(21), while others report no such associations (22). Moreover, thyroid cancer with BRAF V600E and TERT promoter mutations has been associated with worse clinico-pathological outcomes (23,24). BRAF testing can also be useful in deciding treatment in the setting of known metastatic thyroid cancer. Direct tyrosine kinase inhibitors, such as vemurafenib (25), dabrafenib (26), and sorafenib (27), have been shown to be effective in BRAF V600E metastatic thyroid cancers.
Since mutational analysis of single genes has not proven adequate in guiding management decisions in indeterminate or suspicious thyroid nodules, attention turned to using panels of molecular markers. Currently, the most common molecular tests in clinical use are Afirma ® Gene Expression Classifier (GEC) and Thyroseq ® V2 (28). Introduced in 2011 by Veracyte, the Afirma ® GEC has been considered a "rule-out" malignancy test. It includes a 142-gene expression molecular assay and uses microarray technology to measure the mRNA expression profiles to determine whether a thyroid nodule is "suspicious" or "benign. " The test's primary aim is to spare patients with cytologically indeterminate FNA samples unnecessary diagnostic surgery (29). Among indeterminate/suspicious nodules (Bethesda Type III-V), the test has both a high Se (92%) and high NPV (93%) (30) ( Table 2). In contrast, it has a low Sp (52%) and PPV (47%), and cannot accurately identify malignant lesions alone.
ThyroSeq ® v2, introduced in 2014 by CBL Path, is designed to identify malignant thyroid nodules by next generation sequencing (NGS), detecting 14 thyroid cancer-related genetic mutations, including RAS and BRAF mutations, 42 types of gene fusions associated with thyroid cancer, including PAX8/PPARγ and RET/ PTC rearrangements, and mRNA expression levels for 16 genes; it is therefore considered a "rule-in" malignancy test (29). Among Bethesda Type III -IV nodules, the test is marketed as having a high Se, Sp, PPV, and NPV of 90-91%, 92-93%, 77-83%, and 96-97%, respectively, as well as having the ability to stratify risk based on the mutation detected (52, 53) ( Table 3).
In this manuscript, we review the current published literature surrounding Afirma ® GEC and Thyroseq ® V2, discuss current challenges to their utility; and clinical application, and preview the future directions of these rapidly developing technologies.

CliNiCAl MANAGeMeNT OF iNDeTeRMiNATe THYROiD NODUleS
The 2015 American Thyroid Association (ATA) (60) and the 2016 American Association of Clinical Endocrinologists (AACE) (9) clinical guidelines recommend "considering, " molecular testing for indeterminate nodules. If molecular testing is being considered, ATA recommends that patients "should be counseled regarding the potential benefits and limitations of testing and about the possible uncertainties in the therapeutic and longterm clinical implications of results" (strong recommendation, low-quality evidence) (60). However, long-term outcome data on the use of molecular markers for therapeutic decision-making is currently unavailable. A recent report estimated that standard application of the GEC for all indeterminate thyroid nodules would result in only a 7.2% decrease in thyroidectomy volume (61). Similarly, two studies by our group showed that molecular markers did not significantly affect the surgical decisionmaking process, where only 7.9-8.4% of patients had altered clinical management as a result of molecular testing (39,62).    Among Afirma ® GEC studies, Se, Sp, PPV, and NPV ranged from 75 to 100%, 5 to 53%, 13 to 100%, and 20 to 100%, respectively. Among Thyroseq ® V2 studies, Se, Sp, PPV, and NPV ranged from 40 to 100%, 56 to 93%, 13 to 90%, and 48 to 97%, respectively. Valderrabano et al. report that the wide variation among reported diagnostic values can be explained by different defining characteristics of the study populations such as institutional prevalence of malignancy sample size, Bethesda Type included or combination thereof used, the proportions of each Bethesda Type, the definition of "benign" used in the study, and Hürthle cell (HC) predominance (65). Furthermore, among post-validation studies, the molecular test outcome itself influenced the clinical management.
Supporting this, numerous studies have reported a lower specificity or higher false positive rate in GEC tests among indeterminate nodules with HC predominance. Brauner

CURReNT CHAlleNGeS OF MOleCUlAR MARKeR ASSAYS
Current challenges to the application of molecular markers are fourfold: (A) the recent introduction of the histopathological diagnosis NIFTP, (B) the correlation of genetic mutations within both benign and malignant pathologic diagnoses, (C) the lack of follow-up of molecular marker negative nodules, and (D) the cost-effectiveness of molecular markers.

Non-invasive Follicular Thyroid Neoplasm with Papillary-like Nuclear Features
In March 2015, Nikiforov et al. introduced the new histopathological term NIFTP, previously known as encapsulated follicular For these reasons, patient benefit from molecular marker use in routine clinical practice is likely marginal. Moreover, the AACE 2016 guidelines recommend molecular testing to complement cytologic evaluation in indeterminate nodules (Grade A recommendation), but only when the "results are expected to influence clinical management" (Grade A recommendation) (9). Testing for detection of BRAF, RET/PTC, PAX8/PPRG, and RAS mutations can be considered (Grade B recommendation). Furthermore, with the exception of BRAF V600E, there is insufficient evidence "to recommend in favor of or against the use of mutation testing as a guide to determine the extent of surgery" (Grade A recommendation) (9).
Importantly, molecular testing is not recommended in patients with an indeterminate thyroid nodule if other indications for surgery are present such as a nodule greater than 4 cm, compressive symptoms, or personal preference (63). The utility of molecular testing in Bethesda Type V nodules at institutions with a high prevalence of malignancy is low, and provides little additional benefit from a "positive" test result due to the similar PPV as that of a Bethesda Type V FNA result. Moreover, a diagnostic lobectomy would still be recommended in the case of a "negative" result. Finally, a limitation of the current molecular markers is their insufficient data to recommend use among pediatric patients (≤18 years) (64). Until these tests can be validated using this patient population, they cannot be routinely used to complement the indeterminate FNA cytology results.

CURReNT PUbliSHeD liTeRATURe
Current published literature regarding Afirma ® and Thyroseq ® V2 validation studies are summarized in Tables 2 and 3. The data were summarized from the results of a PubMed search for English language studies that reported diagnostic accuracy in observational clinical settings published for GEC and Thyroseq ® V2 up to November 30, 2017. References from the retrieved articles were also searched for additional studies. Inclusion criteria included reporting molecular marker diagnostic accuracy or enough information to calculate sensitivity, specificity, NPV, and PPV among Bethesda Type III or IV lesions. All calculations were made using the available published information. To adhere to current clinical guidelines, non-invasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP) and malignant pathologies were categorized as malignant or "requiring resection. " Of the published literature, only two studies included pediatric patients in their cohort (31,32). variant of papillary thyroid cancer, representing an indolent entity with very low risk of recurrence (70). Major diagnostic characteristics of NIFTP include features of FVPTC, such as a follicular growth pattern and nuclear features of PTC (enlargement, crowding, elongation, irregular contours, grooves, pseudoinclusions, and chromatin clearing), but a lack of vascular or capsular invasion, key features differentiating NIFTP from FVPTC.
This new diagnosis represents a dramatic shift in thyroid pathology where an estimated 61% of lesions previously classified as FVPTCs will now be classified as NIFTP, thus decreasing the percentage of "malignancies" on final pathology compared with FNA. On pre-operative cytology, NIFTP is associated with FNA Bethesda Category III, IV, V, or VI in 15, 56, 27, and 2% of tumor samples, respectively (71). As a consequence, this has created a shift in the malignancy rate associated with each Bethesda category. Strickland (73).
Despite NIFTP's extremely low-recurrence rate of 0.6% (two cases), there remains disagreement regarding NIFTP's true malignant potential (74)(75)(76)(77)(78)(79)(80). Despite its likely benign and, at worst, indolent nature, current ATA guidelines recommend lobectomy as definitive therapy for NIFTP. More importantly, how ever, and, apropos of this review, Afirma ® GEC and Thyroseq ® V2 validation studies occurred before the establishment of NIFTP as a distinct entity. Because of this one needs to be circumspect about the real utility of these marker panels. And, as a consequence, these molecular diagnostic panels require recalibration to appropriately account for the newly introduced entity, NIFTP; a lesion that should likely not be considered malignant (70,81,82).

Correlating Mutations to Pathology
The correlation between presence of mutations and malignancy is imprecise. Among 967 Bethesda Type III, IV, and V nodules, the detection of any mutation conferred the risk of histologic malignancy of 88, 87, and 95%, respectively (83). However, even in nodules with no detected mutations, the malignancy rates were 6, 14, and 28%, respectively. A systematic review by our group included 8,162 patients, of whom 42.5% had benign lesions (84). Among the benign lesions, RAS mutations, RET/PTC rearrangements, and PAX8/PPARgamma rearrangements were present up to 48, 68, 55% of the time, respectively. Thus, benign nodules frequently harbor mutations, while some malignant lesions harbor no detected mutations. The combination of the variable and potentially high level of mutations among benign nodules may explain the low specificity and PPV seen in Afirma. Furthermore, their prominence in benign lesions may also challenge the reported PPV of Thyroseq V2. This issue is further complicated when an indolent tumor, such as NIFTP, should be resected according to current ATA guidelines. NIFTP is commonly associated with RAS mutations (8/27, 29.6%) and its diagnosis is incompatible with the presence of BRAF V600E mutations (70,79). Moreover, Nikiforov et al. describe that 22% (6/27) of NIFTP samples harbor no detectable mutations. To conform to the recommendation that this indolent lesion be resected, new validation studies must show the reliable identification of NIFTP by molecular markers, an unlikely occurrence given the fact that benign lesions also harbor them.

Molecular Marker Negative Nodule Follow-Up
Despite a number of studies exploring the diagnostic value of GEC and Thyroseq ® V2, the current published literature includes discrepancies in the follow-up of molecular marker negative nodules and their consideration as a benign pathology (85,86). Consequently, this may lead to inaccuracies in diagnostic value calculation. A systematic review by Duh et al. (85) highlighted these issues. They included 12 studies and discussed the exclusion of cytologically indeterminate, GEC benign nodules from diagnostic performance calculations (malignant versus benign), leading to an erroneous decrease in Sp and NPV. This is due to the lack of surgical pathology specimens to establish a definitive diagnosis, as well as a lack of follow-up of GEC benign nodules to establish a reference diagnosis. To establish a diagnostic "reference standard" in these nodules that have not undergone surgery and include them in calculations, the authors argue that they should be considered as "true negative" only if no suspicious changes are noted on scheduled interval ultrasound examinations. However, even the natural history of benign thyroid nodules has been described to involve size changes. Indeed, a 5-year prospective study involving 1,567 sonographically or cytologically benign thyroid nodules showed nodule growth in 11.1% (87). However, thyroid cancer was diagnosed in five original nodules (0.3%), of which, only two had an increase in size. Furthermore, a retrospective study ranging from 1 month to 5 years, reported that 39% of the 268 benign thyroid nodules showed at least a 15% change in nodule volume (88). Only one of the 74 repeat-FNAs was malignant. The authors conclude that an increase in nodule volume alone is not a reliable predictor of malignancy.
Two studies have described their experience with follow-up of GEC benign nodules on ultrasound. A study by Angel et al. including 56 patients with cytologically indeterminate, GEC benign nodules followed for a median of 13 months exhibited similar growth (≥20% in two dimensions or ≥50% in volume) to cytologically benign nodules (86). Furthermore, in a grouped cohort analysis by Kloos et al. of 443 GEC benign patients in six studies with a reported follow-up time of 7-26 months, 380 patients (85.8%) were spared unnecessary surgery (89). Clearly, the currently available follow-up periods are inadequate for a definitive assessment, and larger, prospective studies are needed to further evaluate the behavior of cytologically indeterminate,

Cost-effectiveness
The cost-effectiveness of GEC and Thyroseq ® has also been an intense area of research. The cost for GEC and MTC is $4,875 while the cost for Afirma ® MTC alone is $975 and that of Afirma ® BRAF is $475 ( Table 5). The cost of Thyroseq ® V2 is $3200 (29). Despite these high costs, insured patient costs are capped at $300 for either GEC or ThyroSeq ® V2. Numerous studies have reported on the cost-effectiveness of both GEC (90,91) and Thyroseq ® V2 (92).
A 5-year cost effectiveness study of routine use of GEC reported 74% fewer operations for benign nodules with no increase in untreated cancers. Compared with standard clinical management based only on indeterminate FNA results, GEC may lower overall costs (standard cost $12,172 versus GEC cost $10,719) and improve quality of life for patients (91). Another study reported that to be cost effective, GEC's specificity would have to be greater than 68% and decrease the number of unnecessary surgeries performed on benign nodules by more 50% (90). However, a study by Yip et al. compared the average cost per patient with Bethesda IV nodules larger than 1 cm extending 10 years from follicular neoplasm diagnosis in three groups: standard of care, GEC, and Thyroseq ® . The authors reported a 13% increase in average cost per patient when using GEC at $13,027 (range $12,373-$13,666) when compared with the standard of care $11,505 (range $10,676-$12,347), but a 30% reduction in those using Thyroseq ® cost $7,683 (range $7,174-$8,333) (92).
Despite the conflicting results of GEC cost-effectiveness studies and the paucity of analyses on Thyroseq ® , these studies highlight the need to closely examine cost-effectiveness with the use of genetic studies. However, the true impact of thyroid molecular tests on a population's health care costs can only be determined taking cancer prevalence into account. Furthermore, cost-effectiveness also depends on proper education to ensure that these tests are only used in indicated clinical settings.

CONClUSiON AND FUTURe DiReCTiON
Molecular markers are a rapidly developing field despite the current limitations. Veracyte and CBL Path have begun to address the challenges discussed here. In 2017, Veracyte launched the Afirma ® Genomic Sequencing Classifier (GSC), its newest product which tests a total of 10,196 genes. The Afirma ® GSC test system is composed of a series of classifiers including parathyroid (mRNA expression), MTC (mRNA expression), BRAF (mRNA expression + variants), and RET/PTC fusion (fusion transcripts). GSC also includes a follicular content index (mRNA expression), HC index (mRNA expression and mitochondrial transcripts), and Hürthle neoplasm index (mRNA expression and chromosomal level loss of heterozygosity). The reliance on mRNA could prevent detection of mutations undetectable in transcriptomebased assays, such as telomerase reverse transcriptase (hTERT) promoter mutations. Veracyte states that GSC addresses the weaknesses of GEC by significantly increasing molecular test specificity, using a validation cohort with 15 NIFTP samples, and improving the diagnostic performance among HC lesions. Among Bethesda Type III and IV nodules, Veracyte quotes a Se, Sp, NPV, and PPV of 91, 68, 96, and 50%, respectively, when compared with the GEC parameters of 89 50, 93, and 46%, respectively. Among HC lesions, Sp has increased from 11.8 to 58.8%.
Similarly, in 2017 CBL Path sought to address similar issues with the release of Thyroseq ® V3 (93). It has an expanded its assay from 56 to 112 genes, detecting mutations, gene fusions, gene expression alterations, and copy number variations. In a prospective double-blind multicenter study using 257 nodules with Bethesda Types III-V (including 11 NIFTP samples, 10 HC carcinomas, 34 HC adenomas, and 5 hyperplastic nodules with HC predominance), Thyroseq ® V3 is reported to have an increased diagnostic value, including a Se of 98% (from 96.9%) and specificity of 81.8% (from 74.0%). Moreover, HC samples had an Se and Sp of 100.0 and 66.7%, respectively.
As experience accumulates with these next generation tests, we will gain a better understanding of how well they mitigate the limitations and challenges addressed herein. As our understanding of genetic drivers of malignant cancers and our understanding of NIFTP's malignant potential becomes clearer, molecular markers will continue to more accurately identify malignant nodules as well as to spare patients from unnecessary surgery.

AUTHOR CONTRibUTiONS
ZS conducted the literature review. ZS, PS, and CU drafted and revised the manuscript. MZ drafted and revised the manuscript and, as the senior and corresponding author, takes full responsibility.