TruSeq-Based Gene Expression Analysis of Formalin-Fixed Paraffin-Embedded (FFPE) Cutaneous T-Cell Lymphoma Samples: Subgroup Analysis Results and Elucidation of Biases from FFPE Sample Processing on the TruSeq Platform

Cutaneous T-cell lymphomas (CTCLs) are a heterogeneous group of malignancies with courses ranging from indolent to potentially lethal. We recently studied in a 157 patient cohort gene expression profiles generated by the TruSeq targeted RNA gene expression sequencing. We observed that the sequencing library quality and depth from formalin-fixed paraffin-embedded (FFPE) skin samples were significantly lower when biopsies were obtained prior to 2009. We also observed that the fresh CTCL samples clustered together, even though they included stage I–IV disease. In this study, we compared TruSeq gene expression patterns in older (≤2008) vs. more recent (≥2009) FFPE samples to determine whether these clustering analyses and earlier described differentially expressed gene findings are robust when analyzed based on the year of biopsy. We also explored biases found in FFPE samples when subjected to the TruSeq analysis of gene expression. Our results showed that ≤2008 and ≥2009 samples clustered equally well to the full data set and, importantly, both analyses produced nearly identical trends and findings. Specifically, both analyses enriched nearly identical DEGs when comparing benign vs. (1) stage I–IV and (2) stage IV (alone) CTCL samples. Results obtained using either ≤2008 or ≥2009 samples were strongly correlated. Furthermore, by using subgroup analyses, we were able to identify additional novel differentially expressed genes (DEGs), which did not reach statistical significance in the prior full data set analysis. Those included CTCL-upregulated BCL11A, SELL, IRF1, SMAD1, CASP1, BIRC5, and MAX and CTCL-downregulated MDM4, SERPINB3, and THBS4 genes. With respect to sample biases, no matter if we performed subgroup analyses or full data set analysis, fresh samples tightly clustered together. While principal component analysis revealed that fresh samples were spatially closer together, indicating some preprocessing batch effect, they remained in the proximity to other normal/benign and FFPE CTCL samples and were not clustering as outliers by themselves. Notably, this did not affect the determination of DEGs when analyzing ≥2009 samples (fresh and FFPE biopsies) vs. ≥2009 FFPE samples alone.

Cutaneous T-cell lymphomas (CTCLs) are a heterogeneous group of malignancies with courses ranging from indolent to potentially lethal. We recently studied in a 157 patient cohort gene expression profiles generated by the TruSeq targeted RNA gene expression sequencing. We observed that the sequencing library quality and depth from formalinfixed paraffin-embedded (FFPE) skin samples were significantly lower when biopsies were obtained prior to 2009. We also observed that the fresh CTCL samples clustered together, even though they included stage I-IV disease. In this study, we compared TruSeq gene expression patterns in older (≤2008) vs. more recent (≥2009) FFPE samples to determine whether these clustering analyses and earlier described differentially expressed gene findings are robust when analyzed based on the year of biopsy. We also explored biases found in FFPE samples when subjected to the TruSeq analysis of gene expression. Our results showed that ≤2008 and ≥2009 samples clustered equally well to the full data set and, importantly, both analyses produced nearly identical trends and findings. Specifically, both analyses enriched nearly identical DEGs when comparing benign vs. (1) stage I-IV and (2) stage IV (alone) CTCL samples. Results obtained using either ≤2008 or ≥2009 samples were strongly correlated. Furthermore, by using subgroup analyses, we were able to identify additional novel differentially expressed genes (DEGs), which did not reach statistical significance in the prior full data set analysis. Those included CTCL-upregulated BCL11A, SELL, IRF1, SMAD1, CASP1, BIRC5, and inTrODUcTiOn Cutaneous T-cell lymphomas (CTCLs) represent ~4-8% of all non-Hodgkin's lymphomas and are characterized by infiltration of malignant T lymphocytes into the skin (1). Most patients first present with stage I disease, limited to the skin, which can either follow an indolent course (in 70-80% of cases) or progress to a potentially devastating, deadly malignancy with a median survival of <3 years (2). The diagnosis of CTCL is rather challenging for several reasons. First, mycosis fungoides (MF) and Sézary syndrome (SS), the most recognized variants of CTCL, can have variable presentation (3). Second, other common and rare benign inflammatory dermatoses can mimic CTCL and vice versa. Classically, MF may present with centrally distributed erythematous patches and plaques that are not specific to CTCL and are commonly misdiagnosed as chronic eczema, psoriasis, pityriasis rubra pilaris, drug eruptions, and dermatophyte infections. Finally, histopathological analysis of skin biopsies and PCR evaluation of T-cell receptor clonality lacks sensitivity in early MF patients and in erythrodermic disease. Unfortunately, current time to CTCL diagnosis from its initial presentation averages ~6 years (4).
Recently, we analyzed using Illumina's TruSeq targeted RNA gene expression platform a new cohort of 157 patients, with biopsy-confirmed CTCL and compared it to a cohort of patients with normal skin and benign skin conditions (41). A number of patients in this study provided longitudinal biopsy samples (41). Analyzed samples included (A) 29 formalin-fixed paraffinembedded (FFPE) tissues from benign inflammatory dermatoses and skin tag biopsies (1 sample per patient; 7 skin tag samples and 22 benign inflammatory dermatoses samples); (B) 134 FFPE samples of lesional CTCL skin from 110 patients; and (C) an additional 18 samples of freshly obtained and liquid nitrogen snapfrozen skin samples from a different group of CTCL patients. We processed 181 skin biopsy samples either freshly obtained or FFPE using TruSeq platform, capturing 284 genes that were previously identified as important for CTCL diagnosis and/or prognosis (32,48). We identified 75 statistically significant differentially expressed genes (DEGs) between benign skin samples and either all CTCL or stage IV CTCL samples (41) and validated a number of our previous diagnostic and prognostic expression markers (3,41).
However, we noticed non-trivial heterogeneity when performing clustering based on the TruSeq gene expression data, where early-stage CTCL samples and benign samples were admixed in the same clusters with the stage IV advanced CTCL disease. We hypothesized that this could be due to differences in TruSeq library sequencing depth and/or variation in the quality of the FFPE samples obtained during 2007-2008 (older) vs. 2009-2012 (more recent) years. Indeed, recent samples that were freshly obtained and snap frozen had comparable total number of sequencing reads (400-1,000 K reads), while older FFPE samples had often <300 K sequencing reads (41). In addition, we observed that freshly obtained snap-frozen CTCL samples were often tightly grouped in the same cluster, independent of their disease stage (41). This may indicate that TruSeq gene expression analysis may be affected by intrinsic biases based on the very natures of the samples analyzed (e.g., FFPE vs. fresh-frozen biopsies).
Notably, these variables (i.e., old vs. new; FFPE vs. freshly obtained snap frozen) were not formally evaluated in the prior publication but may contribute to the observed heterogeneity. These variations contribute toward a larger problem, known as the batch effect, in the field of gene expression-based analyses that utilize TruSeq, RNA-Seq, gene expression microarrays, and other approaches to identify DEGs. Differences in preprocessing, sequencing runs, technicians/centers, date of experiments, populations, and experimental design can account for heterogeneity that will remain despite normalization and use of control samples. Potential consequences of batch effect include reduction of statistical accuracy, introduction of spurious DEGs, and discrepancies between observed and true correlations (49). Several techniques can be used to minimize batch effects without removing true signals including surrogate variable analysis (50), ComBat (51), and principal component-based approaches (i.e., EIGENSTRAT among others) (52).
In this study, we aimed to characterize TruSeq gene expression patterns separately in older (≤2008) vs. more recent (≥2009) FFPE samples to determine whether clustering analyses results display robustness when compared to the full data set. We also explored sample processing biases (old vs. new and FFPE vs. freshly obtained snap frozen).

Patients and samples
As described before (41), all patients were enrolled in the study in accordance with the IRB-approved protocols: PA12-0267, PA12-0497, and Lab97-256 at the MD Anderson Cancer Center (MDACC) and A09-M106-13A and 13-201-GEN at McGill University/McGill University Health Centre (MUHC). This study was carried out in accordance with the recommendations of the Research Ethics Board of the McGill University/MUHC with written informed consent from all subjects in accordance with the Declaration of Helsinki. This study was carried out in accordance with the recommendations of the MDACC Research Ethics Board, which exempted us from obtaining written informed consent from patients, who earlier signed a hospital consent allowing their stored biopsy samples to be used for research.

Data acquisition
Processed TruSeq data from Litvinov et al (41) were re-analyzed in this study based on transcripts per million (TPM) and RNA integrity number (RIN) parameters. Raw data were deposited in the NCBI SRA, accession number SRP114956. We separated CTCL FFPE samples obtained from the MDACC into two subgroups: older (≤2008) vs. more recent (≥2009).
clustering Unsupervised hierarchal clustering was performed in R, using packages stats, cluster, and gplots. Pairwise dissimilarity (distance) matrix was calculated using Gower's method, which performs well in the case of incomplete/missing data when compared to other methods (53). Clusters were obtained using Ward's clustering method and criteria (54). Silhouette plots followed by visual inspection (to ensure appropriately sized clusters) were used to assess clusters and subclusters divisions. We repeated similar comparisons for all samples, benign samples vs. stage IV CTCL disease, and early (stage ≤IIA) vs. intermediate (stages IIB and III) vs. advanced (stage IV) CTCL.

Principal component analysis (Pca)
Principal component analysis was performed on scaled, centered TPM data using package pcaMethods (55). Probabilistic PCA was used to account for missing data. Score plots of principal components 1 and 2 were generated.

statistical analyses
Differences in mean TPMs were determined using two-tailed Ward's t-test. Power analysis showed an 86% power to detect a twofold expression change at a significance level of 0.05 for the comparison between the smallest subgroups, with complete data points. Correlations were computed using Spearman's rho, on log-2 ratios. Mean RINs were compared using a Bayesian analysis with Markov Chain Monte Carlo (MCMC) simulations, using R package rjags; at least 100,000 iterations were performed to estimate p values.

resUlTs subgroup clustering analysis of all samples
We previously noted that the ≤2008 FFPE samples had significantly decreased number of sequence reads per sample when compared to the ≥2009 samples (mean 103,406 ± 96,620 vs. 437,218 ± 550,840 reads, respectively). Therefore, we repeated unsupervised hierarchical clustering for benign samples (skin tags and benign inflammatory dermatoses), fresh liquid nitrogen snap-frozen CTCL samples, and either ≤2008 or ≥2009 FFPE CTCL samples. For ≤2008 FFPE sample analysis (Figure 1), we observed three major clusters. Cluster 1 comprised exclusively the FFPE CTCL samples, mostly early-stage (≤IIA) (12/33), along with two mid-stage (IIB and III) (2/8) and one late-stage (IV) (1/14) disease. In Cluster 2, 21 of 22 samples were from CTCL patients representing advanced stages (mid = 4/8 and late = 9/14), along with one eczema sample and a number of early-stage CTCL samples (8/33). Cluster 3 formed multiple subgroups (~4) that comprised mostly benign samples (28/29) and fresh CTCL samples, along with many early-stage and mid-late stage FFPE CTCL samples (early = 13/33, mid = 2/8, and late = 4/14). As previously discussed (41), one of the subgroups encompassed all fresh CTCL samples, which tightly clustered together (18/18). Two of the subgroups contained mostly benign samples, while the last one had early-stage FFPE CTCL samples. For ≥2009 FFPE samples (Figure 2), we noted two small clusters and two larger clusters. The first small cluster on the left panel (Cluster 1) contained six FFPE CTCL samples (two early, one mid, and three late). The second small cluster on the right (Cluster 4) included 18/18 fresh CTCL samples similarly to our previous analyses along with 2 benign biopsies. The first large cluster on the center left panel (Cluster 2) exhibited significant molecular disease heterogeneity. The first subgroup (A) had primarily mid-stage (n = 9) CTCL skin biopsies, one early and two late-stage samples, while the other two subgroups (B and C) were very heterogeneous with respect to their composition. For the second large cluster on the center right panel (Cluster 3), a similar admixture was observed with three subgroups, one subgroup being comprised primarily benign skin

subgroup clustering analysis of healthy skin/Benign inflammatory Dermatoses samples vs. stage iV cTcl samples
We then performed unsupervised hierarchical clustering for benign samples (which included skin tags and benign dermatoses that often clinically mimic CTCL) vs. stage IV CTCL disease. Similarly, two analyses were performed for ≤2008 and ≥2009 FFPE biopsies. In the case of ≤2008 samples (Figure 3), there were two major clusters that separated quite well these biopsies based on gene expression changes. Cluster 1 had 13 samples, 12 of which were stage IV CTCL disease (including 12/14 of total stage IV CTCL samples) and 1 sample form a patient with chronic eczema. Cluster 2 contained 30 samples in total and comprised mostly benign dermatoses and skin tags (n = 28) and 2 stage IV CTCL samples.
Surprisingly, for ≥2009 samples (Figure 4), greater overall heterogeneity was observed. However, we noted one small cluster in the right panel and one large cluster with three subgroups in the center. Cluster 1 (right panel) had nine samples, eight of which were stage IV samples (8/20 total stage IV CTCL samples). Cluster 2 was subdivided into three subgroups, where 2A samples (n = 7) with advanced CTCL disease tightly clustered together, while 2B (n = 20) and 2C (n = 11) samples included primarily benign dermatoses and skin tags (85 and 82%, respectively, for each subcluster).  (2) stage IV CTCL. In our initial report (41) and other inflammatory cytokines. In this study, we repeated the analysis of the FFPE samples obtained ≤2008 vs. ≥2009. As presented in Table 1, our analysis revealed 54 DEGs (p < 0.05), when ≤2008 stage I-IV CTCL or ≤2008 stage IV CTCL samples were compared to benign skin samples. This list included 47/75 DEGs that were enriched in the initially reported full data set (41). New highlighted CTCL-upregulated targets in this analysis included BCL11A, SELL, IRF1, SMAD1, CASP1, and BIRC5, while THBS4 was upregulated in benign skin samples. For ≥2009 samples, 41 significant DEGs (p < 0.05) were found when freshly obtained and for ≥2009 samples, FFPE CTCL biopsies were analyzed together in a similar way ( Table 2). Importantly, the same 41 DEGs were identified using only the ≥2009 FFPE samples alone (i.e., excluding the freshly obtained biopsies from this analysis). In the latter analysis, four additional CTCL-upregulated DEGs (EP400, NFKB1, TRRAP, and MAX) were revealed as being statistically significant ( Table 2).
Based on these combined results, 42/75 DEGs were confirmed in both analyses, which highlights significant robustness of these tests. Of course, many of the initially identified DEGs did not achieve statistical significance since the number of samples analyzed in each of these subanalyses (i.e., ≤2008 and ≥2009) was significantly smaller than when all the data were analyzed as one set. Moreover, based on the original TruSeq data, subgroup analysis showed consistency in log-2 ratios between ≤2008 and ≥2009 CTCL samples. Indeed, rank-rank correlation when comparing benign dermatoses vs. all FFPE CTCL samples was ρ = 0.71 (strong; p < 10 −16 ), while this indicator was ρ = 0.55 (medium; p < 10 −16 ) when comparison was made between benign dermatoses and stage IV FFPE biopsies. We then performed unsupervised hierarchical clustering analysis for early (≤IIA) vs. mid (IIB and III) vs. late (IV) stage CTCL for ≤2008 vs. ≥2009 samples. Similarly, we noted a significant molecular heterogeneity that was seen in our original report (41). However, for the ≤2008 CTCL FFPE samples (Figure 5), there were two major clusters. Cluster 1 had 12 samples, 10 of which were early-stage CTCL biopsies (10/31 of the total early-stage CTCL samples). Cluster 2 was rather heterogeneous with respect to its composition and could be subdivided into two subclusters: 2A, larger, with samples from all different stages and 2B with the well-defined subgroup of early-stage CTCL biopsies (10/11) on the right side of this subcluster. For ≥2009 samples (Figure 6), the distribution of samples was very heterogeneous as was seen in our earlier report (41 Table 3). For ≥2009 samples, three different genes were identified: SKAP1 and GTSF1 were upregulated in late-stage CTCL, while BCL11A was upregulated in early-stage CTCL ( Table 4). Overall, merging both subgroups, we validated three of the four genes observed in the full data set when performing the same analysis: TOX and GTSF1 were upregulated in late-stage CTCL, and LTBP4 was upregulated in early-stage CTCL.

comparison of the Truseq Data Quality in FFPe vs. Freshly Obtained snap-Frozen samples
With respect to the RINs, a measure of sample quality prior to conducting the TruSeq analysis, we observed lower RINs for FFPE samples than freshly obtained snap-frozen samples.
However, these RINs were within the expected range for FFPE samples (56,57). RINs were much higher in fresh samples than in the FFPE samples, as expected (fresh: mean 6.1, 95% CI 5. To detect possible batch effects, we performed PCA on TPM data. We aimed to determine whether (1) freshly obtained flash  snap-frozen samples cluster together and (2) ≤2008 and ≥2009 FFPE samples cluster in different areas. We observed a tight cluster of fresh CTCL samples (gray), whether using ≤2008 (Figures 7A,B) or ≥2009 (Figures 7C,D) FFPE CTCL samples, indicating that differences in preprocessing protocols might explain these findings (tight associations in clustering analyses). However, these freshly obtained samples were also in close spatial proximity to normal/benign samples and a number of FFPE samples. When comparing ≤2008 and ≥2009 FFPE samples, we observed no clear clusters (Figures 7E,F), but rather two loose associations. First, many newer samples (≥2009) were clustering around the center of the distribution, toward normal/benign and freshly obtained samples, indicating less preprocessing batch effect. Second, samples showing greater variability were mostly older samples (≤2008), indicating that there might be some processing batch effect among these. Taken together, these findings may explain why performing individual subgroup analyses enabled us to uncover additional DEGs.

DiscUssiOn
In this study, we used subgroup analysis to determine whether older ≤2008 FFPE samples, which were sequenced at a lower depth on the TruSeq platform, were comparable to those obtained ≥2009.
In this study, we also systematically analyzed sample processing biases based on the year of biopsy and the nature (i.e., FFPE vs. freshly obtained snap frozen) of the samples. Clustering analysis showed that ≤2008 and ≥2009 samples clustered equally well to the full data set and, furthermore, in a number of instances they demonstrated even better defined clusters. In particular, for ≤2008 samples, clusters were more reminiscent of the three clusters found in the landmark Boston CTCL cohort (3,32,48) when looking at all samples. There was also a better discrimination between benign and stage IV CTCL samples in ≤2008 samples than in the ≥2009 samples. Both analyses produced nearly identical trends and findings. Specifically, both analyses enriched nearly identical DEGs when comparing benign vs. (1) stage I-IV and (2) stage IV (alone) CTCL samples. Importantly, in this subgroup analysis, we recapitulated most of the targets seen within the full data set. Results obtained using either ≤2008 or ≥2009 samples were strongly correlated. Known upregulated targets in CTCL vs. benign dermatoses were validated, including TOX, FYB, LEF, and STAT signaling genes, inflammatory interleukins, NF-κB pathway signaling members, cancer testis genes, etc. We had previously reviewed in detail how these genes relate to the biology of CTCL tumorigenesis (3,31). Furthermore, this subgroup analysis enabled us to discover additional genes, which did not reach statistical significance in the full data set analysis. One may find it to be counterintuitive. However, indeed, despite the inherently decreased power, potential reasons why additional DEGs can be identified through subgroup analysis may include reduced variability on a per-sample basis due to increased in-group similarity and removal of outliers in some groups.
Those new DEGs included CTCL-upregulated BCL11A (regulation of RNA transcription), SELL (cell adhesion molecule in the    selectin family), IRF1 (Interferon transcription factor), SMAD1 (BMP signaling and gene expression), CASP1 (caspase involved in proteolysis), BIRC5 (inhibitor of apoptosis, survivin), MAX (Myc-associated transcription factor), and CTCL-downregulated MDM4 (negative regulator of p53), SERPINB3 (serine protease involved in inflammatory response), and THBS4 (cell-cell and cell-matrix interactions) genes. Of note, THBS4 promoter was previously found to be frequently hypermethylated in 52% of CTCL samples, which leads to the downregulation in expression of this tumor suppressor gene (58). CASP1 single-nucleotide polymorphisms were associated with changes in NF-κB signaling and development of other non-Hodgkin lymphomas, including diffuse-large B cell lymphomas and small lymphocytic lymphoma/ chronic lymphocytic leukemia (59). By using the full data set analysis, we found significant heterogeneity in our clusters (41). When we performed clustering on ≤2008 or ≥2009 FFPE CTCL samples, we still did not obtain three clusters that were previously described in the historic Boston cohort of CTCL patients (3,32,48). However, in this study of subgroup analyses, we noted less heterogeneity than we observed in the full data set analysis (41). PCA results also supported this conclusion. Indeed, in this subgroup analysis, samples of similar clinical disease stages were most often grouped together.
Subsequently, when we studied the DEGs enriched in both (1) early vs. mid and late CTCL and (2) stage I vs. stage IV disease, four genes were differentially expressed: TOX (involved in chromatin processes and T-cell development), FYB (T-cell adaptor protein), and GTSF1 (germ cell maintenance) were upregulated, and LTBP4 (latent TGF-beta binding protein) was downregulated in later CTCL stages. By merging subgroup analysis of ≤2008 and ≥2009 FFPE samples, our targets included TOX, GTSF1, and LTBP4 as well. In particular, TOX overexpression is a hallmark of poor prognosis in CTCL, although low level of TOX expression has been previously reported in benign dermatoses (31,60). TOX and GTSF1 are aberrantly expressed developmental and meiotic genes that can prognosticate CTCL progression toward advanced disease (29,31,34). We also found that EED (Polycomb complex member expressed in embryonic stem cells), SKAP1 (T-cell adhesion), and LCP2 (T-cell receptor-mediated signaling) were upregulated in advanced CTCL stages. Surprisingly, we also found multiple genes with higher expression in early-stage tumors. These included BCL11A (see above), ATXN7 (chromatin remodeling, AKT signaling), HUNK (AMPK-related kinase), CHD1 (chromatin remodeling), TP63 (transcription factor), KIT (receptor tyrosine kinase), JUNB (transcription factor), HDAC2 (histone deacetylase), and OTUB2 (deubiquitinase, inhibits proteolysis). Based on these combined results, transcription factors, chromatin remodelers, and global cell signaling processes are upregulated early in the disease, while in the advanced stages of CTCL, T-cell-specific genes, inflammatory mediators, and stem cell/germ cell maintenance genes appear to be driving cancer progression. These results further argue that subgroup analysis can often yield additional clues into the biology of cancers.
Formalin-fixed paraffin-embedded samples have RNA of lesser quality than the freshly obtained snap-frozen samples (61). However, FFPE samples are much easier to obtain in the clinical setting, have longer storage half-life, and are suitable for immunohistochemistry in a clinical pathology lab (62). Our FFPE RINs were comparable to those obtained in previous studies (56,57). No matter if we performed subgroup analyses or full data set analysis, fresh samples tightly clustered together. While PCA revealed that fresh samples were spatially closer together, indicating some preprocessing batch effect, they remained in the proximity to other normal/benign and FFPE CTCL samples and were not clustering as outliers by themselves. However, this observed batch effect did not affect the determination of DEGs when analyzing all ≥2009 samples (fresh and FFPE biopsies) vs. ≥2009 FFPE samples alone. Other reports comparing freshly obtained frozen samples to FFPE samples showed a strong correlation (ρ > 0.70) in gene expression analysis (63). Formalin acts as a crosslinking agent for protein-protein, DNA-protein, and RNA-protein interactions (64). Crosslinking nucleic acid to proteins has its advantages in molecular medicine and is especially useful in characterizing transcription factor binding sites via chromatin immunoprecipitation (65) or RNA-protein interactions using RNA immunoprecipitation (66). In this study, we have successfully applied TruSeq targeted RNA sequencing to CTCL samples, both fresh and FFPE. A recent, direct comparison of TruSeq-analyzed RNA obtained from matched FFPE vs. fresh samples produced strongly correlated gene expression findings (R 2 > 0.70) (67). Interestingly, previous studies showed that the RINs can range from 2.2 to 2.8 (median 2.3) for FFPE samples and 3.8 to 8.0 (median 6.8) for freshly obtained samples (67), which is consistent with our findings detailed in this report. In the study by Graw et al., illumina sequence reads between FFPE and freshly obtained matched samples showed a 0.33% error rate (67), which is consistent to previous reports for identical samples processed on the Illumina platform, when a 0.30% error rate was reported (68). In summary, our results indicate that performing targeted gene expression studies on the TruSeq platform from FFPE samples is a viable option that can be used in the real-life, clinical medicine setting.