Comprehensive and Computable Molecular Diagnostic Panel (C2Dx) From Small Volume Specimens for Precision Oncology: Molecular Subtyping of Non-Small Cell Lung Cancer From Fine Needle Aspirates

The Comprehensive, Computable NanoString Diagnostic gene panel (C2Dx) is a promising solution to address the need for a molecular pathological research and diagnostic tool for precision oncology utilizing small volume tumor specimens. We translate subtyping-related gene expression patterns of Non-Small Cell Lung Cancer (NSCLC) derived from public transcriptomic data which establish a highly robust and accurate subtyping system. The C2Dx demonstrates supreme performance on the NanoString platform using microgram-level FNA samples and has excellent portability to frozen tissues and RNA-Seq transcriptomic data. This workflow shows great potential for research and the clinical practice of cancer molecular diagnosis.

Currently, the treatment of NSCLC is dependent on accurate histological and molecular subtyping as well as other clinical and pathological features (1,2). Clinical guidelines such as the NCCN (the National Comprehensive Cancer Network) guidelines (Version 5.2018) (3) and the 2015 World Health Organization classification (4) recommend that histologic and molecular features be used in determining treatment options. Pathologists use hematoxylin-eosin (H&E)-staining on samples to identify NSCLC subtypes like adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC). Immunohistochemical stains are used to diagnose subtype, but in the event further stains are unable to do so, a diagnosis of "NSCLC, not otherwise specified (NOS)" is made. Positive staining for thyroid transcription factor 1 (TTF1, coded by gene NKX2-1) and napsin A (novel aspartic proteinase of the pepsin family) help to classify LUAD. LUSC stains for markers tumor protein p63, and cytokeratins (CK) 5, 6, and 7. After lung cancer subtyping is complete, non LUSC are evaluated for targetable driver genomic alterations such as epidermal growth factor receptor (EGFR) mutations, anaplastic lymphoma kinase (ALK) rearrangements, c-Ros oncogene 1 (ROS1) rearrangements, and BRAF V600E mutation. Checkpoint markers such as PD-1/PD-L1 (programmed cell death-1 and programmed death-ligand 1) (1) are further examined in all subtypes of NSCLC. This process determines the treatment options for patients.
In advanced and metastatic NSCLC cases, small volume biopsies are generally obtained through the least invasive diagnostic option. The morphology-based subtyping can utilize a significant amount of limited tissue and can lead to insufficient biopsy material for molecular testing. Despite the fast-growing quantity demands for diagnostic tissues, minimally invasive small volume biopsies are commonly utilized for diagnosis. The fine-needle aspirate (FNA) is now the most routinely utilized biopsy approach for NSCLC (5)(6)(7). While sufficient for confirming the existence of cancers through transthoracic needle biopsy (TTNB), the amount of cellular material from FNAs can be sparse. This is often cited (8) as the limiting factor to the correct classification of NSCLC histology, a problem further compounded by the poor cellular differentiation characteristic of advanced-stage disease.
Over the last decade, we have been introduced to precision oncology initiatives (9) and novel immunotherapy treatment options that require additional biopsy tissue for molecular testing and next generation sequencing (10,11). New and emerging therapies on or off clinical study require additional tissues for examination and continues to contribute to the growing demand on limited tissues obtained at diagnosis. In real-world clinical settings, small tumor sample volume limits the implementation of many next-generation sequencing (NGS) technologies (12). For example, approximately 14% of clinical FNA samples qualified for the 324-gene FoundationOne CDx targeted genomics sequencing (13)(14)(15) after the samples were used for diagnosis and molecular profiling. NGS devices that are specially designed for small sample volumes, such as the 26-gene Oncomine Dx Target Test, can only test a small number of genes. Liquid biopsy technologies, though available in NSCLC patients (16,17), are only utilized if tissue samples are exhausted and additional specimens cannot be obtained. Therefore, greater effort and novel strategies are required to minimize consumption of diagnostic tissues in order to maximize diagnostic output.
We have been pioneering the development of a novel device, the Comprehensive, Computable NanoString Diagnostic gene panel (C2Dx), as a promising solution to address the burning clinical need in molecular pathological diagnosis with small volume tumor specimens. The NanoString nCounter ® (18) is a digital multiplexed molecular diagnostic platform supporting the molecular profiling of up to 800 genes in a single panel. It uses extremely small amounts of tissues and has been used successfully in precision oncology applications (19,20), including the detection of fusion genes (21) in lung cancer cytology and molecular subtyping of diffuse large B-cell lymphoma (22). We established clinical and laboratory protocols for reliable lung cancer RNA acquisition from a single transbronchial or transthoracic needle pass using fine needle aspiration (23). Key differentially expressed genes in LUAD and LUSC cases demonstrated high consistency across the NanoString platform using FNA samples and the RNA-Seq platform using the bulk tumor samples. We have also demonstrated the successful translation of knowledge learned from bulk-based transcriptomics data to a NanoString gene panel in exploring novel prognostic molecular subtypes in appendiceal mucinous neoplasms (24). A critical step from what we have achieved toward a clinical C2Dx device is to transfer knowledge discovered from existing omics data to a highly robust computable gene panel. In this work, we used the NSCLC subtyping task as a working example to demonstrate the feasibility of such quantitative translational modeling from bulk transcriptomic data to FNA-based NanoString gene panels.
In this work, we developed a highly robust, high performing, low tissue demanding, molecular subtyping system to compliment histology determination (Figure 1). Using a 67gene NanoString transcriptomic gene panel learned from four existing NSCLC omics datasets and generated from nanogramlevel RNA samples, we implemented an intensive resampling strategy for robust elastic net regularization of logistic regression. We established a NanoString-based 19-gene panel (15 signature genes and 4 housekeeping genes) and demonstrated a competitive NSCLC subtyping classification with sound accuracy when compared to interpretations from experienced pathologists. We were able to utilize a single FNA pass to determine adenocarcinoma from squamous histology. This strategy potentially allows for minimal diagnostic tissue utilization for subtyping which can lead to a decrease rate of tissue depletion and need for additional invasive biopsies. The 19-gene subtyping model demonstrates high accuracy across different sample types (FNA specimens and frozen tumor tissues) and transcriptomic platforms (NanoString gene panels and RNA-seq). This work proved the concept of transferring knowledge from omics data generated at bench-side to a robust computable device implementable at bedside through robust translational modeling and FNA-based NanoString C2Dx system.

Overview of the Pipeline
The aim of this study is to establish a robust, comprehensive, computable molecular subtyping device using FNA-based nanoString gene panel and knowledge of subtype-related gene expression from public data to distinguish LUAD and LUSC in NSCLC cases. As demonstrated in Figure 1, as previously reported (23), knowledge about LUAD/LUSC subtyping related genes was learned from a meta-cohort (n = 490) microarray transcriptomics data extracted from public source [Gene Expression Omnibus: GSE10445 (25), GSE4573 (26), and GSE3141 (27); the Director's Challenge Consortium for the Molecular Classification of Lung Adenocarcinoma (28)] and translated to a nanoString 67-gene panel for clinical use. This panel was used to generate targeted transcriptomics data on tissues collected by FNA-based biopsy from 83 WFBH's NSCLC patients. A robust 19-gene molecular subtyping model were trained using the generated data, intensive resampling, and elastic-net regularized logistic regression. The molecular phenotyping model was validated using nanoString-generated data of 44 frozen samples collected from WFBH's tumor tissue bank and RNA-Seq data of 1,016 samples from TCGA's LUAD and LUSC cohorts.

Patient Selection and Procedures
The study was approved by the Institutional Review Board of the Wake Forest Baptist Medical Center from 2013 and 2015. Patients completed informed consent forms prior to enrollment. Eligibility requirements were for radiographic evidence for lung cancer or have a previously diagnosed NSCLC with potential recurrence requiring re-biopsy.
Patients enrolled on study agreed to provide one additional small volume aspirate pass for RNA collection after completion of standard of care diagnostic tissue collection for presumed NSCLC. Patients underwent FNA of primary or secondary metastatic lesions. Exclusion criteria included patients whose FNA biopsy was unable to provide subtype classification by pathology, consistent with small cell carcinoma, non-malignant etiologies, or returned non-diagnostic. Small volume FNA diagnostic modalities were obtained by: endobronchial ultrasound guided transbronchial needle aspiration (EBUS-TBNA), conventional trans-bronchial needle aspiration (cTBNA) and trans-thoracic needle biopsy (TTNB). Bronchoscopy cases were performed by pulmonogists using rapid on-site cytology evaluation (ROSE) to help decipher results. TTNB were performed by interventional radiologists with training and experience with CT-guided thoracic procedures.
Histology classification of the 127 WFBH samples was determined through diagnostic pathological review of Haemotoxylin and Eosin (H&E) stained microscopic slides of tumor tissues by two mutually blinded pathologists and discordant cases were labeled as NOS.

NanoString-Based Transcriptomic Data Generation
A 67-gene panel has been constructed and the nanoString-based transcriptomic data has been generated for the study cohort as previously reported (23). Among these genes, 23 were LUADassociated, 40 were LUSC associated, and 4 were housekeeping genes for normalization purpose. The full gene list can be found in Supplement Table S6.

Individual Normalization of Gene Expression
For NanoString-based FNA data, the raw Reporter Code Count data was used. For TCGA samples, the expressions of the 67 genes in the form of FPKM-UQ (fragments per kilobase per million reads, upper quartile normalized) (29) were used. For each sample, original expression of all genes were log2 transformed, normalized against the mean of the 4 housekeeping genes, and z-transformed. Cross-sample normalization was avoided to simplify clinical implementation. Details are provided in Supplement: Data Processing.

Logistic Classification Model
Assuming that the subtypes LUAD and LUSC followed a binomial distribution, the odd of the probability of a given sample i was from an LUAD patient (p) vs. from an LUSC patient (1p) was modeled using a logistic regression model given by: log

Feature Selection and Model Training
Elastic net regularization (30) and cross-validation with intensive resampling was used for feature selection. We first determined optimal model complexity, that is, the optimal number of genes in the model. Let y i indicate the true subtype of sample i, y i = 1 for LUAD and y i = 0 for LUSC, the objective function for the elastic net penalized logistic regression model was: where n was the sample size, l > 0 was the tuning factor, 0 ≤ a ≤ 1 was the elastic net mixing factor that controlled the mixing percentage between the ridge regularization (a = 0) and the LASSO regularization (a = 1), ||•|| 1 and ||•|| 2 represented the L1 and L2 norm, respectively. To reach a robust solution, 3-fold cross validation was used for 1,000 resampling iterations with a screened from 0 to 1 at step size of 0.1. Evaluating the model performance by accuracy, the 15-signature-gene model was selected with estimated accuracy of 0.902 ± 0.053. All FNA samples was then used to train the final 15-signature-gene classification model using logistic regression with elastic net regularization. Supplement Figure S2 provided more details on elastic net feature selection. R (version: 3.3.1) packages glmnet (version 2.0-10) (31) and caret (version 6.0-77) (32) was used for feature selection and model training.

Model Validation
The classification model, with authentic coefficients, was directly implemented on the internal (44 NanoString samples from frozen tumor tissues) and external (1,016 RNA-Seq data from TCGA datasets) to classify TCGA sample subtypes for evaluating model performance.

Profiling NOS Cases
The 11 FNA samples diagnosed as NOS were analyzed with the 19-gene molecular subtyping model and compared with the LUAD and LUSC cases. The logistic probability patterns of NOS, LUAD, and LUSC cases were analyzed using enrichment analysis and Kolmogoro-Smirnov test.

Statistics Methods
The convergence of model stability during elastic net regularization with respect to resampling rounds was measured by the estimated values as well as the corresponding 95% confidence intervals of the model complexity, accuracy, and elastic net mixing factor a. The receiver operating characteristic (ROC) curves were generated for model performance on the FNA, internal tissue bank, and external TCGA cohorts and the corresponding concordance statistic (cstatistic, a.k.a. the area under the ROC curve, AUC) (33-35) were calculated for model performance comparison.

Patient Characteristics
Patients' demographics and clinical characteristics were obtained for the Exploring, Training, and Validation cohorts summarized in Table 1 and Supplement Table S1. Most characteristics were consistent across all cohorts. Such consistency between the research cohorts (Exploring and TCGA) and the clinical cohorts (WFBH-FNA and WFBH-TB) were important for knowledge translation from scientific studies to clinical applications.

Feature Selection and Model Training
The original NanoString 67-gene panel data as well as the individually normalized gene profile of the Exploring cohort and FNA cohort were visualized in Figure 2 and Supplement Figure S1, respectively. The panel was composed of 63 candidate diagnostic genes and, for internal normalization purpose, 4 housekeeping genes. The candidate diagnostic genes demonstrated strong differential expression pattern in the LUAD vs. LUSC samples. Specifically, the 23 LUAD-specific genes were highly expressed in LUAD samples, and vice versa for the 40 LUSC-specific genes. To reach a robust model, a large-scale resampling for cross-validation was performed over the full range of elastic net mixing factor a. Through 1,695,000 training-and-testing rounds, the best performance of an accuracy of 0.902 ± 0.053 was achieved at a = 0.5 with a model complexity of 15 predictive genes (Supplement: Feature Selection and Figure S2 and Figure S3). The boxplots of key model features such as the optimal a, l, accuracy, and the model complexity during the first 100 resampling were shown in Supplement Figure S4. The convergence of model training during resampling was shown in Supplement Figure S5.

The Molecular Subtyping Model
The final 15 signature genes as well as corresponding coefficients were listed in Table 2 and Supplement Table S2. The performance of the model on FNA cohort was demonstrated in Figure 3 and listed in Supplement Table S3.
The intensive resampling was crucial since the sample size was relatively small (n = 72) comparing with the candidate gene set (m = 63). As shown in Supplement Figure S4, significant variations were observed for the FNA dataset, comparing with the much more stable results for The Cancer Genome Atlas (TCGA) dataset (n = 1,016) (Supplement Figure S5). The convergence analysis (Supplement Figure S4) showed that, to confidently estimate the model complexity and the elastic net mixing factor, at least 1,000 rounds of resampling was required. Resampling approach also provided a realistic estimation of the model performance (Supplement Figure S5). The model performance on internal tissue bank dataset (accuracy of 89.3%) and on external TCGA dataset (accuracy of 94.5%) were consistent with the predicted performance (0.903 ± 0.053).

Model Validation
The model was validated internally with the 67-gene panel transcriptomics generated on the NanoString platform with frozen tumor LUAD and LUSC tissues, and externally with the RNA-Seq transcriptomics from TCGA LUAD and LUSC dataset. The performance on the WFBH tissue bank cohort and on the TCGA cohort were shown in Supplement Tables S4, S5, and Figure 3, respectively. The c-statistics of the model on the WFBH-FNA, WFBH-TB, and the TCGA cohorts were 0.986, 0.911, and 0.982, respective, with the corresponding Receiver Operating Characteristic (ROC) curves shown in Figure 3C.
Control of overfitting is a major challenge in molecular subtyping, since comparing with the large size of candidate biomarkers made possible by modern high-throughput or multiplex technologies. The clinical cohorts are usually small. We have significantly reduced the candidate biomarkers from microarray data to 63 genes. Such a focused gene panel still raise overfitting concern for our clinical cohort. With elastic net regularization and intensive resampling, we reduced the model complexity (gene size = 15) and achieved an events-per-variable (EPV) level of about 5. Such EPV level was around the minimum statistically acceptable (36) ratio for model development. Therefore, the validation independent dataset was necessary. The TCGA cohort validation confirmed that the established molecular subtyping model was not overfitted.

Choosing Thresholds for Confident Subtyping in Clinical Implementation
In clinical practice, it is important to identify which cases can be confidently subtyped without pathologists' review. Subtypespecific thresholds, q LUAD and q LUSC , can be used to reach required confidence. Samples falling between these thresholds need to be examined by pathologists. As shown in Supplement Figure S6, 94.7% of clinical samples can be confidently subtyped with 95% specificity when using q LUAD = 0.84 and q LUSC = 0.74. About 87.5% clinical samples can be subtyped at a specificity of 97% if thresholds q LUAD = 0.89 and q LUSC = 0.66 are used.

Comparison With Current Subtyping Approaches
The classification model demonstrated high accuracy when compared to standard NSCLC clinical subtyping. The TCGA LUAD and LUSC cohort study model outperformed the histopathological image-based artificial intelligence approach using deep convolutional neural network (37), with an AUC of 98.2% vs. 97%. On clinical samples, a subtyping specificity of 97% is at the highend of current pathological diagnosis utilizing immunohistochemical markers (38). The performance analysis showed that only about 12.5% clinical samples are below this specificity and thus need to be further reviewed by pathologists. Implementing our subtyping model will significantly relieve the burden of pathologists.

NOS Case Profiling
The NOS cases in the WFBH-TB cohort were examined using the developed molecular subtyping model and the subtypes of these cases predicted (Figure 3). Further analysis demonstrated unique subtyping probability distribution of NOS cases (the gray cumulative density function curve and the gray probability density function peak in the left and right panel of Figure 4A) comparing with typical LUAD and LUSC cases (Kolmogoro-Smirnov test p-values are 2×10 -4 for both NOS vs. LUAD and NOS vs. LUSC). In contrast, there was no similar NOS-like probability density function peak in TCGA cohort (right panel in Figure 4B), in which the NOS cases were excluded. The profiling results suggested that the NOS cases at WFBH cohort might be molecularly distinct from typical LUAD and LUSC subtypes.

Clinical Relevance
Our work proved the concept that a Comprehensive, Computable NanoString Diagnostic gene panel (C2Dx) is a promising tool to conserve diagnostic tissue specimens that require molecular pathological diagnosis when limited small volume tumor specimens are available. The C2Dx platform was developed from underlying classification models for molecular diagnosis, the NanoString gene panel to reliably generate targeted genomics data, and the clinical and experimental protocols for obtaining FNA specimens and generating NanoString genomics data. C2Dx platform demonstrates high potential in addressing the challenges in molecular diagnosis on FNA samples. The current advance of clinical practice raises a challenge to traditional pathology approaches: the gap between the dramatically reduced sample volume when less invasive approaches such as FNA are used and the fast-increasing needs of molecular diagnosis required by modern therapies such as immune therapies, targeted therapies, and, of course, chemotherapies. NanoString data generated from the clinical FNA samples can address such clinical need. Our approach provides a reliable clinical solution to this emerging clinical need. We chose to use the LUAD vs. LUSC subtyping to prove the concept of C2Dx. First, the task is representatively complex for typical molecular diagnosis tasks, and requires a large number of genes used in a multivariate model. It is both experimentally and mathematically more complex than the standard PD-1/PD-L1 test which only requires a single or few genes. It is representative for other molecular diagnostic needs by using metagenes and gene signatures. Second, this task is suitable for examining the performance of the C2Dx platform. It has been well-established clinically, pathologically, and molecularly. Finally, subtyping LUAD vs. LUSC cases is still an essential pathological task in clinical practice and impacts treatment decisions.
Beyond the use of the C2Dx platform for fresh FNA samples, the computable subtyping model can be applied for other clinical scenarios. Scenario 1: centralized C2Dx laboratory services, as many clinical sites rely on such external resources for genomics profiling tasks. Scenario 2: retrospective calibration and molecular diagnosis for frozen samples stored in tissue banks at low cost and low demands of sample volume. Our work demonstrated that reliable molecular diagnoses can be achieved for frozen samples and thus both scenarios can be archived.
Finally, our subtyping model allows linking the emerging clinical FNA samples to previous patient cohorts for integrative analysis. For example, samples collected at WFBH are comparable to TCGA samples using the subtyping probability

Limitations and Future Directions
The main goal of this work is to use a well-established molecular subtyping task (distinguishing LUSC from LUAD cases among NSCLC patients) to prove the concept of a crucial step toward a comprehensive, computable gene panel for molecular diagnosisrobust molecular modeling for bench--to-bedside knowledge translation. To be implemented in real-world clinical settings, the model needs to be expanded to distinguish small-cell lung cancer cases. Another major future direction will be the incorporation of biomarkers representing the pathological and prognostic knowledge from different domains including genomic abnormalities such as EGFR and BRAF mutations, molecular subtypes such as LUSC and LUAD, immunotherapy markers such as PD-1/PD-L1 expression, and other transcriptomic and genomic signatures for precision oncology.  Samples classified as LUAD (p ≥ 0.5) or LUSC (p < 0.5). Signature genes used in the molecular subtyping model were listed according to the absolute values of their logistic regression coefficients. The categories of the genes were color coded with respect to the associated subtypes (vermilion for LUAD and bluish green for LUSC). (C) The receiver operating characteristic curves (ROCs) of the model performance on the WFBH FNA, the WFBH tissue bank, and the TCGA cohorts. The corresponding c-statistics (AUC, area under the ROC curve) were 0.986, 0.911, and 0.982, respectively. When the probability threshold is set at q = 0.5, model accuracies are 0.931, 0.881, and 0.945 for the WFBH FNA, the WFBH tissue bank, and the TCGA cohorts, respectively (marked as open circles). The optimal accuracies are reached when optimal q levels are used, which are: 0.958 at q = 0.60 for the WFBH FNA cohort, 0.905 at q = 0.59 for the WFBH tissue bank cohort, and 0.946 at q = 0.68 for the TCGA cohort (solid gray circles).

CONCLUSIONS
Our work using FNA-based nanoString gene panel for robust molecular subtyping of NSCLC patients dramatically can reduce the demand of diagnostic samples for subtyping. It can avoid the complications and risks of re-biopsy when specimens are limited. It allows for more tissues to support molecular diagnoses for emerging targeted and immune therapies, and release pathologists from subtyping burden. It may also be used in the research setting, when utilizing small volume "left over" sampling. This translational modeling work is a crucial step toward comprehensive, computable molecular diagnosis devices and potentially revolutionizes current NSCLC diagnosis procedures for precision oncology

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author. The 19-gene molecular subtyping model was also available as an R package at http:// github.com/Su-informatics-lab/FNASubtype, with sample R code demonstrated in the supplements.

ETHICS STATEMENT
The study was approved by the Institutional Review Board of the Wake Forest Baptist Medical Center. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
JS, LM, and JR prepared the manuscript. LM and JR designed the experiments. JS and LH analyzed the data. RB developed the github distribution of the tool. GP, JC, CB, TD, LC, IP, JH, HC, and JP contributed to sample collection and data generation. JS, LH, BPr, MC, LM, and JR participated the development of the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
The work is partially supported by the Cancer Center Support Grant from the National Cancer Institute to the Comprehensive Cancer Center of Wake Forest Baptist Medical Center (P30 CA012197). The work is also partially supported by the Indiana University Precision Health Initiative (funded to JS). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.