Mutant Proteomics of Lung Adenocarcinomas Harboring Different EGFR Mutations

Epidermal growth factor receptor EGFR major driver mutations may affect downstream molecular networks and pathways, which would influence treatment outcomes of non-small cell lung cancer (NSCLC). This study aimed to unveil profiles of mutant proteins expressed in lung adenocarcinomas of 36 patients harboring representative driver EGFR mutations (Ex19del, nine; L858R, nine; no Ex19del/L858R, 18). Surprisingly, the orthogonal partial least squares discriminant analysis performed for identified mutant proteins demonstrated the profound differences in distance among the different EGFR mutation groups, suggesting that cancer cells harboring L858R or Ex19del emerge from cellular origins different from L858R/Ex19del-negative cells. Weighted gene coexpression network analysis, together with over-representative analysis, identified 18 coexpressed modules and their eigen proteins. Pathways enriched differentially for both the L858R and Ex19del mutations included carboxylic acid metabolic process, cell cycle, developmental biology, cellular responses to stress, mitotic prophase, cell proliferation, growth, epithelial to mesenchymal transition (EMT), and immune system. The IPA causal network analysis identified the highly activated networks of PARPBP, HOXA1, and APH1 under the L858R mutation, whereas those of ASGR1, APEX1, BUB1, and MAPK10 were highly activated under the Ex19del mutation. Interestingly, the downregulated causal network of osimertinib intervention showed the highest significance in overlap p-value among most causal networks predicted under the L858R mutation. We also identified the causal network of MAPK interacting serine/threonine kinase 1/2 (MNK1/2) highly activated differentially under the L858R mutation. Tumor-suppressor AMOT, a component of the Hippo pathways, was highly inhibited commonly under both L858R and Ex19del mutations. Our results could identify disease-related protein molecular networks from the landscape of single amino acid variants. Our findings may help identify potential therapeutic targets and develop therapeutic strategies to improve patient outcomes.

HIGHLIGHTS -The first study to perform mutant proteomic analysis of clinical tissue specimens obtained from patients of lung adenocarcinoma with EGFR oncogenic driver mutations.
-Surprisingly, the OPLS discriminant analysis revealed profound differences among the profiles of mutant proteins identified under the different EGFR mutation statuses, which were never seen before. -Weighted gene coexpression network analysis (WGCNA) screened by the over-representative test identified 18 significant network modules under the respective EGFR mutation statuses. -Interestingly, the downregulated causal network of osimertinib intervention and highly activated MNK1/2 were associated with L858R-positive lung adenocarcinoma. Upstream regulators and causal networks predicted suggested a close link to EGFR mutation-positive cancers, mainly NSCLC.

INTRODUCTION
The discovery of somatic mutations in the tyrosine kinase domain of the epidermal growth factor receptor (EGFR) (1,2) drastically changed the therapeutic perspective of non-smallcell lung cancer (NSCLC). The representative EGFR oncogenic mutations are in-frame deletions in exon 19 (Ex19del) (44.8%) and a point mutation at Leu-858 substituted with arginine (L858R) (39.8%) (3). Personalized and/or precision medicine (PM) have been successful by targeting those mutations with tyrosine kinase inhibitors (TKIs) gefitinib, erlotinib, and afatinib. Because most patients, however, suffer from drug resistance after a year of treatment, therapeutic strategies have been challenged to improve the survival benefit of first-line treatment. The efficacy of the first-and second-generation EGFR-TKIs is limited by the result of drug resistance conferred by another mutation involving the substitution of threonine 790 with methionine (T790M) (4).
Osimertinib is an irreversible third-generation EGFR-TKI that is selective for sensitizing EGFR and T790M mutations. The randomized phase III AURA3 trial demonstrated that the efficacy of osimertinib was significantly greater than that of platinum therapy plus pemetrexed in patients with T790M-positive advanced NSCLC (5). Recently, osimertinib was recommended as first-line treatment for patients with EGFR-mutant NSCLC according to the FLAURA trial that reported significantly better PFS and OS with osimertinib than with first-generation EGFR-TKIs (gefitinib or erlotinib) (6,7).
Today, the field of proteomics is strongly dominated by mass spectrometry (MS)-based methodologies, largely due to that modern mass spectrometers offer high mass resolution and accuracy required for correct protein identification. The most successful approach is "shotgun" proteomics that employs proteases (often trypsin) to enzymatically cleave proteins resulting in peptides, which are more convenient to separate and sequence (21). The most reliable protein identification strategy in shotgun proteomics is based on tandem (MS/MS) mass spectra generation of tryptic peptides by fragmentation and their consecutive search against databases of canonical/consensus sequences. MS-based proteomics has been extensively applied to investigate EGFR regulations, including phosphorylation, ubiquitination, and protein-protein interactions as well as post-translational modifications (22,23). Zhang et al. performed quantitative phosphoproteomics to unveil global phosphorylation changes upon the erlotinib treatment of EGFR mutation-positive lung adenocarcinoma cells (24,25).
Unfortunately, mutant proteins, those products of nonsynonymous single nucleotide polymorphisms (nsSNP), are overlooked in general MS-based proteomic data because these proteoforms are excluded in canonical protein databases (26). Although, the high number of nsSNPs, estimated to be >3 million, suggests that single amino acid variants (SAAVs) are widely distributed in the human proteome (27), only a couple of mutant proteins have been detected at expression level in human samples (28). Cancerous diseases are often characterized by high mutation rates (29) that are tightly associated with the physiological and pathological traits of individuals (30), whereas the allele-specific gene expressions in the heterozygous state are also associated with various traits of individuals (31,32).
Because many of these mutant proteins are exclusively expressed in cancer cells (33), they can be identified as lead candidates of optimal disease biomarkers. The qualitative and quantitative analyses of these proteoforms, thus, can provide novel diagnostic and prognostic values.
A laser microdissection (LMD) technique enables the collection of target cells of a certain type from sections of formalin-fixed paraffin-embedded (FFPE) cancer tissue (34,35). Label-free spectral counting and identification-based semiquantitative shotgun proteomic analysis of microdissected target cancerous cells of a certain type were used that characterized lung adenocarcinoma (35).
A pivotal challenge is to understand how the major driver mutations-EGFR L858R and Ex19del-affect disease-related downstream networks together with other upstream driver mutation crosstalk, which plays a central role in the context of lung cancer progression, malignancy, and outcome and/or resistance regarding TKI therapies (28). We performed mutant proteomic analysis and applied the weighted gene correlation network analysis (WGCNA), which is an unsupervised geneclustering method based on the correlation network of gene expression (36)(37)(38) as well as spectral counting-based comparative analysis. The main aim of this study was to identify the key modules and networks of mutant proteins associated with the EGFR mutations L858R and Ex19del. To our knowledge, this is the first proteomics study performed to identify mutant proteoforms expressed in clinical tissue specimens.

FFPE Tissue Specimens and Sample Preparation
Among 974 patients who underwent surgical lung cancer resection at St. Marianna University Hospital between 2000 and 2014, only 674 (69.3%) had tumors that were histologically confirmed adenocarcinomas. Pathological specimens were reviewed by pathologists to confirm that they satisfied the 2015 WHO classification of lung tumors (histological criteria) (39). For tissue microdissection, 10-µm-thick sections from the FFPE tumor blocks were cut onto DIRECTOR slides (OncoPlex Diagnostics Inc., Rockville, MD, USA). The sections were deparaffinized and stained only with hematoxylin using standard histological methods prior to dissection. Microdissection was performed using a Leica LMD7 Microdissection Microscope (Leica, Wetzlar, Germany). A total area of 4 mm 2 with about 15,000 tumor cells was transferred from the FFPE sections via laser dissection directly into the cap of a 200-µL low-binding tube. Proteins were extracted and digested with trypsin using Liquid Tissue MS Protein Prep kits (OncoPlex Diagnostics, Inc.). The procedures have been described in detail elsewhere (34,35,38).

Liquid Chromatography-Tandem Mass Spectrometry
Digested protein samples were used for liquid chromatographytandem mass spectrometry (LC-MS/MS) analysis on a Q-Exactive Orbitrap mass spectrometer (Thermo-Fisher Scientific, Bremen, Germany) equipped with an LC system operated at 500 nL/min via a nano-ESI device (AMR Inc., Tokyo, Japan). The gradient was 110 min long and a 5-µL sample was injected in each analysis.
All LC-MS/MS data were acquired using Xcalibur, version 2.8 SP1 (Thermo Fisher Scientific) in high-resolution data-driven analysis (DDA) mode with the survey scan (MS in the mass range m/z 400-1,600) acquired in the Orbitrap at 70,000 resolution (at m/z 200) in profile mode. The survey scan was followed by the top 10 higher energy collision-induced dissociation (HCD) MS/MS spectra, acquired in centroid mode in the Orbitrap at 17,500 resolution.
For MS/MS acquisition of top 10 precursors, the following settings were used: minimal signal threshold = 1,700; isolation width = 1.6 m/z; normalized collision energy = 27%. Monoisotopic precursor selection, charge-state screening, and charge-state rejection were enabled with rejection of singly charged and unassigned charge states. Dynamic exclusion was enabled to remove selected precursor ions (±10 ppm) for 15 s after MS/MS acquisition. The expression levels of identified mutant proteins were assessed by spectral count-based protein quantification. Fold changes of expressed proteins in the base 2 logarithmic scale (R SC ) (40) were calculated using the spectral count (SpC) that is the number of MS/MS spectra assigned to each mutant protein.

Identification of Mutant Proteoforms
A strategy to identify mutant proteoforms in lung cancer samples was designed using high-quality shotgun proteomics tandem mass spectra. The central component of the approach was a unique set of protein sequences, which included SAAV sequences translated from known genomic studies. Using a custom-made software tool (FastaWriter v1.4.0), a new database of mutant protein sequences (ProteoFinder v17.04.12) was generated to include a mutation in each new entry that, thus, differed in amino acid from the consensus protein. Titin (Q8WZ42) was removed from the database to decrease the size of the database as 21,045 mutations were registered only on this protein.
The resulting in silico derived proteoforms (total number of searchable mutations of 1,899,031) were denoted following the neXtProt nomenclature, including the access codes but adding information also about the position of the mutation (such as NX_P07288-S132L). These SAAV sequences were then shortened to reduce redundancy, keeping only the part of the protein sequence where the amino acid exchange took place surrounded by two additional tryptic peptides at both N-and C-termini. The new database entries were rendered as a combination of consensus (neXtProt database 2017-04-12 release) and mutant proteoform sequences in standardized fasta format. Figure 1 illustrates a general workflow of identification of SAAVs by tandem mass spectra searching a specialized protein database, ProteFinder (PFdb), and MS-based sequencing of a mutant peptide is exemplified.
The MS raw files of 108 runs (36 patient samples as triplicate) were imported into PEAKS Studio v8.5 (Bioinformatics Solutions Inc., Waterloo, Canada) (41) for database searching against the PF v17.04.12 database, appended with contaminant sequences (cRAP). PEAKS database searches were performed with a precursor ion error tolerance of 10 ppm, fragment ion error tolerance of 0.05 Da, fixed carbamidomethyl cysteine, and modifications of oxidation (M), deamidation (NQ), and acetylation (N-term) were set dynamically. Trypsin was specified as the enzyme, allowing for two missed cleavages. The technical triplicates were searched together, generating a single combined result file of each biological sample.
The search results were further filtered for hits with mutant specific tryptic peptides removing all multiple protein identifications while multi-isoform hits with the same amino acid change were included in the final list. Non-tryptic peptides with the mutation were not considered as reliable identification and were excluded in the additional filtering steps. Isobaric amino acid mutations, i.e., exchange of Ile to Leu and Leu to Ile, were registered for future experimental verification and kept as potentially valid identifications. The summary of each search, including score distributions and statistical data, which are available as PDF files (e.g., AZ0x_summary.pdf) in Search summaries in Supplementary Information File 1.
The technical triplicates together have resulted in rich data with an MS/MS spectra range of 44,798-143,118, providing

WGCNA
The similarity among protein expression patterns for all protein pairs was calculated according to their pairwise Pearson's correlation coefficient, i.e., the similarity between proteins i and j was defined as (1r i,j )/2, where r i,j is the correlation of the protein expression patterns between the two proteins i and j. An adjacency matrix was then computed by increasing the similarity matrix up to the power of 10 to generate a coexpression network with scale-free properties. Subsequently, from the resultant scale-free coexpression network, we generated a topological overlap matrix (TOM) that considers topological similarities between a pair of proteins in the network. Using the dissimilarity according to TOM (1 -TOM), we conducted hierarchical clustering to generate a tree that clustered proteins in its branches. Dynamic tree cutting was used to trim the branches to identify protein modules. A protein module was summarized by the top hub protein (referred to as eigen-protein) with the highest connectivity in the module. To identify the protein modules associated with clinical traits, we calculated the correlation coefficients between the eigen-proteins and clinical traits. WGCNA was conducted using a Garuda Platform gadget (The Systems Biology Institute, Tokyo, Japan) that implemented the WGCNA pipeline based on the WGCNA R-package (36).

Protein-Protein Interaction Network Construction and Functional Enrichment
To construct a protein interaction network for a protein module, we used the STRING database (version 11.0) (42), which accumulates information on protein-protein interactions from various other databases, such as IntAct, Reactome, DIP, BioGRID, MINT, KEGG, NCI/Nature PID, the Interactive Fly, and BioCyc. STRING networks were constructed under the criteria for linkage only with experiments, databases, text mining, and coexpression using the default settings, i.e., a medium confidence score of 0.400, a network depth of 0 or 50 interactions. Subsequently, protein networks imported from the STRING database were visualized using Cytoscape version 3.7.2. Functional enrichment results were obtained for canonical pathways considering p < 0.05 to be statistically significant.

Comparative Analysis of the Causal Networks and Pathways Predicted by IPA
Canonical pathways, upstream regulators, and causal networks were predicted using the ingenuity pathway analysis (IPA) software (43). Mutant protein expression data (quantilenormalized for selected modules) were used as input data sets. Comparative analysis of the predicted causal networks (p-value < 0.05) was performed to elucidate networks associated with the three clinical traits: Ex19del, L858R, and no Ex19del/L858R mutations, where activation and inhibition of a predicted network were defined by z-values >2.0 and < −2.0, respectively, and upregulation and downregulation were defined by z-values >1.0 and < −1.0, respectively.

Mutant Proteome Data Sets of Lung Adenocarcinoma
MS-based proteomic analysis was conducted for 36 FFPE tissue specimens of lung adenocarcinoma (35 involved the acinar subtype and one involved the papillary subtype). These specimens were selected for their preserved condition, tumor area, and well-clarified pathological diagnosis and EGFR mutation status (nine specimens of the clinical trait M1: L858R mutation, nine specimens of the clinical trait M2: Ex19del mutation, and 18 specimens of the clinical trait NM: no Ex19del or L858R mutation; see Table 1). Pre-surgical treatment was not performed in any of the cases.
A total of 1,100 mutant proteins were identified, in which M1, M2, and NM were 678, 612, and 837, respectively, and 405 (34.1%) were expressed commonly (Figure 2A). The proportion of mutant proteins unique to the L858R mutation was 121 (11.0%), and that to the Ex19del mutation was 84 (7.8%), whereas the proportion of proteins expressed in only no EGFR mutation cases was 273 (24.8%). GO analysis using PANTHER Ver. 14.1 (44) exhibited mostly similar profiles in gene hits for all the traits (M1, L858R mutation; M2, Ex19Del mutation; NM, no Ex19del or L858R mutation; see Figure S1). Mutation proteins with high hits in GO biological process included cellular process (GO:0009987), localization (GO:0051179), cellular component organization or biogenesis (GO:0071840), biological regulation (GO:0065007), metabolic process (GO:0008152), and response to stimulus (GO:0050896). Those in the GO protein class included cytoskeletal protein (PC00085), nucleic acid-binding protein (PC00171), and metabolite interconversion enzyme (PC00262). An orthogonal partial least square-discriminant analysis (OPLS-DA) (45) was applied to identified mutant proteins and interestingly exhibited profound differences in distance among the EGFR mutation statuses (Figure 2B), whereas a conventional hierarchical clustering of patients according to mutant protein abundance failed to show a clear separation among the three clinical traits. Surprisingly, clear differentiation was found between the NM group and the M1 and M2 groups. The data points of the M1 group appeared to be to some extent scattered, whereas those of the M2 group clustered closely. These findings seem to unveil the mutant proteome landscape correlating with the EGFR mutation type in lung adenocarcinoma.

Identification of Key Mutant Protein Modules by WGCNA
A weighted gene coexpression network was constructed in which all the identified mutant proteins were clustered, and we found 23 mutant protein modules (Figures 3A,B). A spectral counting-based heat map (46) for eigen-proteins in the modules is shown in Figure 3C. In the WGCNA, a soft threshold power of 15 was selected to define the adjacency matrix according to the criteria of approximate scale-free topology with a minimum module size of 30 and a module detection sensitivity deepSplit of 4. The clinical traits for patients were set according to the EGFR mutation status with M1, M2, and NM traits corresponding to L858R mutation, Ex19del mutation, and neither Ex19del/L858R mutation, respectively. The correlations between resultant modules and clinical traits were determined to identify mutant protein modules whose expressions were upor downregulated in L858del, Ex19del, or no Ex19del/L858R mutation samples (Figure S2).
Among the 23 modules, only the WM6 module was moderately correlated with the EGFR Ex19del mutation status (r = 0.41, p < 0.05). Most of the other WGCNA modules were not statistically significant. However, several modules seem to be characteristic to the clinical traits ( Figure S2). The WM10, WM12, WM14, and WM22 modules seem to be characteristic to the L858R mutation status (r = 0.3, p = 0.08). The WM17 module showed a positive correlation with the Ex19del mutation status (r = 0.3, p = 0.08). We could find no modules characteristic to the NM trait (no L858R or Ex19del mutations).

WGCNA Modules Screened by ORA and Functional Enrichment Analysis
The computational WGCNA framework (36) has been proven to be powerful in identifying coexpression protein modules (37,38). However, it should be noted that traditional trait analysis of the correlations between eigen components of WGCNA modules and clinical traits might overlook important modules for investigating molecular mechanisms differentially behind a disease. Especially for clinical traits quite close to each other, difficulties would be sometimes encountered to attain identification of key WGCNA modules with a high significance. Multiple correction testing, such as Bonferroni, Benjamini-Hochberg, etc., would result in that none of the modules associated with M1 or M2 remains significant. Statistical over-representative analysis (ORA) would help to evaluate potential key WGCNA modules with identified proteins uniquely expressed and upregulated to each trait.
We conducted an ORA-based screening of WGCNA mutant protein modules to further identify key protein modules to The red and pink cells in the "L858R mutation" and "Ex19del mutation" columns indicate that the proteins in the group are uniquely expressed and significantly upregulated, respectively, in samples with the mutations [Upregulated with |Rsc| > 1 (M1 > M2 or M1 < M2)]. The fourth column shows the number of proteins in each protein group. The fifth column provides notes for each protein group. The WGCNA modules with significant overlap with each protein group are listed in the sixth column ("Modules" column). (B) Overlap in proteins between the groups by the protein expression profiles and the modules by WGCNA. Each row in the embedded table represents overlap analysis results for each module. The first and second columns in the table represent module ID and color name of the module. The third through eighth columns indicate the q-values for overlap in proteins between a module by WGCNA and the five protein groups. In the six columns, significant q-values are highlighted in red. The eighth column represents the value of the most significant q-value (max q-value) in each module. The 18 modules with max q-values <0.05 are listed in order.
investigate the differential disease mechanisms associated with the EGFR L858R and Ex19del mutation statuses; 121, 84, and 273 mutant proteins identified were expressed uniquely to the respective traits: M1, M2, and NM ( Figure 3A); 132 and 142 mutant proteins were upregulated differentially to M1 and M2 with |R SC | >1 (higher than 2-fold change) in the comparison between M1 and M2 ( Figure S3). The overlaps between the WGCNA-derived protein modules and identification-based significantly expressed proteins were then assessed using the over-representation test. We confirmed that five WGCNA modules overlapped significantly (maximum q-value among the groups <0.05) with protein groups unique to each trait and/or highly upregulated to M1 or M2 (Figures 4A,B).
To characterize those five modules, we analyzed the biological connectivity among the proteins in each module by mapping the module proteins in the human protein-protein interaction (PPI) network and among the biological pathways by pathway enrichment analysis (Figures 5, 6).
Three WGCNA modules-WM10, WM14, and WM22significantly overlapped with protein groups uniquely identified and highly upregulated under the L858R mutation (Figure 4). The enriched pathways of the WM10 mutant protein module involved DNA duplex unwinding, canonical glycolysis, glucose catabolic process to pyruvate, DNA unwinding involved in DNA replication, COPI-dependent Golgi-to-ER retrograde traffic, the role of GTSE1 in G2/M progression after G2 checkpoint, and formation of tubulin folding intermediates by CCT/TriC ( Figure 5B). The hub protein alpha-enolase (also known as MBP-1) encoded by ENO1 is involved in the subnetworks related to the carboxylic acid metabolic process (as indicated by the pink dotted line 1 in Figure 5A) and is associated with glycolysis, growth control, and hypoxia tolerance. MBP-1 binds to the myc promoter and acts as a transcriptional repressor and so maybe a tumor suppressor. The cell cycle-related subnetwork is denoted by the pink dotted line 2 in Figure 5A. DHX9 encodes ATP-dependent RNA helicase A [also known as nuclear DNA helicase II (NDH II) or leukophysin (LKP)] participates in multiple processes of gene regulation, including transcription, translation, and DNA replication, and plays important roles at the maintenance of genomic stability. DHX9 has been reported to be overexpressed in various types of malignant tumors and might be a potential therapeutic target for the treatment of NSCLC (47).
The enriched pathways of the WM14 module include epithelial cell differentiation, tissue development, cell death, programmed cell death, developmental biology, and collagen degradation ( Figure 5). The hub protein KRT14 is associated with developmental biology, which subnetwork is indicated by the pink dotted line 3 in Figure 5A. The enriched pathways of the WM22 module involve the immune effector process, immune response, cytokine signaling in immune system, and cellular responses to stress (Figure 5B). The subnetworks related mostly to the immune system (the pink dotted line 4 in Figure 5A), in which sterile alpha motif and HD domain-containing protein 1 (SAMHD1), a deoxyribonucleoside triphosphate triphosphohydrolase is known to play roles in defense response to the virus and cellular response to DNA damage stimulus, and is dysregulated in breast and other cancers (48). Frequently mutated SAMHD1 found in colon cancers was suggested to be involved in tumorigenesis with defective mismatch repair (MMR) (49) and also act as a resistance factor for anticancer drugs (50). Two WGCNA modules-WM6 and WM17-significantly overlapped with protein groups uniquely identified and highly upregulated under the Ex19del mutation (Figure 4). The enriched pathways of the WM6 module involved DNA replication-dependent nucleosome assembly, chromatin silencing, double-strand break repair via non-homologous end joining, cellular responses to stress, DNA damage/telomere stress-induced senescence, and M phase (Figure 6B). The hub protein is the mutant H3.1t encoded by the mutant HIST3H3. Histone H3.1t protein (also known as H3t) itself is a core component of the nucleosome and plays a central role in transcription regulation, DNA repair, DNA replication, and chromosomal stability. The subnetworks related to both cellular responses to stress and mitotic prophase are indicated by the pink dotted line 5 in Figure 6A. Calcineurin-like phosphoesterase domain containing 1 (CPPED1, also known as CSTP1) blocks cell cycle progression and promoting cell apoptosis by dephosphorylating AKT family kinase (51). CDK5RAP3 itself encodes CDK5 regulatory subunit associated protein C53 (Cdk5rap3, also known as C53 and LZAP) that is a probable tumor suppressor involved in signaling pathways governing transcriptional regulation and cell cycle progression. Its specific mutant protein was reported to prevent apoptosisinduced cleavage of nuclear substrates, including nuclear shrinkage, chromatin condensation, and DNA fragmentation (52). The homeobox protein Nkx-2.1 [also known as thyroid transcription factor-1 (TTF-1)] has a role in lung development and surfactant homeostasis and is highly expressed in both small-cell lung carcinoma (SCLC) and lung adenocarcinoma (53,54). Based on a quantitative real-time RT-PCR study of the NSCLC cell lines, Zu et al. (55) concluded that TTF-1 may serve as a tumor suppressor because of its inverse correlation with Ki-67 proliferative activity and increase of cellular apoptosis.   Target molecules in the dataset are molecules in our dataset whose expression is potentially controlled by an upstream regulator.
The enriched pathways of the WM17 module involved neutrophil degranulation, immune response, and immune system ( Figure 6B). The hub protein is the RAS-related protein Rab-5C (also known as L1880 or RAB5L). Rab-5C itself is one of the three isoforms of Rab-5, which is a master regulator of the endocytic pathway. The subnetworks related mostly to the immune system process are indicated in the pink dotted line 6 in Figure 6A. Protein FAM83D (also known as spindle protein CHICA), a probable protooncogene, plays a role in cell proliferation, growth, migration, and epithelial to mesenchymal transition (EMT) (56). Elevated FAM83D expressions were reported in several cancers including metastatic lung adenocarcinomas (57). Recently, Shi et al. suggested its oncogenic activity by regulating cell cycle in lung adenocarcinoma (58).

Comparative Analysis of Causal Networks Predicted by IPA
The ORA-based screening of the WGCNA modules was performed to capture clinically important modules and their upstream regulators, which reflect the disease mechanisms affected differentially under the different driver EGFR mutations in lung adenocarcinoma. Both upstream regulators and causal networks using IPA (http://www.ingenuity.com) software (43) were performed especially for the two selected modules, WM 10 and WM6, which were significantly associated with the Ex19del and L858R mutation (Figures 5, 6). Causal networks predicted for these mutant protein modules included chemical drugs, transcriptional regulators, transmembrane receptors, growth factors, kinases, transporters, etc. Table 2 summarizes top causal networks significant to each module (|z-value| > 1.5) representative under the EGFR L858R or Ex19del mutation status in the order of higher overlap significance, p-value. Figure 7 presents the representative modules of master and participating regulators with the target mutant proteins differentially significant to the EGFR L858 or Ex19del mutation status.
Regarding the WM10 module associated with the M1 trait, the EGFR L858R mutation, PARPBP, HOXA1, and APH-1 were highly activated or upregulated under the EGFR L858R mutation, whereas AMOT was highly inhibited under both L858R and Ex19del mutations. PARPBP encodes poly (ADP-ribose) polymerase-1 (PARP-1) binding protein, which plays a central role in DNA repair and the maintenance of genomic stability, regulating DNA repair, and negatively double-strand break repair via homologous recombination. Xu et al. reported that PARPBP expression was enhanced in lung adenocarcinoma tissues and correlated with poor prognosis in lung adenocarcinoma patients (59) and also that its high expression was closely correlated with pathologic stages, suggesting its utility as an independent predictor in lung adenocarcinoma patients. HOXA1 encodes homeobox protein Hox-1F, a member of the Homeobox (HOX) transcription factor family. HOXA1 mRNA and protein expression levels were significantly upregulated in breast cancer, and its overexpression was associated with poor prognosis and tumor progression in breast cancer patients (60). Anterior FIGURE 7 | The representative modules of master and participating regulators with the target mutant proteins differentially significant under the EGFR L858 or Ex19del mutation status, which were obtained by using IPA software. The modules of (A) HOXA1, (C) CXCL14, and (D) EP300 were predicted to be highly activated and (B) AMOT highly inhibited in association with the L858R mutation status. The modules of (E) ASGR1 and (F) TYK2 were highly and/or differentially activated on the Ex19del mutation status. Node shapes indicate molecular types: triangle, kinase; square (dashed), growth factor; rectangle (horizontal), ligand-dependent nuclear receptor; rectangle (vertical), ion channel; diamond (vertical), enzyme; diamond (horizontal), peptidase; trapezoid, transporter; oval (horizontal), transcription regulator; oval (vertical), transmembrane receptor; double circle, complex; circle, other. Red or light red colors indicate highly or moderately increased expression of a mutant protein in the data set. Orange or light orange colors indicate the extent of confidence for predicted activation and the blue and light blue for predicted inhibition. Lines denote predicted relationships. A solid or dashed line indicates direct or indirect interaction, respectively. Orange indicates leading to activation; blue, leading to inhibition; yellow, findings inconsistent with the state of a downstream molecule; gray, an effect not predicted. pharynx defective 1 (APH1) is the group APH1A and APH1B, which are the members of the gamma-secretase complex, comprising presenilin (PSEN1 and PSEN2), anterior pharynx defective 1 (APH1), presenilin enhancer 2 (PEN2), and nicastrin. Gamma secretase substrates are known to include the four wellcharacterized mammalian Notch receptors (Notch1-4) and the five canonical transmembrane Notch ligands. Aberrant Notch activation drives development, tumorigenesis, and progression of lung cancer and is known to participate in resistance to anti-VEGF therapy (61). The inhibition of Notch activation by gamma-secretase inhibitors (GSIs) then could benefit NSCLC patients (62). Angiomotin (AMOT) and its related proteins, scaffold proteins, AMOT family proteins, were identified to have a strong interaction with the transcription factors Yes-associated protein (YAP) and TAZ (transcriptional coactivator with PDZbinding motif) by tandem affinity purification (TAP) and mass spectrometry (63). Scaffold proteins angiomotin negatively regulated the transcription factors YAP and TAZ by preventing their nuclear translocation, suggesting a tumor-suppressing role of AMOT family proteins as components of the Hippo pathway. However, Hong reported the controversial results that AMOT may promote nuclear translocation of YAP and act as a transcriptional cofactor of the YAP-TEAD complex to facilitate the proliferation of epithelial cells and cancer development (64). It has been pointed out that the functional roles of AMOTs in different cancer types are controversial, highly depending on cell context (65).
Interestingly, among all causal networks predicted from the WM10 module, the downregulation of osimertinib intervention showed the highest significance in overlap p-value, in which EGFR, ERBB2, ERBB3, ERBB4, BLK, and TNK and their downstream pathways were maintained. In this study, we used FFPE tissue specimens collected from lung adenocarcinoma patients who did not receive any EGFR tyrosine kinase inhibitors, such as osimertinib. It has been reported that L858R-positive patients of NSCLC had a poor prognosis and difference in therapeutic outcome compared to Ex19del-positive patients (6). Moreover, the comparative IPA analysis predicted the MNK1/2 causal network highly and differentially activated under the L858R mutation status (Table S1), which have been targeted by several chemical drug inhibitors for EGFR mutation-positive lung cancers. Those inhibitors, including dacomitinib, tomivosertib, BAY1143269, and ETC-1907206, have been developed for various types of EGFR mutation-positive cancers mainly including NSCLCs and various types of clinical trials are currently undergoing. Other top causal networks activated differentially under the EGFR L858R mutation included max-myc (complex), MYC, F8, STK11, and RAD21.
For the WM6 module associated with the M2 trait, the EGFR Ex19del mutation, ASGR1 and APEX1 were highly activated, and BUB1, MAPK10, and TGFB1 were upregulated. CEBPB was activated commonly under all the traits, whereas Cbp/p300 was activated under both Ex19del and L858R mutations. ASGR1 encodes a subunit of the asialoglycoprotein receptor (ASGR) expressed in the extracellular region and a complex of the receptor and binding ligand is internalized. ASGR has been suggested to promote cancer metastasis by activating the EGFR-ERK pathway through interactions with counter-receptors on cancer cells, responding to endogenous lectins in the tumor microenvironment (66). APEX1 (also known as APE1, APX, HAP1, and REF1) and encodes DNA-apurinic/apyrimidinic (AP) site endonuclease (protein names, such as APEN, APE-1, and REF-1), which plays a central role in the cellular response to oxidative stress, in which its two major activities are DNA repair and redox regulation of transcriptional factors. The elevated levels of APEX1 have been reported in several cancers, including lung cancer (67), and also to be associated with resistance to chemotherapy and radiotherapy in some cancers (68). MAPK10 encodes mitogen-activated protein kinase 10 (also known as stress-activated protein kinase JNK3), which is involved in a wide variety of cellular processes, including stress response, proliferation, differentiation, transcription regulation, and development. MAPK10 functions as a tumor suppressor and the deletion of this proapoptotic gene would favor the survival and proliferation of cancer cells (69). BUB1 encodes mitotic checkpoint serine/threonine-protein kinase BUB1 or budding uninhibited by benzimidazoles 1 (Bub1), which is required for chromosome alignment and resolution of spindle attachment errors but does not play a major role in the spindle assembly checkpoint (SAC) activity. Overexpression of Bub1 in breast cancer is associated with a poor clinical prognosis (70). Recent tumor xenograft studies suggested that the Bub1 kinase inhibitor BAY 1816032 in combination with taxanes or PARP inhibitors enhanced their efficacy and suppressed the development of therapy resistance (71). CEBPB encodes CCAAT/enhancerbinding protein beta (C/EBP beta), which is important in the regulation of genes involved in immune and inflammatory responses. C/EBP beta induces elevated IL-6 expression levels frequently observed in human lung adenocarcinomas (72) and interacts with peroxisome proliferator-activated receptor-gamma (PPARG) involved in pathways of transcriptional misregulation in cancer (73). The study using the inducible EGFR T790M-L858R transgenic mouse models suggested that C/EBP beta is dispensable for lung tumorigenesis in EGFR-driven murine lung cancer (73).

DISCUSSION
Outcomes of lung adenocarcinoma patients receiving EGFR TKIs were reported to be affected depending on the types of EGFR gatekeeper mutation (6,74), which are serious clinical challenges. Targeting disease-associated dual core networks rather than targeting a single protein (gene) as in conventional approaches is expected to greatly improve the outcomes of individual patients, such as efficacy and safety, in line with the concept of precision medicine. Such a concept, so-called network pharmacology, was first proposed by Hopkins (75), which aims to induce synthetic lethality by targeting dual hub molecules involved in different disease core networks. We have first conducted a mutant proteomic analysis for clinical tissue specimens of 36 lung adenocarcinoma patients who harbored distinct EGFR mutations, Ex21 L858R, Ex19del, and no L858R/Ex19del. Disease-related network modules are elucidated from mutant protein expression data sets, which would be potentially associated with the activation of downstream and/or upstream networks affected under distinct EGFR mutations. In particular, this study focuses on influence in disease-related networks of lung adenocarcinoma, which would take place under the L858R mutation. Our analytical workflow combining WGCNA with ORA-screening identified several mutant protein modules significantly overlapping with upregulated mutant proteins under the EGFR L858R mutation.
Our goal with the present study was to apply an unbiased bioinformatic method to characterize the mutant profiles of detectable SAAVs after filtering with stringent criteria of database identifications in pathologically well-described patient samples. Mass spectrometry-based proteomic data is widely recognized as an information-rich source of uniquely expressed proteoforms, but tandem mass spectra interpretation is dependent on fragmentation efficiency and identification strategies. Because the number of subjects was limited in each patient group, we presented quality control data in the Supplementary Material demonstrating the overall homogeneity of the mass spectrometric data due to low technical variability of sample preparation and data acquisition. Careful interpretation of the findings highlighted potential differences between phenotypes, which suggests that different oncogenic driver EGFR mutations would affect activation or inactivation of their downstream disease-related molecular networks, which are often associated with protein mutations.
Surprisingly, the OPLS-DA performed for identified mutant proteins demonstrated profound differences in distance among the different EGFR mutation groups, L858R, Ex19del, and no L858R/Ex19del, suggesting that cancer cells harboring L858R or Ex19del emerge from cellular origins differently from L858R/Ex19del-negative cells (Figure 2B). Aberrant cells would, thus, emerge as a subpopulation of tumor cells of genetic intratumor heterogeneity, which would rapidly grow and predominantly survive by disrupting the tumor environment. To confirm our observation, a further large-scale investigation with genomic alteration analysis by next-generation sequencing (NGS) is required.
The pathways of the carboxylic acid metabolic process, cell cycle, developmental biology, and immune system were centrally associated under the L858R mutation. The top IPA causal networks predicted for the representative mutant protein module-WM10 were associated with the regulation of DNA repair, cancer development, tumorigenesis, and maintenance of genomic stability as well as therapeutic resistance. Interestingly, the downregulation of osimertinib intervention showed the highest significance rank in overlap among all causal networks predicted from the WM10 module ( Table 2). This finding might suggest the potential usefulness of osimertinib to be revisited for the L858R-positive patients of lung adenocarcinoma. Both the causal networks of osimertinib intervention and MNK1/2 identified significantly and differentially, respectively, may evidence disease mechanisms associated with EGFR mutationpositive lung adenocarcinoma (Figure 8).
The pathways of cellular responses to stress, mitotic prophase, cell proliferation, growth, migration, epithelial to mesenchymal transition (EMT), and immune system process were mostly involved under the Ex19del mutation. The IPA causal networks elucidated for the representative mutant protein module, WM6, seem to be associated dominantly with the EGFR-ERK pathway. The pathways related to the Hippo pathway and tumorigenesis were commonly involved under both L858R and Ex19del mutations.
The limitations of this study are as follows: first, the number of patients examined is limited to be 36, which was attributed to collect the homogeneous tumor-derived samples with the best effort. Second, genomic alteration analysis was not conducted for the same samples.
In conclusion, we successfully applied WGCNA combined with ORA-based protein screening to clinical mutant proteomic data sets from 36 patients of lung adenocarcinoma. The proteomic discovery method detecting mutant proteoforms has revealed specific profiles distinguishing the phenotypically characterized patient groups. Our results could confirm the usefulness of mutant proteomics to identify activated or inactivated disease-related mutant protein networks affected under distinct EGFR mutations. Verification and quantitative analysis of these molecular features in an independent patient cohort are yet to be undertaken by either using targeted proteomics or RNAseq and combining the resulting data in a systems biology approach. Additionally, our findings may help in the development of therapeutic strategies to improve patient outcomes. Differences in mutant proteomes between L858R and Ex19del mutation cells help to demonstrate the difference in efficacy of various EGFR-TKIs. Further verifications with a greater number of patient samples and targeted analysis of mutant proteoforms throughout the cohorts are planned in follow-up studies to achieve a better understanding of the expression profiles of SAAVs in phenotypic groups and establish a relationship between the detected networks in connection to disease progression.

DATA AVAILABILITY STATEMENT
The unfiltered mass spectrometry data sets generated and analyzed in this study have been deposited in the ProteomeXchange (http://proteomecentral.proteomexchange. org) and jPOST with the data set identifiers PXD015862 and JPST000687, respectively.

AUTHOR CONTRIBUTIONS
TN and ÁV conceptualized this study, designed the bioinformatics methodology, wrote the main manuscript text, prepared Figures 1-7 and Supplementary Information Files, and wrote the first draft of the manuscript. ÁV performed the identification of mutant protein from the clinical proteome raw data sets. TN analyzed the mutant proteome data sets by using weighted gene coexpression network analysis (WGCNA), over representative analysis, and the ingenuity pathway analysis software. TN, HN, HK, and HS initiated and managed the collaboration. All authors reviewed the manuscript.

FUNDING
The authors declare that this study received funding from AstraZeneca K.K. on this research project under the Externally Sponsored Research ESR-16-12243. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fonc. 2020.01494/full#supplementary-material Supplementary Information File 1 | Search summaries, including score distributions and statistical data are presented using the PEAKS reporting function.