Cancer cell immunity-related protein co-expression networks are associated with early-stage solid-predominant lung adenocarcinoma

Background Solid-predominant lung adenocarcinoma (SPA), which is one of the high-risk subtypes with poor prognosis and unsatisfactory response to chemotherapy and targeted therapy in lung adenocarcinoma, remains molecular profile unclarified. Weighted correlation network analysis (WGCNA) was used for data mining, especially for studying biological networks based on pairwise correlations between variables. This study aimed to identify disease-related protein co-expression networks associated with early-stage SPA. Methods We assessed cancerous cells laser-microdissected from formalin-fixed paraffin-embedded (FFPE) tissues of a SPA group (n = 5), referencing a low-risk subtype, a lepidic predominant subtype group (LPA) (n = 4), and another high-risk subtype, micropapillary predominant subtype (MPA) group (n = 3) and performed mass spectrometry-based proteomic analysis. Disease-related co-expression networks associated with the SPA subtype were identified by WGCNA and their upstream regulators and causal networks were predicted by Ingenuity Pathway Analysis. Results Among the forty WGCNA network modules identified, two network modules were found to be associated significantly with the SPA subtype. Canonical enriched pathways were highly associated with cellular growth, proliferation, and immune response. Upregulated HLA class I molecules HLA-G and HLA-B implicated high mutation burden and T cell activation in the SPA subtype. Upstream analysis implicated the involvement of highly activated oncogenic regulators, MYC, MLXIPL, MYCN, the redox master regulator NFE2L2, and the highly inhibited LARP1, leading to oncogenic IRES-dependent translation, and also regulators of the adaptive immune response, including highly activated IFNG, TCRD, CD3-TCR, CD8A, CD8B, CD3, CD80/CD86, and highly inhibited LILRB2. Interestingly, the immune checkpoint molecule HLA-G, which is the counterpart of LILRB2, was highly expressed characteristically in the SPA subtype and might be associated with antitumor immunity. Conclusion Our findings provide a disease molecular profile based on protein co-expression networks identified for the high-risk solid predominant adenocarcinoma, which will help develop future therapeutic strategies.


Introduction
Lung cancer is the leading cause of death globally, of which lung adenocarcinoma is the most common pathological subtype.Invasive nonmucinous adenocarcinoma is primarily categorized into five histopathological subtypes: lepidic, acinar, papillary, micropapillary, and solid, based on the 2021 World Health Organization Classification of Lung Tumors (1).Micropapillaryand solid-predominant lung adenocarcinoma (MPA and SPA, respectively) are high-risk subtypes with high metastatic potential and the worst prognosis.In contrast, low-risk subtypes characterized by well-and moderately-differentiated morphologies, such as lepidic predominant adenocarcinoma (LPA) have a favorable prognosis (2,3).Lung adenocarcinomas usually contain complex mixtures of different subtypes.MPA exhibits a micropapillary pattern, which is the primary histological pattern assessed semi-quantitatively in 5% increments in resected specimens.It is associated with lymphatic invasion, pleural invasion, and lymph node metastases (4).
Caso et al. conducted a genomic characterization of prognostically important predominant histologic subtypes of lung adenocarcinoma and found that MPA and SPA exhibited a higher tumor mutational burden, increased chromosomal instability (CIN), higher APOBEC (the enzyme with DNA mutagenesis function) mutational signatures, more oncogenic pathway alterations, and the lowest frequency of targetable genomic alterations among the subtypes (5).Micropapillary or solid patterns are risk factors for predicting poor recurrence-free survival in early-stage IA lung adenocarcinoma (6).Recently, Jeon et al. compared the clinicopathological features and clinical course of patients with the MPA and SPA subtypes, including predominant and non-predominant subtypes after curative resection of stage I lung adenocarcinoma and analyzed the prognostic factors.The clinical results were different for stage I high-grade adenocarcinoma and the predominant micropapillary subtype was an independent prognostic factor for recurrence, whereas the solid subtype was a significant factor for overall survival (7).
A pivotal challenge is to unravel the underlying cancer biology of those high-risk lung adenocarcinomas and how tumor morphologic appearances are linked to malignant clinical outcomes.Solid predominant adenocarcinoma (SPA) shows a major component of polygonal tumor cells forming sheets that lack recognizable patterns of adenocarcinoma, such as acinar, papillary, micropapillary, or lepidic growth.SPA subtype and the presence of solid pattern are associated with numerous poor prognostic factors, including higher mitotic count, high risk of occult lymph node metastases, thyroid transcription factor-1 (TTF-1) negativity, and less frequent epidermal growth factor receptor (EGFR) mutations (8).However, molecular profiles characterizing the SPA subtype remain unclear.In this study, we focus on identifying disease-related co-expression protein networks and their upstream regulators associated with the SPA subtype.
Mass spectrometry (MS)-based proteomics has proven feasible in the identification and quantification of proteins expressed in clinical specimens.Quantitative proteome data can be used to identify key disease-related proteins and therapeutic targets (9).We have adopted label-free spectral counting-based semiquantitative MS-based proteomics, following the collection of target cancerous cells from formalin-fixed paraffin-embedded (FFPE) tumor specimens by laser microdissection (LMD).This study aimed at identifying the co-expression protein networks associated with the early-stage high-risk lung adenocarcinoma SPA, by comparing the early-stage low-risk subtype LPA and another high-risk subtype MPA.Upstream regulator and causal network analysis (10) were performed for data-driven protein coexpression networks significant to SPA, obtained by the weighted gene co-expression network analysis (WGCNA), an unsupervised clustering method based on the correlation network expression (11), applied to quantitative proteome datasets.

Materials and methods
reviewed by two pathologists (H.N. and S. N.) to confirm that they fulfilled the 2015 World Health Organization classification criteria for lung tumors (histological criteria) (12).FFPE tumor tissue blocks from 12 surgical specimens histologically confirmed as lung LPA, MPA, and SPA were obtained without patient identifiers from the St. Marianna University School of Medicine Hospital.Informed consent was obtained from all participating subjects.The protocol was approved by the Institutional Review Board of St. Marianna University School of Medicine (approval no.1461) and the study adhered to the Helsinki Declaration.
For tissue microdissection, 10-mm-thick sections from the FFPE tumor blocks were cut and placed on DIRECTOR slides (OncoPlex Diagnostics Inc., Rockville, MD, USA).The sections were deparaffinized and stained with hematoxylin using standard protocols before dissection.Microdissection was performed using a Leica LMD7 microdissection microscope (Leica, Wetzlar, Germany) (Figure 1B).A total area of 4 mm 2 with approximately 15,000 tumor cells was directly transferred from the FFPE sections via laser dissection into the cap of a 200 mL low-binding tube.Proteins were extracted and digested with trypsin using the Liquid Tissue ™ MS Protein Prep kit (OncoPlex Diagnostics Inc.) according to the manufacturer's instructions (13).Briefly, dried microdissected pellets were suspended in 20 mL of Liquid Tissue buffer and heated at 95°C for 90 min and then cooled on ice for 3 min before the addition of 0.1 mg of trypsin.The tubes were then incubated at 37°C overnight.Dithiothreitol was added to a final concentration of 10 mM and the samples were heated for 5 min at 95°C.The digested samples were dried, resuspended in 50 mL of a 2% acetonitrile aqueous solution containing 0.1% trifluoroacetic acid, and stored at −20°C until analysis.

Proteomic analysis by LC-MS/MS
A label-free quantitation approach using spectral counting by LC-MS/MS was adopted for global proteomic analysis.The digested samples (5 mL for a single run) were analyzed in triplicate by LC-MS/MS using a reverse-phase LC system interfaced with a Q Exactive Orbitrap mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) via a nano-electrospray ionization device (AMR Inc., Tokyo, Japan).The mass spectrometer was operated in datadependent acquisition mode.Full-scan MS spectra were acquired in the range of m/z 350-1600 at a resolution of 70,000.The top ten most intense peaks from the survey scan were selected for fragmentation with higher-energy collisional dissociation with a normalized collision energy of 27% and isolation window of m/z 1.6.The dynamic exclusion time for precursor ions selected for MS/ MS fragmentation was 15 s, and the automatic gain control target values for MS and MS/MS were 1 × 10 6 and 1 × 10 5 , respectively.

Japan
).An auto-sampler (HTC-PAL, CTC Analytics, Zwingen, Switzerland) was used to load the samples into the trap, which was then washed with solvent A (2% acetonitrile aqueous solution containing 0.1% formic acid) to concentrate and desalt the peptides in the trap.Subsequently, the trap was connected in series to the separation column, and the peptides were eluted from the whole column with 0.1% formic acid aqueous solution and acetonitrile using a linear 5%-40% acetonitrile concentration gradient over 90 min at a flow-rate of 500 nL min −1 .

Protein identification
The raw data were processed using PatternLab for Proteomics software v4.0.Peptide sequence matching (14) was done using the Comet algorithm against the UniProt Homo sapiens database.A target-reverse strategy was employed for increased confidence in protein identification.This search considered tryptic peptide candidates as well as the formylation of lysine and oxidation of methionine as variable modifications.The Comet search engine considered a precursor mass tolerance of 40 ppm and a fragment bin tolerance of 0.02 Da.The validity of the peptide spectrum matches was assessed using the Search Engine Processor (SEPro) module of PatternLab.The acceptable FDR for spectra, peptide, and protein were 3%, 2%, and 1%, respectively.The expressions of the identified proteins were assessed using spectral count-based protein quantification.The spectral count represents the number of MS/MS spectra assigned to each protein.

Weighted correlation network analysis
The similarity in protein expression patterns for all protein pairs was calculated according to their pairwise Pearson's correlation coefficient [i.e., the similarity between proteins i and j was defined as (1-r i,j )/2, where r i,j is the Pearson's correlation coefficient of the protein expression pattern between the two proteins].We performed a network topology analysis for various soft-thresholding powers ranging from 1 to 2 to choose an optimal value of balance between independence and mean connectivity.A topological overlap matrix (TOM) that considers topological similarities between a pair of proteins in the network was then generated from the resultant scale-free co-expression network.We generated a tree by hierarchical clustering using dissimilarity according to TOM (1−TOM), and protein modules were determined using dynamic tree-cutting to trim the branches (11).
The modules were summarized by the first principal component, which is referred to as eigen proteins in the text, as they express the highest connectivity in the module.Module membership, defined as the correlation between the protein expression profile and the module eigen-protein, was measured with values ranging from 0 to 1, with "0" representing a gene that is not part of the module and "1" representing high connectivity with the module.Subsequently, the module-trait association was determined using the correlation between the module eigenprotein and the three subtypes: MPA, SPA, and LPA.A protein module was summarized by the top hub protein (referred to as "eigen-protein") with the highest connectivity in the module.The WGCNA analysis was performed using the WGCNA R-package (11) implemented in RStudio.

Protein-PPI network construction
We used the STRING database (version 11.5) (https://stringdb.org/) to construct a PPI network for a protein module (15).STRING networks were calculated under the criteria for linkage with experiments, databases, text mining, and co-expression using the default settings (medium confidence score: 0.400; network depth: 0 interactions).Functional enrichment results were obtained for canonical pathways with a p-value <0.05.Proteins in a module were mapped in the PPI network from the STRING database to produce the results of the enrichment analysis regarding the biological process (GO) and Reactome pathways (HAS).Protein networks were subsequently exported to Cytoscape (version 3.9.1)(Institute for Systems Biology, Seattle, WA, USA: https:// cytoscape.org/)(16) from the STRING database.The hub proteins in each module were identified according to their intramodular connectivity and their correlation with module eigen-proteins.The proteins inside the co-expression modules exhibit high connectivity and the proteins within the same module may play similar roles.The top 10 high-degree proteins were identified using the cytoHubba plugin (17).The top-ranked proteins in each module were considered hub proteins and designated "highly connected proteins."Functional enrichment results were obtained for canonical pathways by considering a network bias-corrected p-value of <0.05 for statistical significance.
The multivariate correlation analysis (MVA) of semiquantitative key protein expressions was performed using the JMP software (SAS Institute, Cary, NC, USA), and which result was visualized using the Intervene Shiny App (https://intervene.shinyapps.io/intervene/)(18).

Upstream regulator and causal network analysis by IPA
Upstream regulators, causal networks, and canonical pathways were predicted using IPA software (http://www.ingenuity.com)(10).Quantile-normalized protein expression data of the selected modules were used as input datasets.Both the upstream regulators and causal networks (p < 0.05) predicted from the WGCNA network modules were significantly associated with the three subtypes (LPA, MPA, and SPA), in which the activation and the inhibition of a predicted network were defined by z-values that were >2.0 and <−2.0, respectively.The upregulation was defined by zvalues > 1.5 and < 2.0, whereas downregulation was defined by zvalues > −2.0 and < −1.5.

Proteome datasets
At the St. Marianna University School of Medicine Hospital (Kawasaki, Kanagawa, Japan), 1,293 lung cancer patients were pathologically confirmed and underwent surgery between January 2008 to April 2022.These cases consisted of the following subtypes: LPA (14%), MPA (1%), SPA (4%), and others (81%) (Figure 1Aa).The SPA cases exhibited worse overall survival (OS) compared with the LPA cases (log-rank test: p = 3.06 × 10 −5 ) (Figure 1Ab), whereas the MPA cases were not statistically significant because of the limited number of samples.
Overall, 2,112, 1,753, and 1,361 proteins were identified from LPA, MPA, and SPA, respectively.A gene ontology (GO) analysis was performed using the Protein Analysis Through Evolutionary Relationships (PANTHER, version 17.0) program (Paul D. Thomas, University of Southern California, Los Angeles, CA, USA) (19) and revealed similar results among the three subtypes (Supplementary Figure S1).Volcano plots were generated from the protein expression data obtained by SimpliFi ™ software (PROTIFI, Farmingdale, NY, USA; https://simplifi.protifi.com).Upregulated proteins significant to SPA included large ribosomal subunit protein uL29 (RPL35), human leukocyte antigen G (HLA-G), GTP-binding nuclear protein Ran (RAN), human leukocyte antigen B (HLA-B), and putative heat shock protein HSP90-beta-3 (HSP90AB3P).The expression of HLA-G protein was significantly associated with and highly upregulated in the SPA subtype (Figure 2A), which was confirmed by its ANOVA test with a high significance to the SPA subtype with p-ANOVA = 0.0026 (Figure 2C).

Identification of protein co-expression networks by WGCNA
Weighted correlation network analysis, also known as weighted gene co-expression network analysis (WGCNA) incorporates traditional data exploration techniques, but its intuitive network language and analysis framework go beyond standard analysis techniques.It uses a network approach and is suitable for the integration of complementary genomic/proteomic datasets so that it can be interpreted as a data analysis technique for systems biology (11).
Following hierarchical clustering of the samples based on protein abundance (Figure 3A), a WGCNA analysis (11) was performed with a soft threshold power of 10, which was selected to approximate a scale-free topology, a minimum module size of 10, and a module detection sensitivity (deepSplit) of 4. Correlations between the resultant modules and the traits were obtained to identify protein modules that were significantly associated with the respective traits.Forty protein modules were identified by clustering all of the proteins and constructing weighted protein co-expression networks.The protein cluster dendrogram is presented in Figure 3B.A heatmap of eigen-protein expression for pairwise correlations between the modules in the connectivity measure (kME) of the module eigen-protein are presented in Figure 3C, together with lists of module IDs, module colors, and their eigenproteins representing protein expressions in a module.
Several other WGCNA modules were also correlated (r > 0.5) with one of the three traits.WM8 (dark-magenta: r = 0.76), WM19 (light-cyan: r = 0.61), and WM32 (violet: r = 0.61) correlated with the LPA subtype, WM11 (dark-red: r = 0.72) and WM18 (orange: r = 0.59) with the MPA subtype, and WM2 (turquoise: r = 0.57) with the SPA subtype.However, none of these correlations were significant (q > 0.05).Trait correlation analysis often tends to overlook important modules.Statistical over-representative analysis (ORA) may help evaluate potential WGCNA modules concerning overlap with uniquely upregulated proteins for each trait.The numbers of proteins in each subtype more than twice as highly expressed compared with other subtypes were 407 in LPA, 321 in MPA, and 482 in SPA, respectively.Overlap of the WGCNA modules with that of the protein groups was assessed using ORA (Figure 4).In this study, we focused on the WGCNA modules, WM1 (pink) and WM2 (turquoise), significantly associated with the high-risk subtype, SPA (Figure 4).

Protein-protein interaction networks and functional enrichment
Using the Search Tool for the Retrieval of Interacting Genes/ Proteins (STRING) database version 11.5 (https://string-db.org/)(15), human PPI networks were obtained for the WGCNA modules.The PPI networks for the WM1 and WM2 modules associated with the SPA subtype were reconstructed using Cytoscape (version 3.9.1)software (Institute for Systems Biology, Seattle, WA, USA: https:// cytoscape.org/)(16) (Supplementary Figure S2).Top hub proteins were calculated using the cytoHubba plugin with maximal clique centrality (MCC) (17).In these data-driven protein co-expression networks, eigen-proteins are indicated in red letters, hub proteins in red to orange fill colors, and some key proteins are denoted with red borders.Top STRING enrichment results for the WGCNA modules, WM1 and WM2, significant to the SPA subtype are shown in Supplementary Table S1.
The functional enrichment obtained for the WM1 module included the following: (i) SRP-dependent co-translational protein targeting to membrane, translational initiation, nuclear-transcribed mRNA catabolic process, nonsense-mediated decay, and protein targeting to membrane as biological processes (GO); and (ii) L13amediated translational silencing of ceruloplasmin expression, GTP hydrolysis and joining of the 60S ribosomal subunit, eukaryotic translation termination, and NMD independent of the exon junction complex (EJC) (Supplementary Table S1).The eigenprotein RPS4X (small ribosomal subunit protein eS4, also known as SCR10) is a component of the 40S subunit and is involved in L13a-mediated translational silencing of ceruloplasmin expression, eukaryotic translation termination, and NMD independent of the exon junction complex (EJC).
The pathways enriched for the WM2 (turquoise) module included the following: (i) antigen processing and presentation of peptide antigen via MHC class I, NIK/NF-kB signaling, and regulation of transcription from RNA polymerase II promoter in response to hypoxia as biological processes (GO); (ii) regulation of expression of SLITs and ROBOs, cross-presentation of soluble exogenous antigens (endosomes), and NIK to noncanonical NF-kB signaling (Supplementary Table S1).The eigen-protein tropomyosin 4 (TPM4) is a member of the tropomyosin family of  actin-binding proteins, which is involved in stabilizing cytoskeleton actin filaments.

Semi-quantitative protein expression and multivariate correlation analysis
Forty eigen-proteins together with fifteen key proteins, which were representatively expressed throughout all 40 modules, were subjected to multivariate correlation analysis (MVA), assuming three-group clustering (Figure 2D).The eighteen eigen-proteins of the WGCNA modules relatively well correlated (r > 0.4) with respective subtypes are denoted in colored filled circles (LPA: red; MPA: blue; SPA: green).Those eighteen eigen-protein expressions were clustered into Cluster A, Cluster B, and Cluster C, which well corresponded to the SPA, LPA, and MPA subtypes, respectively.(Figure 2D).Upregulated proteins characteristic of the respective subtypes were also well clustered consistently: HLA-G and HLA-B in SPA; C3, RCC1, and HSPG2 in LPA; MYOF and CRABP2 in MPA.
Zhou et al. (20) performed proteomic analyses between low-risk and high-risk subtypes of early-stage lung adenocarcinomas, where the low-risk LPA subtype group (n = 31) and the high-risk subtype group (n = 28) consisting of both MPA and SPA and reported several proteins differentially expressed more to the high-risk group.
Their results were not well linked to our proteomic observations.Some proteins differentially expressed in their high-risk subtype group, such as DNA replicating licensing factors MCMs, were oppositely upregulated in the low-risk subtype LPA subtype in our study, except that the expression of upregulated P4HA2 (Prolyl-4 hydroxylase subunit alpha-2) was observed in the high-risk subtype MPA (p = 0.03).There are intrinsic differences between their approach and ours.They directly used sections of tissues collected immediately after resection for proteomic analysis while we collected only cancerous cells directly from the FFPE tissue sections via laser dissection, and they grouped both MPA and SPA as one high-risk group while we did not group MPA and SPA into one.

Canonical pathways enriched by IPA
The canonical pathways and their pathway categories enriched were obtained by the IPA software (http://www.ingenuity.com)(10) for the WM1 and WM2 modules significant to the SPA subtype (Figure 5).The WM1 network module was associated predominantly with activated pathways of EIF2 signaling, which are involved in the following pathway categories: Cellular Growth, Proliferation and Development, Cellular Stress and Injury, and Intercellular and Second Messenger Signaling (Figure 5A).The

Upstream regulators and causal networks enriched by IPA
Upstream regulator and causal network analysis were performed for the WGCNA modules using IPA software (http:// www.ingenuity.com)(10).Table 2 summarizes the top upstream regulators, causal networks, canonical pathways, and diseases or functions predicted for the four WGCNA modules.
Highly inhibited La-related protein 1 (LARP1), the master regulator of the cap-dependent top mRNA translation, strongly suggests the inhibition of protein synthesis via cap-dependent mRNA translation or the activation of the cap-independent, IRES-mediated translation of mRNA subsets encoding oncogenic proteins, such as HIF1A, and MYC (22).
FGFR1 is a cell-surface receptor for fibroblast growth factors that regulate embryonic development, cell proliferation, differentiation, and migration.Its amplification has been frequently reported in NSCLC (23).Activation of CTNNb1 and CTNNb-LEF1 implied activation of Wnt/b-catenin signaling in which b-catenin is stabilized and accumulated as free b-catenin in the cytosol and is subsequently translocated into the nucleus and activates the TCF (T-cell factor)/LEF (lymphoid enhancer factor)dependent transcription of Wnt target genes, key factors in cell proliferation and invasion (24,25).Proto-oncogene LCK encodes lymphocyte-specific protein tyrosine kinase LCK, an Src family tyrosine kinase, which plays an important role in TCR-linked signal transduction.CD28-induced LCK activation is an important mediator of T-cell activation.Recently, mathematical models for LCK autophosphorylation suggest that LCK is involved in the early stages of T-cell activation and response derived from PD-1 signaling suppression (26).It also potentially mediates PD-1-induced inhibition of early TCR signaling (27), which is important to cancer immune checkpoint therapy.LILRB2 is Leukocyte immunoglobulin-like receptor subfamily B member 2 (also known as ILT-4, immunoglobulin-like transcript 4) is a tumor Relationship between module eigen-proteins and the clinical traits of three subtypes LPA, MPA, and SPA.Each row in the embedded table represents weighted gene co-expression network analysis results for each module.The first and second columns in the table represent the module identification and color name of a module, respectively.The third column represents the number of proteins in each module.The table is colorcoded by the correlation coefficient, in which the intensity and direction of the correlations are indicated (red, positive correlation; blue, negative correlation).The seventh to ninth columns present the p-values of the correlation coefficients, and the tenth to thirteenth columns are q-values by multiple testing correction using the Benjamini-Hochberg method, where significant p-value (<0.05) and q-values (< 0.1) are highlighted in bright red background.p-values between 0.05 and 0.1 are highlighted in yellow background.The fourteenth to sixteenth columns present a number of overlaps between module proteins and proteins upregulated in each subtype.The eighteenth to twenty-one columns present results of the statistical over-representative analysis (ORA) using the hypergeometric overlapping test of all the WGCNA modules with proteins upregulated (more than twice) uniquely for the individual subtypes: 397 proteins in LPA, 314 in MPA, and 474 in SPA, where the Benjamini-Hochberg corrected significant q-values (< 0.05) are highlighted in bright red background.immune checkpoint molecule.Caseinolytic protease P (CLPP) is involved in the mitochondrial unfolded protein response and cellular bioenergetics (28).
Interestingly, all of Hif (complex), TRCD, FGFR1, IFNG, CD80, CD86, LCK, and LILRB2 target molecules including HLA-G in the WM2 module, whereas LILRB2 targets module molecules including both HLA-G and eigen-protein TPM4 (Table 2).Schreiber et al. demonstrated that the effects of IFNG are generally protumorigenic during the immune escape stage of cancer immunoediting, in which IFNG increases inhibitory immune checkpoint molecules, including HLA-G, to promote the formation of a tolerant immune microenvironment (29).
HLA-G exerts inhibitory effects on both innate and adaptive effectors through direct binding to inhibitory receptors, such as immunoglobulin-like transcripts, LILRB1 (also known as ILT-2), and LILRB2, with high affinity compared with other HLA class I molecules.HLA-G can inhibit many players associated with the anti-tumor response throughout early and late tumor stages, although CTLA-4 and PD-1 are predominantly expressed in higher tumor grades (30, 31).The networks associated with LILRB2 inhibition include the activated hub regulator, toll-like receptor 4 (TLR4, also known as CD284).TLR signaling activates various signaling molecules, including nuclear factor kB (NFkB), extracellular signal-regulated kinase (ERK/JNK/p38), and induces the synthesis of immunologic factors including IL-6, IL-12, PD-L1, and HLA-G.This results in the resistance of tumor cells to CTL attack and tumor cell immune evasion (32).Highly expressed HLA-G and highly inhibited LILRB2 implicated involvement of immune tolerance or anti-tumor immune escape.The volcano plots revealed high expression of HLA-G associated with the SPA subtype (Figures 2A-C).A web-based survival analysis (KMplot) for mRNA data of lung adenocarcinoma (n = 719) indicated that high HLA-G expression was significantly associated with poor OS (log-rank test p = 1.0 × 10 −11 ; hazard ratio: 3.1 (2.2-4.36))(https:// kmplot.com/analysis/)(Supplementary Figure S3) (33).Yan et al. interestingly showed a predominant expression of soluble HLA-G (sHLA-G) in tumor cells not from squamous cell carcinoma but adenocarcinoma of lung (34).

Module
In connection with the above genomic alterations, upregulated expressions of HLA I class molecules such as HLA-G and HLA-B were observed in proteomic analysis.(Figures 2A, B).It has been reported that there is a positive correlation between mutational burden and the expression of most HLA class I molecules (35, 36), and HLA class I molecules are associated with the innate immune response as ligands of inhibitory killer cell immunoglobulin-like receptors (KIRs) of Natural Killer (NK) cells (37).

Discussion
The WGCNA analysis was successfully applied to MS-based proteomic data.Forty co-expression network modules were identified, among which two network modules were significantly associated with solid predominant adenocarcinoma, SPA (Figure 4).Volcano plots of the proteins revealed high expression levels of HLA-G (human leukocyte antigen G) in the SPA subtype (Figures 2A, B).A multivariate correlation analysis (MVA) of eigen-proteins mostly grouped into Cluster A, B, and D, which correspond well to the SPA, MPA, and LPA subtypes, respectively (Figure 2C).Oncoprint analysis of three subtypes exhibited that SPA interestingly exhibited a highly frequent p53 alteration of 75% although MPA and LPA harbored a frequency of p53 alterations, 41% and 17%, respectively (Supplementary Figure S4).High TP53 alteration seems to reflect a high malignancy of SPA.
Upstream analysis using IPA predicted that the highly activated oncogenic regulators, MYC, MLXIPL, and MYCN may reflect an aberrant feature of the SPA subtype, that activation of the redox master regulator NFE2L2 indicates an occurrence of oncogenic signaling in response to oxidative stresses and that both the highly inhibited LARP1 and activated pathways of EIF2 signaling indicated a switch to IRES-mediated mRNA translation, resulting in the generation of oncogenic proteins, such as hypoxia-inducible factors (Hif).Activation of Wnt/b-catenin signaling also might participate in the disease-related interaction networks of SPA, which results in Wnt target gene products by the TCF/LEF-dependent transcription.
Interestingly, SPA-significant co-expression network modules, WM1 and WM2, also involved highly activated IFNG, TCRD, CD3-TCR, CD8A, CD8B, CD3, CD80/CD86, and highly inhibited LILRB2, all of which are associated with the adaptive immune system and defense, and importantly which all target module molecules including the HLA class I molecule HLA-G.Representative immune system-related causal networks associated with the SPA subtype are presented in Figure 6.Moreover, our observation of the immune checkpoint molecule, HLA-G, significantly upregulated in the SPA subtype and its counterpart LILRB2 predicted to be significant and highly inhibited implicate the involvement of the immune tolerance or anti-tumor immune escape processes in early-stage SPA, although CTLA-4 and PD-1 may not yet be expressed at early-stage.Representative cancer cell immunity-related causal networks of (A) CD80/CD86, (B) CD-TCR, and (C) LILBR2, associated with the SPA subtype, predicted by IPA.Node shapes indicate molecular types: triangle, kinase; square (dashed), growth factor; rectangle (horizontal), ligand-dependent nuclear receptor; rectangle (vertical), ion channel; diamond (vertical), enzyme; diamond (horizontal), peptidase; trapezoid, transporter; oval (horizontal), transcription regulator; oval (vertical), transmembrane receptor; double circle, complex; and circle, other.Orange/light orange and blue/ light blue colors indicate the extent of confidence for predicted activation and inhibition, respectively.Lines denote predicted relationships.A solid or dashed line indicates direct or indirect interaction, respectively.Orange, leading to activation; blue, leading to inhibition; yellow, findings inconsistent with the state of a downstream molecule; grey, not predicted effect.CD80/CD86, T-lymphocyte activation antigens CD80/CD86 (Activation B7-1/B7-2 antigens); CD3-TCR, CD3, TCR (T cell receptor complex); LILRB2, Leukocyte immunoglobulin-like receptor subfamily B member 2, also known as Immunoglobulin-like transcript 4 (ILT-4).The Applied Proteogenomic Organizational Learning and Outcomes research network (38) conducted the deep proteogenomic profiling of 87 lung adenocarcinoma tumors from a cohort of individuals in the United States, which utilizes the integrative approaches, including whole-genome, transcriptome, and MS-based proteomic and phosphoproteomic sequencing, and in which adenocarcinoma tumor subtypes were stratified according to RNA expression subtypes, such as terminal respiratory unit (TRU), proximal proliferative (PP), and proximal inflammatory (PI).They reported that PI enriched with solid predominant histological subtype (SPA) overexpressed the immune cell marker clusters, and its network is characteristically associated with enhanced IFNG signaling and inflammation, high tumor mutational burden (TMB), coupled with PD-L1 protein and CTLA4 RNA expression, and suggested that the PI subtype may have the subset of tumors most likely to respond to immune checkpoint inhibitors (38).
Recently, Li et al. (39) conducted a multi-omics analysis of 1,078 untreated lung adenocarcinoma patients with clinicopathologic, genomic, transcriptomic, and proteomic data from public and internal cohorts.SPA had molecular features including significantly higher tumor mutation burden (TMB), the higher frequency of TP53 mutation together with EGFR/TP53 comutation, and higher immuno-resistant microenvironment, that indicates a poor response to chemotherapy, such as tyrosine kinase inhibitors (TKIs).Indeed, SPA showed upregulated expression of immunotherapy-related genes relevant to strong immunogenicity.Moreover, it was shown in the cohort of lung adenocarcinoma patients who received neoadjuvant immunotherapy that SPA relatively well responded to immunotherapy.They concluded that SPA would be more suitable for immunotherapy while less suitable for chemo-and targeted therapy (39).
Both studies (38,39) seem to consistently reflect the molecular features of SPA and give corroboration to our findings in this study.One limitation of this study was the number of patients examined because of the limited number of early-stage cases pathologically confirmed and available for proteomic analysis in our hospital.

Conclusion
In summary, we successfully identified disease-related coexpression protein networks by WGCNA analysis applied to the proteomic datasets, following MS-based proteomic analysis of cancerous cells laser microdissected from FFPE tissue specimens of three lung adenocarcinoma subtypes, LPA, MPA, and SPA.Upstream regulator and causal network analysis performed for protein coexpression networks significantly associated with the SPA subtype implicated not only oncogenic signalings including cap-independent IRES-dependent mRNA translation and NRF2-mediated oxidative stress response but also highly activated molecular networks relating to the adaptive immune system and immune tolerance more likely associated with malignancies characterizing the SPA subtype, together with its high frequency of TP53 mutation.Although the number of patient samples examined was limited in this study, we are planning a larger cohort study of patient-derived samples, including genomic alteration analysis to investigate core data-driven proteogenomic networks.This approach will provide clinically important information on proteogenomic landscapes of the high-risk adenocarcinoma subtype SPA of the lung.organizations, or those of the publisher, the and the reviewers.Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
FIGURE 1 (A) 1,293 lung cancer patients were pathologically confirmed and received surgical operations between January 2008 to April 2022.at the St. Marianna University School of Medicine Hospital (Kawasaki, Kanagawa, Japan) The proportions of subtypes were LPA (14%), MPA (1%), SPA (4%), and others (81%).The SPA cases showed a worse overall survival (OS) compared with the LPA cases (log-rank test: p = 3.06 × 10 -5 ), whereas the MPA cases were not statistically significant due to the limited number of cases against the LPA and SPA cases.(B) Representative images of tumor tissues from solid predominant adenocarcinomas (SPA) (sample SPA_T21) stained with hematoxylin & eosin (HE) using standard histological methods, and only with hematoxylin prior to and after laser microdissection (LMD).Scale bars in red color are indicated at the bottom right.(C) Venn map of the identified proteins.

2
FIGURE 2 Semi-quantitative protein expression and multivariate analysis of eigen-protein expression.(A) Volcano plots of protein expression in SPA vs. LPA.(B) Expressions of HLA class I proteins, HLA-G and HLA-B, are presented in all the sample tissues.(C) An ANOVA test of HLA-G expressions among three subtypes.(D) Multivariate correlation analysis (MVA) for the spectral counting-based expression of forty eigen-protein and fifteen proteins characteristic of the respective subtypes, assuming three-group clustering (correlation coefficient: Pearson; heatmap order: hierarchical clustering; agglomeration method: complete; the number of clusters: 3).The eighteen eigen-proteins of the WGCNA modules relatively well correlated (r > 0.4) with and upregulated proteins characteristic of respective subtypes are denoted in colored filled-circles and colored circles (LPA: red; MPA: blue; SPA: green).

3
FIGURE 3The forty protein network modules identified by weighted gene co-expression network analysis (WGCNA).(A) Sample dendrogram and trait heatmap, (B) protein cluster dendrogram, and (C) pairwise correlations between the 40 identified modules concerning the connectivity measure (KME) of the module eigen-protein (correlation coefficient: Pearson; heatmap order: Euclidean; agglomeration method: complete; the number of clusters: 3), where the color bars represent the three clusters, together with lists of module ID, module color and eigen-protein.

TABLE 1
Clinicopathological information of the recruited patients.