Quantitative proteomics identifies tumour matrisome signatures in patients with non-small cell lung cancer

Introduction The composition and remodelling of the extracellular matrix (ECM) are important factors in the development and progression of cancers, and the ECM is implicated in promoting tumour growth and restricting anti-tumour therapies through multiple mechanisms. The characterisation of differences in ECM composition between normal and diseased tissues may aid in identifying novel diagnostic markers, prognostic indicators and therapeutic targets for drug development. Methods Using tissue from non-small cell lung cancer (NSCLC) patients undergoing curative intent surgery, we characterised quantitative tumour-specific ECM proteome signatures by mass spectrometry. Results We identified 161 matrisome proteins differentially regulated between tumour tissue and nearby non-malignant lung tissue, and we defined a collagen hydroxylation functional protein network that is enriched in the lung tumour microenvironment. We validated two novel putative extracellular markers of NSCLC, the collagen cross-linking enzyme peroxidasin and a disintegrin and metalloproteinase with thrombospondin motifs 16 (ADAMTS16), for discrimination of malignant and non-malignant lung tissue. These proteins were up-regulated in lung tumour samples, and high PXDN and ADAMTS16 gene expression was associated with shorter survival of lung adenocarcinoma and squamous cell carcinoma patients, respectively. Discussion These data chart extensive remodelling of the lung extracellular niche and reveal tumour matrisome signatures in human NSCLC.


Introduction
Lung cancer, including non-small cell lung cancer (NSCLC), is a common cancer and the leading cause of cancer-related deaths (1). Lung cancer, therefore, is a significant cause of morbidity and mortality, and improving lung cancer outcomes remains a clinically unmet need. Underpinning improved outcomes are a better understanding of the underlying biological process of tumour progression, identification of novel drug targets and establishment of markers that aid clinicians in determining diagnosis, prognosis and treatment decisions. Thus, proteomic assessment of primary cancer samples may help to advance these aims by uncovering differences in the protein composition of tumour and non-tumour tissues (2,3).
The extracellular matrix (ECM) proteome, or matrisome, is composed of core ECM proteins, including collagens, glycoproteins and proteoglycans which form the structure of the ECM, and ECMassociated proteins, such as mucins, enzymes that can modify the ECM and secreted factors such as cytokines (4). Matrisome proteins have several key biological functions, including providing physical support, regulating pH, hydration and organisation of tissue and modulating signalling in tissue by binding growth factors and cytokines (5,6). The matrisome is frequently altered in neoplastic tissues, impacting on multiple hallmark features of cancer (7,8); for example, ECM dysregulation is associated with suppression of antitumour immunity and immunotherapy resistance, including in lung cancer (9, 10). Despite having important functions in cancer progression, the matrisome is an under-explored region of tumour tissues (11)(12)(13), owing in part to technical difficulties in its analysis, including the poor solubility of many ECM proteins (14), the multiplicity of their post-translational modifications (15), the limited number of robustly validated antibodies targeting ECM proteins (16) and low abundance of ECM proteins in many tissues compared to intracellular proteins (17). While bulk lung tumour tissue samples have been analysed by mass spectrometry (MS)based proteomics to search for candidate biomarkers of disease (18)(19)(20)(21)(22)(23), the tumour-associated matrisome of lung cancer patients has not been comprehensively documented. We hypothesised that deep, quantitative characterisation of ECM isolated from tumours from NSCLC patients will enable the detection of new prospective extracellular protein markers of lung cancer.
This study aimed to detect changes in matrisome proteins between human NSCLC tissues and patient-matched noncancerous lung using a quantitative proteomic approach to curate an extensive resource of lung tumour matrisome proteins. We analysed tissue from 34 patients undergoing curative intent resections for NSCLC, coupling MS-based proteomics to fractionation of tissue samples to enrich for matrisome proteins. We quantified the differential abundance of proteins between tumour tissue and non-cancerous lung, characterising tumour matrisome signatures in NSCLC. Functional network analysis identified a collagen hydroxylation module enriched in tumour tissue, and we validated two novel putative extracellular markers, peroxidasin and a disintegrin and metalloproteinase with thrombospondin motifs 16 (ADAMTS16), as being up-regulated within tumours in separate patient cohorts.

NSCLC proteomic study population characteristics
The cohort comprised 34 NSCLC patients, 20 female and 14 male, ranging from 49 to 87 years old, whom underwent surgical resections of NSCLC as treatment with curative intent (details in Supplementary Table 1). Seventeen of the patients were diagnosed with adenocarcinomas, 12 with squamous cell carcinomas, three with large cell tumours and two with pleomorphic tumours. Using TNM classification (8 th edition), one patient had a T1, 17 had T2, 10 had T3 and six had T4 tumours (24). Lymph node (N) component included stage N0 in 24 patients, seven patients with N1 and three with N2 disease. As this was a curative intent cohort, no patients had distant metastasis. All patients except one were either previous or current cigarette smokers. The standardised uptake value of the PET imaging tracer 18F-FDG from the pre-operative CT scans of patients was high for all but two patients (one moderate, one not recorded). Non-cancerous lung tissue was retrieved from adjacent areas of resected lung tissue removed with the lung tumours ( Figure 1A). Histopathological abnormalities within the nonmalignant regions of the lung included single abnormalities or combinations of pneumonia, emphysema, inflammation, sarcoidosis and pleural fibrosis, and seven of the tissues were r e c o r d e d a s b e i n g hi s t o p a t h o l o g i c a l l y n o r m a l l u n g (Supplementary Table 1).

Quantification of NSCLC matrisome proteins by MS
To enrich extracellular proteins from patient-derived lung tissue for proteomic analysis, we used detergent and alkaline extractions and DNase treatment to deplete the lung tissue of cells and intracellular material, with minimal loss of relatively insoluble ECM proteins ( Figure 1A; Supplementary Figure 1A). We performed label-free liquid chromatography-coupled tandem mass spectrometry (LC-MS/MS) analysis of ECM isolated from NSCLC tumour and non-tumour tissue, identifying a total of 1,662,859 peptide-spectrum matches, quantifying 3,602 proteins with a false discovery rate (FDR) of 1% (Supplementary Data 1). Extracellular proteins were significantly over-represented in the isolated ECM fractions ( Figure 1B; Supplementary Data 2), and 1,420 identified proteins (39.4%) were annotated as extracellular in the Gene Ontology database (25) (Supplementary Datas 1, 2). The median sequence coverage of extracellular proteins, determined by MS, was significantly higher than that of all other identified proteins ( Figure 1C), and the spectral counts for extracellular proteins were significantly higher than for other proteins ( Figure 1D).
Two hundred and fifty-six proteins in the dataset were classified as curated matrisome proteins (26, 27), which comprised 108 core matrisome proteins and 148 matrisome-associated proteins. The core matrisome group consisted of 70 glycoproteins, 26 collagens and 12 proteoglycans; the matrisome-associated group consisted of 37 ECM-affiliated proteins, 73 ECM regulators and 38 secreted proteins. This is comparable in number and matrisome subgroup distribution to recent proteomic analyses of lung tumour tissue (28-30) (Supplementary Figure 1B; Supplementary Table 2). Together, these data indicate strong detection and enrichment of extracellular proteins in the patient-derived lung ECM samples.
Of the 3,602 identified proteins, 1,805 significantly differed by at least 2-fold between tumour and non-tumour tissues (P < 0.05, paired two-sided Student's t-test with Benjamini-Hochberg correction) ( Figure 1E

Down-regulation of ECM-organising proteins in NSCLC
The majority of matrisome proteins up-regulated in tumour ECM were matrisome-associated proteins, predominantly ECM regulators and secreted proteins, whereas proteins down-regulated in tumour ECM included a substantial number of both matrisomeassociated proteins and core matrisome proteins (Figure 2;   Cluster analysis of patient characteristics and matrisome protein expression. Heatmap representation of the 161 differentially expressed matrisome proteins (P < 0.05, paired two-sided Student's t-test with Benjamini-Hochberg correction). Proteins were quantified by log 2 -transformed label-free quantification intensities, standardised by row-wise Z-scoring and hierarchically clustered on the basis of Euclidean distance. The two principal clusters are indicated; proteins enriched or depleted by at least four-fold are labelled next to the respective cluster (selected proteins are labelled with gene names for clarity). SUV, standardised uptake value.
Supplementary Figure 3A). These proteins catalyse the posttranslational formation of hydroxylysine and hydroxyproline residues in collagen chains, which are critical for collagen helix stability (31), and have been reported to be up-regulated in fibrotic lung ECM in murine and human model systems (32,33). LC-MS/ MS analysis revealed that the majority of hydroxylated residues were found in type I and type III collagen chains (COL1A1, COL1A2, COL3A1) ( Figure 3B). While there were both up-and down-regulated hydroxyproline-containing peptides quantified by LC-MS/MS (Supplementary Figure 3B), almost all hydroxylysinecontaining peptides were up-regulated in tumour ECM, and all-butone of these were derived from type I collagen chains ( Figure 3C).
To identify clusters of residues that showed similar regulation of hydroxylation across the tissue samples, we mapped the changes in abundance of hydroxyproline-and hydroxylysine-containing peptides for matched tumour-non-tumour pairs. This analysis Up-regulation of lysine hydroxylation in patient-derived lung tumour ECM. (A) Gene ontology enrichment analysis of the principal clusters of differentially regulated matrisome proteins identified in Figure 2. All enriched terms for each respective cluster are shown (Fisher's exact test with Benjamini-Hochberg correction); for cluster 2, the term "extracellular structure organisation", the parent term of ECM organisation, and comprising the same contributing proteins and enriched with the same P-value, was omitted. Proteins belonging to enriched terms were used to construct corresponding protein interaction networks (right panels). Proteins (nodes) are coloured according to enrichment or depletion in tumour samples and sized according to statistical significance (P < 0.05, paired two-sided Student's t-test with Benjamini-Hochberg correction). Protein interactions (edges) were weighted according to evidence of co-functionality. Unconnected proteins are not shown. indicated variable modulation of proline hydroxylation in type I and type III collagen chains, including a substantial cluster of hydroxyproline-containing peptides derived from COL1A1 that were up-regulated in tumour ECM ( Figure 3D Figure 3D), whereas modulation of lysine hydroxylation in type III collagen was more variable ( Figure 3C; Supplementary Figure 3C). These data imply that type I collagen lysine residues are extensively hydroxylated in tumour ECM, in concert with an up-regulation of enzymes that catalyse collagen hydroxylation.

Up-regulation of matrisome-associated proteins in NSCLC
In addition to enrichment of enzymes that regulate ECM hydroxylation in tumours tissue, there was an increase in matrisome proteins associated with ECM turnover in tumour samples. Several cathepsins, some of which have reported roles in the degradation of extracellular proteins, were up-regulated in tumour samples, including the secreted thiol proteases cathepsin B (2.1-fold) and cathepsin S (1. In addition to many matrisome-associated proteins, several core matrisome glycoproteins were up-regulated in tumour tissue. The most enriched glycoproteins in tumour ECM compared to nontumour ECM included thrombospondin-2 (16.9-fold), MXRA5

Peroxidasin and ADAMTS16 are increased in NSCLC tumour samples
Of the extracellular proteins we quantified as up-regulated in tumours, we selected peroxidasin and ADAMTS16 for further analysis ( Figure 4A). The activity of peroxidasinan enzyme that mediates collagen cross-linkinghas recently been linked to the promotion of lung cancer through inhaled air-pollution particles, which adsorb peroxidasin and induce aberrant ECM thickening (34), providing a rationale to examine peroxidasin expression in early-stage NSCLC ECM. ADAMTS16 DNA methylation has been shown to be dysregulated in several epithelial cancers, including lung cancer (35), suggesting that regulation of ADAMTS16 protein expression in tumour-associated ECM, and its potential as a protein biomarker in NSCLC, warrant investigation. We constructed a tumour microarray (TMA) containing control and tumour samples from a separate cohort of lung cancer patients undergoing curative intent surgery (Supplementary Table 3). Selected proteins were analysed by immunohistochemistry (IHC) and scored on a four-point scale (scored from 0-3) based on increasing immunostaining by IHC from no staining to strong staining. For lung cores in the TMA derived from tumour samples, the distribution and scoring of immunostained proteins was assessed separately for tumour stroma and tumour cells. In tumours, peroxidasin and ADAMTS16 were found variably distributed between tumour stroma and tumour cells ( Figure 4B). Importantly, both proteins had significantly increased immunostaining intensities in tumour cells compared to nontumour cells ( Figures 4B, C). These data confirm the results from the MS-based proteomics analysis and imply that peroxidasin and ADAMTS16 are up-regulated in NSCLC tumours.
To examine the expression of the genes encoding the selected extracellular proteins, we analysed an integrated dataset of RNA-seq data derived from paired tumour and adjacent normal tissue from lung adenocarcinoma or lung squamous cell carcinoma patients (36). Expression of both PXDN (which encodes peroxidasin) and ADAMTS16 were significantly up-regulated in adenocarcinoma tumours compared to matched adjacent normal lung tissue (1.96fold and 10.7-fold change in median expression, respectively) ( Figure 4D). Both genes were also significantly up-regulated in squamous cell carcinoma tumours compared to matched adjacent normal lung tissue (1.78-fold and 5.62-fold change in median expression, respectively) ( Figure 4E). These data indicate that PXDN and ADAMTS16 are transcriptionally up-regulated in NSCLC tumours.
We next assessed whether the protein expression levels of peroxidasin and ADAMTS16 were predictors of survival in the TMA cohort (Supplementary Table 3). Although up-regulated in tumour tissue ( Figures 4A-C), the degree of tumour stroma immunostaining of these two markers, as determined by IHC scoring of all interpretable lung cores, did not significantly correlate with survival outcome within the limited TMA cohort (Supplementary Figure 4). To examine patient survival in larger NSCLC cohorts, we used the Cancer Genome Atlas (TCGA) data derived from primary tissue samples from lung adenocarcinoma (513 patients) or lung squamous cell carcinoma (501 patients). We found that higher expression of PXDN was significantly associated with shorter survival of adenocarcinoma patients (hazard ratio (HR) = 1.57; 95% confidence interval (CI), 1.05-2.37; P = 0.028, log-rank test), whereas that of ADAMTS16 was not ( Figure 4F). Median survival of adenocarcinoma patients was 54.4 months for the low PXDN expression subgroup and 39.0 months for the high PXDN expression subgroup. In contrast, there was no statistical association of expression of PXDN with survival of squamous cell carcinoma patients, whereas higher expression of ADAMTS16 was significantly associated with shorter survival of squamous cell carcinoma patients (HR = 1.72; 95% CI, 1.18-2.50; P = 0.0043) ( Figure 4G). Median survival of squamous cell carcinoma patients was 74.1 months for the low ADAMTS16 expression subgroup and 33.4 months for the high ADAMTS16 expression subgroup. Together, these results indicate that these proteins, identified in the patient lung tumour matrisome, are increased in NSCLC and their corresponding gene expression provides prognostic information for lung adenocarcinoma (PXDN) or squamous cell carcinoma (ADAMTS16) patients undergoing curative intent surgery.

Discussion
We present herein a comprehensive characterisation of the NSCLC matrisome. To our knowledge, this is the first unbiased matrisome-scale proteomic analysis of enriched ECM from lung cancer patient tissue. Our quantitative analyses reveal the differential expression of a substantial number of extracellular proteins in lung tumour tissue from patients with early-stage NSCLC, implying extensive remodelling of the extracellular niche. We found that many core matrisome proteins were less abundant in lung tumour tissue as compared to non-cancerous lung tissue. For example, all differentially expressed collagens and proteoglycans were relatively decreased in tumour samples, while a subset were not significantly altered. These findings are consistent with MS-based proteomic data from a murine model of lung adenocarcinoma, which identified decreased or little change in abundance of many core matrisome proteins, including the majority of detected collagens and laminins, in tumour ECM as compared to non-tumour ECM (29).
The general association of lung cancer with desmoplasia, however, suggests that our observations may be linked to early stages of tumour ECM remodelling prior to accumulation of collagen. Indeed, recent examination of idiopathic pulmonary fibrosis lung tissue identified induction of collagen-modifying enzymes that contribute to collagen cross-linking, including lysyl hydroxylase 2, but not increased collagen synthesis, as a defining feature of lung fibrosis that increases tissue stiffness and promotes fibrotic progression (37, 38). Lysyl hydroxylase 2 (encoded by PLOD2) is one of several collagen-modifying enzymes that we detected as up-regulated in NSCLC tumour tissue in this study, alongside a concomitant increase in collagen lysine hydroxylation. Lysyl hydroxylase 2 is secreted by lung cancer cells in culture, and its hydroxylation of collagen telopeptidyl lysine residues leads to the formation of stable hydroxylysine aldehyde-derived collagen crosslinks that are up-regulated in lung cancer tissue and generate stiffer tumour tissue (39, 40). Thus, dysregulation of collagen architecture or cross-linking, and consequential ECM stiffness, may be more prominent features of early-stage NSCLC than changes in total collagen synthesis or density.
We observed the up-regulation of enzymes involved in extracellular protein degradation in lung tumour tissue, including several cathepsins and MMPs. These proteinases target a wide range of ECM proteins, such as collagen, laminin and elastin (41,42), almost all of which were depleted in tumour samples. This suggests that there may be increased turnover of core ECM macromolecules in NSCLC tumours, consistent with the remodelling of the extracellular niche found in various respiratory diseases and invasive tumour growth (41,43). Indeed, the expression of MMP2 and MMP14, which were up-regulated in lung tumour ECM, is linked to poorer patient outcomes in NSCLC (44,45), and MMP12, also up-regulated in tumour samples, is associated with faster disease relapse and metastasis in NSCLC patients (46) and the occurrence of bronchioalevolar adenocarcinomas in patients with emphysema (47).
We validated in a separate patient cohort the up-regulation in tumour tissue of two extracellular proteins identified by MS-based proteomics. Peroxidasin, an extracellular peroxidase, mediates the formation of sulfilimine cross-links between methionine and hydroxylysine residues in type IV collagen (48, 49). Interestingly, the activity of peroxidasin is associated with accelerated tumorigenesis in murine models of lung carcinoma in the presence of inhalable fine particulate matter, which adsorbs peroxidasin and leads to accumulation of collagen cross-linking (34). In our analyses, peroxidasin clustered in a tumour-enriched collagen-modifying protein subnetwork, which together with our identification of a tumour-enriched collagen hydroxylation functional network, implies that collagen modifications and modulation of collagen cross-linking are key characteristics of early-stage NSCLC. The other selected extracellular tumour marker candidate, ADAMTS16, a matrisome-associated protease, targets fibronectin to inhibit ECM assembly (50), further suggesting that ECM organisation is dysregulated in NSCLC tissue. IHC analyses determined that both peroxidasin and ADAMTS16 were enriched in tumour cell-rich regions of lung tumour tissue, although IHC scoring of either of these candidates did not represent a significant predictor of survival in the TMA cohort of NSCLC patients. The assessment of patients with early-stage disease undergoing curative intent surgery in this study precludes the detection of matrisome protein changes that occur in more advanced disease, which could limit the ability to identify robust late-stage disease biomarkers but likely also explains the identification of putative early events in the remodelling of collagen cross-linking. In addition, tissue was only collected from a small portion of the tumour. Tumours are known to be heterogeneous (51,52); therefore, sampling from one area may not be representative of all the pathological protein changes within a tumour. Analysis of transcriptomic data from larger lung cancer patient cohorts, however, revealed increased tumour expression of genes encoding both peroxidasin and ADAMTS16, and this was separately linked to poorer survival of lung adenocarcinoma and squamous cell carcinoma patients, respectively. Recent analysis of matrisome gene expression in multiple transcriptomic datasets showed that extracellular protein levels are generally concordant with corresponding gene expression in human tissues (53). Our findings are consistent with this observation and suggest that high expression of PXDN and ADAMTS16 can be used as surrogate readouts for up-regulation of these two potential extracellular lung tumour markers.
In summary, this study provides an extensive analysis of lung tissue ECM in patients with lung cancer, charting the remodelling of the matrisome in early-stage NSCLC. We show that proteomic profiling of patient-derived lung tumour ECM enables the identification of candidate extracellular markers of tumour cells. In addition, the systems-level changes to the lung matrisome we report here, including the up-regulation of a collagen hydroxylation network in NSCLC tissue, reveal potential molecular networks that could modulate ECM organisation and regulate lung cancer progression.

Study approvals
Patient samples were collected with ethical approval and written patient consent. Ethical approval was granted by Lothian NRS Bioresource, REC number 15/ES/0094 (reference SR419). All samples were assigned an anonymised code, and researchers were blinded to patient details for experiments. The TMA dataset was approved by Lothian NRS Bioresource, REC number 15/ES/0094 (reference SR1208), and approved by the NHS Lothian Caldicott Guardian (reference CRD19031).

Patient samples
Tissues samples used for mass spectrometry (MS) were collected from patients with NSCLC undergoing curative-intent surgical procedures. Following resection, samples were handled by an experienced thoracic pathologist, and samples of tumour and noncancerous lung (from the most distal portion of the resection specimen) were dissected and provided for MS analysis. Samples were snap frozen and stored at −70°C until required. Anonymised patient details were recorded, including age, gender, smoking history, histopathological diagnosis of tumour and non-tumour tissues, degree of differentiation of the tumour tissue, tumour stage and lymph node stage (determined by TNM status), survival and PET tracer uptake.

Enrichment of matrisome proteins
Tissue samples were enriched for predominately insoluble matrisome proteins by depleting soluble intracellular proteins ( Figure 1A). Methods were adapted from previously published work (54). Tissue samples were finely minced with scalpels, and the presence of any necrosis and tissue pigmentation was noted. Samples were homogenised twice in 1 ml of chilled phosphate-buffered saline (PBS) (without Ca 2+ or Mg 2+ ), containing 1% (v/v) protease inhibitor cocktail (Sigma-Aldrich), using a Precellys 24 tissue homogeniser (Bertin Instruments) at 6,500 rpm for 50 s. Samples were incubated for 5 min on ice between homogenisation cycles. The homogenate was centrifuged at 14,000 × g for 10 min at 4°C. The supernatant was removed (fraction 1), and the remaining pellet was resuspended and incubated in 10 mM Tris-HCl, pH 8, 150 mM NaCl, 25 mM EDTA, 1% (v/v) Triton X-100, 1% (v/v) protease inhibitor cocktail for 30 min on ice. The lysate was centrifuged at 14,000 × g for 10 min at 4°C. The supernatant was removed (fraction 2), and the remaining pellet was resuspended and incubated in 20 mM NH 4 OH containing 0.5% (v/v) Triton X-100 in PBS (without Ca 2+ or Mg 2+ ). The lysate was centrifuged at 14,000 × g for 10 min at 4°C. The supernatant was removed (fraction 3), and the remaining pellet was incubated with 10 µg/ml DNase I in PBS (with Ca 2+ and Mg 2+ ) for 30 min on ice. Samples were centrifuged at 14,000 × g for 10 min at 4°C, and the supernatant was removed (fraction 4). The remaining pellet was washed in ice-cold PBS and centrifuged at 14,000 × g for 10 min at 4°C three times. The final insoluble pellets enriched for ECM proteins were then stored at −70°C until further use.

SDS-PAGE and western blotting
To confirm matrisome proteins were enriched in the final protein pellet prior to proteomics experiments, the supernatants from serially extracted fractions from a subset of samples were probed by SDS-PAGE and western blotting. Three percent of the supernatant from each fraction was used for western blotting to allow comparison between fractions. Fractions 1-4 were incubated in 1× Laemmli buffer containing 50 mM dithiothereitol for 10 min at 95°C. The final ECM pellet was precipitated using TCA-acetone (see below), homogenised and incubated in 8 M urea, 100 mM NH 4 HCO 3 , pH 8, 10 mM dithiothereitol for 30 min at 37°C. Samples were resolved by SDS-PAGE using 4-20% Tris-glycine gels. Proteins were transferred to PVDF membranes using an iBlot 2 dry blotting system (Thermo Fisher Scientific) according to manufacturer's instructions. Membranes were blocked using milk blocking buffer (5% (w/v) non-fat skimmed milk powder (Marvel) in 1× Tris-buffered saline containing 0.1% (v/v) Tween 20 (TBS-T)) for 1 h at room temperature with shaking. Following blocking, membranes were incubated with primary antibodies, diluted 1:1,000 in milk blocking buffer, overnight at 4°C with rolling. Antibodies used were rabbit polyclonal anti-fibronectin (#ab2413, Abcam), mouse monoclonal anti-lamin A/C (clone 4C11; #4777, Cell Signaling Technology) and rabbit monoclonal anti-GAPDH (clone 14C10; #2118, Cell Signaling Technology). Membranes were washed three times in TBS-T and then incubated with anti-rabbit or anti-mouse horseradish peroxidase-conjugated secondary antibodies, diluted 1:10,000 or 1:5,000, respectively, in milk blocking buffer, for 45 min at room temperature with rolling. Membranes were washed three times in TBS-T. Membranes was developed using Pierce ECL western blotting substrate (Thermo Fisher Scientific) according to the manufacturer's instructions. Membranes were imaged using an Odyssey Fc imaging system (LI-COR Biosciences).

Matrisome protein precipitation
Ice-cold TCA (10% (v/v) final concentration) was added to final ECM-enriched fractions for 20 min at 4°C. The sample was centrifuged at 16,000 × g for 30 min at 4°C. The supernatant was discarded, and the ECM-enriched pellet was resuspended in icecold acetone, using vortexing and sonication, and incubated for 20 min at −20°C. The sample was centrifuged at 16,000 × g for 30 min at 4°C, and the supernatant was discarded. The acetone wash step was then repeated. Samples were air dried at room temperature until residual solvent had evaporated. Precipitated protein pellets were stored at −70°C until further use.

Matrisome protein digestion
Precipitated protein pellets were resuspended in 300 ml solubilisation buffer (8 M urea, 100 mM NH 4 HCO 3 , pH 8, 10 mM dithiothreitol). Samples were sonicated on ice using a probe sonicator and then incubated for 30 min at 37°C. After cooling to room temperature, sample pH in the range pH 8-9 was verified using pH indicator paper. To 50 µl sample, 8.3 ml of 175 mM iodoacetamide in 100 mM NH 4 HCO 3 , pH 8, was added (final concentration 25 mM) and incubated for 30 min at room temperature in the dark to block thiol groups of cysteine residues. Excess iodoacetamide was quenched by the addition of 3 µl of 100 mM dithiothreitol in 100 mM NH 4 HCO 3 , pH 8. For protein digestion, urea was diluted from 8 M to 2 M using 100 mM NH 4 HCO 3 , pH 8. Samples were incubated with 833 units of PNGase F for 2 h at 37°C with shaking (900 rpm), then with 800 ng of Lys-C for 2 h at 37°C, then with 1.6 µg of MS-grade trypsin for 16 h at 37°C with shaking (1,200 rpm). Samples were then incubated with an additional 0.8 µg of trypsin for 2 h at 37°C with shaking (1,200 rpm). Samples were acidified using trifluoroacetic acid (TFA) to obtain sample pH in the range pH 3-4, and samples were clarified by centrifugation at 18,000 × g for 15 min.

Peptide desalting prior to MS analysis
Peptide concentrations in the digested samples were estimated using a Nanodrop spectrophotometer measuring absorbance at 280 nm. Stop-and-go extraction (Stage) tips were made in-house, using a method adapted from Rappsilber et al. (55). Briefly, using an 18gauge blunt-ended needle, two disks were cut from C18 solid-phase extraction material and placed on top of each other inside a 200-µl EasyLoad pipette tip (Greiner Bio-One). Stage tips were loaded into a custom-built tip holder over a deep 96-well plate. Methanol was added to each Stage tip, and tips were centrifuged at 300 × g for 2 min. Stage tips were then equilibrated with 0.1% (v/v) TFA and centrifuged at 500 × g for 5 min. Samples estimated to contain 10 µg of acidified peptide (based on Nanodrop readings) were then added to Stage tips, and tips were centrifuged at 500 × g for 5 min. Stage tips with bound protein were stored at −20°C until elution.

Desalted peptide elution
Peptides were eluted from C18-containing Stage tips with 40 µl of 80% (v/v) acetonitrile, 0.1% (v/v) TFA, and tips were centrifuged at 200 × g for 5 min. Acetonitrile was evaporated using a vacuum centrifuge. Samples were adjusted to 15 ml volume with 0.1% (v/v) TFA, and peptide concentration was re-measured using a Nanodrop spectrophotometer measuring absorbance at 280 nm.

MS data acquisition
'Bottom-up' liquid chromatography-coupled tandem MS (LC-MS/MS) was used to elucidate the structure of isolated peptides and to detect post-translational modifications. LC-MS/MS analysis was carried out using an Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo Fisher Scientific) coupled to an UltiMate 3000 UHPLC Nano (Thermo Fisher Scientific), Aurora C18 column (IonOpticks), column oven (maintained at 50°C; Sonation) and Proxeon Nanospray ion source (Thermo Fisher Scientific). Peptides (1 µg) were injected onto an Aurora C18 column in buffer A (2% (v/ v) acetonitrile, 0.5% (v/v) acetic acid) and eluted with a linear 120min gradient of 2%-45% (v/v) buffer B (80% (v/v) acetonitrile, 0.5% (v/v) acetic acid). Eluting peptides were ionised in positive-ion mode before data-dependent analysis. A dynamic exclusion window of 30 s was enabled and lockmass was not used.

MS data analysis
The MS data were normalised and quantified using MaxQuant software (version 1.6.10.43) (56). MaxQuant quantifies proteins using a label-free technique, which calculates a normalised peptide abundance from ion signal intensities (57). Peptide lists were searched against the human UniProt knowledgebase database (version 2019_09) (58) and a common contaminants database using the Andromeda search engine implemented in MaxQuant. Cysteine carbamidomethylation was set as a fixed modification, and methionine oxidation, lysine oxidation, proline oxidation, N-terminal deamidation and protein N-terminal acetylation were set as variable modifications, with up to five modifications per peptide. Peptide identifications were matched between runs if they eluted within a time window of 0.7 min. Enzyme specificity was C-terminal to arginine and lysine, except when followed by proline. A maximum of two missed cleavages were permitted in the database search; minimum peptide length was seven amino acids. At least two peptide ratios were required for label-free quantification, and large label-free quantification ratios were stabilised. Peptide and protein false-discovery rates (FDRs) were set to 1%, determined by applying the target-decoy search strategy implemented in MaxQuant. Proteins matching to the common contaminants or decoy databases, and matches only identified by site, were omitted. Label-free quantification intensities were log 2 transformed, and proteins quantified in less than one-third of samples were removed. Missing values were imputed from a width-compressed, down-shifted Gaussian distribution using Perseus (version 1.6.2.3) (59).

Functional enrichment analysis
Gene Ontology (GO) over-representation analysis was performed using DAVID 2021 (DAVID Knowledgebase, version 2022q2) (60). The functional annotation tool category of GO term enrichment was used to filter out very broad GO terms. GO overrepresentation analysis of matrisome proteins used the entire matrisome database as background. Significant overrepresentation of terms was determined using a Fisher's exact test with Benjamini-Hochberg correction.

Matrisome data analysis
Identified proteins were defined as belonging to the matrisome if they exist in searchable databases of matrisome proteins based on data from 17 studies of the ECM, MatrisomeDB (27), and in-silico and in-vivo data from the Matrisome Project (26). Label-free quantification intensities for proteins derived from tumour and non-tumour samples were compared using a paired two-sided Student's t-test, with FDR set to 5% and artificial within-groups variance (s0) set to 1 using Perseus. Statistical data were visualised using Prism (version 9.2.0; GraphPad). For differentially expressed matrisome proteins, potential confounding variables were also considered. These variables included necrotic tumour samples versus non-necrotic tumour samples, adenocarcinoma versus squamous cell carcinoma, well and moderately differentiated tumours versus poorly differentiated tumours, tumours with local lymph node spread versus no known lymph node metastasis and normal non-tumour samples versus non-tumour samples with underlying pathology. Necrosis was crudely assessed based on the composition of the tumour samples when dissected and minced. PET standardised uptake value was scored as mild, moderate or marked. In some instances, absolute values were provided in reports rather than categorical values. These numbers were reclassified as no uptake for 0.6-0.8, low uptake for 1.0-2.0, moderate uptake for 1.5-2.0 and high uptake for >2.5. These variables were compared using multiple Student's t-tests (P < 0.05, FDR 5%).

Hierarchical cluster analysis
To enable relative comparison of protein enrichment, log 2transformed label-free quantification intensities were standardised by row-wise (protein-wise) Z-scoring. Differentially expressed matrisome proteins were hierarchically clustered on the basis of Euclidean distance, computed using average linkage, using Cluster 3.0 (C Clustering Library, version 1.54) (61). The following variables were included in the protein enrichment analysis: sex, age, smoking history, if patients had died at the time of analysis, histopathology results, degree of tumour differentiation, tumour stage, lymph node stage and PET 18F-FDG tracer uptake. Modified peptide fold changes between matched non-tumour-tumour pairs were hierarchically clustered on the basis of Euclidean distance, computed using average linkage. For sample correlation analysis, Spearman rank correlation coefficient-based distance matrices were computed using average linkage. Clustering results were visualised using Java TreeView (version 1.1.5r2) (62).

Interaction network analysis
Composite functional association networks were constructed using GeneMANIA (version 3.5.2; human interactions) (63) in Cytoscape (version 3.8.0) (64). Networks were based on reported physical and predicted protein-protein interactions; edges, representing protein-protein interactions, were weighted according to evidence of co-functionality using GeneMANIA. Networks were clustered using Markov clustering (granularity parameter 2.5), and graph layouts were determined using the force-directed algorithm in the Prefuse toolkit (65).

TMA and immunohistochemistry
The TMA was constructed for sequential patients undergoing surgical resection for lung cancer over a 2-year period at a regional thoracic surgery centre. Formalin-fixed paraffin pathological blocks were annotated by an experienced pathologist, and 1-mm cores taken from tumours and non-cancerous lung were embedded into new blocks and, subsequently, 4-µm sections were cut onto glass slides. Slides were deparaffinised and rehydrated, and antigen retrieval was undertaken with citrate buffer (#ab64214, Abcam) twice for 5 min in a microwave. Slides were processed with a commercial DAB cell and tissue staining kit (#CTS019, R&D Systems). Primary antibody immunostaining was optimised for polyclonal rabbit anti-human peroxidasin (#abx101905, Abbexa) and polyclonal rabbit anti-human ADAMTS16 (#TA322059, AMS Biotechnology) (diluted 1:100 and 1:200, respectively, and incubated at room temperature for 1 h). Secondary antibodies were incubated at room temperature for 30 min and DAB was developed. Slides were counterstained with haematoxylin and mounted, and images were acquired on an Axioscan microscope slide scanner (Zeiss).
For peroxidasin immunostaining, there were 151 noncancerous lung cores that were available for evaluation (149 interpretable following staining) and 138 tumour cores, of which 119 were paired samples (Supplementary Table 3). For ADAMTS16 immunostaining, there were 150 available noncancerous lung cores (148 interpretable for inflammatory cells and 149 for non-tumour cells) and 155 available tumour cores (154 interpretable for tumour stroma and 152 for tumour cells), of which 133 were paired samples (Supplementary Table 3). Slides were scored for intensity of staining of tumour cells or stromal areas on a four-point scale: 0, no staining; 1, low staining; 2, moderate staining; 3, high staining. Data are presented as the proportion of positive staining within each category of staining for tumour and normal samples. Significant difference in the distribution of staining scores was determined by a chi-square test. Outcome data were recorded for each patient who had a tumour resection, with the median time to follow-up of 1,432 days (range 1,054-1,905 days). Outcome included a record of death as defined by the clinical care team. Immunohistochemistry (IHC) scoring was assessed against survival for tumour staining.

Differential gene expression analysis
Gene expression data were derived from RNA-seq datasets of primary tissue samples from cohorts of lung adenocarcinoma or lung squamous cell carcinoma patients extracted from TCGA, the Genotype-Tissue Expression repository and the Therapeutically Applicable Research to Generate Effective Treatments database using TNMplot (36). Data from tumour samples were compared to paired data from matched adjacent normal samples.

Data availability
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD041066.

Survival analysis
Kaplan-Meier curves were computed from RNA-seq datasets of primary tissue samples from cohorts of lung adenocarcinoma or lung squamous cell carcinoma patients extracted from TCGA using KMplot (66). The lower and upper quartiles of gene expression were compared. Follow-up time was truncated to retain at least 10 patients at risk.

Data availability statement
The original contributions presented in the study are publicly available. This data can be found here: https://proteomecentral. proteomexchange.org, PXD041066.

Ethics statement
Patient samples were collected with ethical approval and written patient consent. Ethical approval was granted by Lothian NRS Bioresource, REC number 15/ES/0094 (reference SR419). All samples were assigned an anonymised code, and researchers were blinded to patient details for experiments. The TMA dataset was approved by Lothian NRS Bioresource, REC number 15/ES/0094 (reference SR1208), and approved by the NHS Lothian Caldicott Guardian (reference CRD19031).

Author contributions
HT, AK, JW, RO'C, AB and AA undertook experimentation and data generation. HT, AK, JW, RO'C, KD, MF, AB and AA interpretated the data and conceptualised the manuscript. AA, SP, DD constructed the TMA, retrieved the clinical data and provided samples for ECM extraction. HT, AB and AA wrote the manuscript. AB and AA jointly supervised the work. All authors contributed to the article and approved the submitted version.

Funding
HT was funded by the EPSRC and MRC Centre for Doctoral Training in Optical Medical Imaging (EP/L016559/1). AK was supported by a Wellcome Trust multiuser equipment grant (208402/Z/17/Z). MF was funded by Cancer Research UK (C157/ A15703 and C157/A24837). AA is supported by a Cancer Research UK Clinician Scientist Fellowship (A24867).