Cathelicidin-3 Associated With Serum Extracellular Vesicles Enables Early Diagnosis of a Transmissible Cancer

The identification of practical early diagnostic biomarkers is a cornerstone of improved prevention and treatment of cancers. Such a case is devil facial tumor disease (DFTD), a highly lethal transmissible cancer afflicting virtually an entire species, the Tasmanian devil (Sarcophilus harrisii). Despite a latent period that can exceed one year, to date DFTD diagnosis requires visual identification of tumor lesions. To enable earlier diagnosis, which is essential for the implementation of effective conservation strategies, we analyzed the extracellular vesicle (EV) proteome of 87 Tasmanian devil serum samples using data-independent acquisition mass spectrometry approaches. The antimicrobial peptide cathelicidin-3 (CATH3), released by innate immune cells, was enriched in serum EV samples of both devils with clinical DFTD (87.9% sensitivity and 94.1% specificity) and devils with latent infection (i.e., collected while overtly healthy, but 3-6 months before subsequent DFTD diagnosis; 93.8% sensitivity and 94.1% specificity). Although high expression of antimicrobial peptides has been mostly related to inflammatory diseases, our results suggest that they can be also used as accurate cancer biomarkers, suggesting a mechanistic role in tumorous processes. This EV-based approach to biomarker discovery is directly applicable to improving understanding and diagnosis of a broad range of diseases in other species, and these findings directly enhance the capacity of conservation strategies to ensure the viability of the imperiled Tasmanian devil population.


INTRODUCTION
Cancer is a condition that affects all multicellular species with differing degrees of susceptibility. One of the main challenges in oncology is a lack of diagnostic tools that allow for the early detection of cancerous processes. Commonly, cancer diagnosis relies on biomarkers that are present in identified cancerous masses (solid biopsy), or in the bodily fluids of the cancer patient (liquid biopsy). During the past decade, liquid biopsies have increasingly gained attention as a source of cancer biomarkers over traditional solid biopsies as they have increasing potential for early disease detection (1). One approach increasingly used in liquid biopsies is the analysis of extracellular vesicles (EVs). EVs are nano-sized bilipid membrane structures that are released by all cells. EVs mediate intercellular communication, including mechanisms of cancer progression (2) via their functional cargo such as proteins, lipids, and nucleic acids (3).
EVs are a promising biomarker source as they are accessible from almost all bodily fluids (4). They exhibit high sensitivity and specificity in cancer diagnosis and prognosis (5)(6)(7), and have organotrophic characteristics that may indicate organ-specific metastasis in bodily fluids (8). Further, EVs have stable biological activities as their cargo is protected from enzymatic degradation by a bilipid membrane (9). Considering these advantages, researchers have expressed great enthusiasm in the molecular analysis of EVs as an approach to cancer biomarker discovery in liquid biopsies. Proteins are well-studied EV cargo (1), as isolating EVs from serum can allow the enrichment and detection of a greater range of proteins that are otherwise masked by high-abundance serum/plasma proteins (10). Several EV protein biomarkers enabling early diagnosis of human cancers have been identified to date (7,11,12).
Commonly, cancer is understood as an individual disease, as tumors usually emerge and die with their hosts. However, there are several examples of transmissible cancers that have developed the capacity for tumor cells to be transmitted from one individual to another as allografts (13). Like other infectious diseases, transmissible cancers become a health problem at the population level, even to the point of threatening populations with extinction. One such case is the devil facial tumor disease (DFTD) that affects the Tasmanian devil (Sarcophilus harrisii; herein 'devil'). Since the first identification of DFTD in 1996, the disease has spread across more than 90% of the devils' range, leading to an 82% decline in local densities and reducing the total population to as few as 16,900 individuals (14). Due to the high mortality and epidemic nature of DFTD, the Tasmanian devil was listed as endangered by the International Union for the Conservation of Nature in 2008 and is protected by both Tasmanian State and Australian Federal legislation (15). The cause of DFTD is a clonal cancer of Schwann cell origin that is transmitted as a malignant tissue transplant among devils through bites (16,17). DFTD is a lethal cancer, almost always killing its host within 6 to 12 months after the clinical presentation of tumors on facial, oral and neck regions (18). A second transmissible cancer (DFT2), also of Schwann cell origin, was reported in 2016 (19). In this manuscript, DFTD refers to the transmissible cancer identified in 1996. DFTD is currently diagnosed by the appearance of macroscopic tumors and subsequent confirmation in the laboratory on the basis of positive staining for periaxin, karyotype aberrance, and PCR of tumor biopsies (20,21). However, there is direct evidence that DFTD has a long latent period as devils can develop tumors between 3 to 13 months after initial exposure to the disease (22). McCallum et al. (23) suggested that the disease is unlikely to spread between individuals prior to the development of clinical signs, however this assumption has not been validated due to the lack of a preclinical test. In an effort to identify DFTD serum biomarkers that could potentially serve to predict preclinical stages, Karu et al. (24) demonstrated that a panel of fibrinogen peptides and seven metabolites could differentiate devils with overt DFTD from healthy controls with high sensitivity and specificity. Another study found elevated levels of the receptor tyrosine-protein kinase ERBB3 in the serum of devils infected with DFTD compared to healthy controls (25). Despite the potential value of serum biomarkers for DFTD diagnosis, neither study confirmed their findings in samples from latent DFTD devils (3 to 13 months prior to clinical manifestation of tumors). The discovery and validation of a biomarker for early DFTD diagnosis would greatly improve the capacity for DFTD surveillance and population management and could ultimately assist in recovering devil numbers in wild populations.
To enable the preclinical diagnosis of DFTD, in this study we analyzed the proteome of EVs derived from the serum of devils collected over five years of quarterly devil trapping expeditions at several remote field sites in Tasmania. The longitudinal nature of this long-term monitoring program allowed the collection of serum from devils during the presumed "latent period", i.e., samples collected while devils were clinically healthy (no palpable or visible tumor masses), 3-6 months prior to subsequent recapture and clinical diagnosis of DFTD. We included EV samples from three classes of wild devils: those with clinically diagnosed overt DFTD, these devils in presumed latent stage of DFTD infection (herein: "latent"), and healthy devils from an offshore island population isolated from DFTD. Captive devils never exposed to DFTD were also included as healthy controls. These samples were divided into discovery and validation cohorts for the identification of DFTD biomarkers that would enable early detection with serum collected during routine.

Serum Samples
The two phases of this study comprised proteomic analysis of a discovery cohort and then a validation cohort ( Table 1). The discovery phase aimed to identify potential EV associated protein biomarkers for DFTD using a cohort of 12 DFTD infected devils and 10 healthy controls. DFTD infected devils were considered to be in advanced stages (mid-late) of the disease based on large tumor volumes (15 ml to 161 ml). Tumor volumes were calculated by the ellipsoid formula described by Ruiz-Aravena et al. (26), utilizing measures of length, width, and depth of each DFTD tumor. DFTD infected devils often present more than one tumor on multiple locations of the body. Therefore, total tumor volume was calculated by summing the volume of each tumor present at the time of sampling. The second phase was designed to validate the first phase data in an independent cohort and further investigate the potential biomarkers in preclinical, presumed DFTD latent devils. The validation cohort was composed of 17 healthy controls, 15 latent (preclinical) DFTDinfected devils, and 33 confirmed DFTD-infected devils at different clinical stages of the disease. Of these, 17 devils were sub-classified as early stage (tumor volumes from 0.05 ml to 2.63 ml), 14 as medium stage (tumor volumes from 5.0 ml to 40.73), and 2 as late stage (tumor volumes from 26 ml to 56 ml). The animal with 26 ml of tumor was categorized as late instead of medium DFTD-stage as it was emaciated and had to be euthanized. The samples from presumed latent devils were collected 3 to 6 months prior to confirmed diagnosis of DFTD and are herein referred to as "latent" ( Table 1). The serum samples of the DFTD-infected devils used in both phases of the study were collected from two wild populations at the Northwest of Tasmania on 10-day field expeditions every 3 months between February 2015 and August 2019 ( Table 1). The serum samples of the healthy cohort were obtained from captive devils held in Bonorong Wildlife Sanctuary and Richmond facilities (discovery cohort; samples collected between 2018-2019) and from wild devils from a DFTD-free insurance population (validation cohort; samples collected between 2014 and 2015) ( Table 1). As DFTD-induced extinction was a genuine concern predicted by mathematical and epidemiological models (23), government managers established a wild-DFTD population on an isolated island free from DFTD, located in Maria Island on Tasmania's east coast (27). Blood was obtained from conscious (wild devils) or anesthetized devils (captive devils) by venipuncture from either the jugular or marginal ear vein (between 0.3 -1 mL) and transferred into empty or clot activating tubes. After a maximum of~five hours, samples were centrifuged at 1,000 g for 10 minutes, and the serum was pipetted off and stored frozen at -20°C (short term storage, up to 3 months) or -80°C (long term storage, up to 6 years) until further use. All animal procedures were performed under a Standard Operating Procedure approved by the General Manager, Natural and Cultural Heritage Division, Tasmanian Government Department of Primary Industries, Parks, Water, and the Environment and under the auspices of the University of Tasmania Animal Ethics Committee (permit numbers A0017550, A0012513, A0013326, and A0015835).

Extracellular Vesicle Purification
Serum samples were thawed on ice, and 500 µl and 300 µl of serum were extracted for the discovery and validation cohort, respectively. The serum samples were firstly centrifuged at 1,500 g for 10 minutes at 4°C to remove cells and debris. The samples were further centrifuged at 10,000 g for 10 minutes at 4°C to pellet larger extracellular vesicles. The supernatant was taken and subjected immediately to size exclusion chromatography on qEV2/35nm columns (IZON) following the manufacturer's instructions. Briefly, EVs were eluted in phosphate buffered saline (PBS) containing 0.05% sodium azide in eight fractions of 1 ml each after the collection of 14 ml of void volume and pooled. The EV samples were concentrated with Amicon Ultra-  Control, healthy devils never exposed to DFT1. IQR, interquartile range. *Captive holding facilities. NA, not applicable.
15 centrifugal filters (MWCO 100 kDa) to a final volume of 1 ml and stored in aliquots of 500 µl at -80°C until future use.

Transmission Electron Microscopy
Copper TEM grids with a formvar-carbon support film (GSCU300CC-50, ProSciTech, Qld, Australia) were glow discharged for 60 seconds in an Emitech k950x with k350 attachment. Two 5 µl drops of EV suspension were pipetted onto each grid, allowed to adsorb for at least 30 seconds and then blotted with filter paper. Two drops of 2% uranyl acetate were used to negatively stain the particle blottings after 10 seconds each time. Grids were then allowed to dry before imaging. Grids were imaged using a Joel JEM-2100 (JEOL Australasia Pty Ltd) transmission electron microscope equipped with a Gatan Orius SC 200 CCD camera (Scitek Australia).

Nano Particle Tracking Analysis (Zetaview)
EV size distribution and concentration were determined using a ZetaView PMX-120 nanoparticle analyzer (Particle Metrix, Inning am Ammersee, Germany) equipped with Zetaview Analyze Software version 8.05.12. Prior to measurement, the system was calibrated as per manufacturer's instructions with 100nm Nanospheres 3100A (Thermo Fisher Scientific). Measurements were performed in scatter mode, and for all measurements, the cell temperature was maintained at 25°C. Each sample was diluted in PBS to a final volume of 1 ml. Capture settings were sensitivity 80, shutter 100, and frame rate 30. Post-acquisition settings were minimum trace length 10, min brightness 30, min area 5, and max area 1000.

Sample Preparation
EV samples (500 µl aliquots) were thawed on ice and mixed with acetonitrile to a final concentration of 50% (v/v) and evaporated by a centrifugal vacuum concentrator to obtain EV sample proteins for mass spectrometry analysis. The EV sample associated proteins were resuspended in 150 µl of denaturation buffer (7 M urea and 2 M thiourea in 40 mM Tris, pH 8.0). Protein concentration was measured by EZQ protein quantification kit (Thermo Fisher Scientific), and 30 µg of protein from each sample was reduced with 10 mM dithiothreitol overnight at 4°C. EV protein samples were alkylated the next day with 50 mM iodoacetamide for 2 hours at ambient temperature in the dark and then digested into peptides with 1.2 µg proteomics-grade trypsin/ LysC (Promega) according to the SP3 protocol described by Hughes et al. (28). EV peptides samples were de-salted using ZipTips (Merck) according to the manufacturer's directions.

High-pH Peptide Fractionation
A specific peptide spectral library was created for devil serum EVs using off-line high-pH fractionation. A pooled peptide sample (180 µg) composed of aliquots of each EV sample from the discovery cohort (n=22 individuals) was desalted with Pierce desalting spin columns (Thermo Fisher Scientific) according to manufacturer's guidelines. The sample was evaporated to dryness and resuspended in 25 µl in HPLC loading buffer (2% acetonitrile with 0.05% TFA) and injected onto a 100 x 1 mm Hypersil GOLD (particle size 1.9 mm) HPLC column. Peptides were separated on an Ultimate 3000 RSLC system with micro fractionation and automated sample concatenation enabled at 30 µl/min with a 40 min linear gradient of 96% mobile phase A (water containing 1% triethylamine, adjusted to pH 9.6 utilizing acid acetic) to 50% mobile phase B (80% acetonitrile with 1% of triethylamine). The column was then washed in 90% buffer B and re-equilibrated in 96% buffer A for 8 minutes. Sixteen concatenated fractions were collected into 0.5 ml low-bind Eppendorf tubes, and then evaporated to dryness and reconstituted in 12 µl HPLC loading buffer.

Mass Spectrometry -Data-Dependent Acquisition
Peptide fractions were analyzed by nanoflow HPLC-MS/MS using an Ultimate 3000 nano RSLC system (Thermo Fisher Scientific) coupled with a Q-Exactive HF mass spectrometer fitted with a nano spray Flex ion source (Thermo Fisher Scientific) and controlled using Xcalibur software (version 4.3). Approximately 1 µg of each fraction was injected and separated using a 90-minute segmented gradient by preconcentration onto a 20 mm x 75 µm PepMap 100 C18 trapping column then separation on a 250 mm x 75 µm PepMap 100 C18 analytical column at a flow rate of 300 nL/min and held at 45°C. MS Tune software (version 2.9) parameters used for data acquisition were: 2.0 kV spray voltage, S-lens RF level of 60 and heated capillary set to 250°C. MS1 spectra (390 -1500 m/z) were acquired at a scan resolution of 120,000 followed by MS2 scans using a Top15 DDA method, with 30-second dynamic exclusion of fragmented peptides. MS2 spectra were acquired at a resolution of 15,000 using an AGC target of 2e5, maximum IT of 28ms and normalized collision energy of 30.

Mass Spectrometry -Data-Independent Acquisition
Individual EV peptide samples were analyzed by nanoflow HPLC-MS/MS using the instrumentation and LC gradient conditions described above but using DIA mode. MS1 spectra (390 -1240 m/z) were acquired at 120 k resolution, followed by sequential MS2 scans across 26 DIA x 25 amu windows over the range of 397.5-1027.5 m/z, with 1 amu overlap between sequential windows. MS2 spectra were acquired at a resolution of 30,000 using an AGC target of 1e6, maximum IT of 55 ms and normalized collision energy of 27.

Proteomic Database Search
Both DDA-MS and DIA-MS raw files were processed using Spectronaut software (version 13.12, Biognosys AB). The specific library was generated using the Pulsar search engine to search DDA MS2 spectra against the Sarcophilus harrisii UniProt reference proteome (comprising 22,388 entries, last modified in August 2020). Spectral libraries were generated using all default software (BGS factory) settings, including N-terminal acetylation and methionine oxidation as variable modifications and cysteine carbamidomethylating as a fixed modification, up to two missed cleavages allowed and peptide, protein and PSM thresholds set to 0.01. For protein identification and relative quantitation between samples, DIA-MS data were processed according to BGS factory settings, with the exception that single-hit proteins were excluded. In the case of uncharacterized proteins, protein sequences provided by UniProt were blasted against the Tasmanian devil reference genome (GCA_902635505.1 mSarHar1.11) using the online NCBI protein blast tool (29).

Statistical Analysis
Spectronaut protein quantification pivot reports, including protein description, gene names and UniProt accession numbers were created for the discovery, validation, and combined datasets. The combined dataset includes the discovery and validation datasets and was used to search for EV sample protein markers suggested by the Minimal information for studies of extracellular vesicles 2018 (30), and to evaluate the relationship of EV associated proteins with tumor volume. The protein quantitation pivot reports were uploaded into Perseus software (version 1.6.10.50) for further data processing and statistical analysis. Quantitative values were log 2 transformed and proteins filtered according to the number of valid values. The data were filtered in order that a valid value for a given protein was detected in ≥70% of samples in at least one group (i.e., discovery: control/diseased; validation: control/latent/early/advanced; combined: captive healthy/wild healthy/latent/early/medium/late). Remaining missing values were imputed with random intensity values for low-abundance proteins based on a normal abundance distribution using default Perseus settings. The filtered proteins in the discovery and validation datasets were considered for differential expression analyses of biomarker candidates, which was determined using two-tailed Student's t-test with a permutation-based false discovery rate (FDR) controlled at 5% and s 0 values set to 0.1 to exclude proteins with very small differences between means. Significantly upregulated EV associated proteins from the filtered datasets were exported from Perseus and analyzed using R 3.6.2 (31). The utility of each discovery dataset protein as a disease status classifier was investigated by subjecting healthy/ disease cohort sample values of each to receiver operating characteristic (ROC) curve analysis to calculate their area under the curve, sensitivity, specificity, and accuracy with bootstrapped confidence intervals. The classification cut-off values were determined using Youden's index. Discovery dataset proteins with a disease status classification area under the ROC curve greater than 0.9 were then investigated by ROC curve analyses, if present, in the validation dataset (excluding the latent samples). Proteins with areas under the ROC curve greater than 0.9 in the discovery and validation dataset were investigated in the latent cohort vs healthy wild controls using protein abundance cut-off values trained to distinguish DFTD infected devils from healthy controls calculated in the validation cohort. Kendall rank correlation was used to reveal significant correlations between protein abundance from the combined dataset and tumor volumes. Linear models were utilized to search for associations between the level of EV associated proteins and tumor burden, which was calculated by dividing total tumor mass by body weight minus total tumor mass and expressed in percentage. Tumor mass was calculated assuming a tumor density of 1.1 g per ml of volume as described by Ruiz-Aravena et al. (26).

Proteome of Extracellular Vesicles Derived From DFTD Cultured Cells In Vitro
To identify possible signals from DFTD tumors, an EV proteome database derived from cultured DFTD cells was used to identify proteins in serum EVs that may originate from DFTD cells (32). Proteins upregulated in DFTD EVs relative to healthy fibroblast EVs and in EVs derived from serum samples of DFTD-infected devils relative to healthy controls obtained in the discovery cohort were compared.
We have submitted all relevant data of our experiments to the EV-TRACK knowledgebase (EV-TRACK ID: EV220126) (33).

Characterization of EVs Derived From Tasmanian Devil Serum
First, we used size exclusion chromatography columns to isolate extracellular vesicles from serum samples of healthy (DFTD free controls) and DFTD infected devils in different stages of the disease ( Table 1). Transmission electron microscopy (TEM) and nanoparticle tracking analysis (NTA) were used to evaluate the morphology and size of isolated extracellular vesicles. TEM images confirmed the presence of EV structures in all disease stages and healthy controls, showing a typical EV morphology as closed vesicles with a cup-shaped structure as described in other studies [ Figure 1A and Figure S1; (34)]. NTA demonstrated the presence of a heterogeneous nanoparticle population with a small to medium size distribution, which did not differ based on DFTD clinical stage ( Figure 1B). The different clinical stages of the disease were classified according to tumour volumes (see "methods"). Although health status/DFTD stage had a significant effect on the total number of nanoparticles (oneway ANOVA p = 0.02), no significant pairwise differences between groups were found ( Figure 1C).
A proteome dataset comprising combined discovery and validation cohorts (n=87) from the biomarker discovery process was generated by data-independent acquisition mass spectrometry to gain an overview of the serum EV proteome and evaluate the presence of commonly recovered EV protein markers and serum contaminants (30). Of a total of 345 filtered proteins, 23 established EV markers were identified, including CD9, annexins, heat shock and major histocompatibility complex proteins [ Figure 1D; (35)]. Serum-derived contaminants, which included albumin and five lipoproteins, all decreased in abundance as DFTD progressed ( Figure 1E).

Discovery of EV Associated Biomarkers for DFTD
For the biomarker discovery process, we first analyzed the proteome of extracellular vesicles isolated from a cohort of  (Figure 2A and Table 1). Based on Student's t-tests, 96 proteins (FDR corrected p < 0.05) were upregulated in EVs derived from DFTD infected devils relative to those from healthy controls ( Figure 2B and Table S1A).
Of these upregulated proteins, ROC curve analysis identified 31 proteins with high accuracy [area under the ROC curve ≥ 0.9; (36)] to distinguish diseased from healthy individuals (Table S2).
Proteins such as cathelicidin-3 (CATH3), connective tissue growth factor (CTGF) and complement component 5 (C5) were perfect classifiers of advanced-stage DFTD infected devils when compared to healthy controls (area under the ROC curve = 1, sensitivity and specificity = 100%; Figure 2C and Table S2). CATH3 was the most significantly upregulated protein in serum EV samples derived from DFTD infected devils relative to healthy controls (p < 10e-6) with a 4.7-fold increase ( Figures 2B, D). CTGF and C5 were significantly upregulated by 5.7-and 2.4-fold, respectively, in the DFTD infected devils compared to healthy controls ( Figures 2B, D).
To evaluate whether the upregulated proteins present in serum of advanced DFTD-infected devils relative to healthy controls were potentially released by DFTD cells, we used a proteomic database of EVs derived from cultured DFTD cells (32). We found that of the 96  upregulated EV associated proteins derived from serum of DFTD infected devils relative to healthy controls, 19 of them overlapped with the proteins of EVs derived from DFTD cells that were upregulated relative to EVs derived from healthy fibroblasts ( Figure S2). Six of these 19 proteins found in both cell culture EVs as well as serum proteomic databases yielded an area under the ROC curve greater than 0.9: F-actin-capping protein subunit alpha (CAPZA) and beta (CAPZB), profilin-1 (PFN1), fructosebisphosphate aldolase A (ALDOA), tyrosine 3-monooxygenase/ tryptophan 5-monooxygenase activation protein zeta (YWHAZ), and ARP3 actin related protein 3 (ACTR3) ( Table S2 and Figure S2).
However, none of the three perfect classifiers detected in the discovery cohort (CATH3, CTGF, and C5) were present in the EV cell culture database, suggesting an origin other than DFTD tumors.

CATH3 and PFN1 as EV Associated Protein Biomarkers for DFTD
To validate the discovery cohort results, the analysis of the proteome of EV samples were repeated with an independent cohort of 33 DFTD-infected devils in different stages of the disease to test whether our potential EV associated protein biomarkers can identify animals across a broader range of cancer progression  ( Figure 3A and Table 1). We also included 17 healthy devils from a DFTD-free wild insurance population located on Maria Island as negative controls ( Figure 3A and Table 1). Based on Student's t-test analyses, 51 proteins (FDR-corrected p < 0.05) were upregulated in EV samples derived from DFTD infected devils relative to healthy controls ( Figure 3B and Table S1B). Of these 51 upregulated proteins, only four yielded an area under the ROC curve greater than 0.9 (Table S3). In agreement with the discovery cohort results, CATH3 and PFN1 were significantly upregulated in different stages of DFTD-infected devils relative to the wild healthy controls by 2.9and 4.1-fold, respectively ( Figures 3B, C). ROC curves indicated that CATH3 and PFN1 classified devils with DFTD with 87.9% and 90.9% sensitivity and 94.1% and 88.2% specificity, respectively ( Figure 3D and Table S3). Unlike CATH3, PFN1 was detected in the cell culture DFTD EV database ( Figure S2), suggesting a possible tumor origin. In contrast, other protein candidates identified in the discovery cohort such as CTGF and C5 showed a reduced performance in distinguishing different stages of DFTD from the wild healthy controls, with a sensitivity of 48.5% and specificity of 88.2% for CTGF and 84.8% sensitivity and 70.6% specificity for C5 ( Figure S3).

EV Associated CATH3 Detect Latent Stage DFTD 3-6 Months Before Overt Disease
Further analysis of EVs derived from serum samples of the validation cohort revealed that the levels of CATH3 in EV samples could successfully distinguish devils in latent stages of DFTD (n=15) from healthy wild individuals (n=17). Devils were presumed to be in the latent stage of DFTD as samples were collected 3 to 6 months before subsequent DFTD pathological and clinical diagnosis ( Figure 4A and Table 1). Specifically, the levels of CATH3 were consistently upregulated in latent DFTD samples relative to the wild healthy group, following the same pattern revealed by the discovery and validation cohort results ( Figures 4B, C). In contrast, PFN1 was not significantly upregulated in latent devils relative to healthy controls ( Figures 4B, C).
We calculated sensitivity, specificity, and accuracy of CATH3 and PFN1 to classify latent stages from healthy controls, using protein abundance cut-off values trained to distinguish DFTD infected devils from healthy controls calculated in the validation cohort. CATH3 exhibited a sensitivity of 93.3% and a specificity of 94.1% with an accuracy of 93.8% to differentiate latent stages from healthy controls, supporting its utility as a biomarker for all stages of DFTD and its potential use for early detection of this transmissible cancer ( Figure 4D). In comparison with CATH3, the protein PFN1 was less effective in distinguishing devils in latent stages from healthy controls ( Figure 4D).

MYH10, TGFBI, and CTGF Are Associated With Tumor Burden
We used the filtered proteome dataset comprising combined discovery and validation cohorts to evaluate relationship between EV associated proteins abundance and tumor (B) Volcano plot of protein relative abundance fold changes (log 2 ) between EVs derived from serum of DFTD latent devils (n=15) and healthy wild controls (n=17) vs fold change significance. (C) Dot plot showing the relative abundance of EV CATH3 and PFN1 detected in 17 wilds healthy, 15 latent, and 33 DFTD-infected devils, different letters "a" and "b" indicate significant pairwise differences between groups (i.e., groups denoted with the same letter are not significantly different; one-way ANOVA and Tukey post-hoc test, p < 0.05). (D) Receiver operating curve analysis performed to classify latent devils (n=15) from healthy controls (n=17). Sensitivity and specificity were calculated (95% confidence intervals) for latent devils based on the protein threshold trained with the full validation dataset (n=50).
volume. CTFG and C5 were significantly and positively correlated with tumor volume in DFTD infected devils ( Figure 5A), which is consistent with their high predictive power to classify advanced-DFTD stages (large tumor volumes) from healthy individuals in the discovery biomarker phase. Myosin heavy chain 10 (MYH10), transforming growth factor beta induced (TGFBI) and CTGF were the proteins that correlated best with tumor volume ( Figure 5A and Table S4), and their expression levels enhanced as tumor volume increases ( Figure 5B). CATH3 and PFN1 did not demonstrate a significant positive correlation with tumor volume ( Figure 5A) but showed a binary relationship with disease/healthy demonstrated in the discovery and validation cohorts. Linear regressions were performed to evaluate the ability of tumor burden (as % of body mass) to predict MYH10, TGBI, CTGF, and C5 relative abundance values. Percent tumor burden was a significant predictor of CTGF (F (1,43) = 9.55, p < 0.01), TGFBI (F (1,43) = 9.41, p < 0.01), and MYH10 abundance (F (1,43) = 7.96, p < 0.01). Percent tumor burden explained a modest amount of variation in abundance of both CTGF and TGFBI (R 2 = 0.18) and slightly less for MYH10 (R 2 = 0.16). The models estimate CTGF, TGFBI, and MYH10 abundances enhance 1.95, 0.54-and 1.13-fold, respectively, for each 1% increase in tumor burden ( Figure 5C).

DISCUSSION
The ongoing transmission of DFTD and the consequent decline of the Tasmanian devil population has been intensively investigated for the past 25 years. However, the sole method of diagnosis of this transmissible cancer still relies on the visual identification of tumors and confirmatory biopsy, despite previous efforts to develop preclinical diagnostic tests. Here, with contemporary methodology for isolation of extracellular vesicles and quantitative proteomics, we identified promising biomarker candidates from liquid biopsies with potential to predict the presence of this transmissible cancer at a preclinical stage. Specifically, the elevated expression of cathelicidin-3 (CATH3) in serum-derived EV samples of two independent cohorts had a high predictive power to detect DFTD. Further, CATH3 enrichment was detectable 3 to 6 months before tumors were visible or palpable, providing the first preclinical biomarker for DFTD and confirmation of a consistent latent period of DFTD infection. The preclinical detection of EV-associated CATH3 in routinely collected devil serum samples provides a means to improve the health management of endangered devils along with insights for the future development of mammalian cancer biomarkers.
Cathelicidins are a family of peptides with roles in antimicrobial responses (37). Relative to placental mammals, devils have a notable diversity of cathelicidin peptides, several of which are widely expressed in devil immune tissues, digestive, respiratory and reproductive tracts; milk and marsupium [i.e., pouch; (38)]. Even though cathelicidins are thought to play important roles in the devil immune system, they have not been explored in DFTD pathogenesis. By contrast, the peptide LL-37, produced by the sole human cathelicidin gene, has been identified as a potential anti-tumor therapeutic agent for oral squamous cancer due to it causing apoptotic cell death, autophagy, and cell cycle arrest (39,40). Conversely, other studies have suggested that LL-37 can promote cancer cell proliferation, migration, and tumor progression via activation of the MAPK/ERK signaling pathway (41,42). Interestingly, this pathway is interconnected with the ERRB-STAT3 axis, thought to be a primary mechanism of tumorigenesis in DFTD (43). These intriguing findings raise the possibility that CATH3 expression in the course of DFTD infection is associated with a protective response by the host animal's innate immune system, or alternatively a yet undescribed evasion mechanism induced by the transmissible tumor.
As CATH3 was not identified in EVs derived from cultured DFTD tumor cells (32), we propose that this early DFTD biomarker is likely associated with host cell derived EVs rather than those of tumor origin. EV protein cargo found in plasma/ serum of cancer patients reflects the systemic effects of cancer, displaying markers not only associated with the primary tumor, but also the tumor microenvironment, distant organs, and the immune system (6). These EV protein signatures have also demonstrated diagnostic power in discriminating between healthy and cancer samples, indicating that host cell derived EVs can serve as sensitive cancer biomarkers. A host cell derived EV protein may have advantages for use as an early biomarker. Based on the finding that EV associated CATH3 abundance was independent of tumor volume, and the consistent upregulation of CATH3 across latent and overt DFTD stages relative to healthy samples, we propose that the increase in CATH3 arises from a uniform host response to this clonal cancer rather than of tumor cell origin. This independence of tumor volume is a desired feature for an early cancer biomarker as its sensitivity will be less dependent on a minimum tumor burden.
In contrast with the likely host origin of CATH3, we found that profilin-1 (PFN1), the overt DFTD biomarker found in this study, was highly expressed in EVs derived from DFTD cells in vitro (32). As DFTD cells have a Schwann cell origin (17), the upregulation of the actin-binding protein PFN1 is not surprising as it is required for Schwann cell development and migration (44). Additionally, PFN1 has been observed to be overexpressed in renal cell carcinoma (45) and proposed as a urine biomarker for bladder cancer aggressiveness (46). Considering these lines of evidence, we suggest that the upregulation of PFN1 in serum EV samples isolated from overt DFTD devils likely originates from DFTD tumor cells. This is consistent with the poor performance of PFN1 to classify latent DFTD, considering tumor volume is presumably at a minimum at the preclinical disease stage.
Relative to the likely DFTD cell origin of PFN1 upregulation, a host origin of CATH3 may confer enhanced performance to classify preclinical DFTD but could also raise a concern regarding clinical specificity. Cathelicidins are also associated with inflammation and secondary infections (47), and altered abundance of other cathelicidins has been associated with purely inflammatory diseases such as bovine mastitis (48). However, we found no evidence for elevated levels of CATH3 in 16 of the 17 serum EV samples from the wild devils used for our healthy cohort despite the elevated values of other common inflammatory markers such as C-reactive protein, serum amyloid P-component, and several complement proteins [(49, 50); see Figure S4]. The high expression of these inflammatory markers found in wild healthy devils relative to captive healthy individuals is most likely due to the high prevalence of injuries resulting from intra-species biting, a common social behavior among devils (16) that results in wounds susceptible to microbial infections. Thus, the high specificity of CATH3, but not other cathelicidins or common inflammatory markers strongly implies that CATH3 is not associated with general inflammation. We suggest investigating the potential mechanism of action of CATH3 in the pathogenesis and progression of DFTD to identify this marker's role. In addition, we suggest developing anti-CATH3 antibodies to determine the specific association of CATH3 with EVs (e.g., surface or intravesicular cargo) by immunoaffinity techniques, enabling a better understanding of how this biomarker is packaged in EV samples.
Of the proteins that were found at greater abundance in devil EV samples at the advanced DFTD stages, many were among the subset of well-characterized EV markers, such as heat shock proteins, annexins, and integrins (30,35). This is consistent with previous reports that found a strong correlation of general EV markers with advanced cancer stages, indicating their potential prognostic value (8,51). The three proteins myosin heavy chain 10 (MYH10), transforming growth factor beta induced (TGFBI), and connective growth factor (CTGF) with the strongest correlations with tumor volume are not generic EV markers and were not found in EVs derived from DFTD cells in vitro (32). However, these proteins have all been documented to be associated with aggressiveness of tumor progression. For instance, MYH10 is overexpressed in glioma cells and implicated in cell migration and invasion (52), and also has a pro-tumorigenic effect in a murine lung cancer model (53). High expression of TGFBI predicts poor prognosis in patients with colorectal and ovarian cancer (54,55), while it also promotes breast cancer metastasis (56). High levels of CTGF expression correlate positively with glioblastoma growth (57), invasive melanoma behavior (58), poor prognosis in esophageal adenocarcinoma (59), aggressive behavior of pancreatic cancer cells (60), and bone metastasis in breast cancer (61). Thus, the mechanisms that induce high levels of MYH10, TGFBI and CTGF expression with late stages of DFTD warrant further investigation to better understand the pathogenesis of DFTD.
The development of an antibody-based assay for serum EVassociated CATH3 would provide a scalable and cost-effective diagnostic test for latent DFTD. The implementation of such a test will enhance the capabilities for management and conservation actions, which may aid the recovery of devils in wild populations, ensuring this species can fulfil its ecological niche in the future. Firstly, it will ensure that only healthy wild devils will be introduced into insurance populations, which will significantly reduce the cost of maintaining devils in quarantine prior to release, which is currently required for at least fifteen months (22). Secondly, it will greatly improve the capacity of ongoing monitoring programs that are critical for early warning and response and underpin research on the epidemiology and evolutionary dynamics of this disease system. Finally, early detection of DFTD will improve the implementation of any potential vaccination or other therapeutic intervention in the future (62). Further studies are required to determine whether CATH3 is elevated in devils in DFTD-latent periods longer than 3 to 6 months as the evidence suggests more than one year of latency, to determine how far pre-diagnosis CATH3 expression can distinguish latent devils from healthy controls.
The results herein demonstrate that DFTD is a valuable cancer model for comparative oncology to explore cancer biomarkers, as it represents a way to examine the effect of a single genetically identical cancer on the EV profile of numerous individual animals, allowing for a level of replication not possible in other systems. Identifying a devil cathelicidin as an early DFTD biomarker could provide insight into cancer responses more broadly and represent a possible target for the development of anticancer drugs, given human antimicrobial peptides have been proposed as novel cancer biomarkers and therapeutics agents (39,(63)(64)(65). Characterizing CATH3 expression in response to a single cancer in a natural system could offer insight into host cancer adaptation strategies, as antimicrobial peptides have shown rapid evolutionary diversification within species with specific anti-pathogen activities (66). Finally, this marker also widens the scope of human and animal cancer studies to include non-tumor derived cancer markers that result from altered physiology during tumor development.

DATA AVAILABILITY STATEMENT
The datasets generated and analyzed for this study can be found in the ProteomeXchange Consortium via the PRIDE (67) partner repository with the dataset identifier PXD021480.

ETHICS STATEMENT
The animal study was reviewed and approved by The General

FUNDING
This work was funded by the National Geographic explorer early career grant (to CE and AL), Holsworth Wildlife Research Endowment grants (to CE, AL, and GW, and to MR-A for field sample collection), the University of Tasmania Foundation through funds raised by the Save the Tasmanian Devil Appeal (to AL, GW, RW, and RH). Proteomics infrastructure was funded by ARC LE180100059 (to RW and GW). Sample collection from wild devils was funded by US National Institutes of Health (NIH) grant R01-GM126563-01 and US National Science Foundation (NSF) grant DEB1316549 (to MJ) as part of the joint NIH-NSF-USDA Ecology and Evolution of Infectious Diseases program.
Marita Crombie from Mistover Cottage in Yolla for providing accommodation and logistic support during fieldwork. We thank volunteers who helped with data collection. We also thank David Gell, Cherie Blenkiron and Kirsty Danielson for their comments in the manuscript.