Plasma Protein Biomarkers for the Prediction of CSF Amyloid and Tau and [18F]-Flutemetamol PET Scan Result

Background: Blood biomarkers may aid in recruitment to clinical trials of Alzheimer’s disease (AD) modifying therapeutics by triaging potential trials participants for amyloid positron emission tomography (PET) or cerebrospinal fluid (CSF) Aβ and tau tests. Objective: To discover a plasma proteomic signature associated with CSF and PET measures of AD pathology. Methods: Liquid chromatography-tandem mass spectrometry (LC-MS/MS) based proteomics were performed in plasma from participants with subjective cognitive decline (SCD), mild cognitive impairment (MCI), and AD, recruited to the Amsterdam Dementia Cohort, stratified by CSF Tau/Aβ42 (n = 50). Technical replication and independent validation were performed by immunoassay in plasma from SCD, MCI, and AD participants recruited to the Amsterdam Dementia Cohort with CSF measures (n = 100), MCI participants enrolled in the GE067-005 study with [18F]-Flutemetamol PET amyloid measures (n = 173), and AD, MCI and cognitively healthy participants from the EMIF 500 study with CSF Aβ42 measurements (n = 494). Results: 25 discovery proteins were nominally associated with CSF Tau/Aβ42 (P < 0.05) with associations of ficolin-2 (FCN2), apolipoprotein C-IV and fibrinogen β chain confirmed by immunoassay (P < 0.05). In the GE067-005 cohort, FCN2 was nominally associated with PET amyloid (P < 0.05) replicating the association with CSF Tau/Aβ42. There were nominally significant associations of complement component 3 with PET amyloid, and apolipoprotein(a), apolipoprotein A-I, ceruloplasmin, and PPY with MCI conversion to AD (all P < 0.05). In the EMIF 500 cohort FCN2 was trending toward a significant relationship with CSF Aβ42 (P ≈ 0.05), while both A1AT and clusterin were nominally significantly associated with CSF Aβ42 (both P < 0.05). Conclusion: Associations of plasma proteins with multiple measures of AD pathology and progression are demonstrated. To our knowledge this is the first study to report an association of FCN2 with AD pathology. Further testing of the proteins in larger independent cohorts will be important.


INTRODUCTION
The accumulation of amyloid-beta (Aβ) plaques followed by the deposition of hyper-phosphorylated tau protein in neurofibrillary tangles, central to Alzheimer's disease (AD) neuropathology, is thought to develop around 20 to 30 years in advance of clinical symptom onset (Jansen et al., 2015). Given the long prodromal phase of the disease, a biomarker of these early neuropathological changes would be beneficial in participant selection and cohort enrichment for clinical trials of disease modifying therapies targeting AD neuropathology.
To date the best characterized and most frequently used biomarkers relating to amyloid and tau pathology are PET imaging measures of brain amyloid deposition and cerebrospinal fluid (CSF) measures of Aβ, total tau (tTau) and phospho-tau (pTau) (Kang et al., 2013;Palmqvist et al., 2015). However, PET scans can be expensive and access to scanners and radioligands remains limited, whilst extracting CSF is relatively invasive and can therefore be problematic to obtain, particularly if repeated measures are required. Blood based biomarkers have therefore been investigated as a less invasive and potentially cost-effective option for early detection and monitoring of AD pathology.
Many studies have investigated blood-based proteomic biomarkers to distinguish Alzheimer's disease cases from cognitively healthy elderly controls as reviewed (Thambisetty and Lovestone, 2010;Fu et al., 2014). However, to date there has been a relatively low rate of replication of these biomarkers across the field. This may be in part due to issues surrounding a study design that compares AD to cognitively healthy elderly control subjects. Given that AD neuropathology precedes clinical presentation of the disease by a number of years, some cognitively healthy elderly subjects may in fact be harboring silent AD neuropathology. This reduces the ability to find biomarkers specifically relating to AD using this design, as some control subjects will instead be preclinical cases. To overcome this issue biomarkers specific to the underlying disease have been sought including candidates relating to rate of cognitive decline, progression from mild cognitive impairment (MCI) to AD or disease pathology (MRI measures of brain atrophy, measures of brain amyloid) as we have reviewed earlier .
In such studies predicated not on disease category but on 'endophenotypes' of disease, our group has previously identified protein markers in blood relating to brain atrophy, disease severity and progression and to accumulation of cerebral amyloid as measured using PET imaging Kiddle et al., 2012;Hye et al., 2014;Sattlecker et al., 2014;Ashton et al., 2015;Voyle et al., 2015;Westwood et al., 2016). This has included the identification of a panel of 10 proteins, which coupled with APOE genotype, could predict MCI conversion to AD with 87% accuracy (Hye et al., 2014). Moreover, we have previously identified a number of proteins associated with PET amyloid in both an AD based cohort (Ashton et al., 2015) and in cognitively healthy elderly (Westwood et al., 2016). Notably, one protein; fibrinogen gamma chain (FGG), when combined with age was able to predict neocortical amyloid burden with 59% sensitivity and 78% specificity (Ashton et al., 2015).
However to date we are not aware of any studies that have been designed to discover blood protein biomarkers relating to CSF measures of AD pathology including both measures of Aβ 42 and of tau. Therefore, in the present study we first sought to discover and then to perform a technical replication of candidate biomarkers correlating with and predicting CSF Tau/Aβ 42 pathology in samples collected in the Amsterdam Dementia Cohort at VU University medical center, using an untargeted mass spectrometry proteomic approach. Secondly, we aimed to validate these candidates by relating their levels to disease pathology and progression in two independent cohorts. Firstly, we utilized plasma from people with MCI who were also assessed with brain amyloid PET using [ 18 F]Flutemetamol [GE067-005 study (Wolk et al., 2018)], measuring in these samples both the proteins identified in discovery and replication phase and protein markers of endophenotypes identified previously in the studies referenced above. Secondly, we utilized plasma from AD, MCI and cognitively healthy control individuals sourced through the European Medical Information Framework (EMIF) platform 1 who also had a CSF Aβ 42 measure, here we focused on measuring protein markers of AD pathology identified in the discovery and replication phase and also from our previous studies. Our overall aim was identification, replication and validation of blood markers indicative of brain pathology that might be used to reduce screening failure when seeking to recruit people with pathology to clinical trials.

Amsterdam Dementia Cohort
Biomarker discovery proteomics were performed on plasma samples from participants with AD, MCI or with mild subjective cognitive decline (SCD) visiting the Alzheimer Center Amsterdam as previously described (van der Flier et al., 2014;van der Flier and Scheltens, 2018). In brief, the diagnosis of probable AD was made according to common clinical and research criteria (McKhann et al., 1984(McKhann et al., , 2011, for MCI the Petersen's criteria was used (Petersen et al., 1999) and SCD was assigned to patients who presented with cognitive complaints but did not meet the criteria for dementia, MCI or any neurological or psychiatric conditions affecting cognition (van der Flier et al., 2014). Venous blood samples were processed for plasma and stored according to international consensus standard operating procedures (Teunissen et al., 2009).
Plasma samples were selected on the basis of CSF Aβ 42 and tau measures and, using an extreme phenotype approach, were stratified as very low CSF pathology (low CSF tau/high CSF Aβ 42 , n = 25) to very high CSF pathology (high CSF tau/low CSF Aβ 42 , n = 25) ( Table 1). Calculation of the CSF pathology score was carried out using the discrimination line: x = (373 + 0.82[tau])/[Aβ 42 ] (Mulder et al., 2010).
Replication studies using immunocapture techniques of the biomarkers identified in the discovery phase were carried out in the plasma samples included in the discovery phase and in additional plasma samples from a further 50 subjects visiting the Alzheimer Center Amsterdam, also stratified by CSF pathology score (Mulder et al., 2010). In total 50 plasma samples per group were included in the replication phase (Table 1). Data from 1 www.emif.eu clinical assessments were available for all subjects including mini mental state examination (MMSE) and Apolipoprotein E (APOE) genotype data (van der Flier et al., 2014).

GE067-005 Study
Plasma samples from MCI participants enrolled in the GE067-005 study 2 (Wolk et al., 2018) were included in the independent validation study (n = 173, Table 2). This included amnestic MCI subjects who converted to probable AD within a 3year time frame (MCI Converter, n = 52) and subjects who remained MCI over this time period (MCI non-converter, n = 121). Participants were assessed clinically every 6 months until conversion, dropout or completion of the 3-year follow-up.
[ 18 F]Flutemetamol PET amyloid imaging data were available for all subjects, who were categorized as having either an "abnormal" PET amyloid scan (positive for the presence of amyloid, n = 68) or a "normal" PET amyloid scan (negative for the presence of amyloid, n = 105). Visual image interpretation of the PET scan was performed by five independent trained readers who were blinded to the participants' clinical history and diagnosis. Scan interpretation was based on the majority classification from these five independent readers. The details of image interpretation are published elsewhere (Wolk et al., 2018). General clinical and demographic data were also available for all subjects, including APOE genotype, body mass index (BMI), prevalence of diabetes and years of education. Whole blood was collected in EDTA tubes and processed for plasma (Supplementary Methods Section 1).

Discovery Phase: Gel LC-MS/Mass Spectrometry Based Proteomics
Discovery proteomics was carried out by gel LC-MS/Mass Spectrometry (LC-MS/MS) coupled with tandem mass tagging (TMT). The data acquisition and preprocessing pipelines are described in detail elsewhere (Ashton et al., 2015). In brief, the proteomics workflow consisted of plasma samples labeled in a TMT6plex configuration with TMT126 -TMT130 (Thermo Fisher Scientific), and the study reference with TMT131. The tagged samples within each TMT6plex were pooled and then separated by one-dimensional gel electrophoresis. Ten equal fractions were excised from each gel and the gel pieces were destained, tryptically digested and peptides extracted. LC-MS/MS acquisition was performed using the Orbitrap Velos Pro instrumentation (Thermo Fisher Scientific) coupled to a Proxeon EASY-nLC II system (Thermo Fisher Scientific). The LC-MS/MS raw Excalibur data files (Thermo Fisher Scientific) were processed by Proteome discover (Thermo Fisher Scientific, version 1.3) using Mascot 3 (version 2.3) to determine peptide identifications. Processing of the Mascot output data files was performed in R, and included median ratio normalization, calculation of peptide ratios and subsequent protein scores using median and mean roll-up methods of peptide ratios as described (Ashton et al., 2015). Where the same protein was observed by electrophoresis in multiple different molecular weight regions of the one-dimensional gel, these observations were treated as separate protein molecular weight isoforms in subsequent analysis. Analysis of each protein molecular weight isoform was performed on the data produced from both the mean and median protein roll up methods and age, sex, APOE ε4 allele presence and sample storage duration were included as covariates in regression models.

Replication Phase: Immunocapture Assay Based Proteomics
Technical replication was performed on proteins in the same 50 samples that underwent LC-MS/MS based proteomics with an additional 50 samples selected from the Amsterdam Dementia Cohort ( Table 1). The criteria for protein selection for replication included; (1) nominal statistical significance, (2) quantification by ≥2 peptides, and (3) detection by 1D gel electrophoresis in the molecular weight range of the native protein. Eight proteins were selected for replication; ficolin-2 (FCN2), apolipoprotein C-IV (ApoC-IV), c4-binding protein alpha chain (C4BPA), fibrinogen β chain (FGB), Ig gamma-3 chain C region (IGHG3), apolipoprotein A-I (ApoA-I), serotransferrin and apolipoprotein A-IV (ApoA-IV). These proteins included novel candidates and proteins that had previously been identified in AD based biomarker studies (Yu et al., 2003;Liu et al., 2006;Ijsselstijn et al., 2011;Hu et al., 2012;Song et al., 2012;Hye et al., 2014;Ashton et al., 2015;Westwood et al., 2016). Proteins were measured by commercially available singleanalyte enzyme-linked immunosorbent assays (ELISAs) according to the manufacturer's instructions (Supplementary Table 1). ELISA absorbance at 450 nm was detected using a microplate reader (PHERAstar FS, BMG, LABTECH). The background corrected (570 nm) absorbance data were exported into Sigma Plot (Systat Software; version 12.5) for estimation of protein concentrations using a 5-parameter logistic fit. Intraassay and inter-assay variability was assessed by calculation of the percentage coefficient of variation (% CV). Average intra-assay CVs were calculated using the duplicate measures for each sample and average inter-assay CV was calculated using the measurement of a control sample, which was analyzed on each plate. All protein concentration values were log 10 transformed prior to statistical analysis and age, sex, APOE ε4 allele presence,

Independent Validation Phase
Proteins were selected from the replication phase in the Amsterdam Dementia Cohort, based upon nominal statistical significance, for further extension studies using the GE067-005 cohort. In addition, other proteins previously identified as markers related to AD pathology and progression, including candidate markers of MCI conversion to AD (Hye et al., 2014), rates of cognitive decline and disease severity and brain atrophy (Hye et al., 2014;Sattlecker et al., 2014) and [ 11 C] Pittsburgh compound B (PiB) PET amyloid (Kiddle et al., 2012;Ashton et al., 2015;Voyle et al., 2015;Westwood et al., 2016) were also selected (Supplementary Table 2). In total 37 targets (including alpha-2-macroglobulin measured by two different assays) were measured in the GE067-005 study cohort (Supplementary Table 2). Of these, twenty-six proteins were measured by multiplex bead assays across 7 MagPlex MAP panels using the Luminex 200 instrument (Supplementary Table 1). Median fluorescent intensity (MFI) was measured using xPONENT 3.1 (Luminex Corporation). Eleven proteins were quantified by commercially available ELISAs as described earlier (Supplementary Table 1). Intra-assay and inter-assay variability was calculated as described earlier. All protein concentration values were log 10 transformed prior to statistical analysis in order to achieve a normal distribution and the following covariates were included in regression models: age, sex, APOE ε4 allele presence, BMI, diabetes, center, batch variation and sample storage duration.
Proteins were also selected from our earlier discovery studies for independent validation in the EMIF 500 study cohort. In total 21 proteins were measured by multiplex bead assays using the Luminex 200 instrument and by commercially available ELISAs as described earlier (Supplementary Table 1). As plasma proteins were measured in singular, the data were assessed for outliers on a protein-by-protein basis. Extreme outliers were defined as values falling outside three times the inter-quartile range of all samples measured and were removed from the dataset prior to subsequent statistical analysis. The following covariates were included in regression models: age, sex, APOE ε4 allele presence, center, and batch variation.

Univariate Statistical Analysis
All statistical analyses were performed in R (version 1.3.3). For both discovery and replication phase studies, proteomic data were analyzed using the Mann Whitney U-test and logistic regression to compare dichotomized high and low CSF pathology groups. The association between proteomic data and continuous measures of CSF Tau/Aβ 42 were assessed by Spearman rank correlation and linear regression. Validation phase studies were analyzed by both linear and logistic regression to assess the relationship between the proteomic data and pathology endophenotypes when accounting for covariates. Benjamini-Hochberg q values were calculated as a multiple testing correction for all analyses.

Classification Analysis for the Prediction of Amyloid Status
A generalized linear regression model (GLM) was used to adjust the data for covariates. Machine learning (i.e., classification) was performed in R on the GLM adjusted data. The minimal protein set with optimal AUC characteristics for prediction of amyloid status were assessed by Support Vector Machines combined with LASSO, as a variable selection method, and performance was assessed using 100 repeats of 10-fold cross validation.
Pathway Analysis: Using LC-MS/MS Proteomic Data Differential regulation of pathways associated with CSF Tau/Aβ 42 pathology were identified through gene enrichment analysis on the results of the LC-MS/MS analysis. For this analysis only protein molecular weight isoforms detected in 80% or more of the TMT6plexs and for which gene IDs were available were included. The p-values derived from the univariate analysis of the protein molecular weight isoforms were used to estimate a single p-value per protein by applying Fisher's method as described here. For each protein the sum of logarithms of the p-values of all the molecular weight isoforms were calculated. The chi-squared distribution was then used to derive the protein p-value from this sum of logarithms.
We next expanded the analysis to include proteins that directly interact with the proteins detected in 80% or more of the TMT6plexs. Firstly, the p-values were log transformed and then the proteins known to interact with these proteins were identified by STRING (Szklarczyk et al., 2015). Only the most stringent protein-protein interactions (that have direct experimental evidence) were considered, with a confidence level >0.4. For each STRING protein, the average of the normalized p-values of the proteins that directly interact with them was then calculated.

RESULTS
In order to identify a biomarker that might predict brain amyloid pathology we performed a proteomic study in three phases in three independent sample collections. First, we used mass-spectrometry based discovery in the Amsterdam Dementia Cohort including a technical replication phase using immunocapture. Then we used immunocapture to replicate these and previous findings in a cohort derived from a clinical trial of a radiotracer for detection of brain amyloid (GE067-005 study) and finally an independent validation phase study, again using immunocapture methods in an independent set of samples collated from three separate cohorts sourced using the EMIF. The study workflow is illustrated in Figure 1. In order to further assess these discovery phase observations, we performed a combined technical and clinico-biological replication study for eight proteins. Intra-assay CV was <13% for all assays, and batch variation was included as a covariate in regression analysis to control for any inter-assay differences.

Pathway Analysis: LC-MS/MS Proteomics in the Amsterdam Dementia Cohort
In order to explore the potential biological significance of these findings, we then performed a pathway analysis. The total summed p-values were calculated for 77 proteins from 233 protein MW isoform p-values for inclusion in gene enrichment analysis. Using STRING this list was expanded to include proteins for which there is direct experimental evidence of an interaction, giving a total of 769 proteins. When comparing this protein set to the Reactome database, three pathways were significant after FDR correction for multiple comparisons; HDLmediated lipid transport (q = 0.010), lipoprotein metabolism (q = 0.035), and lipid digestion, mobilization, and transport (q = 0.035) ( Table 5). Comparison to the DisGeNet database FIGURE 1 | Schematic diagram illustrating the experimental work flow of the present study for the discovery, replication, and validation of plasma proteins associated with brain pathology. AD, Alzheimer's disease; MCI, mild cognitive impairment; SCD, subjective cognitive decline; CSF, cerebrospinal fluid; TMT, tandem mass tagging; LC-MS/MS, liquid chromatography tandem mass spectrometry; Aβ, amyloid-beta. revealed three diseases were significant post FDR correction; hypercholesterolemia, familial (q = 0.034), brain diseases (q = 0.034), and metabolic bone disorder (q = 0.034) ( Table 5).

Independent Validation Phase: Immunocapture Assay Based Proteomics in the GE067-005 Study Cohort
Using immunocapture, we then performed a validation phase study in an independent cohort using a different end point measure of AD pathology ([ 18 F]-Flutemetamol PET amyloid). From 37 proteins measured, three proteins were excluded from analysis due to technical failure of the assays (Soluble receptor for advanced glycation end products, Complement C4-B and IGHG3). Intra-assay CV was <12% for all other assays, and batch variation was included as a covariate in regression analysis to control for any inter-assay differences.

Classification Analysis for the Prediction of Amyloid Status in the GE067-005 Cohort ([ 18 F]-Flutemetamol PET Group)
After excluding subjects with missing data, the classification analysis was performed on 78 GE067-005 subjects. These subjects were split between amyloid-positive and amyloidnegative groups as follows: negative [ 18 F]-Flutemetamol PET, n = 44; positive [ 18 F]-Flutemetamol PET, n = 34 as measured by visual inspection according to the approved methods for image interpretation. The minimal protein set with optimal AUC characteristics for prediction of amyloid group was assessed by SVM combined with LASSO and performance was assessed using 100 repeats of 10-fold cross validation. Two proteins (Aβ40 and ApoC4) formed the minimal protein panel for classifying amyloid positivity and achieved modest accuracy (AUC = 0.69  (Figure 3), PPV = 0.52, NPV = 0.51, sensitivity = 0.57, and specificity = 0.44).

Independent Validation Phase in the EMIF 500 Study Cohort
Univariate Statistical Analysis of Plasma Proteins in Relation to CSF Aβ 42 In order to explore the relationship between these proteins and disease we obtained samples from three cohorts sourced through EMIF and including participants with MCI and AD as well as normal controls. We first considered the relationship between plasma proteins and CSF Aβ 42 pathology in all AD, MCI and CTL subjects combined. In the high Aβ 42 pathology group a reduction in A1AT (β = −0.248, P < 0.05, Supplementary Table 9) and an increase in clusterin (β = 0.278, P < 0.05, Supplementary Table 9) were observed by logistic regression. There was also a trend toward increased FCN2 in the high Aβ 42 pathology group (β = 0.216, P = 0.055, Supplementary Table 9). A1AT was also nominally significantly associated with CSF Aβ 42 pathology by linear regression (β = 31.690, P < 0.05, Supplementary Table 9). The relationship between the 21 plasma proteins and CSF Aβ 42 were then examined within each of the separate diagnostic groups (n = 162 AD, n = 235 MCI, n = 97 CTL). In the AD group, only RANTES was associated with CSF Aβ 42 , as shown by both logistic regression (β = −1.192, P < 0.01, Supplementary  Supplementary Table 11). In the CTL group, a significant decrease in CFHR-1 and FGG in association with high CSF Aβ 42 by logistic regression (β = −1.435, P < 0.05 and β = −1.003, P < 0.05, respectively, Supplementary Table 12) was observed, whilst A1AT was nominally significantly associated with CSF Aβ 42 by both logistic and linear regression (β = −1.186, P < 0.05 and β = 77.990, P < 0.01, respectively, Supplementary  Table 12).
A trend was also observed for increased FCN2 with high CSF Aβ 42 in the MCI group, by both logistic and linear regression (β = 0.315, P = 0.058 and β = −42.573, P = 0.051, respectively, Supplementary Table 11), replicating the association of FCN2 with CSF Tau/Aβ 42 in the Amsterdam Dementia Cohort and with PET amyloid in MCI in the GE067005 study.

Classification Analysis for the Prediction of Amyloid Status in the EMIF500 Cohort (CSF Aβ 42 Group)
After excluding subjects with missing data, the classification analysis was performed on 96 subjects from the EMIF 500 study. These subjects were roughly evenly split between high and low amyloid groups (low CSF Aβ 42, n = 42; high CSF Aβ 42, n = 54). The minimal protein set with optimal AUC characteristics for prediction of CSF Aβ 42 amyloid was assessed by SVM combined with LASSO and performance was assessed using 100 repeats of 10-fold cross validation. Five proteins (A1AT HAGP, Ig Kappa Chain C region, PEDF, and RANTES) formed the minimal protein panel for classifying amyloid positivity and achieved modest accuracy (AUC = 0.67 (Figure 3), PPV = 0.47, NPV = 0.47, sensitivity = 0.55, and specificity = 0.41).

DISCUSSION
In this study, we describe the discovery, replication and validation of plasma protein biomarkers relating to AD pathology and progression using an amyloid and tau pathology endophenotype based design. The success of recent clinical trials of diseasemodifying therapies targeting Aβ have been hampered by lack of brain amyloid pathology in clinically diagnosed AD participants (Salloway et al., 2014) and in future it is likely that many trials of potential disease modifying agents will utilize biomarkers such as CSF amyloid and tau and amyloid PET measures of pathology in participant selection. However, such markers are relatively invasive and demanding of resource and participant commitment. Identifying participants with pathology using such methods in the preclinical and prodromal phase of disease is difficult and results in high screen failure in clinical trials. The cost of such screen failure is high, often prohibitively so. Even a modest reduction in screen failure rates would represent a major advance, certainly reducing costs and potentially accelerating speed of recruitment to such trials of disease modifying agents. Blood based biomarkers that can detect individuals likely to harbor AD pathology may therefore provide a cost-effective aid in triaging potential trial participants for PET or CSF based tests, helping to reduce screen failure, patient burden and costs. Moreover, a minimally invasive and cost-effective biomarker of AD pathology may help facilitate trials where repeated testing and monitoring of pathology is required.
We have first used LC-MS/MS to identify twenty-five plasma protein biomarkers of CSF Tau/Aβ 42 , and then replicated by ELISA the nominal association with CSF Tau/Aβ 42 of three of the eight proteins subjected to further analysis; FCN2, ApoC-IV and FGB chain. Lack of ELISA-based replication of the remaining five proteins may be in part due to key platform differences. Mass spectrometry involves the analysis of peptides resulting from denatured protein, whilst ELISA measures intact protein, or more precisely the region of the intact protein where the epitope recognized by the antibody resides. It is possible therefore that protein region differences in turnover and abundance and other post-translational modifications including phosphorylation, glycosylation, and other changes may impact these results. Further replication studies examining protein region-specific abundance would therefore be needed to confirm the association of the proteins with CSF Tau/Aβ 42 .
Pathway analysis of the LC-MS/MS proteomic data revealed the significant differential regulation of a number of lipidrelated pathways in association with CSF Tau/Aβ 42 pathology. This is in line with the high representation of apolipoproteins we observed significantly associated with CSF measures of  pathology. Furthermore, these findings are in agreement with the alteration to brain lipid metabolism observed in AD (Bales, 2010) and with pathway analysis of genomic association data (Jones et al., 2010). The proteins significantly associated with CSF measures of AD pathology were also shown to be associated with three disease groups using informatics approaches; familial hypercholesterolemia, brain diseases and metabolic bone disorder. These findings support the previously documented association of cholesterol and hypercholesterolemia with AD pathology (Refolo et al., 2000;Gibson Wood et al., 2003) and the association of osteoporosis with risk of developing AD (Zhou et al., 2014). The most striking finding from all three phases of the current study -discovery, replication and validation -is the nominal association of FCN2 with AD pathology measures. This finding was consistent across measures used to assess pathology (CSF and PET) and using independent or orthogonal protein assay technologies (mass spectrometry and ELISA). Moreover, the association of FCN2 with CSF Aβ 42 but not CSF Tau or pTau in the Amsterdam Dementia Cohort, suggests that this is driven by an association with amyloid pathology, in line with the replication results we report here from the GE067-005 PET amyloid study. However, in the EMIF500 cohort we do see a significant association of FCN2 with pTau. However, this association is only found in AD subjects, whereas earlier in the disease course we see a trend toward a significant relationship between FCN2 and CSF Aβ 42 in the MCI subjects from this cohort. Given the studies that suggest CSF biomarkers are more sensitive to early change than PET biomarkers (Toledo et al., 2015) it might be that change in FCN2 measures could follow change in amyloid pathology particularly in the preclinical and prodromal stages of the disease. Such a hypothesis emphasizes the need for longitudinal studies of biomarkers in AD.
To our knowledge this is the first study to identify and validate FCN2 as a biomarker of AD pathology. Ficolins and mannose-binding lectins (MBL) are both activators of the lectin complement pathway (Fujita et al., 2004) and CSF MBL levels have been shown to be reduced in AD (Lanzrein et al., 1998). Another member of the ficolin family; ficolin-3 (FCN3), which shares approximately 50% amino acid sequence homology with FCN2 (Kilpatrick and Chalmers, 2012), is also associated with insulin resistance and diabetes (Li et al., 2008;Chen et al., 2012;Zhang et al., 2016). The association of the ficolin family with diabetes is interesting given that the relationship between diabetes and AD is well documented (Janson et al., 2004;Talbot et al., 2012).
Whilst Aβ40 and AopC4 were included in minimal protein panel with optimal accuracy for classifying high [ 18 F]-flutemetamol PET amyloid from low [ 18 F]-flutemetamol PET amyloid subjects, the accuracy of the 6-protein classifier was only modest (AUC = 0.69). Given that the proteins measured in this study included those that were previously identified as markers of other AD related measures [including cognitive decline, CSF Tau/Aβ and brain atrophy (Hye et al., 2014;Sattlecker et al., 2014)] they may not necessarily be specific to the load of fibrillised amyloid deposits in brain. Moreover, changes in CSF Aβ and tau, PET amyloid, MRI measures of brain atrophy and clinical measures of decline are all detectable at different stages of disease (Jack et al., 2013). We would therefore not necessarily expect all of these proteins to be related to amyloid at the MCI stage. In order to evaluate the biomarker utility of these proteins further, testing in larger independent cohorts with measures relating to various aspects of disease pathology and stage would be useful.
In the GE067-005 study cohort associations of Apo(a), ApoA-I, Ceruloplasmin and PPY with MCI conversion to AD were observed, and increased levels of ApoA-I were also tending toward an association with high [ 18 F]-flutemetamol PET. All four proteins have previously been suggested as putative blood markers related to AD. For example, Apo(a) has previously been shown to be increased with high PiB PET amyloid (Ashton et al., 2015). Whilst increased plasma ApoA-1 in association with cognitive decline Song et al., 2012) and brain atrophy (Hye et al., 2014) have been observed. Decreased ApoA-1 levels in AD versus controls (Liu et al., 2006;Shih et al., 2014) and in association with increased risk of clinical progression to MCI and AD (Slot et al., 2017) and PiB PET amyloid (Ashton et al., 2015;Westwood et al., 2016) have also been shown. Moreover, ApoA-1 has been implicated in amyloid pathology, binding Aβ and protecting hippocampal neuronal cultures from Aβ-induced neurodegeneration (Paula-Lima et al., 2009).
In this study we use a range of proteomics approaches, building on previous studies from our group and others that have indicated a protein signature in blood that differentiates disease from non-disease and measures correlates with 'endophenotypes' of that disease state, as previously reviewed (Thambisetty and Lovestone, 2010;Baird et al., 2015;Shi et al., 2017). Others have taken a more direct route to blood biomarkers of AD, seeking to measure amyloid directly. Early studies using immunocapture were largely unsuccessful in identifying a marker methodology that predicted brain amyloid and was stable across studies and disease phases. More recently, using mass spectrometry and immunocapture with novel antibodies studies have reported excellent power in predicting brain amyloid load (Pesini et al., 2012;Ovod et al., 2017;Nakamura et al., 2018). However, whilst these studies show enormous potential, in some cases the methods are not yet suitable for application at scale, in large multi-site studies, require bespoke sample collection protocols and are likely to be resource intensive. These studies have however, clearly confirmed the findings of early biomarker studies that there is a signature in blood that reflects disease pathology. The use of multiplexed immunocapture as we describe here is a lowcost technology, readily applicable in the context of very large multi-center trials and therefore may have real world utility alongside any other approach to blood based biomarkers being developed.
In conclusion, in this study we have identified a number of proteins that are associated with CSF Aβ 42 /tau pathology. We identified and replicated FCN2 as a novel biomarker of both CSF and PET measures of AD pathology in an independent cohort and by independent proteomic platforms. Furthermore, we find an association of C4 with [ 18 F]-flutemetamol PET amyloid and four proteins; Apo(a), ApoA-I, Ceruloplasmin and PPY with MCI conversion to AD, building upon previous findings of their relationship with AD and amyloid pathology. These results would suggest a biologically relevant role for these proteins in AD. Further analysis of the potential of these proteins as a biomarker of AD pathology and progression, in combination with other proteins or multimodal measures, and in larger independent cohorts will be essential. Such a blood-based biomarker could be of value as a triaging tool for PET and CSF based tests and hence aid in recruitment to clinical trials of disease modifying treatments.

DATA AVAILABILITY STATEMENT
The datasets for this manuscript will be made available by the authors to qualified researchers upon reasonable request.
Requests to access the datasets should be directed to the corresponding author.

ETHICS STATEMENT
This study was carried out in accordance with the recommendations of METC of VU University Medical Center for the Amsterdam Dementia Cohort, and the medical ethics committee at each site for the GE067-005 and EMIF 500 cohorts (Supplementary Tables 21 and 22, respectively) with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the METC of VU University Medical Center for the Amsterdam Dementia Cohort and the medical ethics medical ethics committee at each site for the GE067-005 and EMIF 500 cohorts (Supplementary Table 21 and 22, respectively).

AUTHOR CONTRIBUTIONS
AB, SL, AH, CB, MW, CTH, SB, VN, PS, and CT contributed to study design. BN, MZ, KD, SB, VN, WvdF, DG, LP, AL, PV, PS, and CT contributed to sample selection and provision. AB, SW, AH, SA, and NA were responsible for data acquisition. AB, SW, AH, SK, AN-H, BL, and DN carried out data analysis and interpretation. SW and AB drafted the manuscript. All authors revised the manuscript.