Multicenter Analytical Validation of Aβ40 Immunoassays

Background Before implementation in clinical practice, biomarker assays need to be thoroughly analytically validated. There is currently a strong interest in implementation of the ratio of amyloid-β peptide 1-42 and 1-40 (Aβ42/Aβ40) in clinical routine. Therefore, in this study, we compared the analytical performance of six assays detecting Aβ40 in cerebrospinal fluid (CSF) in six laboratories according to a recently standard operating procedure (SOP) developed for implementation of ELISA assays for clinical routine. Methods Aβ40 assays of six vendors were validated in up to three centers per assay according to recently proposed international consensus validation protocols. The performance parameters included sensitivity, precision, dilutional linearity, recovery, and parallelism. Inter-laboratory variation was determined using a set of 20 CSF samples. In addition, test results were used to critically evaluate the SOPs that were used to validate the assays. Results Most performance parameters of the different Aβ40 assays were similar between labs and within the predefined acceptance criteria. The only exceptions were the out-of-range results of recovery for the majority of experiments and of parallelism by three laboratories. Additionally, experiments to define the dilutional linearity and hook-effect were not executed correctly in part of the centers. The inter-laboratory variation showed acceptable low levels for all assays. Absolute concentrations measured by the assays varied by a factor up to 4.7 for the extremes. Conclusion All validated Aβ40 assays appeared to be of good technical quality and performed generally well according to predefined criteria. A novel version of the validation SOP is developed based on these findings, to further facilitate implementation of novel immunoassays in clinical practice.

inTrODUcTiOn Cerebrospinal fluid (CSF) biomarkers have proven to be helpful in early diagnosis of Alzheimer's disease (AD). They reflect preclinical early events in AD by as many as 10-15 years before clinical symptoms occur (1). Therefore, CSF biomarkers have been incorporated in the diagnostic criteria of AD (2)(3)(4). The CSF biomarkers most prominently used in Alzheimer's diagnostics are amyloid-β peptide 1-42 (Aβ42), total tau protein (t-tau), and tau phosphorylated at threonine 181 (p-tau), because they reflect the pathological hallmarks of AD (5,6). A decrease in CSF Aβ42 levels probably reflects the extent of amyloid-β accumulation in the formation of plaques in the brain, while an increase in CSF t-tau and p-tau levels likely reflects neuronal degeneration and intracellular tangle formation. The ratio of Aβ42/Aβ40 is helpful to correct levels of Aβ42 for the total amyloid production (7) and provide better assessment of the presence of amyloid pathology in case of for instance cerebral amyloid angiopathy (8). For example, recent studies demonstrated that the CSF Aβ42/Aβ40 ratio significantly improved the diagnostic performance compared to CSF Aβ42 alone in distinguishing controls or non-AD patients from mild cognitive impairment or AD patients (9)(10)(11)(12)(13). The concordance with amyloid positron emission tomography increased when the CSF Aβ42/Aβ40 ratio was used as compared to CSF Aβ42 alone (14). Therefore, there is currently a strong interest to implement the Aβ42/Aβ40 ratio into clinical practice. However, before implementation, it is important to assess the quality of the Aβ40 assays and validation is an essential step in implementation of novel assays in clinical routine. In this study, we evaluated the performance parameters of six Aβ40 assays currently commercially available and one in-house assay according to an international consensus protocol following the ISO 15189 guidelines (15,16). The results will provide insight into the real-life use of this standard operating procedure (SOP) for implementation of novel immunoassays in clinical routine. Moreover, we determined the inter-laboratory variation and compared the quality and outcomes of the ELISA assays.

csF samples
The inter-laboratory variation was tested by using 20 CSF samples centrally distributed in aliquots by one center. The samples were shipped on dry ice and upon receipt stored at −80°C in all laboratories. These samples were patient samples from an outpatient clinic, mostly patients who came for dementia diagnostic screening. Participants or their legal representatives gave informed consent. The study conforms with The Code of Ethics of the World Medical Association (Declaration of Helsinki) (17). Furthermore, for all the other parameters, samples were used that were available in each of the laboratories participating in this study. The samples were diluted according to protocols provided by the manufacturers ( Table 1).

Participants and assay Kits
Six laboratories participated in this study and validated seven Aβ40 assays ( Table 1). Every commercial assay was validated by two or three experienced laboratories, which collaborated within the EU Joint Programme Neurodegenerative disease (JPND) BIOMARKAPD consortium. Additionally, the inhouse Aβ40 assay was only tested by the developers. The following assays for quantification of Aβ40  (18), VUmc, Amsterdam, The Netherlands]. Aβ40 assays of the same production batches from each vendor were directly distributed to the participating laboratories. The assays were performed manually according to the manufacturers' protocols, some laboratories used plate washers for the washing steps. The laboratories tested all performance parameters according to the BIOMARKAPD SOPs (15,16). Calculations of the analytical performance parameters were done using the corresponding Data Sheets S2 and S3 in Supplementary Material (15).

sensitivity
For the determination of the LLOQs, 16 blank samples were measured in one plate for each assay. The calibration curves were calculated using a four parameter logistic curve fit for all assays, which gave the optimal fit.

Precision
Intra-assay variation (repeatability) was determined by analysis of samples (n = 15) in four replicates within one plate. Some deviations were made from the original protocol: in Lab #6, n = 3 samples instead of n = 15 samples were analyzed due to technical reasons. In Lab #2, n = 14 samples for the MSD assay and, in Lab #5, n = 16 samples for the Invitrogen assay were tested. The mean coefficient of variation (%CV) was calculated by averaging the CVs of all tested samples. A %CV <20% was defined as acceptable.
Inter-assay variation (intermediate precision) was measured to determine the variation of analyses between different days. To quantify inter-assay variation, samples with low, medium, and high concentrations were selected from the samples used for the intra-assay variation [quality control (QC) low, QC medium and QC high]. These samples were measured in duplicate in four different plates at identical positions in the assay plates on four different days. In Lab #2, three different plates were tested on three different days for the Invitrogen enzyme-linked immunosorbent  assay (ELISA). The mean %CV was calculated for all samples. A %CV <20% was defined as acceptable. Intra-plate variation was determined to explore the influence of different positions within a plate on the measured concentrations (stability of the plate). QC low, QC medium, and QC high samples were measured in four replicates at different positions of the plate (columns 3/4 vs. columns 11/12). The mean %CV was calculated for all samples. A %CV <20% was defined as acceptable.

Dilutional linearity
Three different CSF samples were used to perform the dilution linearity experiments. In Table S1 in Supplementary Material, the spiked Aβ40 calibrator concentration and the dilution factors used to serially dilute the samples per laboratory and assay are summarized. These samples were analyzed in duplicate. The dilutional linearity was calculated and expressed as follows: %Linearity = observed C * dilution factor previous observed C * pr ( ) ( e evious dilution factor * 100 ) C = concentration(pg/mL) A linearity between 80 and 120% was defined as acceptable.
To compare dilutional linearity for the kits from the various vendors and analyses by the different labs, we determined the length of dilution range in which dilutional linearity was within the acceptable range of 80-120%. This dilution range length was calculated using the following formula: Dilution range length = highest dilution factor in which the curve is l linear lowest dilution factor in which the curve is still linear Despite that protocols were distributed among the participating laboratories, some of them deviated from this protocol in the execution of the experiments to assess the dilutional linearity, since these laboratories did not spike the calibrator into the CSF samples. In addition, different interpretations of the distributed protocols may have resulted in the large differences in spiked Aβ40 calibrator concentrations by the various laboratories to assess dilutional linearity (Table S1 in Supplementary Material).
recovery Five different CSF samples, measured in duplicate, were diluted according to manufacturers' protocols and spiked with recombinant Aβ40 calibrator at three different levels. An overview of the spike low, medium, and high concentrations for each laboratory in every assay can be found in Table S2

Parallelism
Five different CSF samples, with high endogenous protein concentrations, were serially diluted. Both reciprocal relative dilution factor and OD450 absorbance signals of the samples and calibrator were log-transformed to be able to use linear regression to calculate the slopes of the sample and calibrator curves. The slope of the linear parts of the log-log transformed calibrator and sample dilution series were compared to determine the degree of parallelism by calculating the "in range%" using the following formula: in range% = slope of sample dilution series slope of calibration curve e *100 A calculated in range% between 80 and 120% was defined as acceptable.

statistical analysis
Bland-Altman plots were drawn to define if differences in results between assays were dependent on the concentration and to define the % deviation of each assay from the overall mean results for all clinical CSF samples. Correlation coefficients for comparison of results between vendors and between labs were performed by Spearman's ρ. A p-value of 0.05 was considered significant.
resUlTs sensitivity An overview of the mean LLOQs per assay kit and for each laboratory per assay are presented in Table 2 and Table S3 in Supplementary Material, respectively. The mean LLOQs of the MSD and Fujirebio assays showed the largest variation between laboratories (CV = 99%), while the Euroimmun assay showed the smallest variation between laboratories (CV = 8%).

Precision
The mean intra-assay CVs in all laboratories were below the predefined value of 20% for all assays ( Table 2). However, it should be noted that the CVs of some individual samples were above this threshold for specific tests in single laboratories ( Figure S1A in Supplementary Material).
The mean inter-assay CVs for all assays, but not for all individual laboratories, were below the predefined value of 20% (Table 2; Figure S1B in Supplementary Material).
The mean intra-plate CVs for all assays in all laboratories were below the predefined value of 20% (Table 2; Figure S1C in Supplementary Material). Only one individual sample in one laboratory showed an intra-plate CV above 20% ( Figure S1C in Supplementary Material).

Dilutional linearity and hook-effect
No correct dilutional linearity data were obtained for the Fujirebio assay, Lab #2 and Lab #5. The remaining results are shown in Table 2 (overview) and Table S4 in Supplementary Material. The MSD assay showed an acceptable dilutional linearity between the predefined ranges of 80-120% over the longest dilution range, whereas this dilution range with acceptable linearity was the shortest for the Euroimmun and VUmc assays. Of note, the dilution range lengths with acceptable dilutional linearity may, in some cases, be longer than reported in Table S4 in Supplementary Material (as indicated by footnotes), since in these cases the highest or lowest dilution factor in which the curve was linear, corresponded to the highest or lowest dilution factor tested by the laboratory.
A hook effect, i.e., suppression of signal at concentrations above the upper limit of quantification, was observed for one sample tested in the Invitrogen ELISA by Lab #2 (between dilution factor 50 and 250; data not shown). Other samples tested by this and other laboratories neither showed a hook effect for the Invitrogen ELISA nor for any other assay (data not shown).

recovery
The results of the recovery (%R) experiments for all assays of different vendors are detailed in Table S5 in Supplementary Material. A large variation in recovery results between laboratories was observed for several assays. As a result, acceptable recovery within the predefined range of 80-120% for all three laboratories was only obtained for the IBL assay. In addition, acceptable recovery was obtained by individual laboratories for the MSD (1/3 laboratories), Invitrogen (1/3 laboratories), Novex (1/2 laboratories), Euroimmun (1/3 laboratories), and Fujirebio (1/3 laboratories) assays. Due to this large inter-laboratory variation, an overview of recovery results per vendor is not included in Table 2.

Parallelism
Parallelism results are displayed in Table 2 (overview) and Table  S6 in Supplementary Material. The mean percentage parallelism (%P) results for the assays of MSD, IBL, Novex, Euroimmun, and VUmc were within the predefined ranges of 80-120% (Table  S6 in Supplementary Material), respectively: MSD (92%), IBL (100%), Novex (95%), Euroimmun (98%), and VUmc (93%). A large inter-laboratory variation for %P was observed using the Invitrogen and Fujirebio assays. For both these assays, the %P was within the predefined range for two out of three laboratories, while one laboratory per assay showed a strongly deviating result.

Variation in clinical sample concentrations between laboratories
The intra-assay variation in Aβ40 concentrations was below 10% for all CSF samples (n = 20) in all laboratories for all different vendors, well below the predefined CV of 20% ( Table 2). Per vendor, a strong correlation (ρ > 0.8) between sample Aβ40 concentrations assayed by the different laboratories was observed, except for the results by Lab #1 and Lab #2 that showed a lower correlation coefficient for the MSD (ρ = 0.77) and Fujirebio (ρ = 0.67) assays (Figure 1). The best mean correlation coefficient between laboratories was measured for Aβ40 concentrations using the IBL kit (ρ = 0.93) ( Table 2). In terms of absolute concentrations, results for the Novex assay had the lowest similarity in mean values of the clinical samples between laboratories (Figure 1). The mean Aβ40 concentrations of all CSF samples varied up from ~2,000 pg/mL for the Novex assay at the one extreme to ~9,500 pg/mL for the IBL assay at the other end. The %difference in Aβ40 concentrations per assay compared to the mean concentration of all assays per sample showed that the highest deviation was obtained by the IBL and Novex assays (mean deviation of 65 and −64%, respectively) (Bland-Altman plots in Figure 2). The lowest variation (within 20% deviation from the mean Aβ40 concentration of all assays per sample) was observed for the MSD assay.

DiscUssiOn
The CSF Aβ42 levels reflect the extent of amyloid-β accumulation in the form of plaques in the brain. However, it has been shown in previous studies that the concentration of Aβ42 not only depends on the presence or absence of plaques but also may on the total turnover of Aβ peptides in the brain, resulting in different total Aβ concentrations in the CSF (11,(19)(20)(21). Besides, recent studies indicated that absorption of Aβ42 to collection and storage tubes can be overcome by calculation of the ratio of Aβ42/Aβ40, since both peptides will be absorbed to a similar degree (22,23). The ratio of Aβ42/Aβ40 is helpful to correct levels of Aβ42 for the total amyloid production. In this study, we validated Aβ40 assays from six different vendors and one in-house developed assay for future use in clinical practice.
The assignment of the different assays to the laboratories was performed first based on the availability of the MSD and Luminex systems in the laboratories and second as random as possible, to overcome that two laboratories measured the same set of assays as a source of center bias. We have chosen to have the validation performed by up to three centers per assays, which we expect to provide sufficient insight into real-life performance of the assays and deemed to be cost-effective. We reasoned that testing all assays in a larger number of laboratories would not lead to dramatically different results for the assays than presented in this study.
The performance parameters of all assays were within the predefined ranges, except for the recovery results, for which no definite conclusions could be drawn due to large inter-laboratory variation, and the parallelism results of Invitrogen and Fujirebio assays, for each of which one out of three laboratories obtained results out of the predefined range. The variation between laboratories in the Aβ40 concentrations in the clinical samples was small in the duplicate measurements for each assay in every laboratory (%CV <10%). Furthermore, high mean correlation coefficients (ρ > 0.8) per vendor between sample concentrations were found for all assays. In terms of variation in absolute values between vendors, the least deviation was found for the MSD assay compared to the mean Aβ40 concentration of all assays per sample. The LLOQs were slightly higher compared to the kit specifications provided by most vendors, but well below the ranges found in clinical samples. The IBL and Euroimmun assays are the exceptions, because the LLOQ was lower in our results compared to the kit specifications (data not shown). It should be noted, however, that the LLOQs are calculated differently by the vendors (mean + 2 to 3*SD), while we did use a more strict method of calculation (mean + 10*SD). Although the mean precision data per assay were well below the predefined CV of 20% for all assays, if we would consider a more strict maximum of a mean CV of all laboratories below 15%, the results of Invitrogen and Novex assays are out of range.
The dilutional linearity description in the SOP gave room for many different interpretation options and should be revised. Some laboratories pre-diluted the CSF samples and, therefore, did not test the lower dilutions in which a hook effect still could be present. In the Invitrogen assay, a high variation was found, probably because Lab #6 tested lower steps of dilution (a factor 2.5) than the other two labs did (factor 4 and 5). Also, the amount of spiked antigen differed per laboratory; some laboratories did not spike antigen and, therefore, actually performed a parallelism experiment instead of the dilutional linearity experiment and had to be excluded from analysis for this parameter.
The recovery was out-of-range for the majority of the experiments. This could be due to the large variability between the laboratories in terms of preparation of the samples for the recovery experiments, e.g., the variation in concentrations chosen for spiking. In some laboratories, too high concentrations of Aβ40 were spiked, which resulted in an overflow or too low concentrations were spiked, resulting in very low precision of the result and thus low recovery. We have adapted the wording in the SOP to increase the compliance in execution (Supplemental Data for the improved version), e.g., by expanding on the choice of spikes and adding notes to standardize the spike concentrations in multicenter studies. Moreover, we added a proposal to include analysis of the actual spiked concentrations in reagent diluent for calculation instead of using a theoretical spiked value, in order to track possible dilution errors.
The parallelism results for the majority of the assays were within the predefined ranges, except the results of the Invitrogen and Fujirebio assays, because one laboratory was out of range. No explanation could be given for these two deviating results. However, with regard to this occurrence, to get more insight, it may be helpful to include more laboratories to test each validation parameter in future validation studies.
The IBL, Invitrogen, and Euroimmun Aβ40 assays seem to perform slightly better in terms of inter-laboratory comparisons, as shown by consistent high correlation coefficients for the clinical samples between laboratories. The mean Aβ40 concentrations of all CSF samples varied between the vendors. This is a hurdle in implementation of the ratio of Aβ42/Aβ40 in clinical practice to improve diagnostic accuracy (9-13). As long as this variation between assays is not overcome by the use of certified reference methods and materials, which are being developed (24), cut-offs will depend on the combination of assays used at every site. Usage of automated platforms instead of manual performance of biomarker measurements could mitigate inter-laboratory variations.
In conclusion, all Aβ40 assays perform generally well and are promising for implementing in clinical practice. We showed some deviating results in specific parameters, but these are more likely the result of inter-laboratory variation and the lack of reference materials. Furthermore, of note is that the absolute Aβ40 concentrations in clinical CSF varied between assays. Therefore, the assays could be improved by the availability of certified reference materials to calibrate kits as well as usage of automated platforms. The upcoming availability of Aβ40 assays in a European Conformity (CE)-certified format is an important first step for their routine implementation in Europe.

eThics sTaTeMenT
Participants or their legal representatives gave informed consent. The study conforms with The Code of Ethics of the World Medical Association (Declaration of Helsinki) (World_Medical_ Association 1964). The biobank containing the coded patient samples was approved by the Local Ethical committee of the VU University Medical Center Amsterdam. aUThOr cOnTriBUTiOns LWD: performed data analysis and drafted the manuscript. LK: supervision of data collection and study design, critical review of the manuscript. MK-S: study design and coordination, data collection, and analysis. HK, SE, AP-L, SL, MV, and EV: supervision of data collection and study design, critical review of the manuscript. AV: data collection, critical review of the manuscript. HS, HT, and AF: data collection and analysis, critical review of the manuscript. EV: supervision of data collection and study design, critical review of the manuscript. CT: study design and coordination, data analysis, critical review of the manuscript.