Harmonization of Multiple SARS-CoV-2 Reference Materials Using the WHO IS (NIBSC 20/136): Results and Implications

Background There is an urgent need for harmonization between severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) serology platforms and assays prior to defining appropriate correlates of protection and as well inform the development of new rapid diagnostic tests that can be used for serosurveillance as new variants of concern (VOC) emerge. We compared multiple SARS-CoV-2 serology reference materials to the WHO International Standard (WHO IS) to determine their utility as secondary standards, using an international network of laboratories with high-throughput quantitative serology assays. This enabled the comparison of quantitative results between multiple serology platforms. Methods Between April and December 2020, 13 well-characterized and validated SARS-CoV-2 serology reference materials were recruited from six different providers to qualify as secondary standards to the WHO IS. All the samples were tested in parallel with the National Institute for Biological Standards and Control (NIBSC) 20/136 and parallel-line assays were used to calculate the relevant potency and binding antibody units. Results All the samples saw varying levels of concordance between diagnostic methods at specific antigen–antibody combinations. Seven of the 12 candidate materials had high concordance for the spike-immunoglobulin G (IgG) analyte [percent coefficient of variation (%CV) between 5 and 44%]. Conclusion Despite some concordance between laboratories, qualification of secondary materials to the WHO IS using arbitrary international units or binding antibody units per milliliter (BAU/ml) does not provide any benefit to the reference materials overall, due to the lack of consistent agreeable international unit (IU) or BAU/ml conversions between laboratories. Secondary standards should be qualified to well-characterized reference materials, such as the WHO IS, using serology assays that are similar to the ones used for the original characterization of the WHO IS.


INTRODUCTION
There is an urgent need for harmonization between severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) serology platforms and assays prior to defining appropriate correlates of protection and as well inform the development of new rapid diagnostic tests that can be used for serosurveillance as new variants of concern (VOC) emerge (Berry et al., 2020;Ciotti et al., 2021;Giavarina and Carta, 2021;Infantino et al., 2021;Perkmann et al., 2021;Petrone et al., 2021;Knezevic et al., 2022).
Conversion of results from different laboratory methods to a harmonized international unit reduces the interlaboratory/method variability (Cooper et al., 2018;McDonald et al., 2018;Mattiuzzo et al., 2019Mattiuzzo et al., , 2020Ciotti et al., 2021;Knezevic et al., 2022). The WHO International Standards (ISs) are considered the highest quality materials to use for comparison between diagnostic methods using international units (Mattiuzzo et al., 2020). The WHO IS for SARS-CoV-2 serology standard is the National Institute for Biological Standards and Control (NIBSC) 20/136 (United Kingdom, 2020). This standard, as most biological standards, was produced in limited quantities, making it difficult to be used exclusively as a calibrant to compare results between multiple SARS-CoV-2 serology assays on a global scale. Therefore, there is a pressing need to increase the availability of appropriate reference materials that are considered equivalent to the WHO IS. Other well-characterized reference samples can be evaluated against the WHO IS to obtain a valid measurement and calibrated to the arbitrary WHO IS values of 1,000 international units per milliliter (IU/ml) for neutralization assays and 1,000 binding antibody units per milliliter (BAU/ml) (National Institute for Biological Standards Control, 2020).
We compared multiple SARS-CoV-2 serology reference materials to the WHO IS to determine their utility as secondary standards, using an international network of laboratories with high-throughput quantitative serology assays. This enabled the comparison of quantitative results between multiple serology platforms. Furthermore, each serology method can derive a BAU/ml (or IU/ml as appropriate) conversion for multiple antigen-antibody combinations within each sample that are scaled to the arbitrary 1,000 BAU/ml value assigned to the WHO IS. We also note that neutralization assays that report IU/ml may additionally be calibrated to the WHO IS.

Recruitment of Severe Acute Respiratory Syndrome Coronavirus 2 Serology Reference Materials
Between April and December 2020, 13 well-characterized and validated SARS-CoV-2 serology reference materials were recruited from six different providers (Table 1) (National Institute for Biological Standards Control, 2020;Frederick National Laboratory for Cancer Research, 2021;Oneworld Accuracy, 2021;Thermo Fisher Scientific, 2021;Windsor et al., 2021;Zeichhardt and Kammel, 2021). Reference materials were selected based on the following criteria: originally characterized by the suppliers with the relevant test's thresholds for positive and negative results, are readily available, enough panels will exist after this study to distribute for widespread use, and the providers intend to distribute their reference materials to other (primarily low-resource) laboratories. All the materials were individually evaluated against the WHO IS using previously validated diagnostic tests given in Table 2 and characterized according to the anticipated results shown in Table 1. All the reference materials and diagnostic tests were handled according to manufacturers' and the respective Clinical Laboratory Improvement Amendments (CLIA) laboratory developed test instructions.

Neutralization Assays
Severe Acute Respiratory Syndrome Coronavirus 2 Focus Reduction Neutralization Test Vero E6 cells (ATCC, CRL-1586;Manassas, Virginia, USA) were maintained at 37 • C in Dulbecco's Modified Eagle Medium (DMEM) (HyClone 11965-084; Logan, Utah, USA) supplemented with 10% fetal bovine serum and 100 U/ml penicillin-streptomycin. SARS-CoV-2 strain 2019 n-CoV/USA-WA1/2020 was obtained from ATCC. The virus was passaged once in Vero E6 cells and titrated by the focus reduction neutralization test (FRNT) on Vero E6 cells. All the work with infectious SARS-CoV-2 was performed in Biosafety Level 3 (BSL3) facilities at the University of Colorado School of Medicine.
The focus reduction neutralization test (FRNT) was performed as previously described Schultz et al., 2021;Taylor et al., 2021). Vero E6 cells were seeded in 96-well plates at 10 4 cells/well. On the next day, serum samples were heat inactivated at 56 • C for 30 min and then serially  Shaker Heights, Ohio, USA). The FRNT50 titers were calculated relative to a virus only control (no serum) set at 100%, using GraphPad Prism 9.1.2 default nonlinear curve fit constrained between 0 and 100%.

CPass α-Receptor-Binding Domain (GenScript) Neutralization Antibody Test
The cPass α-receptor-binding domain (RBD) neutralization antibody (nAb) test is a quantitative assay that specifically measures a subset of spike-binding antibodies that can block the interaction between the RBD on the SARS-CoV-2 spike protein and the human host receptor angiotensin-converting enzyme 2 (ACE2) (GenScript, 2021). The assay is performed as a blocking ELISA as described in the Food and Drug Administration (FDA) Emergency Use Authorization (EUA) instructions for use in the cPass TM SARS-CoV-2 Neutralization Antibody Detection Kit. The surrogate virus neutralization test (SVNT) cPass assay was clinically validated and shown to be 100% sensitive and specific when compared to a gold standard plaque reduction neutralization test (PRNT), with qualitative analysis results 100% in agreement (GenScript, 2021). The reference materials were diluted and preincubated 1:1 with RBD protein conjugated to HRP at 37 • C for 30 min. The mixture (100 µl) was then added to a 96-well plate coated with human ACE2 receptor protein; the plate was sealed and incubated for an additional 15 min at 37 • C. The plate was washed four times with 260 µl/well Wash Solution provided in the kit before addition of 100 µl per well 3,3' ,5,5'-Tetramethylbenzidine (TMB) substrate for 15 min at room temperature. 50 µl of 1 N sulfuric acid solution was added to each well and the optical density (OD) was measured at 450 nm using a spectrophotometer. The nAb assay readout was percent signal inhibition by neutralizing antibodies, which was calculated to be the OD value of the sample relative to the OD of the negative control subtracted from one (Tan et al., 2020;Petrone et al., 2021;Taylor et al., 2021). The positive cutoff results are ≥ 30% signal inhibition and results < 30% are reported negative based on previously conducted clinical validation studies (Petrone et al., 2021).

Binding Antibody Assays
Platelia α-Nucleocapsid Total Antibody Test The Platelia α-nucleocapsid (anti-N) total antibody test detects antibodies [IgG, immunoglobulin M (IgM), and immunoglobulin A (IgA) combined; Bio-Rad Incorporation] to the nucleocapsid protein. The assay is performed as a one-step antigen capture ELISA as described in the FDA EUA instructions for use for the Platelia SARS-CoV-2 Total Antibody Test Kit (Bio-Rad, 2021). The diluted plasma (1:5) and the WHO IS (1.5-fold serial dilution series up to 8 times, starting at 1:90 dilution) were mixed with SARS-CoV-2 nucleocapsid protein coupled with horseradish peroxidase (HRP) enzyme at a 1:1 ratio and 100 µl added to a 96-well plate coated with the nucleocapsid protein. The plate was covered with an adhesive plate sealer and incubated at 37 • C for 1 h. The plate was then washed five times with the Working Washing Solution provided in the kit and 200 µl of the Enzyme Development Solution was added to each well. After a 30-min incubation in the dark at room temperature (18-30 • C), the reaction was stopped by adding 100 µl per well of an acidic stopping solution and mixing thoroughly before measuring the OD at 450 nm using a spectrophotometer. The assay readout was a ratio of the specimen OD to cutoff control OD. A positive specimen-to-cutoff ratio ≥ 1.0 and < 0.8 is negative and in between is reported equivocal with the recommendation of another specimen collected 3 days later.
The Platelia assay has FDA EUA clearance for a qualitative interpretation of results (Bio-Rad, 2021).

Statistical Analysis
Parallel-line assay (PLA) was used to compare all the secondary standard candidate samples to the WHO IS; all the analytes were set at 1,000 IU or BAU/ml (Finney and Schild, 1966). All the samples were tested in triplicate with each diagnostic test at dilutions within each assay's given linear range for the WHO IS. Data were analyzed using PLA analysis using R 3.5.0 that we created (R Core Team, 2021). Sample results and their corresponding dilutions were log-transformed and assessed for parallelism using the relative slope calculated individually between each sample and the WHO IS. To ensure the assumption of parallelism for PLA analysis to occur, a relative slope between 0.8 and 1.2 was considered parallel and samples with relative slopes outside the range were excluded from further analysis because they violated the PLA assumption of parallel lines (Mattiuzzo et al., 2020). The relative potency was calculated for each sample whose slope was within 20% of the WHO IS slope. Relative potencies were then converted to IU or BAU/ml based on the assay used (Finney and Schild, 1966) and parametric bootstrapping was used to calculate CIs for each sample (B. Efron, 1979;Landes et al., 2019). The full reproducible code and readme file are both available at: github.com/yroell/pla. and the overview of our created PLA analysis is shown in Supplementary Figure 1 showing an overview used for each sample. IU and BAU/ml conversions were then compared for interassay variability using percent coefficient of variation (%CV) (Reed et al., 2002;Wood et al., 2012).

Analysis of Samples and Binding Antibody Unit Conversions
Thirteen samples (including the WHO IS) from six different providers (Table 1) were tested using six different SARS-CoV-2 serology diagnostic platforms. Twenty-one total antigenantibody (Ag-Ab) combinations were evaluated. Three of the platforms were multiplexed platforms targeting multiple Ag-Ab combinations. The remaining three platforms consisted of two SARS-CoV-2 neutralization tests and one nucleocapsid-specific ELISA ( Table 2). Each laboratory performed serial dilutions of the WHO IS to establish the linear range of the WHO within each testing platform. All the reference samples were then serial diluted within the WHO IS linear range and tested in triplicate. Results from each laboratory were compiled and evaluated using PLA. Reference material samples were considered "parallel" if their relative slope against the WHO IS was between 0.8 and 1.2. Samples that failed to fall within the range were excluded from further analysis. For each sample at each Ag-Ab combination, BAUs (or IUs for neutralization tests) were calculated using sample relative potency. BAU conversions for each sample are shown in Table 3 and Figure 1 summarizes

Interlaboratory/Method Binding Antibody Units Concordance
Once BAUs were calculated, we evaluated results for overall intermethod concordance if multiple laboratories yielded results for each Ag-Ab combination using percent coefficient of variation (%CV). Lower %CV values (<21%) indicate that results are highly agreeable between laboratories. None of the samples tested yielded universally high concordance between methods (regardless of Ag-Ab combination). For specific Ag-Ab combinations, there was no universal concordance between methods regardless of the sample tested. Samples 1WA-A, 1WA-B, and 1WA-C saw high concordance between laboratories for both the IgG and IgM bound to S, RBD, and N antigens (%CV range between 5 and 57%). CSCP-HR and CSCP-WR were highly concordant within the IgG-S combination (5 and 12%, respectively) and IgA and IgM bound to S, RBD, and N antigens (%CV range between 2 and 53%). Sample 416,029 was highly concordant between laboratories for IgG-RBD and IgG-S1 combinations (%CV 6 and 1%, respectively). Sample 416,048 saw high concordance with IgG S, S1, and RBD combinations (%CV = 19, 2, and 8%, respectively). The highest %CV value in Figure 2 was found in sample 416,006 at the IgG-N analyte, which is likely because that particular sample was acquired from a postvaccinee individual and was not highly reactive to IgG-N during its characterization ( Table 1). The National Cancer Institute (NCI) Frederick sample saw good overall concordance between laboratories for all the measured analytes. The IgG-S1 of the Thermo Fisher Scientific sample was highly concordant between laboratories (%CV = 6%). Result concordance between testing methods at each Ag-Ab combination is shown in Figure 2.

DISCUSSION
We evaluated multiple candidate reference materials against the WHO IS (NIBSC 20/136) to determine whether secondary standards could be established. We then evaluated the applicability of using arbitrary BAU conversions to compare results between laboratories and serology diagnostic methods. Many seroprevalence studies use different serology assays to estimate transmission and/or herd immunity. The differences between assays make it nearly impossible to harmonize and establish a reliable limit of detection. A reference standard would theoretically allow for comparison between such studies.
A number of studies have determined that internal standards provided by the WHO for various pathogens may be useful and should be used to compare results across laboratories and diagnostic methods to help establish correlates of protection for SARS-CoV-2 and other high-threat pathogens (Cooper et al., 2018;McDonald et al., 2018;Mattiuzzo et al., 2019Mattiuzzo et al., , 2020Ciotti et al., 2021;Knezevic et al., 2022). For example, when assessing candidate reference materials for enterovirus serology, FIGURE 2 | Inter-method concordance of binding antibody unit conversions among reference materials for each analyte. %CV, percent coefficient of variation; Light blue, Higher concordance between methods; dark blue, lower concordance between methods; blank, not enough labs yielded PLA results to compute concordance. Thick Black outlines indicate that the particular analyte was evaluated by that sample's provider. The following samples were removed because they were classified as "non-reactive" during testing: 1WA-D, 416006, CSCP-NR; The following Ag-Ab combinations were removed due to lack of sufficient PLA data due to linearity violations: Whole-Virus-Total, S2-Total, S2-IgM, S2-Iga, S2-Total, S-Total, RBD-Total.
one study evaluating the interassay variability for both the raw neutralization titer and the calculated relative potencies found a marked decrease in interassay variability. Their calculated percent geometric coefficient of variation (%GCV) was between 30 and 94% (Cooper et al., 2018), indicating that although their candidate materials had decreased interassay variability after the results were converted to a harmonized metric, it is difficult to know what is considered an acceptable coefficient of variation across methods in this context. Two additional studies that evaluated candidate reference materials for Zika virus found similar improvements to intermethod concordance with the reference material, yet GCVs remained exceptionally high, suggesting that a threshold for acceptable intermethod concordance may be difficult, if not impossible, to establish in these contexts (Mattiuzzo et al., 2019;Berry et al., 2020).
Finally, the developers of the WHO IS conducted a robust evaluation of the candidate standard that included 125 different SARS-CoV-2 serology assays (Mattiuzzo et al., 2020;Knezevic et al., 2022). When evaluating the interassay variability of results, they stratified their comparisons into neutralization assays, ELISAs, and "other" assays relative to what is now the WHO IS. Interassay variability between neutralization assays for samples tested relative to the WHO IS did not fall below 67% (%GCV range 67-250%). The interassay variability for the WHO IS itself was 241% (Mattiuzzo et al., 2020). Similar results were found when comparing ELISA methods and there were no data that evaluated the "other" methods included in the characterization. The assignment of an arbitrary 1,000 IU for neutralization assays and 1,000 BAU/ml for other assays-despite the large interassay variability relevant to the WHO IS-does not account for the vast differences between assays. Additionally, the interassay variability between all the methods used was not presented, which, therefore, makes it difficult to fully understand how best to harmonize results between multiple laboratories in order to assess correlates of protection. This study evaluated the interassay variability relative to the WHO IS across all the methods used. We also present the variability between laboratories for multiple Ag-Ab combinations to differentiate which ones are more likely to remain consistent or be highly variable within each sample.
Other studies also suggest that SARS-CoV-2 serology tests cannot be calibrated to the same measurement "ruler" and results compared between assays (Cooper et al., 2018;Bradley et al., 2021;Castillo-Olivares et al., 2021;Giavarina and Carta, 2021;Infantino et al., 2021;Perkmann et al., 2021;Solastie et al., 2021;Knezevic et al., 2022). It is also important to note that the IU or BAU assigned to the WHO IS is arbitrary and not based on an analytical concentration measurement. Additionally, results attained using the WHO IS are highly variable between assays. Our results demonstrate that any reference material should be characterized independently for each assay and it is not advisable to compare quantitative IU or BAUs between different assays. Therefore, arbitrary BAUs that were not calculated should not be used to benchmark any characterizations made for other reference materials, especially candidate secondary standards (Bradley et al., 2021;Giavarina and Carta, 2021;Perkmann et al., 2021). International Standards are not able to account for the wide variety of reagent formulations and nuances between testing methods using a universal metric such as an IU or BAU conversion. Finally, our findings show the qualification of secondary standards using the WHO IS using the 1,000 IU or BAU as a baseline metric that does not yield consistent IU or BAU conversions between assays.
Regardless of the pathogen, many other evaluations of "candidate" reference materials from the WHO have revealed a high degree of interassay and interlaboratory variability during characterization (Bozsoky, 1963;Holder et al., 1995;Wood et al., 2012;Dimech et al., 2013;Cooper et al., 2018;McDonald et al., 2018;Mattiuzzo et al., 2019;Kempster et al., 2020;Timiryasova et al., 2020). Although these findings cannot be verified within the context of this study, our findings reinforce that SARS-CoV-2 serology reference materials face the same challenges and interpretation issues that other groups have seen (Mattiuzzo et al., 2020;Castillo-Olivares et al., 2021;Ciotti et al., 2021;Giavarina and Carta, 2021;Infantino et al., 2021;Kristiansen et al., 2021). Standardization of IU or BAU values for candidate secondary standards relative to the WHO IS could not be achieved across different laboratory assays using methods consistent with the NIBSC characterization of the WHO IS (Mattiuzzo et al., 2020). This calls into question the feasibility of standardizing different serology assays in the future and what this means when interpreting seroprevalance or distinguishing between natural infections and vaccineinduced responses.

Limitations
Some limitations are noted for this study. Among our laboratories, some were unable to yield relative potency values to use for a BAU/ml conversion for certain Ag-Ab combinations. Our criteria for PLA parallelism were more strict (relative slope = 0.8-1.2) than the standards set by the NIBSC (relative slope = 0.8-1.25) during the initial characterization of the NIBSC 20/136 because we wanted to set a more consistent range for relative slopes on either end (Mattiuzzo et al., 2020). Furthermore, the NIBSC does not clarify why they established an acceptable relative slope range of 0.8-1.25 was chosen. Manufacturing convalescent plasma/serum samples at scale is not common practice due to low volume donations and lot-to-lot differences. So, unlike molecular standards, it is difficult to generate large batches and consistent lots for harmonization or even for testing (in a postharmonization world). Two of the six methods used were neutralization assays; one did not yield relative potency for any samples tested and the other only yielded a relative potency for a single sample. Even after log, the raw candidate sample neutralization results failed to fall within the parameters to accurately perform PLA (Taylor et al., 2021).
Similar studies have used a variety of different interassay comparability methods that include, but are not limited to the Spearman's rank correlation coefficient, the Mann-Whitney U tests, and Bablok regression (McDonald et al., 2018;Castillo-Olivares et al., 2021;Giavarina and Carta, 2021;Perkmann et al., 2021). Percent coefficient of variation (%CV) is a flexible metric commonly used in clinical laboratories and the developers of International Standards to evaluate interassay, intralaboratory, and lot-to-lot variations (Reed et al., 2002;Mattiuzzo et al., 2020). Furthermore, each of the example of alternative comparison methods exclude outlier results from analysis, which biases comparisons to appear erroneously "better" in a study context where outlier laboratory results are important to consider when determining the effectiveness of candidate reference materials.
The MMA method tested the WHO standard as nonreactive (no reaction present) for IgM against the nucleocapsid and spike S2 and indeterminate (no result due to PLA violation) for IgA against the nucleocapsid. Even though the assay was sensitive enough to give values for these analytes, these numbers are below what was consider reactive. Because the standard was so low and set to 1,000 BAU/ml, any sample with detectable but similarly

Topic
Recommendation(s)

Regulatory Bodies
• Replace the process that qualifies candidate secondary materials to an international standard with standards or best practices set for the "characterization" process of any potential reference materials using historical development of WHO IS' as a framework. *This will elevate the quality standards for characterization of samples.* • Regulatory bodies must also require more precise interpretation of how to use particular reference materials based on the results from their characterization. *These interpretations must take into account the nuances of reagent formulation, testing platform, and the results interpretation in a clinical setting. * • Once these interpretations are more precise, future studies can then appropriately compare the results between seroprevalance studies for SARS-CoV-2 and potentially other incoming pathogens of interest.

Reference Material Characterization
• When characterizing reference materials, the methodology, reagent formulation, and validation information must be shown and included in the interpretation of reference material testing results. Different assays with different reagent formulations might yield slightly different results.
• Establish a minimum number of laboratory methods to include when characterizing potential reference materials.
• Require that the development, manufacturing, and distribution of secondary standards align with Good Manufacturing Practices.
• Establish a minimum list of pathogens to test for when determining sample microbial bioburden.
• Establish a list of minimum requirements for "suitable assay" used to demonstrate reference material expected immunological activity.
• Establish an acceptable level of concordance (%GCV or % CV) between laboratories for the average BAU IU conversion to be considered "reliable."

Interpretation
• Clarify that reference material (international standards and secondary standards) characterization is extremely assay and context dependent, which can affect accuracy of result interpretations. Similar tests with similar reagents must be used when comparing BAU conversions, and seroprevalence study results.
• Revoke the encouraged removal of outlier method results during sample characterization. Exclusion of outlier laboratory data that fall within the PLA assumptions makes reference materials less comparable between methods which might remove the ability to adequately compare results between seroprevalance studies.
• In order to continue using any WHO IS after their supply runs out, consider the development artificial IS for serology.
• Clarify and establish that the intended use of standard reference materials is for external quality assurance schemes, comparing results between studies using similar assays or reagents, and be used as "anchors" by testing the same standards in the beginning and the end of a longitudinal research study. Which will attest to the quality of the results presented by that research study.
low quantities of an analyte will give a misleadingly high BAU/ml value and should be interpreted with caution. Finally, each method in this study used different formulations of commercial reagents as noted in the Materials and Methods section. For coronavirus disease 2019 (COVID-19) and detection of anti-SARS-CoV-2 antibodies, the field is complicated by multiple antigen sources, multiple host experiences (one or more natural infections and/or vaccines and boosters), multiple variants, and multiple test platforms. This makes it very difficult to achieve harmony. The nuanced differences between these reagent, platforms, and host experiences might contribute to the differences between IU and BAU conversions. Serology is extremely dense with methods and tests, regardless of the pathogen, which highlights the difficulty of applying the same standards for interpretation because it does not account for the nuances that accompany a wide range of assays. This highlights the need for a more precise interpretation of reference material characterizations, so these differences can be accounted in future studies and allow for better harmonization of results between methods.

CONCLUSION
Harmonization of serology reference materials will increase the accessibility of reference materials-particularly in low-resource settings, provided the methods used for comparison are accurate and reliable. Our findings indicate that the arbitrary units of the WHO IS are not an accurate means to compare SARS-CoV-2 serology results between different laboratories or methods. This study also shows that even after IU or BAU conversion, candidate secondary material results are still drastically different between laboratory methods. Both the International Standards and candidate secondary standards should only be used to compare the results within the same laboratory methods, provided they are using identical testing platforms, protocols, and reagent formulations (Bradley et al., 2021;Giavarina and Carta, 2021;Perkmann et al., 2021). This must be highlighted by regulatory bodies to accurately portray the use of the WHO IS as an assay calibrator during development or external quality assurance material for intramethod comparison, not as a universal comparator (Holder et al., 1995;Infantino et al., 2021).
Finally, despite some concordance between laboratories, qualification of secondary materials to the WHO IS using arbitrary IU or BAU/ml does not provide any benefit to the reference materials overall, due to the lack of consistent agreeable IU or BAU/ml conversions between laboratories. Secondary standards should be qualified to well-characterized reference materials, such as the WHO IS, using serology assays that are similar to the ones used for the original characterization of the WHO IS. However, secondary standards are useful if qualified using similar assays as the original characterization as source traceability for they can be used for intraassay adjustments and can be used in external quality assessment to identify binding to antigen(s) presented in an assay to a reference, thereby providing intralaboratory operations (Table 4).

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
WW oversaw the entire study operation, conducted the analyses, constructed the tables, and drafted the manuscript. YR conducted the data analyses in R, built an open source R script for the parallel-line assay (github.com/yroell/pla), and built all the figures (both the manuscript and supplementary). ML provided data analysis oversight. MC assisted WW with the overall study design, execution, and result interpretation. HT and WL provided their open source parallel-line assay analysis program to use as a comparison to the R script developed by YR. WW, HZ, MK, HW, AD, and DT contributed SARS-CoV-2 serology reference materials. The remaining authors tested the samples in their respective laboratorios. All authors contributed insights, edits, revisions, and interpretations to the final manuscript.

FUNDING
This study is an expansion of the original COVID-19 Serology Control Panel Study funded by the Bill & Melinda Gates Foundation.