Longitudinal evaluation of laboratory results and method precision in worldwide erythropoietin external quality assessments

Introduction: This study presents a longitudinal analysis of external quality assessment (EQA) results for erythropoietin (EPO) determinations conducted between 2017 and 2022 with a continuously increasing number of participating laboratories. The aim of this work was to evaluate participant performance and methodological aspects. Methods: In each of the eleven EQA surveys, a blinded sample set of lyophilized human serum containing one sample with lower EPO concentrations (L) and one with higher EPO concentrations (H) was sent to the participating laboratories. Results: A total of 1,256 measurements were included. The median (interquartile range) fraction of participants not meeting the criteria of acceptance set at 20% around the robust mean of the respective survey was 9.5% (6.1%–10.7%) (sample L) and 9.1% (5.8%–11.8%) (sample H) but lacked a clear trend in the observed period. Some surveys exhibited unusually high interlaboratory variation, suggesting interfering components in the EQA samples. Different immunological methods and reagent manufacturers also showed variability in measurement outcomes to some extent. Conclusion: These findings highlight the need for continuous quality assessment in EPO measurements to ensure patient safety and identify areas for further research and investigation.


Introduction
The quantitative determination of erythropoietin (EPO) in blood is mainly performed using immunoassays.By measuring serum EPO levels, useful information can be obtained on various pathogenic changes.The resulting therapeutic algorithms can guide treatment.Chronic kidney disease, as well as systemic inflammation and malignancies, can lead to a decrease in EPO biosynthesis and, therefore, to low EPO levels in the blood (Jelkmann, 2011;Portolés et al., 2021).Higher concentrations can be measured for secondary erythrocytosis, mostly caused by hypoxemia.In addition, non-renal anemia results in a higher renal EPO production and an exponential increase in serum levels (Artunc and Risler, 2007;Jelkmann, 2011;Bunn, 2013).In combination with other parameters, EPO also serves as a marker for possible myeloproliferative diseases (Michiels et al., 2007).Endogenous EPO levels should also be determined before injecting erythropoiesis-stimulating agents to treat, for example, myelodysplasia (Fried, 2009).
An adequate measurement quality is essential for ensuring patient safety, and a formal proof of the analytical competence to measure certain parameters-namely, accreditation-is mandatory or at least recommended in most countries (Zima, 2017).Adequate treatment and patient safety require reliable test results at a consistently high standard (Laudus et al., 2022).Clinicians and especially patients expect precise test results from diagnostic testing during treatment monitoring, regardless of the laboratory performing the tests (De La Salle et al., 2017).External quality assessment (EQA) is used to independently evaluate, continuously monitor, and compare laboratory performance, and frequent participation in EQA programs is mandatory for accredited medical laboratories (Favaloro et al., 2018;Sciacovelli et al., 2018;DIN EN ISO 15189:2023-03, 2023).It is a helpful tool for accessing the current status quo and can help to identify areas in need of improvement (Laudus et al., 2022).Additionally, EQA can assess the precision of the methodology used by the laboratories (Favaloro et al., 2018).Marsden et al. emphasized the need for the establishment of an EPO EQA scheme in 2006 after they found fluctuations of 2.9-200 IU/L in a sample distribution program involving six laboratories (Marsden et al., 2006).INSTAND e.V. is an independent scientific medical society and accredited organization located in Germany supporting quality assurance in medical laboratories by performing EQA in laboratory medicine.INSTAND introduced its first worldwide EQA for EPO measurements in 2017.Since then, it has been performed twice a year, and the certificate is valid for 12 months.
In order to observe developments in the general measurement quality of medical laboratories and their methodology for EPO measurement, an established EQA scheme with a certain number of participants and EQA runs is required.This study is the first to show a longitudinal analysis of the results of the INSTAND EPO EQA from 2017 to 2022 with participating laboratories from all over the world.The study also aims to summarize the results of all runs of this EQA and to present the development of the EPO EQA since its introduction.

EPO EQA procedure
A total of eleven surveys of the EPO EQA were performed twice per year (surveys S1/S2) between 2017 and 2022, which involved an increasing number of participants from all over the world.For each survey, every participating laboratory was asked to analyze two blinded lyophilized human serum samples containing different EPO concentrations.In this work, the sample with the lower concentration is always referred to as sample L, and the one with the higher concentration is sample H.In some cases, specimens were enriched with recombinant EPO by the sample manufacturer.Due to an unexpectedly high number of participants in 2020-S1 and 2020-S2, participants were divided into two subsets (2020-S1a and 2020-S1b and 2020-S2a and 2020-S2b), and each received a different sample set.The lyophilized EQA samples had to be reconstituted with 1 mL of distilled water for 30 min at room temperature and then analyzed like a normal patient sample.
Laboratories reported their results and information about the assay they used to INSTAND via the RV-Online platform (https://rv-online.instandev.de).Between 2017 and 2022, the EQA criteria of acceptance (CoA) for EPO were set to a 20% deviation from the robust mean calculated using Algorithm A (ISO13528:2015, 2020).Laboratories that reported measurements outside the CoA would not pass the quality assessment.The German Medical Association has not yet defined a maximum permissible relative deviation in EQA schemes for EPO.Therefore, the CoA used for the evaluation of the INSTAND EPO EQA is based on the mean value of the permissible relative deviations recommended in the guideline of the German Medical Association for EQA schemes for other quantitative parameters in clinical chemistry (Bundesärztekammer, 2022).

Data analysis
Microsoft Excel (Version 16.56, Microsoft Corporation, Redmond, WA, USA) was used for data management.The statistical analysis and visualization of the results were performed using R Studio (Version 4.1.1 (2021-08-10), Rstudio PBC, Boston, MA, USA).Figures were created using the R-package ggplot2 (Wickham, 2016).The whiskers in the created boxplots span 1.5 times the interquartile range (IQR) above and below the box, capturing the middle 50% of the data.The dots mark outliers, which are defined as observations that exceed 1.5 times the IQR from either edge of the box.
The mean absolute deviation (MAD) to median ratio was calculated to evaluate the interlaboratory variation.Data distribution depending on the immunological methods used by the laboratories was analyzed.Methods used by the participating laboratories were enzyme-linked immunosorbent assay (ELISA), chemiluminescence immunoassay (CLIA), or luminescent enzyme immunoassay (LEIA).Reagent manufacturers' dependent data distributions were analyzed.The manufacturers were Beckman Coulter, Inc. (BE), Siemens (DPC-Biermann; DG), and IBL International GmbH (IB).Missing information on test method and reagent manufacturer, as well as manufacturer collectives with n < 14, were grouped as "other" due to lack of statistical validity.
Nine measurements each for sample L and sample H were excluded from the dataset due to suspected sample mix-ups or data submission errors and were not included in later calculations (Supplementary Table S2).

Results
Overall, 1,256 measurements were evaluated.The first EQA survey conducted in 2017 had ten participating laboratories.In subsequent years, the number of participants increased to an annual average of 85 laboratories in 2022 (Figure 1; Table 1).The overall median (IQR) percentage of participants not meeting the CoA was 9.5% (6.1%-10.7%)for sample L and 9.1% (5.8%-11.8%)for sample H. Relatively high rates (46.9% and 38.2%, respectively) of measurements outside the CoA for sample L were observed for 2019-S2 and 2020-S1a (Figure 1; Table 1).The interlaboratory variation was determined by calculating the MAD/median ratio for each survey.The overall MAD/median ratio (median; IQR) was 11.0% (7.5%-13.1%)(sample L) and 9.9% (8.8%-10.6%)(sample H) but showed an unusual peak for 2019-S2 at 25.0% for sample L, which is in line with the low passing rate for this survey (Figure 1).
The results were also evaluated based on the immunological methods used by the laboratories (Table 2).Scatterings for the individual methods are quite low when considered in relation to the overall distribution of the data.Overall, the method-specific data distributions were mostly within the quartiles of the total data distribution.In some cases, the value distribution for ELISA shifted upwards, especially for the less concentrated samples (sample L) between 2019-S2 and 2020-S2b but also for sample H 2020-S2a and 2021-S1 (Figure 2).With 824 measurements for samples L and H combined, CLIA was the most frequently used method in every survey.LEIA had the lowest frequency, with 118 total observations.ELISA was used 132 times.
Regarding the reagent manufacturer-dependent data analysis, the most frequently used manufacturer was DG, with 846 measurements for sample L and sample H combined (Table 3).Manufacturer BE was used 158 times.IB was used the least (n = 70).IB showed a tendency for higher values, and upward shifts could be observed in some surveys, especially for sample L between 2019-S2 and 2021-S1, but also for sample H in the 2019 and 2021 surveys (Figure 3).In some cases, BE tended towards values in the lower range of the overall distribution and, in some cases, even outside the lower quartile.One shift outside the upper quartile could  :2015, 2020) and measurements outside of the criterion of acceptance (CoA) at ± 20% around the robust mean for each of the eleven surveys (S) from 2017 to 2022 for sample L (L) and sample H (H).

Survey
Robust mean (Algorithm A; IU/L) Measurements outside the CoA

Discussion
This study summarizes quantitative EQA results for EPO determination conducted between 2017 and 2022.The MAD/median ratio was below 15% in almost every case.Survey 2019-S2 showed higher values at 25.0% for sample L. Also, some immunological methods and reagent manufacturers showed variability in measurement outcomes to some extent.These findings should also be placed in relation to their clinical relevance.EPO determination is mainly a diagnosis of exclusion to identify, for example, chronic kidney disease as the cause of anemia.Therefore, the focus is on the concentration of EPO in relation to other anemia markers rather than on the exact prevailing EPO concentration.Low EPO concentrations in the blood, in combination with hemoglobin concentrations below 13.0 g/dL (adult males) and 12.0 g/dL (nonmenstruating females), may indicate a renal cause (Lankhorst and Wish, 2010).Non-renal anemia usually results in increased EPO levels, and, in severe cases, an increase of up to 1000-fold can be reached (Artunc and Risler, 2007;Higgs et al., 2015).Hence, measurement deviations may be, to a small extent, clinically less critical if the EPO value is considered in relation to the relevant biomarkers.Nevertheless, clinical laboratories should always strive for the highest measurement precision so that patient safety, as the highest priority, is never compromised.To this date, further investigation is needed to get clear statements on quality specifications for EPO measurement variation.
Scattering in the EPO levels of the investigated immunological methods and reagent manufacturers could be observed in some cases.Immunoassays have an analytical error rate of 0.4%-4% (Ismail, 2017).This can be attributed to exogenous factors such as variability in sample pipetting and other handling errors or systematic exogenous error sources such as calibration errors (Sturgeon and Viljoen, 2011).Furthermore, interfering factors, such as the reagents used, have been known to affect measurement outcomes (Alhajj and Farhana, 2022).There also may be excessive non-specific binding of the antibody or antigen in the assay performed (Gan and Patel, 2013).It is known that the imprecision of EPO quantification immunoassays depends on the concentration (Marsden, 2006).Especially for the reagent manufacturer IB, scatter could be observed at median sample concentrations of 10 IU/L or less.This manufacturer was only used in combination with the ELISA and "other" method collective.The concentration range of the calibration curve is 10.7-469 IU/L of the commercially available ELISA kit from this manufacturer, according to the manufacturer's website (IBL International GmbH, 2023).Thus, the EPO concentration in the samples might have been too close to the detection limit of the assay.However, due to the comparably small number of IB applications, more measurements would be needed to corroborate this assumption.Compared to IB, the lowest limit of detection for the manufacturer DG device Immulite 2000 was found to be 0.16 IU/L, with the manufacturer's recommended detection limit being 0.24 IU/L (Benson et al., 2000).The lowest limit of detection for the DG device Advia Centaur Systems is given at 0.75 IU/L (Siemens Healthcare Diagnostics Inc, 2019).The dynamic range of the BE family of Access Immunoassay Systems EPO assays could be determined at 0.6-750 IU/L (Retka et al., 2005;Beckman Coulter, Inc., 2023).Marsden et al. compared different EPO ELISA test kits with radioimmunoassay as a reference test.One kit from the manufacturer IB was also included in that comparison and showed a slight positive bias compared to the reference method.Even though Marsden et al. was conducted in 1999, and no radioimmunoassay was used in the present study, these results are in line with some observed upward shifts for this manufacturer (Marsden et al., 1999).
In some cases, slight fluctuations were also observed for BE.Owen and Roberts compared the test performance of the Access 2 device of this manufacturer with the Immulite 2000 device by the manufacturer DG and obtained comparably good results with both manufacturers (Owen and Roberts, 2011).As the sample sizes for both manufacturers were the same in the study mentioned (n = 101) compared to the extremely varying frequencies of use in this EQA, the results obtained here do not yet indicate a clear difference in the measurement range of the two methods.Owen and Roberts also compared the two manufacturers DG and BE in terms of cross-reactivity with recombinant EPO preparations and found that both differed considerably in the measurement results of samples spiked with Epoetin alfa and Darbepoetin alfa, as the values for BE were in a much higher range-109 IU/L higher and 242 IU/L higher than DG, respectively (Owen and Roberts, 2011).Because the samples used for these EQA surveys were sometimes spiked, differences in crossreactivity with recombinant EPO as the cause of variability cannot be safely excluded.
The manufacturer DG was used most frequently by the EQA participants in this work.A study by Abellan et al. from 2004 compares the Immulite 2000 system from DG, which is based on CLIA, with an ELISA kit by a different manufacturer that was not used by any participant in the present study.The DG device showed better intra-laboratory precision and a lower variation in the interlaboratory comparison.Both immunoassay methods correlated well, although ELISA tended to show lower   values (Abellan et al., 2004).In the methodological comparison of the present study, some cases were observed in which ELISA tended to show higher values than CLIA and LEIA, which contrasts with the tendency observed in the mentioned article.Because there is not yet any reference method for quantitative EPO determination, no valid statement can be made as to which method or which manufacturer offers the highest precision.External quality controls are, therefore, even more important when comparing the measuring ranges of the laboratories and the methodology.Methodological comparisons require representative sample sizes, which are partially not yet given due to the low frequency of use in some cases.Because the number of participants in the EPO EQA has been increasing, more specific comparisons might be made in future studies.
It should also be noted that the standards used for the IBL-ELISA were calibrated against the first international erythropoietin standard (87/684) (IBL International GmbH, 2022 National Institute for Biological Standards and Control, 2008).The calibrator of the Immulite 2000 by manufacturer BE and the devices used in this study from manufacturer DG are traceable to the second international erythropoietin standard (67/343) (Owen and Roberts, 2011;Beckman Coulter, Inc., 2020).The second international standard is derived from urine but is used to calibrate detection in human serum or plasma (National Institute for Biological Standards and Control, 2013).It remains questionable whether accurate results can be obtained in blood if the calibrators of the assays are traceable to a standard from a completely different matrix.The Siemens Advia Centaur device from manufacturer BE, which was used by some participants in this study, is traceable to the second international standard and the third international erythropoietin standard (11/170), which is mainly based on a recombinant EPO preparation (National Institute for Biological Standards and Control, 2012;Siemens Healthcare Diagnostics Inc, 2019).
EQAs may not be passed for different reasons, most of which can be attributed to human error, such as sample mix-ups or errors during the reconstitution process.Li et al. found that potential reasons for not passing EQAs can, for example, be due to errors in the management of the measurement results, such as transcription errors or reporting of incorrect units, which were also noticed in this work.However, technical errors, such as calibration problems, were described as the main reason (Li et al., 2019).To successfully complete the EQA, it is important that participants follow the details of the test scheme and apply good laboratory practices, like checking the methods for quality and ensuring that the staff is adequately trained (Edson et al., 2007).Two surveys (2019-S2 and 2020-S1a) did stand out with a particularly high failure rate and high interlaboratory variation for sample L. The same batch of sample sets was used in these two surveys.This suggests that there might be interfering components in this batch for sample L. This may be due to unusually high concentrations of regular serum components prevailing in the sample, leading to falsely high or falsely low results (Sequeira, 2019).Insufficient commutability of the sample may also have negatively impacted the test performance.It is often not possible to use authentic clinical samples in the context of proficiency testing.However, artificially generated samples do not always mirror the patient samples that are routinely examined in laboratories (Laudus et al., 2022).In the EQA surveys performed, samples were sent to participants in lyophilized form.The samples used in 2019-S2 and 2020-S1a were not spiked with recombinant EPO, but other samples used in this study were.Both sample preparation and sublimation have been described as possible influencing factors (Vesper et al., 2007;Miller et al., 2011).As mentioned above, there can also be differences in cross-reactivity with recombinant EPO preparations depending on the assay manufacturer (Owen and Roberts, 2011).
The study had the following limitations: The exact isoform of recombinant EPO spiked into some of the samples is unknown.This makes it difficult to draw conclusions about any possible cross-reactivity in the samples.It should also be reiterated here that commutability studies of the EQA samples have not yet been carried out, so a possible influence of the sample preparation on EPO detection is not known.Whether the test performance is affected by the sample itself should be evaluated.As mentioned above, there is no validated reference method for quantitative EPO detection.Accordingly, no analytical target value can be determined for the evaluation of the EQA, and the robust mean value must be used as the target value for evaluation, which is a common practice.The most represented method or manufacturer also has the strongest influence on the overall mean.Because the true value is unknown, this can lead to biases in the evaluation to an unknown extent (Kristensen and Meijer, 2017).Furthermore, it is not possible to include the exact specifications given by the manufacturer for each method at any given time, as the corresponding reagent kits and batches are not known.The EQA is also intended to provide an overall picture of the analyses rather than comparing individual kits and batches.
However, the results presented in this study are of importance despite the limitations mentioned, as this is the first longitudinal evaluation of EPO EQA data to date.Medical laboratories should always aim to keep their measurement quality at the highest standard, and this work can be used to reflect on the institution's  methodology and to see how their detection method or the assay manufacturer used performs in relation to others.

Conclusion
This work shows that variations in laboratory results and in methodological terms for quantitative EPO determination do persist to some degree, and knowledge about sources of errors is vital in order to optimize measurement quality and thus ensure patient safety.However, in terms of clinical relevance, small deviations might be considered less critical for the diagnostic assessment and the resulting therapeutic consequence in patients because, in anemia diagnostics, the level of EPO in combination with other relevant biomarkers is of decisive importance.Thresholds for maximum acceptable variation in EPO measurement quality and their clinical consequences should be further investigated in the future.

FIGURE 1
FIGURE 1 General outcome/information of the INSTAND EPO EQA from 2017 to 2022.(A) Number of laboratories participating between 2017 and 2022 (green) and a corresponding trend line (blue) starting with one survey (S) in 2017 (2017-S1) and continuing with two runs per year (S1/S2) until 2022.(B) The percentage of measurements outside the criteria of acceptance (CoA; %) calculated for each survey for sample L (red) and sample H (turquoise).The CoA was defined as ± 20% around the robust mean for the individual surveys shown.(C) Mean absolute deviation (MAD)/median ratio (%) for sample L (red) and sample H (turquoise) for every survey.

FIGURE 2
FIGURE 2 Method-dependent analysis of EQA results for EPO levels from 2017 to 2022 (A) Distribution of the EPO measurement results (IU/L) for the individual methods CLIA (red), ELISA (green), LEIA (turquoise), and "other" (violet) in relation to the overall distribution of all measured values in the individual surveys (black) for sample L from 2017 to 2022.In this plot, whiskers span 1.5 times the IQR above and below the box, capturing the middle 50% of the data.The red, green, turquoise, violet and black dots mark outliers, which are defined as observations that exceed 1.5 times the IQR from either edge of the box.(B) The same consideration used for (A) but for sample H. (C) Percentage of the frequencies for the respective measurement methods of the total of all measurements per survey per sample.

FIGURE 3
FIGURE 3 Manufacturer-dependent analysis of EQA results for EPO levels from 2017 to 2022.(A) Distribution of the EPO measurement results (IU/L) for the individual reagent manufacturers BE (red), DG (green), IB (turquoise), and "other" (violet) in relation to the overall distribution of all measured values in the individual surveys (black) for sample L from 2017 to 2022.In this plot, whiskers span 1.5 times the IQR above and below the box, capturing the middle 50% of the data.The red, green, turquoise, violet and black dots mark outliers, which are defined as observations that exceed 1.5 times the IQR from either edge of the box.(B) The same consideration used in (A) but for sample H. (C) Percentage of the frequencies for the respective manufacturers of the total of all measurements per survey per sample.

TABLE 1
Robust mean values (IU/L) calculated by Algorithm A (ISO13528

TABLE 2
Method-dependent and total median (interquartile range; IQR; IU/L) and respective frequencies in each survey (S) from 2017 to 2022 for sample L (L) and sample H (H).

TABLE 2 (
Continued) Method-dependent and total median (interquartile range; IQR; IU/L) and respective frequencies in each survey (S) from 2017 to 2022 for sample L (L) and sample H (H).

TABLE 3
Manufacturer-dependent and total median interquartile range (IQR; IU/L) and respective frequencies in each survey (S) from 2017 to 2022 for sample L (L) and sample H (H).

TABLE 3 (
Continued) Manufacturer-dependent and total median interquartile range (IQR; IU/L) and respective frequencies in each survey (S) from 2017 to 2022 for sample L (L) and sample H (H).