A Comparison of Serum and Plasma Blood Collection Tubes for the Integration of Epidemiological and Metabolomics Data

Blood is a rich biological sample routinely collected in clinical and epidemiological studies. With advancements in high throughput -omics technology, such as metabolomics, epidemiology can now delve more deeply and comprehensively into biological mechanisms involved in the etiology of diseases. However, the impact of the blood collection tube matrix of samples collected needs to be carefully considered to obtain meaningful biological interpretations and understand how the metabolite signatures are affected by different tube types. In the present study, we investigated whether the metabolic profile of blood collected as serum differed from samples collected as ACD plasma, citrate plasma, EDTA plasma, fluoride plasma, or heparin plasma. We identified and quantified 50 metabolites present in all samples utilizing nuclear magnetic resonance (NMR) spectroscopy. The heparin plasma tubes performed the closest to serum, with only three metabolites showing significant differences, followed by EDTA which significantly differed for five metabolites, and fluoride tubes which differed in eleven of the fifty metabolites. Most of these metabolite differences were due to higher levels of amino acids in serum compared to heparin plasma, EDTA plasma, and fluoride plasma. In contrast, metabolite measurements from ACD and citrate plasma differed significantly for approximately half of the metabolites assessed. These metabolite differences in ACD and citrate plasma were largely due to significant interfering peaks from the anticoagulants themselves. Blood is one of the most banked samples and thus mining and comparing samples between studies requires understanding how the metabolite signature is affected by the different media and different tube types.


INTRODUCTION
Classical epidemiological studies seek to identify risk factors to determine the presence or absence of disease and health in a population. Given the technological advancements in high-throughput -omics, epidemiology is now in a more powerful position to be able to uncover biological mechanisms involved in the etiology of different diseases. While research in past decades has identified candidate metabolites contributing to diseases, metabolomics, in particular, offers an unprecedented opportunity to enhance an epidemiologist's traditional toolbox as metabolites are the final products of the genomics, transcriptomics, and proteomics cascade and provide a chemical "snapshot" of an organism's entire metabolic state at any given time.
A wide variety of biological specimens (urine, blood, cerebrospinal fluid, saliva, etc.) may be utilized for metabolomics analysis. Blood is a rich biological sample that is sensitive to the effects of health or disease, genetic variation, environment, nutrition, or the impact of toxicants and is easily obtained and commonly biobanked as serum and plasma in large repositories that collect, process, store, and distribute samples for future scientific investigations. Biobanks are important resources for studies of the connection between genes and diseases, response to drugs and treatments, and other outcomes related to understanding diseases. Several countries have established national and international biobanks as repositories for biological samples, including blood samples. Established in 2006, the United Kingdom (UK) Biobank (Elliott et al., 2008), for example, has collected EDTA plasma, acid citrate dextrose (ACD) plasma, and urine, among other biosamples for future use. In 2015, the National Institute of Health (NIH) initiated the Precision Medicine Initiative (PMI) All of Us Research Program (Sankar and Parker, 2017), which will be the largest longitudinal study in the United States with a cohort of one million volunteers. PMI All of Us aims to understand how a person's genetics, environment, and lifestyle can help to determine the best approach to prevent or treat disease by collecting genetic data, health data, and biological samples (including serum, EDTA plasma, citrate plasma, heparin plasma). In 2020, NIH announced blood samples collected from PMI All of Us participants will be tested for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) antibodies to track prior infections within the US population to address the unprecedented coronavirus disease 2019 (Covid-19) global pandemic-exemplifying the importance of biorepository samples. Metabolomics analysis could be used in a similar endeavor to understand how biomarkers are associated with disease progression, or how metabolites are related to molecular changes in treatment therapies, for instance (Shen et al., 2020). However, understanding whether and/or how the blood collection tube matrices (anticoagulants) affect measurement of the metabolome is critical to successfully merge epidemiology and metabolomics.
Serum is often considered the gold standard as it is obtained from blood that has been coagulated and requires no additives, whereas plasma is obtained by mixing blood with an anticoagulant to inhibit the blood from clotting, followed by collecting the plasma supernatant. While the choice of serum or plasma may depend on the specific research purpose, it may also depend on sample availability such as in clinical and epidemiological studies which routinely biobank biological samples for future analysis. Several previous studies have investigated differences in the metabolic profile based on blood collection tubes (Supplementary Table S1). However, most of the extant literature is limited to investigating EDTA and Heparin plasma (Teahan et al., 2006;Pereira et al., 2010;Bernini et al., 2011;Denery et al., 2011;Wedge et al., 2011;Yu et al., 2011;López-Bascón et al., 2016;Suarez-Diez et al., 2017;Cruickshank-Quinn et al., 2018;Liu et al., 2018;Nishiumi et al., 2018;Paglia et al., 2018), while a few studies have also investigated other plasma tubes [such as citrate plasma (Bernini et al., 2011), potassium oxalate (Yin et al., 2013)] or have done comparisons using animal models (Zhou et al., 2017). Given the assortment of other plasma tube types available and the range of potential blood samples available in biobanks, here we sought to compare how the NMR-based metabolic profiles of serum compared to those profiles from plasma collected in ACD tubes, citrate plasma tubes, EDTA plasma tubes, fluoride plasma tubes, and heparin plasma tubes under the same collection conditions. We aimed to clarify the compositional differences between serum and plasma across the various collection tubes.

Blood Samples
Blood samples were collected from eight healthy volunteers (22-40 years old, with a BMI between 20-25, all female) after an overnight fast. For each subject, serum and plasma samples were collected into six different tubes that included plastic tubes with no additives (for serum), and acid citrate dextrose plasma (ACD), sodium citrate (Citrate), ethylenediaminetetraacetic acid (EDTA), sodium fluoride (Fluoride), and sodium heparin (Heparin) (for plasma) (BD Biosciences, San Jose CA). Each plasma and serum sample was processed as designated by the manufacturer's specification. Briefly, for plasma, collection tubes were inverted eight times followed by centrifugation at ≤1,300 RCF for 10 min at 20°C. Serum tubes were gently inverted five times followed by a 45-60 min resting period at room temperature to obtain complete coagulation before performing the same centrifugation process as for plasma. Each serum/ plasma sample was aliquoted into three 1.0 ml fractions and stored at -80°C until metabolomics analysis. All volunteers gave their written informed consent prior to participation in the study. The study was approved by the institutional review board at the University of California, Davis.

Nuclear Magnetic Resonance Experiment
Three analyses were performed (three independent sample preparations on separate days) on each sample as follows. All samples were prepared in the same laboratory, using the same standard protocols. Data were collected on the same instrument operating with the same settings, but on different days. Plasma and serum samples were thawed, then filtered through an Amicon 3,000 MW cut-off Centrifugal Device to remove lipids and proteins. A 207 µL aliquot of the water-soluble filtrate was collected and combined with 23 µL of an internal standard (ISTD) consisting of 4.47 mM DSS-D6 ([3-(trimethylsilyl)-1propanesulfonic acid-d6], 0.2% NaN3, in 99.8% D2O). The pH of each sample was adjusted to 6.8+/-0.1 by adding small amounts of NaOH or HCl. The volumes of added HCl and NaOH were recorded. A 180 µL aliquot of the mixture was transferred to a labeled 3 mm Bruker NMR tube and stored at 4°C until NMR    5.2 (4.0-8.0) 5.5 (4.2-9.2) 6.6 (5.8-9.7) 6.6 (4.7-9.9) 6.4 (4.7-9.8) 6.8 (4.         6.0 (4.6-7.7) 3.9 (1.9-6.2) 4.9 (3.0-5.7) 4.1 (2.8-5.5) 4.6 (2.9-6.2) 5.4 (3.9-7.3) acquisition (within 24 h of sample preparation). Samples were run on a 600 MHz Bruker AVANCE III NMR spectrometer equipped with a TCI cryoprobe and SampleJet autosampler using the NOESY-presaturation pulse sequence (noesypr). NMR spectra were acquired at 25°C, with water saturation of 2.5 s during the prescan delay, a mixing time of 100 ms, 12 ppm sweep width, an acquisition time of 2.5 s, eight dummy scans, and 32 transients. All spectra were zero-filled to 128K data points and Fourier transformed with a 0.5-Hz line broadening applied. Spectra were manually phased and baseline-corrected and metabolites were identified and quantified using NMR Suite v8.1 (Chenomx Inc., Edmonton, Canada) (Weljie et al., 2006). After analysis, a list of compounds together with their respective concentrations, based on the concentration of the added internal standard (DSS-d6), was generated. All compounds in the database have been verified against known concentrations of reference NMR spectra of the pure compounds and have been shown to be reproducible and accurate (Slupsky et al., 2007;Smilowitz et al., 2013 Figure S1).

Statistical Analysis
Metabolite concentrations were log10 transformed, and principal component analysis (PCA) was performed using the "prcomp" function where each variable was centered by subtracting to the variable means (center True) but not scaled to the standard deviation (scale FALSE) using the ggplot2 library in R. The mean difference and percent coefficient of variation (%CV defined as the standard deviation/mean × 100) of raw metabolite concentrations between serum (control) and each plasma tube type (ACD, Citrate, EDTA, Fluoride, or Heparin) were calculated to compare differences in specific metabolite concentrations between tubes. The Mann-Whitney U test was utilized to evaluate the significance of those differences, because not all metabolites followed a normal distribution. To account for multiple testing, we adjusted p values by controlling the false discovery rate (FDR) at 5% using the Benjamini-Hochberg procedure (p.adjust.method "BH") with p-values, 0.05 as statistically significant. Effect size between serum and each plasma tube was calculated using Cliff's delta (δ) statistic (cliff.delta function from the effsize package). A |δ| < 0.33 corresponds to small, |δ| < 0.474 corresponds to medium, and |δ| > 0.475 corresponds to large effect size in metabolite concentration differences. The methodological precision for each metabolite was calculated as the mean ± SD (%) of the %CV for the 48 samples (six tube types tested across eight subjects) that were individually prepared and analyzed in triplicate by 1 H NMR (for a total of 144 samples). Statistical computing and graphical generation were performed using the R programming environment. The identity of each sample was unblinded only after the analysis was completed. Literature derived reference values from the Human Metabolome Database (HMDB) are also presented for each metabolite to provide readers a better estimate of the potential concentration variations (Table 1).

RESULTS
A total of 52 metabolites were identified and quantified in our study. However, the metabolites cis-aconitate (which was only identified in ACD tubes) and ascorbate (which fell below the limit of detection for EDTA tubes) were excluded from any further analysis as they were not present in all tube types (Supplementary Figure S2). Therefore, a total of 50 metabolites were identified in all collection tubes and used in the analysis. These included amino acids and their metabolites (2aminobutyrate, 2-hydroxybutyrate, 2-hydroxyisovalerate, 2oxoisocaproate, 3-hydroxyisobutyrate, 3-methyl-2oxobutanoate, alanine, arginine, asparagine, betaine, glutamate, glutamine, histidine, isoleucine, leucine, lysine, methionine, N,Ndimethylglycine, ornithine, phenylalanine, proline, serine, taurine, threonine, tryptophan, tyrosine, and valine), ketone bodies (3-hydroxybutyrate, acetoacetate, acetone), pyruvate metabolism (lactate and pyruvate), short-chain fatty acids (acetate and butyrate), sugars (glucose, mannose, and myo-inositol), tricarboxylic acid cycle metabolites (2-oxoglutarate, citrate, and succinate), creatine, creatinine, choline, dimethyl sulfone, methanol, and urea. The mean (SD) and median (interquartile range) of the metabolites, as well as the average %CV (SD) of the technical replicates for each tube type, are provided in Table 1. The average %CV ranged from 2 to 15% overall metabolites and tube types (average 7%), indicating a high degree of repeatability for sample preparation, data acquisition, and data analysis amongst all tube types.
Furthermore, we found that 31 out of 50 metabolite serum concentrations exhibited excellent agreement with concentrations reported in the literature (i.e., fell within one standard deviation of the literature value). An additional 11 metabolites fell within two standard deviations or reference range reported. However, not all our metabolite concentrations agree with literature-derived values. The greatest discrepancy between our serum measured values and the literature-derived values include: 3-hydroxyisobutyrate, 3-methyl-2-oxobutanoate, arginine, butyrate, formate, glutamate, methanol, and tryptophan. Although we have attempted to find reference values reported in the literature collected in a similar manner (i.e., NMR-derived serum values reported in healthy adult populations) some discrepancies between our values and those in the literature may be due to different analytical methods utilized (NMR versus mass-spectrometry), other anticoagulants utilized, differences in the study population, or possibly sample size effects. Nonetheless, 42 of the 50 metabolites (84%) exhibited good agreement with literature derived values.
Principal component analysis (PCA) was applied to investigate inherent patterns in the metabolomic profiles ( Figure 1). On the scores plot, each point represents a sample, with the same color representing the same tube type, and the same letter representing the same subject. The loadings plot indicates the contribution of the measured metabolites to the principal components. On the scores plot, principal component 1 (PC1) accounted for 55.6% of the variation and PC2 accounted for 10.5% of the variation. The corresponding loadings plot identified that citrate concentration greatly contributed to separation along PC1. Citrate is an additive in citrate tubes and in ACD plasma tubes (which also contain high levels of glucose). As such, the concentrations of citrate (in citrate plasma and ACD plasma), and glucose (in ACD plasma) do not reflect true biological concentrations (Supplementary Figure S3). However, there is a clear overlap in the metabolic profiles of all blood collection tube types, particularly for serum and heparin plasma. Additionally, differences between subjects were apparent along PC2, showing that subjects B and E tended to cluster toward the top of the plot regardless of tube type.
The heparin plasma tubes performed the closest to serum, with only three metabolites showing significant differences, followed by EDTA which significantly differed for five. Specifically, only 3 of 50 metabolites (6%), all of which were amino acids (arginine, glutamate, and taurine), were higher in serum compared to Heparin plasma. EDTA plasma differed in only 5 of 50 metabolites (10%) as compared to serum, which included higher levels of the amino acids arginine and taurine, and lower levels of pyruvate, acetate, and formate in serum  compared to EDTA plasma. We also found that 11 of 50 metabolites (22%) were higher in serum compared to Fluoride plasma. These metabolites included amino acids (alanine, arginine, glutamate, glutamine, glycine, histidine, ornithine, taurine), as well as lactate, pyruvate, and myo-inositol. In contrast, metabolite measurements from plasma ACD tubes differed significantly for more than half of the 50 metabolites assessed. Specifically, ACD plasma varied in 29 of the 50 metabolites (58%) compared to serum. Interestingly, most of these metabolites had higher concentrations in serum compared to ACD plasma, which included amino acids and their metabolites, as well as o-acetylcarnitine, choline, urea, lactate, butyrate, and myo-inositol. As expected, glucose and citrate (both of which are additives in ACD plasma) were much lower in serum than in ACD tubes. Succinate, however, had a lower concentration in serum as compared to ACD plasma. Similar to ACD plasma, a high proportion (24 of 50 metabolites, 48%) were also significantly different in citrate plasma. Again, amino acids and their metabolites had increased levels in serum compared to citrate plasma, as well as carnitine, urea, lactate, glucose, and myo-inositol. In contrast, acetoacetate, acetone, acetate, and as expected, citrate concentrations were significantly lower in serum compared to citrate plasma tubes. The mean differences in the concentration of serum metabolites versus plasma samples are summarized in Table 2.

DISCUSSION
Epidemiology has immensely contributed to public health by pinpointing important risk factors (often multifactorial environmental and genetic components) that contribute to disease outcome. Together with metabolomics, epidemiology is now in the position to be able to uncover biological mechanisms to refine the relationship between exposure and disease in humans, which in turn may offer opportunities for intervention. Blood is one of the most banked samples and thus mining and comparing samples between studies requires understanding how metabolite signatures are affected by different matrices and different tube types. Here we investigated differences in the metabolite profile of blood samples collected as serum compared to plasma collected utilizing acid citrate dextrose (ACD), citrate, EDTA, fluoride, and heparin anticoagulants.
Utilizing targeted NMR-based metabolomics analysis, we identified and quantified 52 metabolites, 50 of which were present in all blood samples and were compared. Our results show a high degree of repeatability in terms of sample preparation, data acquisition, and data analysis, showing that the NMR method is precise, and produces highly robust reproducible quantitative data. Historically, serum has been the preferred assay material because it does not require any anticoagulants for its collection. Serum is used to assess clinical chemistry parameters, drug levels, and blood bank procedures, and as such was used as our gold standard. Overall, we found that the analysis of heparin plasma, followed by EDTA, and fluoride plasma had similar metabolic profiles to serum. Heparin, in particular, only differed in 3 of 50 metabolites (arginine, glutamate, and taurine), and had a nearly identical metabolic fingerprint to serum (Figure 1). Previous studies have also found minimal differences between serum and heparin plasma samples (Teahan et al., 2006). EDTA plasma also had a similar metabolic profile to serum. However, EDTA produces strong signals in 1 H-NMR spectra which could obscure neighboring metabolites, such as choline, dimethylamine, and one signal of citrate (Bernini et al., 2011). Our results agree with previous findings from Barton et al. (2010) who also found EDTA had negligible effects on the overall metabolic fingerprint. Very limited work has been done on fluoride tubes, and we found notably higher levels of pyruvate in serum compared to fluoride plasma. Fluoride tubes are specialized tubes that contain sodium fluoride to inhibit the metabolic processes of glycolysis by erythrocytes. This explains the difference in pyruvate concentration obtained by the analysis of samples from fluoride tubes and serum tubes. Nonetheless, most metabolites in fluoride tubes were very similar to serum tubes including ketone bodies, lipid metabolism metabolites, short-chain fatty acids, tricarboxylic acid cycle intermediates, most amino acids, and sugars.
We found that ACD plasma and citrate plasma were very different from serum, largely due to significant interfering peaks (from citrate and glucose) in the NMR spectra which originate from the anticoagulants themselves. One could exclude glucose and citrate metabolites from analysis to utilize these plasma tube types; however, it comes with the cost of losing the ability to quantify these two important biological compounds. Interestingly, although substantial differences were associated with tube types in the metabolic profiles, clear differences between subjects are preserved, even among ACD and Citrate tubes. For example, in Figure 1, samples from subjects B and E largely cluster together due to high levels of ketone bodies (acetoacetate, acetone, and 3-hydroxybutyrate) regardless of tube type. Indeed, we have successfully utilized ACD plasma samples in a previous study to investigate metabolomic differences among individuals with developmental disabilities in an epidemiological case-control study bridging metabolomics and epidemiology (Orozco et al., 2019). However, we limited our analysis to only include individuals with ACD samples and did not include any serum samples to avoid any confounded results based on measurement errors from the collection tube rather than true biological differences, and we excluded citrate and glucose from our analysis.
An interesting finding in the present study is that most amino acids and their derivatives had higher concentrations in serum compared to all plasma tube types. This finding agrees with previous studies, even across different analytical platforms. For example, Denery et al. (2011) also found higher concentrations of amino acids and their metabolites in serum compared to heparin plasma using liquid chromatography−mass spectrometry (LC-MS). Paglia et al. (2018) found amino acid concentrations were higher in serum compared to EDTA and citrate plasma utilizing LC-MS. Nishiumi and colleagues (Nishiumi et al., 2018) reported higher amino acids and derivatives levels in serum compared to EDTA plasma utilizing LC-MS. Additionally, Yu et al. (2011) also found significantly higher amino acid levels in serum compared to EDTA plasma. One possible explanation for the difference in amino acid concentrations is that the added anticoagulants themselves are likely to dilute the samples. ACD tubes, for example, contain trisodium citrate (22.0 g/L), citric acid (8.0 g/L), and dextrose (24.5 g/L). Similarly, citrate plasma tubes contain 3.2% buffered sodium citrate solution. Additionally, differences in amino acids could also be due to the coagulation step of serum collection, which is likely to concentrate metabolites in serum in a reduced volume (Paglia et al., 2018). Both of these factors may play a role in the different plasma amino acid levels relative to serum.
Regarding other notable differences in serum compared to plasma, we found that pyruvate, acetate, and formate were significantly lower in serum compared to EDTA plasma. Suarez-Diez et al. (2017) also reported lower concentrations of formate and pyruvate in serum compared to EDTA plasma. Additionally, lactate was notably higher in serum compared to all plasma tubes, though it only reached statistical significance in ACD, Citrate plasma, and Fluoride plasma. Lopez-Bascon et al. (2016) also reported significantly higher lactate in serum compared to EDTA plasma, while Teahan et al. (2006) found that lactate was higher in serum compared to heparin plasma. Lopez-Bascon also found higher concentrations of myo-inositol in serum compared to EDTA plasma. Similarly, we found myoinositol was higher in serum compared to ACD, citrate plasma, and fluoride plasma.
Overall, there are important aspects that should be considered when designing an experiment where metabolomics analysis might be performed, preparing to bank samples, or using banked samples. We recommend the use of serum for metabolomics studies since anticoagulants that interfere with downstream laboratory applications are avoided, and the impact of these anticoagulants on the concentration of certain metabolites, such as amino acids, can be avoided. In the case that serum is unavailable, we have shown that both heparin plasma and EDTA plasma approximate the concentrations observed in serum closely. Further, we suggest that the mean difference summarized in our study (Tables 1, 2) can be utilized as a correction factor to adjust metabolite concentrations collected in plasma to be similar to concentrations in serum. This could be useful to pool data from biobanked samples across epidemiological studies that were not collected using the same tube. Similarly, corrections could be used for meta-analyses combining results across studies of metabolites collected in different tubes.
Additional considerations in blood tube choice may depend on other intended downstream analyses. Heparin plasma (which inhibits thrombin activity) and EDTA plasma (which binds calcium ions) are both broadly used in clinical and epidemiological research. However, heparin binds to DNA during purification and inhibits Taq polymerase used for polymerase chain reaction (PCR). Although we have shown this is not problematic for NMR-based metabolomics, these samples would not be recommended for DNA work. Likewise, one also needs to consider if the large EDTA peaks in the NMR spectra may interfere with metabolites of interest [such as choline, dimethylamine, and one peak of citrate (Bernini et al., 2011)]. Furthermore, an important distinction also needs to be made between conventional blood collection tubes and those utilizing gel separator tubes for investigators considering metabolomics analysis. Gel separator tubes are used to accelerate the process of serum or plasma separation and theoretically should not change the metabolite composition because of the inertness of gel. Yet, several studies have shown changes in the metabolite fingerprints of samples collected utilizing polymeric gel tubes compared to conventional tubes, particularly for amino acids (Yu et al., 2011;López-Bascón et al., 2016). As such, the use of gel separator tubes is not recommended.
We have chosen to utilize serum as the gold standard in our study by which to compare all other plasma tubes due to the broader applications of serum, the limitations of some anticoagulants in plasma, and because previous studies have found that serum samples had the greatest number of recovered metabolites compared to plasma (Denery et al., 2011;Cruickshank-Quinn et al., 2018;Nishiumi et al., 2018). However, a limitation of serum is that the processing time can be subject-dependent (i.e., clotting time may vary across individuals) (Tuck et al., 2009). Therefore, metabolic processes from biologically active analytes may still be occurring and affect accurate metabolite quantification in serum. Other studies have shown pre-analytical steps, such as freeze-thaw cycles, can negatively affect the metabolome profile (Bernini et al., 2011;Townsend et al., 2016;Cruickshank-Quinn et al., 2018;Nishiumi et al., 2018). A strength in our study is that aliquots of all samples were immediately frozen at -80°C after collection, were under the same storage duration and conditions, and never underwent previous freeze-thaw cycles before NMR-based metabolomics analysis, which could have negatively affected metabolite stability.

CONCLUSION
Careful consideration about which blood collection matrix to use in a study is critical to obtain meaningful biological inferences from metabolome data. While serum is considered the gold standard, we have shown that Heparin and EDTA plasma are comparable to serum for NMR-based metabolomics studies. We also found that ACD plasma and citrate plasma were the most different from serum tubes, largely due to significant interfering peaks (from citrate and glucose). Yet, despite the differences in metabolite concentration based on tube type (particularly for ACD and Citrate plasma), clear differences between subjects were preserved regardless of tube types. Our results, and others, show serum samples have higher levels of amino acids and their derivatives compared to plasma. Bridging technological advancements in metabolomics with classical epidemiological approaches can provide new insight into the etiology of diseases.