Assessment of Circulating Protein Signatures for Kidney Transplantation in Pediatric Recipients

Identification and use of non-invasive biomarkers for kidney transplantation monitoring is an unmet need. A total of 121 biobanked sera collected from 111 unique kidney transplant (KT) patients (children and adolescent) and 10 age-matched healthy normal controls were used to profile serum proteins using semi-quantitative proteomics. The proteomics data were analyzed to identify panels of serum proteins that were specific to various transplant injuries, which included acute rejection (AR), BK virus nephropathy (BKVN), and chronic allograft nephropathy (CAN). Gene expression data from matching peripheral blood mononuclear cells were interrogated to investigate the association between soluble serum proteins and altered gene expression of corresponding genes in different injury phenotypes. Analysis of the proteomics data identified from different patient phenotypes, with criteria of false discovery rate <0.05 and at least twofold changes in either direction, resulted in a list of 10 proteins that distinguished KT injury from no injury. Similar analyses to identify proteins specific to chronic injury, acute injury, and AR after kidney transplantation identified 22, 6, and 10 proteins, respectively. Elastic-Net logistic regression method was applied on the 137 serum proteins to classify different transplant injuries. This algorithm has identified panels of 10 serum proteins specific for AR, BKVN, and CAN with classification rates 93, 93, and 95%, respectively. The identified proteins could prove to be potential surrogate biomarkers for routine monitoring of the injury status of pediatric KT patients.

Recipient age a (years) 12  of the detailed mechanism of transplant injury would help in the proper management of transplanted organs for long-term survival (4). Recent developments in high-throughput assays such as gene microarrays and mass spectrometry-based proteomics have proven to be useful in profiling gene transcripts (mRNA) and proteins, and this global-scale molecular profiling has helped in a better understanding of homeostasis of organ failure (5)(6)(7)(8)(9). By analyzing carefully selected samples using advanced statistics, gene microarrays, and mass spectrometry-based proteomics, we can provide a global picture of gene expression and corresponding protein levels specific to disease conditions (6)(7)(8)(10)(11)(12)(13)(14)(15). We and other groups have previously reported genes, proteins, and antibodies that are associated with different transplant injuries, including AR (5,10,12,16,17). Our recent studies have demonstrated that mechanisms of graft rejection are common across different solid organs (10,18,19). These observations have suggested a strong signal for T cell activation, T cell receptor engagement, and interferon gamma-and STAT1-regulated pathways driven through various chemokines (10). Additional molecular data from carefully selected patient cohorts would provide additional information to our current understanding and provide potential biomarkers for detection of transplant injuries.
The objective of this study was to identify signatures of serum proteins that are associated with KT injury in pediatric patient population. To meet the objective, we used a unique set of 121 serum samples from pediatric patients with kidney transplantation and performed mass spectrometry-based proteomics, followed by statistical and bioinformatic analyses to identify proteins that are associated with different kinds of transplant injuries in kidney. In addition, we also interrogated gene expression data from matching peripheral blood mononuclear cells (PBMCs) to investigate if there existed any correlation between the level of serum proteins and the gene expression of corresponding proteins in matching PBMCs.

Patient samples
Serum samples were collected from 111 unique pediatric kidney transplant (KT) patients from Lucile Packard Children's Hospital, Stanford University. Sera from 10 age-matched healthy normal controls were also included in the study as a non-transplant control. Each sample was matched with a biopsy collected at the time of serum collection. The study included 27 AR, 20 calcineurin inhibitor toxicity (CNIT), 25 Table 1 and Figure 1. All the kidney biopsies were blindly analyzed by a Stanford University pathologist and were graded by the Banff classification (20)(21)(22) for AR, and intragraft C4d stains were performed (23,24) to assess for acute humoral rejection (25,26). Transplant "injury" was defined as a >20% increase in serum creatinine from its previous steady-state baseline value and an associated biopsy that was classified as AR, CAN, CNIT, BKVN, or STA. AR was defined at minimum, as per Banff schema, as a tubulitis score ≥1 accompanied with an interstitial inflammation score ≥1 with both C4d and DSA negative. For this study, we included only T-cell-mediated AR. Antibody-mediated rejection cases were excluded from the study. CAN was defined at minimum as a tubular atrophy score ≥1 accompanied by an interstitial fibrosis score ≥1. The histological lesions of chronic CAN were extensively identified, and a semi-quantitative score for CAN was applied to each biopsy, based on standardized definitions from the Banff (2), chronic allograft damage index (3), and chronic CNIT (19) scores. BKVN was defined as positivity of polyomavirus PCR in peripheral blood, together with a positive SV40 stain in the concomitant renal allograft biopsy. Normal (STA) allografts were defined by an absence of significant injury pathology as defined by Banff schema.

serum sample collection and storage
Blood samples (4.5 mL) were collected into a 5 mL red top tube and incubated at room temperature for 30 min until the clot was formed. The sample was then centrifuged at 2,000 × g for 5 min using a swinging bucket rotor. The upper layer of serum was then transferred to another cryotube and was stored at −80°C until use.

Mass spectrometry
Serum samples were depleted of the 20 most abundant serum proteins by ProteoPrep20 Plasma Immunodepletion Kit (Sigma-Aldrich, St. Louis, MO, USA, Cat. PROT20). The eluate from each sample was subsequently subjected to trypsin digestion with a standard trypsin digestion protocol. Tryptic peptides were reconstituted in a buffer containing 0.2% formic acid, 2% acetonitrile, and 97.8% water prior to mass spectrometry. The high-performance liquid chromatography utilized was an Eksigent nano2D (Eksigent) with a self-packed 150 μM ID C18, 15 cm column. The electrospray source was a Michrom Advance operated at 600 nL/min on an LTQ Orbitrap Velos (Thermo Fisher). Data acquisition was performed in a data-dependent manner in which the top 12 (Velos) most intense-charged peptide ions were selected for MS/MS fragmentation of charge state 2+ and 3+. Data were subsequently extracted with msconvert script into an mzXML format prior to Sorcerer (SAGE-N) analysis with the Sequest algorithm. The IPI human database was searched using a 50 ppm mass window on the precursor ion. We allowed for the static modification of propionamide on cysteine and variable modifications of methionine oxidation and lysine acetylation. All searches were compiled and displayed in a Scaffold (Proteome Software Inc., Portland, OR, USA) interface, which listed the identified proteins with cumulative spectral counts for each protein.

gene expression analysis
Gene expression analysis was performed for gene expression aberrations on a subset of whole blood collected at the same time from the study subjects for the proteomics study. This analysis included blood samples collected from 11 AR, 11 STA, 9 CAN, 9 BKVN, and 9 CNIT patients. Blood was collected in 2.5 mL PAXgene™ Blood RNA Tubes (PreAnalytiX, Qiagen, Valencia, CA, USA). Total RNA was extracted, and cDNA synthesis was done using a previously published protocol (27). Synthesized cDNA was then hybridized onto GeneChip Human Genome U133 Plus 2.0 Arrays (Affymetrix Inc., Santa Clara, CA, USA). The arrays were washed and scanned as recommended by the Ovation Biotin RNA Amplification and Labeling System User Guide (version 1.0) (NuGEN Inc., San Carlos, CA, USA). The data were analyzed using AltAnalyze software (28). Genes were considered to have differentially expressed by an empirical Bayes t test, p value <0.05, and fold change 1.5.

Data analysis
Data analysis was done in three steps. First, we performed ANOVA to identify injury-specific serum proteins in KT injury. In this step, mass spectrometric data were analyzed with settings that included the false discovery rate (FDR) for protein identifications 0.1% based on target decoy analysis using Sequest ® and the FDR at the unique peptide level 0.2%. The spectral counts for the identical IPI numbers were summed. After filtering for proteins that were not consistently identified in different samples, the identified proteins were used for further analysis. In the first step, the data were quantile-normalized and used for ANOVA analysis. Phenotype-specific proteins were identified with ANOVA with the criteria FDR <0.05 and at least twofold change in either direction. Principal component analysis (PCA) plots and heatmaps were generated using Partek Genomics Suite (Partek Inc., St. Louis, MO, USA).
In the second step, we used penalized logistic regression to identify proteins that could serve as potential surrogate biomarkers for different KT injuries. This approach provides not only accurate estimation for the regression coefficients but also probability estimation for each patient (29,30). We used the regularization paths for generalized linear models via Coordinate Descent for the estimations (30). The logistic equation used is: where π(x) = P(x) (case) and x1, …, xK are the expressions of K proteins for observations x. The Elastic-Net fits this model by adding a mixed penalty term to the likelihood We fitted 100 Elastic-Net logistic regression models to the 137 proteins using bootstrapped samples maintaining the use of ~75% samples in the training set and ~25% samples in the test set to classify between transplant samples and non-transplant samples with different transplant injury status. For each bootstrap, a nested cross-validation loop estimated the best value according to the deviance. The parameter of the Elastic-Net was fixed at 0.95, the value recommended by Friedman et al. (30). The mean test classification rate was 96%. We counted the number of times each protein was selected by the Elastic-Net over the 100 bootstraps. For each of the bootstrap samples, the Elastic-Net fit a subset of the 137 proteins with non-zero coefficients. For each protein, we counted the number of bootstrap samples for which the protein had non-zero coefficients. After running the 100 bootstrapped models, we selected the K proteins with the greatest number of non-zero coefficients. Classification with the reduced set of K proteins: in order to have an unbiased estimation of the predictive performance (classification rate, sensitivity, specificity, PPV, and NPV), we ran another set of 100 bootstrap Elastic-Net classifications with nested cross-validation for λ, this time using only the set of K proteins selected in step 1. Finally, in the third step, we fitted the selected models to the whole dataset to give the regression coefficients.

resUlTs
The objective of this study was to identify KT injury-specific proteins. By the proteomics analysis of serum proteins, we identified 137 proteins across all the samples with the criteria of (i) minimum two peptides per protein for a positive identification and (ii) 0.1% FDR for protein and 0.2% FDR for peptide identification Sequest ® (31). An enrichment analysis for the biological association of the 137 proteins listed complement and coagulation cascade (p = 4.79E−38) and peptidase inhibitor activity (p = 1.02E−13) as the top associations of the proteins identified.
serum Proteins specific to Different injury subtypes To identify KT injury status, ANOVA was performed on proteomics data with the criteria of FDR <0.05 and at least twofold change in either direction for significance. (i) Injury vs no injury: for this, the samples were classified in to injury (AR, CAN, CNIT, BKVN) and no injury (STA and HC). Ten differentially expressed unique proteins were identified specific to kidney injury after transplantation ( Table 2). These 10 proteins were significantly associated with a complement activation pathway (FDR = 0.006).
(ii) Chronic injury-specific serum protein markers: the contrast between CAN/CNIT (Chronic) and AR/BK/STA (non-chronic) identified 22 differentially expressed proteins listed in Table 2. These proteins are associated with positive regulation of cholesterol esterification (FDR = 5.26E−10), plasma lipoprotein particle remodeling (FDR = 5.64E−08), regulation of response to wounding (FDR = 5.64E−08), and regulation of inflammatory response (FDR = 1.02E−07). (iii) Acute injury-specific proteins: we identified 6 differentially expressed proteins between patients with acute injury (AR and BKVN) and those with chronic injury (CAN and CNIT) or STA ( Table 2). These proteins are associated with blood coagulation pathways. (iv) Identification of AR with 10 proteins: 10 differentially expressed proteins were identified between patients with AR and those without AR (CAN, CNIT, BKVN, and STA) ( Table 2)   identified for injury, chronic injury, acute injury, and AR demonstrated a good separation of the phenotypes (Figures 2A-D).

identification of serum Protein Panel for Different KT injuries
To identify transplant injury-specific panels, we fitted 100 Elastic-Net logistic regression models to the 137 proteins using  Figure 3A shows separation of projection score for different phenotypes. The ROC analysis was performed for the classification potential, which resulted in AUC of one by the model consisting of 10 proteins. The interception and regression coefficients are shown in Table 3 Table 3. Figure 3C shows the separation of projection  Table 3. Figure 3 shows the separation of projection score for AR with other phenotypes. (vi) Serum protein biomarkers to distinguish CAN vs CNIT: a panel of 10 proteins was identified that classified CAN (n = 25) vs CNIT (n = 20). The mean classification rate was 0.95. The sensitivity, specificity, PPV, and NPV were 0.96, 0.94, 0.96, and 0.95, respectively. The interception and regression coefficients are shown in Table 3.

gene expression analysis of Whole blood from a subset of serum samples Used in Proteomics analysis
A subset of PBMCs with matching serum samples from the same collection date for each patient was analyzed. A total of 9,665 genes were significantly different in whole blood by ANOVA (p < 0.05). Among these, only 39 proteins (gene products) were identified in our serum proteomics data. With the significance criteria of an empirical Bayes t test p value <0.05 and a fold change of two vs stable graft (STA), a total of 626 upregulated genes were identified, which enriched for the biological system process (FDR = 0.0004). In addition, there were 7,316 downregulated genes enriched in AR, which are associated with cellular macromolecule metabolic process (FDR = 5.1E−14) and primary metabolic process (FDR = 2.1E−13). There were 12 upregulated and 713 downregulated genes in CAN. CAN-associated genes were highly enriched in immune system process (FDR = 7.4E−9) and response to stress (FDR = 2.4E−6). Blood from BKVN patients demonstrated 141 upregulated and 1,321 downregulated genes in BKVN, which were enriched in anatomical structure morphogenesis (FDR = 4.2E−05) and transcription regulatory region DNA binding (FDR = 0.005). There were seven upregulated and 290 downregulated genes associated with CNIT. These CNIT-associated genes were enriched in viral transcription process (FDR = 0.003). Even though we were not expecting to see correlation between serum protein level and gene expression in the whole blood, we noticed that some serum protein levels had aligned with corresponding gene expression levels in the PBMCs. Among those genes, complement factor properdin (CFP), attractin (ATRN), serum IgA (IGHG2 -gene), and clusterin (CLU) levels were downregulated in acute injury, including AR (both gene expression and protein level).

DiscUssiOn
Despite the fact that there is an urgent need to improve long-term survival of transplanted organs, the organs succumb to immuneand non-immune-related causes (32). Sequencing of human genome and subsequent development of "omic" platforms have led us to investigate the biology of transplant rejection and failure (15,17,33). The data presented in recently published reports have helped our understanding in immune-related processes (3)(4)(5)(6)(7). Similar studies to identify serum proteins in transplant injuries have been published, but they suffer from either small sample size (34) or a lack of diversity of injuries to really understand the perturbations of those proteins (35). In addition, rejection is a heterogenous biological process, and it has been accepted that the possibility that a single gene or protein could serve as a surrogate biomarker or represent the multitude of biological events that occur concurrently during transplant injury (36) is unlikely. The semi-quantitative proteomics strategy used in this study and the penalized logistic regression to identify potential biomarker protein panels for different KT injury phenotypes is a novel approach that we used in the context of biomarker discovery of serum proteins in kidney transplantation. The method has provided accurate estimates for the regression coefficients and probability estimates for each transplant injury phenotype in pediatric KT recipients (29,30). There are several important findings in this report. First, we identified proteins associated with different transplant injuries. A broad analysis to identify transplant injury-associated proteins listed proteins such as CFP, which is known as a positive regulator of complement activation (37), and CLU, which is known to help with the clearance of cellular debris and apoptosis (38). Six out of eight proteins identified were associated with complement activation and alternative pathways. Complement and coagulation cascade-related proteins such as complement component (C5), complement factor B (CFB), and Protein S alpha (PROS1) were associated with chronic injury. Involvement of the complement system in transplant rejection and chronic injury has been reported (39,40). Association of complement and coagulation cascade-related proteins is relevant as a response to immune system and vascular injury to the organ. A set of immune regulatory proteins namely alpha-2-HS-glycoprotein (AHSG), apolipoprotein A-IV (APOA4), ATRN, IgG, and fibronectin 1 (FN1) were associated with acute injury. Fibronectin is known to participate in cell adhesion, growth, migration, and differentiation (41). Soluble FN is expressed in hepatocytes (42) and is produced as a response to vascular injury (43). Our observation of a decreased level of serum fibronectin (FN) in chronic injury and increased level of FN in acute injury, which includes AR and BKVN, is interesting as its protein level in in the blood and expression in the rat kidney has been studied in the context of transplant rejection with no clear association (44)(45)(46)(47). Molecules involved in the regulation of defense and inflammatory response are namely AHSG/fetuin-A, APOA4, ATRN, CFB, SERPINA3, apolipoprotein A-I, IgG, complement component 5 (C5), immunoglobulin heavy constant alpha 1 (IgA), and histidine-rich glycoprotein. The elevation of these proteins in the blood during an AR episode is attributed to the response to the acute inflammation due to rejection.
Second, using a penalized logistic regression panel, we have identified panels of 10 proteins as a surrogate biomarker panel for different transplant injuries. After several efforts in biomarker discovery for AR and other transplant injuries, it has now become evident that finding a single gene or protein as a biomarker is almost impossible. Biomarker discovery and validation is an arduous task and takes a long-term study and many validation steps. Even though we started off with identifying the panel of potential biomarker proteins, and validation of biomarker panels was not the scope of this study, we utilized data generated in this to identify potential biomarker panels for transplant injuries and validated them using ~25% of the study samples. To this end, using our penalized logistic regression model, we have identified 10 protein panels for transplant, transplant injury, AR, acute injury that included AR and BKVN, and a panel that differentiates CAN from CNIT. The sensitivity, specificity, PPV, and NPV are listed for different models, which show some promise for these proteins to serve as biomarkers. We acknowledge the small sample size and therefore consequently our results could suffer from overfitting. Validation of this will require use of more prospectively collected samples in a clinical trial setting on a larger cohort of patient samples.
The third important finding of this study is that profiling of serum proteins and profiling of transcriptome from whole blood provided complementary information about the perturbations in the blood in case of other injuries and allowed us to compare complementary information about molecular pathways that were obtained from serum proteomics and gene expression of the blood cells. Even though the overlap in between the serum proteins (n = 137) to gene expression data (n = ~20,000) was minimal, this is our effort to understand the molecular perturbations in different tissue types. This approach is important as analyzing different tissue types will help build a complete picture of biological events that occur during an immune or non-immune insult to the transplanted kidney.
In conclusion, using a high-throughput semi-quantitative mass spectrometry-based proteomics approach with serum samples from pediatric KT patients; we have identified proteins associated with different transplant injuries in kidney transplantation. We acknowledge some limitations of this study, including that its conclusions are based on discovery data and lack independent validation. We acknowledge that the use of a strategy that used depletion of high abundant proteins, including the possibility that we depleted any potential biomarker that was albumin-bound. Every discovery study like this should be followed up with a larger validation study using an independent cohort of samples for proteins and genes as a surrogate biomarker for KT injury. With this promising result in hand, we are planning studies that will include validation of identified proteins and genes in a set of prospectively collected samples. This longitudinal sample-set will also determine whether the increased or decreased level of the serum proteins correlates with injury status over time.

eThics sTaTeMenT
This study was carried out in accordance with the recommendations of the University of California San Francisco with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Human Research Protection Program, Institutional Review Board (IRB) of University of California San Francisco.

aUThOr cOnTribUTiOns
TS and MS conceptualized the study. TS carried out the experiments and data generation. Both TS and MS analyzed the data and prepared the manuscript.