Integration of Proteomics, Bioinformatics, and Systems Biology in Traumatic Brain Injury Biomarker Discovery

Guingab-Cagmat, Joy; Cagmat, Emilio; Hayes, Ronald  L; Anagli, John

doi:10.3389/fneur.2013.00061

REVIEW article

Front. Neurol., 31 May 2013

Sec. Neurotrauma

Volume 4 - 2013 | https://doi.org/10.3389/fneur.2013.00061

This article is part of the Research TopicTowards translating research to clinical practice: Novel Strategies for Discovery and Validation of Biomarkers for Brain InjuryView all 18 articles

Integration of proteomics, bioinformatics, and systems biology in traumatic brain injury biomarker discovery

J.D. Guingab-Cagmat*

E.B. Cagmat

R.L. Hayes

J. Anagli*

Banyan Biomarkers, Inc., Alachua, FL, USA

Traumatic brain injury (TBI) is a major medical crisis without any FDA-approved pharmacological therapies that have been demonstrated to improve functional outcomes. It has been argued that discovery of disease-relevant biomarkers might help to guide successful clinical trials for TBI. Major advances in mass spectrometry (MS) have revolutionized the field of proteomic biomarker discovery and facilitated the identification of several candidate markers that are being further evaluated for their efficacy as TBI biomarkers. However, several hurdles have to be overcome even during the discovery phase which is only the first step in the long process of biomarker development. The high-throughput nature of MS-based proteomic experiments generates a massive amount of mass spectral data presenting great challenges in downstream interpretation. Currently, different bioinformatics platforms are available for functional analysis and data mining of MS-generated proteomic data. These tools provide a way to convert data sets to biologically interpretable results and functional outcomes. A strategy that has promise in advancing biomarker development involves the triad of proteomics, bioinformatics, and systems biology. In this review, a brief overview of how bioinformatics and systems biology tools analyze, transform, and interpret complex MS datasets into biologically relevant results is discussed. In addition, challenges and limitations of proteomics, bioinformatics, and systems biology in TBI biomarker discovery are presented. A brief survey of researches that utilized these three overlapping disciplines in TBI biomarker discovery is also presented. Finally, examples of TBI biomarkers and their applications are discussed.

Introduction

Tremendous efforts have been put into the discovery of biomarkers that can diagnose disease or injury. A quick search for scholarly articles that include the word biomarker can yield more than half a million hits. However, the overall status of biomarker development and clinical validation is very disappointing. There are only a handful of novel biomarkers that are of clinical relevance, and the rate at which a biomarker is introduced to the market is dismal. One estimate shows that since 1998, new protein biomarkers that were approved by the US Food and Drug Administration fell to one per year (Rifai et al., 2006). The reasons for this trend are numerous and one strategy to reverse the fall is a better understanding of the whole process itself. One key strategy in hunting for that robust biomarker is the combination of scientific disciplines. In traumatic brain injury (TBI), an interdisciplinary approach is employed by combining the methods and tools from three fields, namely, proteomics, bioinformatics, and systems biology.

Proteomics is the study of protein populations or proteomes. The term “proteome” was coined in 1995 (Wasinger et al., 1995) to describe the protein complement of a genome. This came after realizing that at least half of the proteins encoded by the human genome have no known functions. The move to study the message (mRNA or cDNA) and focus on the product of the message (proteins) gave birth to proteomics. Proteomics assesses the expression level of proteins, post translational modifications and interactions of proteins within a tissue, cell, subcellular compartment, or biofluid. The goal is to obtain a large scale and a global view of physiological conditions and disease processes. However, studying the global systems of proteins has produced a large amount of data, and making sense of the complex data generated became a problem. In the beginning, it was clear that processing a vast amount of data requires the aid of computers. Like genomics a decade ago, proteomics tackled the problem by enlisting the help of bioinformatics and later on, systems biology.

Bioinformatics combines mathematics and computer technology to deal with the analyses of large numbers of proteins while systems biology unveils the global network of physiological environments. Bioinformatics has become an integral part of proteomics, strategically mining data for sensible results. Systems biology on the other hand tries to look at the big picture by mapping interactions of isolated proteins, akin to looking at the ecosystem of the whole forest, rather than just the individual trees.

The triad of proteomics, bioinformatics, and systems biology has been applied to study protein behaviors in myriad disease pathologies. It was no different with neurological conditions such as TBI, Alzheimer’s disease, and stroke. Neuroproteomics (Choudhary and Grant, 2004), a field under the proteomics umbrella, has zeroed in these disorders, extracting insights into the dynamics and interactions of proteins in these disease states (Ottens et al., 2006, 2010; Bayes and Grant, 2009; Alzate, 2010; Shoemaker et al., 2012).

One of the neurological conditions that received a fair amount of media attention lately is TBI. Although TBI is known as the “silent epidemic,” the public is beginning to be aware of the injury as war veterans come home from war-zone blasts. Even the military has acknowledged that TBI is the “signature injury of the conflicts in Iraq and Afghanistan” (Risdall and Menon, 2011).

Increasing media coverage to concussive injury has increased lately. This is partly due to the increase in suicides of football players. High profile cases of professional football players have captivated the public, highlighted by the suicide of Dave Duerson. Mr. Duerson, a professional football player, was suspected to have suffered TBI during his playing years. He was found dead with a gunshot to his chest, not in his head, to preserve his brain for science.

Statistically, TBI is one of the leading causes of disability in the United States. It is considered one of the major health problems annually claiming 5% of the lives of the two million victims. Around 25% are hospitalized and approximately half are treated and released after emergency care (Smith et al., 2003; Johnson et al., 2004). It is estimated that by 2030, the public health impact of TBI will increase (Mathers and Loncar, 2006). This should alarm us as road traffic accidents will be the most common cause of blunt trauma, making TBI the fourth leading cause of disability.

The disturbing reality for victims and their families is that currently, there are no FDA-approved treatments or therapy (except for pain relievers) that can alleviate the effects of TBI. One of the most pressing needs however, is the accurate diagnosis and monitoring of patients. Physicians should be guided if patients respond to the treatment and improve. But to this day, clinicians are limited only by parameters such as brain pH, pO₂, intracranial pressure (ICP), and temperature. Brain imaging techniques such as Computer Tomography (CAT) and Magnetic Resonance Imaging (MRI) scans have provided information of damaged regions non-invasively, but only looking at the injury in a short time. The limitations of traditional diagnosis have hindered the overall progress in understanding the condition, highlighting the need for more accurate diagnostic tools. The goal is that a robust biomarker or panels of biomarkers will complement existing diagnosis, and eventually replace the more traditional ones.

In this paper, advances and limitations of proteomics, bioinformatics, and systems biology will be discussed. We shall then try to integrate the three fields in relation to biomarker discovery, and limiting the discussion only to protein biomarkers in TBI. This article is structured as follows. In Section “Biomarkers, TBI Models, Proteomics, Bioinformatics, and Systems Biology, Their Definition,” we define biomarkers, TBI animal models, proteomics, bioinformatics, and then systems biology. In Section “Protein Profiling,” we shall review the methods, challenges, and technical difficulties inherent in identifying proteins. Section “Biomarker Applications” deals with the present panels of proteins that can be used as a biomarker for TBI.

Biomarkers, TBI Models, Proteomics, Bioinformatics, and Systems Biology, Their Definition

Biomarkers are indicators of normal biological processes or disease states. A biomarker can also be a gage of pharmacological response in therapeutic interventions (Lesko and Atkinson, 2001).

The idea in biomarker discovery is that organs secrete specific molecules that can indicate a physiological malfunction. In general, these are any biomolecules that can serve as a fingerprint showing up from samples of affected tissue or peripheral fluids of the affected area. In the context of TBI and proteomics, ideal biomarkers are proteins that are only present in the brain, leaked out from the blood brain barrier and into the person’s blood or cerebrospinal fluid (CSF) during or after brain injury. These molecular signatures should be proportional to the impact and the extent of damage in the brain, and should reflect differences between age groups and sex.

Numerous animal models of TBI have been developed to understand the heterogeneous nature of brain injury (recently reviewed by Chopp et al.) (Xiong et al., 2013). Due to their low cost and the presence of more standardized outcome measurements, rodent models are particularly used to study TBI although bigger animals are closer to human physiology. Controlled cortical impact (CCI) uses a controlled degree of impact by a pneumatic or electromagnetic impact device (Lighthall, 1988; Dixon et al., 1991). Penetrating ballistic-like brain injury (PBBI) model mimics severe to moderate TBI such as gunshot wounds. PBBI is induced by transmission of high energy projectiles and a leading shockwave producing a temporary cavity in the brain that is many times the size of the projectile itself (Williams et al., 2005). Another widely used model is the fluid percussion injury (FPI) where a contusion force is incurred by the movement of a fluid in a chamber. In the drop-weight impact acceleration injury, the skull (with or without craniotomy) is exposed to a weight that is dropped from a certain height and injury severity can be altered by adjusting the mass of the weight and the height from which it falls. The more recent TBI models are the blast models that mimic TBI induced by explosive devices. Blast-induced brain injuries have been predominant among military personnel who have been exposed to a blast but do not have external injuries (Warden, 2006; Benzinger et al., 2009). Different variations of blast TBI animal models have been developed to elucidate the effects of primary blast waves on the brain (Wang et al., 2005; Cheng et al., 2010; Svetlov et al., 2010; Risling et al., 2011). Elucidation of the mechanisms of blast injury, identification of biomarkers and, eventually, the development of strategies for mitigating blast-induced brain injury will benefit from further design optimization, characterization, and standardization experimental parameters of blast TBI models.

While TBI can occur as a result of auto accidents, violence, or sports injuries it has left the shadows with the war in Iraq and Afghanistan. Twenty-first century warfare exposes military personnel to blast injuries resulting from high order explosives. The Kevlar helmet, although an excellent protection against penetrating brain injury, offers little protection from blast injuries (Lew, 2005; Okie, 2005). Accurate statistics are not currently available, but it is estimated that more than 50% of all casualties from the Afghanistan and Iraq theaters have sustained head injuries (Warden, 2006) compared to 15–25% from twentieth century conflicts (Carey, 1996). Of the 1.4 million TBIs that occur annually, the vast majority, between 75 and 90% are mild or moderate (mTBI) (Jager et al., 2000; Gerberding, 2003). Mild and moderate TBI, also called concussion, occurs when an impact or forceful motion of the head results in a brief alteration of mental status, such as confusion, disorientation, brief memory loss, or brief loss of consciousness. Because they produce a number of imprecise perceptual symptoms without diagnosable objective structural brain alterations, mTBIs are challenging to diagnose (Lyeth et al., 1990; Hamm et al., 1993; Kibby and Long, 1996; Margulies, 2000). Furthermore, many sufferers fail to recognize the potential severity and seriousness of their injury thus do not seek medical attention (Alexander, 1995; Kushner, 1998). TBI is thus under-diagnosed and under-represented in medical statistics. However, even brief alterations in mental status can inflict profound and persistent impairment of physical, cognitive, and psychosocial functioning (Binder, 1997; Ruff and Jurica, 1999). Furthermore, TBI is an epigenetic risk factor for Alzheimer’s and Parkinson’s diseases (Smith et al., 2003; Szczygielski et al., 2005). Although TBI is a major focus of casualty care in combat areas and the principal cause of mortality and morbidity due to improvised explosive devices (IEDs) and other hazards, there are no FDA-approved pharmacologic therapies that have been demonstrated to improve functional outcomes.

Proteomics is defined by many as the study of the protein complement of the genome, the proteome (Blackstock and Weir, 1999; Stults and Arnott, 2005). The proteome is the set proteins from the whole organism or specific organ at specific physiological conditions.

Several definitions of bioinformatics can be found in the literature today. What suits us is the idea that bioinformatics is a tool to mine vast amounts of data using computer technology and mathematics (Hagen, 2000; Kumar and Mann, 2009).

Systems biology came into picture as soon as proteins were identified from proteomics experiments. For example, low concentration proteins can now be identified in an injured brain; however, a list of individual proteins may not make sense. To understand the connections of isolated proteins, systems biology came in.

The science of systems biology is still considered to be in its infancy and a consensus on its definition has not been fully reached (Ideker et al., 2001; Kitano, 2002a,b; Chuang et al., 2010). For us, it is an approach to study the complex interactions of biological systems. It examines, assembles, and maps the properties and regulations of tightly interconnected biological systems.

Protein Profiling

Identification of proteins is one of the main goals of biomarker discovery. The conventional method of identifying proteins as a marker for disease is by measuring a specific compound known to be part of the pathophysiology. In TBI for example, the presence of glial fibrillary acidic protein (GFAP) in the blood can mean damage to the glia. Also, tau and spectrin protein breakdown products in the blood indicate damage to the axons. One can also examine the unregulated breakdown products of necrotic cell death. Breakdown products of calpain mediated proteolysis can be used as biomarkers of TBI. This is the same for products of apoptotic cell death, from the activation of caspase (Büki et al., 1999; Farkas et al., 2005; Svetlov et al., 2009; Risdall and Menon, 2011).

A novel method, which is the subject of this review, is the data driven and high-throughput approach of discovery. In this strategy, the samples from normal and TBI patients are compared, screening for differences between the two. This approach consistently uses mass spectrometry (MS) and most of the time it is discovery driven instead of being hypothesis driven (Stults and Arnott, 2005). In discovery driven types of experiments, information is collected and then patterns are sought. Unlike hypothesis driven research that disproves or proves a defined hypothesis, discovery driven research collects a huge amount of information first then extracts questions and answers from lots of data. It may sound like a “blind shot” to find answers, but our current technology enables us to do this. If history is a good indicator, it worked with genomics and metabolomics, so performing discovery driven experiments with an entire proteome is logical.

Historical Background

It was almost 40 years ago when two-dimensional electrophoresis was invented and described in a paper (O’Farrell, 1975), giving way to the separation of more complex mixtures. A few years after, in the early 1980s, the first profiling of human CSF (Merril et al., 1983) and mammalian brain (Klose and Feller, 1981) were reported. These started the systematic classification of proteins from the brain. By the mid-80s, the first proteomic database SWISSPROT was established (Bairoch and Boeckmann, 1991; Bairoch and Apweiler, 1997; Peitsch et al., 1997). In the end of that decade, two ionization techniques for MS analysis were introduced, making large protein analyses possible (Karas and Hillenkamp, 1988; Fenn et al., 1989). High-throughput and gel free proteomics came into being when liquid chromatography (LC) was integrated with MS around 1996 (Appella et al., 1995).

Ten years later, the profile of a mouse’s brain was created, identifying 7,792 proteins (Wang et al., 2006), ranging in abundance from tens of copies to hundreds of thousands of copies.

MS-Based Neuroproteomics Workflow

In a typical neuroproteomics experiment, proteins from the brain or spinal cord tissue are extracted as a mixture of proteins. Depending on the experiment, obtaining the proteins can be done with tissue homogenization, cellular fractionation, or affinity fractionation. Then the complex mixture is further separated to reduce its complexity.

Three of the common separation tools for protein separations are two-dimensional electrophoresis (2DE), one-dimensional electrophoresis (1DE), and a two-dimensional LC. In the more common bottom-up proteomics, after subjecting a sample to one of the three methods mentioned above, the proteins are then digested by an enzyme prior to analysis by MS. After protein separation and digestion, the resulting peptide mixture is further resolved by a nanoflow liquid chromatography (nanoLC) based on the peptides hydrophobicity prior to introduction into the mass spectrometer by nano-electrospray ionization. Many TBI proteomic biomarker studies have relied on the bottom-up approach. Putative protein biomarker candidates were identified in rat CCI model using 1D-SDS-PAGE prior to bottom-up proteomic analysis (Will Haskins). An improved two-dimensional platform employing a protein pre-fractionation step by cation-anion exchange and ID-SDS-PAGE prior to bottom-up proteomic analysis was used in subsequent TBI biomarker studies from our group (Kobeissy et al., 2006; Ottens et al., 2007). Kochanek’s group was the first to use 2D-PAGE in TBI biomarker study (Jenkins et al., 2012). Siman et al. (2004) performed MALDI-MS following 2D-PAGE of proteins released from TBI cell culture model to identify acute TBI protein biomarkers. 2D-PAGE and mass spectrometric analysis have been implemented in oxidative stress TBI biomarker studies (Opii et al., 2007). An alternative to the above approach is shotgun proteomics (Wolters et al., 2001; McDonald and Yates, 2002, 2003; Wu and MacCoss, 2002). The complex mixture in shotgun analyses is directly digested without prior separation or fractionation. Variations of this method exist but all shotgun proteomics begins with a mixture of proteins. For example, a complete protein digest without prior separation can be separated by LC and then analyzed by MS in real time. A shotgun proteomic approach based on nanoLC in conjunction with matrix-assisted laser desorption/ionization time of flight tandem MS (MALDI-TOF MS/MS) was utilized to quantitatively analyze the protein content of consecutive ventricular CSF samples of severe TBI patients (Hanrieder et al., 2009). Recently, our group has applied shotgun proteomics to profile the neuronal-glial biomarkers released into conditioned media collected from MTX-, NMDA-, and STS-treated cell cultures (Guingab-Cagmat et al., 2012).

One application of MS is in the identification of intact proteins (i.e., without enzyme digestion) referred to as top-down approach. In the context of proteomics, top-down is an emerging technology but more difficult to implement compared to the more widely used bottom-up approach. For proteomics, top-down has the advantage of preserving the forms of proteins present in vivo by measuring them intact, rather than measuring peptides produced from them by proteolysis. This approach is particularly useful in characterization of post translational modifications which may be challenging to analyze with enzymatic digestion. But in order to perform this kind of analysis, an expensive instrumentation is a requirement. Most of the laboratories however don’t have the luxury of having a Fourier Transform Ion Cyclotron Resonance Mass Spectrometer (FT-ICR-MS) (Marshall et al., 1998; Shi et al., 1998), or the relatively less expensive Orbitrap mass spectrometer (Thermo’s Orbitrap Elite) or access to these kinds of instruments. The very advantage of these kinds of instruments is that they are highly sensitive and capable of ultra-high resolution. The downside however is that maintenance of FT-ICR-MS is very expensive since it requires cooling a very strong magnet, on top of an expensive machine. The Orbitrap mass analyzer traps ions using an electrostatic field, instead of a magnet. The cost and maintenance are now relatively lower, but still an expensive machine. Possibly due to these reasons, top-down proteomics is yet to be accepted and widely implemented to TBI studies.

Presently, technologies that focus on identifying less abundant proteins are gaining traction. These methods are usually based on MS, and the requirement is that a step prior to injection into the mass spectrometer is added. Broadly, the steps prior to MS can be categorized into chemical modifications and direct enrichments. An example of chemical modification is affinity tagging. A popular tagging method is ICAT or isotope coded affinity tags. It is used to quantify and identify plasma biomarkers of TBI. These kinds of experiments can identify several candidate proteins, from tens to hundreds. Protein biomarkers in serum of pediatric patients with severe TBI were identified by ICAT-LC-MS/MS (Haqqani et al., 2007a,b). Another approach is isobaric tagging for relative and absolute quantification (ITRAQ). An example is the study by Crawford et al. (2012) on the identification of protein markers of TBI outcome. Here, CCI mouse model was used to identify plasma biomarkers specific to mild or severe TBI at 24 h, 1 month, or 3 months post-injury. In addition, they used apolipoprotein E 3 and 4 transgenic mice, which demonstrate relatively favorable and unfavorable outcomes respectively, following TBI to identify proteins that are significantly modulated in response to the TBI*APOE genotype interaction representing potential prognostic biomarkers. ITRAQ has also been applied in the identification of serum biomarkers and demonstrating their potential for predicting elevated intercranial pressure in TBI patients (Hergenroeder et al., 2008).

Direct enrichment entails some separation prior to MS analysis. These separation strategies usually apply chromatography, SDS-PAGE, or antibody. For example, cation-exchange chromatography, SDS-PAGE, and then LC, were performed on a rat CCI model to identify putative protein biomarkers post 48 h TBI (Kobeissy et al., 2006; Ottens et al., 2007). The results included 59 differential protein components of which 21 decreased and 38 increased in abundance after TBI. Proteins with decreased abundance included collapsing response mediator protein 2 (CRMP-2), glyceraldehyde-3-phosphate dehydrogenase, microtubule-associated proteins MAP2A/2B, and hexokinase. Conversely C-reactive protein, transferrin, and breakdown products of CRMP-2, synaptotagmin, and alphaII-spectrin were found to be elevated after TBI.

Bioinformatics in Identifying Proteins

Proteomics experiments to identify proteins are tedious. It is akin to breaking a huge and complicated puzzle and then putting the pieces together again. With our puzzle analogy, manual integration of the pieces (smaller peptides) is impossible to complete without the help of computers.

The need for algorithms to identify proteins married bioinformatics to proteomics. Once the developers of algorithms were on board, they needed to know some of the rules of protein science. One of these is for example in sample digestion. If trypsin was used in the digestion, this enzyme is known to only cleave proteins after both lysine and arginine, as long as the next amino acid sequence is not proline (Fraser and Powell, 1950). This and many other rules have to be grasped by software developers.

Other rules that computer scientists should fundamentally understand are mass spectra. Historically speaking, MS of digested proteins was performed predominantly by matrix-assisted laser desorption ionization time of flight or MALDI-TOF. MALDI (Karas and Hillenkamp, 1988) is a kind of ionization that is regarded as “soft,” enabling large biomolecules to be ionized and carried to the mass analyzer. The ionization requires two things: the energy from the laser and the matrix. Although the mechanism of MALDI is still in question, it is believed that the ionization of the analyte happens after the matrix absorbs the energy from the laser, the matrix imparting the energy to the analyte, thereby ionizing the sample.

Once the calculated experimental spectrum or mass lists are produced, these are matched against a protein database. Another set of spectrum, a theoretical one, is also matched to the database. Theoretical and experimental results are compared and computed, to have confidence in the identified proteins (Maggio and Ramnarayan, 2001; Colinge and Bennett, 2007; Matthiesen, 2007; Webb-Robertson and Cannon, 2007).

The above method, performed in MALDI-TOF, is commonly referred to as peptide mass fingerprinting (PMF). PMF’s requirement is that a single spectrum should contain the peptides of the protein. The introduction of LC and ESI however removed the single spectrum requirement. With LC experiments, identification of peptides became more challenging.

In a typical LC-MS/MS analysis, one can predefine the number of the most intense peaks to be selected for dissociation. For example, in our laboratory, we subject the 10 most intense peptide signals to tandem MS (MS/MS) fragmentation (data dependent scanning). Every second, the MS analyzes the sample and produces a full MS Scan of ∼20,000 intact peptides. Based on the initial full MS scan, the mass spectrometer, following the user’s settings, selects again and fragments up to 10 distinct peptides, producing another set of MS/MS spectra.

During the selection of peptide however, the same peptide can be selected more than once. To avoid this problem, a dynamic exclusion strategy is usually implemented. For example, if a peptide was selected three times already over the span of 18 s, that peptide is placed in the exclusion list for the next 25 s. The cycle of subjecting the 10 most intense peptides to MS/MS and the production of s full MS scan is repeated until the chromatography is done.

Tandem MS (MS/MS) provides an additional degree of information in identifying proteins. One can see that in a single analysis, a large number of MS/MS spectra are produced. Assigning the peptide sequences responsible for the generation of the observed fragments is challenging. Since the fragmentation process in MS/MS follows some rules, rules that software developers exploit, it is now possible to identify proteins that are subjected to tandem MS.

Collision of an inert gas with large proteins (such as collision induced dissociation or CID) fragments the proteins apart into smaller peptides. This happens inside the trapping cell of the MS. The breaking of proteins follows a certain type of fragmentation pattern (most researchers follow the nomenclature introduced by Roepstorff and Fohlman, 1984). It is widely known that proteins in the gas phase can break into set of ions (“b, y, and a” type ions) (Bencsath and Field, 1988; Polfer et al., 2005; Liu and Schey, 2008; Chen et al., 2009; Paizs and Mann, 2012).

Even though the rules of producing specific ions are clear, problems still exist. Some compounding factors happen when there can be some additional peaks resulting from neutral losses (b-H₂O, y-H₂O), ammonia loss (b-NH₃), from contaminating peptides, small molecules, or even missing peaks. Some peaks can be shifted due to amino acid modifications. And as in any other analytical signal, the presence of noise even complicates the spectrum interpretation. These hinder the peptide sequence assignment to each spectrum.

Software for Peptide and Protein Identification

The process of protein identification benefited from the maturation of two technologies, the computer hardware and database software. Protein database search has become a powerful approach to address the challenge of protein ID. Currently, numerous bioinformatics software for computational peptide identification from MS/MS data are available in the market (Xu and Ma, 2006).

The first computer program to use a database search was Sequest. Acquired by Thermo Scientific and commercially available through Proteome Discoverer (Thermo, San Jose, CA, USA. www.thermo.com), the development of this software can be traced back to Yates et al. (1995) in the early 1990s at the University of Washington. The scoring function in this package is heuristic in nature, and it was considered to be the first really useful bioinformatics technique in the field of proteomics. The software integrates correlational analysis between data dependent mass spectral scans and a FASTA protein database. Sequest searches and identifies peptides and the corresponding modifications that the user specifically queries. Using these peptide identifications, one can make inferences about the proteins in the sample.

In Sequest, the first process is the extraction of tandem mass spectra from the raw file. Theoretical candidate sequences from the digested proteins in the database are listed. Within a defined tolerance set by the end user, the algorithm determines which one matches the experimental peptides’ molecular weight. A comparison of the candidate’s b and y ions to the experimental spectra are made and scored as primary score (Sp). The primary score sorts the candidate sequences in descending order. Sequest uses two scoring functions, so after the initial candidate sequence is determined; the top peptides are taken off the list. A second function rescores the hits by computing a cross correlation, taking into account their height and mass position. The new candidates are resorted in descending order. After taking into account the possible random matches, the final list after resorting is the final SEQUEST scores (Xcorr). These top hits are reported back and stored into the search files (.msf). In addition to Sequest’s Xcorr, users can export several other parameters such as Sp or DeltaCn. DeltaCn measures how good the XCorr is relative to the next best match. Overall, Xcorr is a robust measure of how accurate the match was between theoretical and experimental peaks.

Several algorithms came after Sequest. MASCOT (Perkins et al., 1999) and X! Tandem also became popular search engines. Owned by Matrix science, MASCOT is commercially available, although the scoring has never been patented or published. With MASCOT, the accuracy score is probability based. This is measured by Ion score, and a P-value gives a relative score. On the other hand, X! Tandem is an open source tool. These search engines approach the problems differently and uses different algorithms. With X! Tandem, hyper score and E-value are two of the parameters calculated.

In some instances however, there are situations that a protein database is not yet available. This can happen in the analysis of an organism where its genome sequence is incomplete or unavailable. In addition, if one is only interested in identifying novel isoforms of the protein, often, the database is unavailable. A popular approach to tackle the problem is to perform a de novo (Shevchenko et al., 2001) sequencing. Spectrum identification in de novo analysis uses a database of candidate peptides consisting of all possible linear amino acid sequences (Xu and Ma, 2006). This method can be used also for searching peptide homologs and modifications. In the early days of the development of algorithms in de novo sequencing, researchers in this field attempted to reconstruct peptide sequences by making all of the amino acid combinations. This was not applicable though due to generic problems. However, the market has seen software for de novo sequencing. Algorithms in de novo sequencing usually filter the experimental mass list to remove noisy peaks. PEAKS (Ma et al., 2003) and PepNovo (Frank and Pevzner, 2005) are some of the software that facilitate fast de novo peptide identification. A hybrid between the de novo sequencing and protein database searching is known as tag-based approach. Sequence-tagging uses the de novo analysis to identify subpeptides or sequence tags hypothesized to occur in the sequence. In these kind of experiments, information is usually extracted from database that contains the tags (Mann and Wilm, 1994).

Since the sequencing results of de novo shows a close resemblance compared to the output of known protein database, de novo is usually used in validating the accuracy of database-derived protein identifications (Shadforth et al., 2005). Validation of the accuracy of one’s result is one of the issues that are tackled by end users and software developers. Reviewers of top proteomic journals have pushed to address this issue. This will be discussed next.

False Detection Rate

False detection rate (FDR) measures the false positive proteins identified. FDR provides a statistically meaningful estimate of the uncertainty in protein identification. It is usually a good validation, for example in large data sets of brain proteins. Most proteomic journals require FDR to be reported. In measuring FDR, a decoy database is usually used. Decoy database for FDR calculations were pioneered by Gygi and co-workers, in which decoys consist of a randomized or scrambled sequence database (Elias and Gygi, 2010). The parameters used in regular search are applied to the decoy database search. Matches using decoy database search is not expected to be significant, and the number of matches found in a decoy search is a good estimate of the real FDR in the regular forward sequence database search.

Although there are two ways to implement a search in a decoy database, users preferentially use one from the other. The most preferred method is the concatenated approach. In this method, the decoy and the non-decoy databases are linked together.

The other method is a more conservative approach. The search of MS/MS data is separate from non-decoy to decoy databases and the number of matches for each database is counted.

Systems Biology

After a database search and identification of proteins, usually a huge library of information is generated. The next step is to know the protein’s functions and the connections of these identified proteins. Rather than focusing on individual molecular components, systems biology seeks to understand the dynamics that govern protein networks, the functional set of proteins that regulate cellular decisions related to TBI. From the perspectives of drug discovery and diagnostics, systems biology gives important and practical clues concerning the pathways relevant to brain injury and the effects that drugs might have on them. Therefore, it enhances the entire biomarker and therapeutic drug discovery, development, and commercialization process (cite Systems bio approach/Theranostics). Recently, protein biomarkers of TBI, induced by penetrating ballistic-like injury model (PBBI), were identified by the proteomics followed by systems biology analysis (Boutté et al., 2012). These proteins are ubiquitin carboxyl-terminal isozyme 1, tyrosine hydroxylase, and syntaxin-6. Using semi-quantitative western blotting analysis, the said proteins were found to be elevated after 72-h post-injury compared to control. It should not be a surprise that Ubiquitin carboxy-terminal hydrolase L1 protein (UCHL1) is already in clinical trial as a biomarker.

The connections or network of connections are pictured using nodes and links. The nodes can be a biomolecule, such as proteins or DNA. The link or the connections between these nodes represent the biochemical interactions or the connections can highlight relationships between nodes, such as the strength of predicted binding or physical interactions. Theories in the science of systems study and statistical mechanics, in conjunction with graph theory, can be applied to glean insights about the network. Mapping the connection of these proteins is the driving force of pathway-based biomarker discovery and diagnosis. Particularly in TBI, upregulated proteins after the injury are hunted and identified as possible diagnostic biomarkers. Numerous scientific publications containing networks of cellular pathways are scattered throughout archives and available data are growing fast. Historically, most of the repositories of large scale sequencing projects were mostly nucleic acid and amino acids. But this gave way to other biomolecules such as proteins. Lately, databases that store proteins have been steadily increasing. For example, the Database of Interacting Proteins can be queried for known protein-protein interactions or PPI (Xenarios et al., 2001).

The nuts and bolts of these bioinformatics software, which systems biology has integrated, are geared toward people with a strong background in computer science and statistics. Since we are the end users of this technology, we will focus on software that we are familiar with and have been using. Readers are directed to other sources of in-depth reviews with regards to systems biology. Three commercially available pathway analysis software include Pathway Studio (Ariadne Genomics, Rockville, MD, USA), Metacore (Thompson Reuters, New York City, NY, USA), and Ingenuity (Ingenuity Systems, Redwood City, CA, USA). These tools enable the identification of the relationship among proteins, small molecules, cell processes, and diseases. Pathway analysis provides information on what is known to interact with the proteins that are identified in the sample as well as association of these proteins to cellular processes.

Biomarker Applications

Clinically validated biomarkers are needed for the accurate diagnosis of mild TBI. This type of TBI is particularly hard to accurately measure and the situation is, made more challenging by patients who sometimes hide their symptoms. There is no gold standard yet for diagnosing mild TBI (Shenton et al., 2012), not even by conventional assessment through neuroimaging techniques (Niogi and Mukherjee, 2010). The lack of a consensus definition of mild TBI further complicates the matter (Ruff and Jurica, 1999; Arciniegas and Silver, 2001) and the challenge lies in accurate diagnosis in managing post-injury. The Veteran’s Administration Clinical Practice Guideline released a working document on criteria to diagnose mild TBI. These diagnostic criteria include an initial Glascow Coma Scale (GCS) of 13–15; less than a 30-min loss of consciousness; post traumatic amnesia up to 24 h after the injury and alteration of consciousness (Management of Concussion/mTBI Working Group, 2009). Other factors may compound this guideline. In addition to patients trying to hide the true injury, proper diagnosis is compounded by alcohol ingestion, polytrauma, sedatives, pain killers, and drugs of abuse.

A biomarker that is measurable in the blood would be useful in these kinds of situations, where a polytrauma exists. It was suggested that instead of using one biomarker, a panel of biomarkers could be helpful. Mondello et al. (2012a) have explored the ratio of GFAP and UCHL1 concentrations to assess patients with severe TBI.

Another type of injury that needs to be addressed by biomarkers from the blood is in diffuse axonal injury (Inglese et al., 2005). The microstructural axonal damage in this kind of injury is believed to be a challenge to detect by neuroimaging techniques such as computed tomography and conventional MRI.

Drug discovery is one of the fields that will greatly benefit from a signature marker for TBI. New therapeutic development traditionally has an extremely high triage rate because more than 90% of drugs that advance to Phase I clinical trials fail. Some argue such extreme loss can be overcome by guiding all new therapeutic development and clinical trials with a disease-relevant diagnostic test. Discovery of translational biomarker (from animal studies to clinical trials) might help to finally deliver the long sought after clinical trial success. “Theranostics represents the convergence between Therapeutics and diagnostics (Bissonnette and Bergeron, 2006; Hooper, 2006).” It has been viewed as the parallel use of new therapy and diagnostic tests for a human disease or disorder so as to facilitate drug development and clinical trials and to achieve optimal clinical outcomes in a population of patients. Importantly, in recognizing the emerging role of the theranostic approach, the FDA has recently drafted a Drug-Diagnostic Co-Development Concept Paper (Hinman et al., 2006) with the goal of setting guidelines for prospective co-development of a drug or biological therapy (drugs) and a device test in a scientifically robust and efficient way.

One example of a theranostic approach to drug development is a novel biomarker-guided approach in our laboratory that combines calpain-generated acute brain injury-tracking biomarkers with potent and selective calpain inhibitor drug candidates to fast-track and improve the chances of successful drug development for CNS injury. During brain injury, neural proteins or their breakdown products generated by calpains (μ-calpain and m-calpain) are released into the extracellular environment and eventually reach the CSF in relatively high concentration (Wang et al., 2005). In due time the proteins reach the blood stream either via the compromised blood brain barrier (BBB or via filtration of the CSF). Clearance and half-life of the biomarkers contribute to the final concentration that can be measured in the blood. The CSF volume of an adult human (CSF 125–150 mL) is about 30- to 40-fold less than the blood volume (4.5–5 L) which explains why the brain biomarker concentration is significantly higher in the CSF samples versus blood samples and makes the former valuable for drug development. Enabled by recent technological advances in proteomics, novel brain injury biomarkers that have elevated levels in biofluid such as CSF or blood after TBI have been discovered.

Possible Biomarkers for Traumatic Brain Injury

We now know that despite the efforts in brain injury research to discover and develop disease tracking markers, currently there are no clinically validated biomarkers to diagnose TBI. Even though the search continues, several candidate biomarkers of TBI biomarkers are in the clinical validation pipeline. Extensive studies are being pursued to move these protein biomarkers to clinical validation. The aforementioned techniques in proteomic have been employed in the discovery for candidate biomarkers of TBI. Kobeissy et al. identified 59 differentially proteins 48 h post TBI using a CCI rat model. Proteins that were decreased in abundance included CRMP-2, glyceraldehyde-3-phosphate dehydrogenase, microtubule-associated proteins MAP2A/2B, and hexokinase (Kobeissy et al., 2006). Upregulated proteins included C-reactive proteins, transferrin, and breakdown products of CRMP-2, synaptotagmin, and alphaII-spectrin. Western blotting analysis confirmed the differential changes in the mentioned proteins. This study provided insight into the mechanism of TBI and generated candidate biomarkers that can aid in the evaluation of the severity and progression of injury as well as in the development of possible therapies. The need for strengthening the role of systems biology and its application to the field of neuroproteomics due to its integral role in establishing a comprehensive understanding of specific brain disorder and brain function in general was highlighted in a review by Kobeissy et al. (2008). The use of a systems biology-based approach to drug discovery and development for TBI based on the advances in genomics, proteomics, bioinformatic tools, and systems biology software has been shown (Zhang et al., 2010). Recently, Boutté et al. (2012) conducted a proteomic analysis and brain-specific systems biology in a rodent model of PBBI. In their study, a combination of two-dimensional gel electrophoresis and MS was used to screen for biomarkers in a rat model of PBBI. Brain-specific systems biology analysis of brain tissue identified 321 upregulated and 65 downregulated proteins 24 h post PBBI compared to sham controls. In their gene ontology analysis, the majority of upregulated proteins were cytoskeletal (10.5%), nucleic acid binding (9.3%), or kinases (8.9%). Most proteins were involved in protein metabolism (22.7%), signal transduction (20.4%), and development (9.6%). Pathway analysis indicated that these proteins were involved in neurite outgrowth and cell differentiation. Further confirmation of these proteins was conducted using semi-quantitative Western blotting. Among these proteins that indicated consistent increase in the brain tissue and CSF at several time points post PPBI were UCHL1, tyrosine hydroxylase, and syntaxin-6. Antibody-based platforms, antibody microarrays (AbMA), and reverse capture protein microarrays (RCPM) complementing the classical methods based on 2D gel electrophoresis and mass spectrometry (2DGE/MS) has been proposed for discovery of potential biomarkers for blast neurotrauma (Agoston et al., 2009). Kwon et al. (2011) combined behavioral, proteomics, and histological studies to investigate stress and blast-induced TBI. In this study, exposure to repeated stress alone showed a transient increase in anxiety but no significant memory impairment or cellular and molecular changes. In contrast, repeated stress and blast resulted in lasting behavioral, molecular, and cellular abnormalities characterized by memory impairment, neuronal and glial cell loss, inflammation, and gliosis.

Listed below are examples of the most studied candidate protein biomarkers for TBI. These represent potential biomarkers of TBI that have shown high sensitivity and specificity in independent studies. UCHL1, SBDPs, and neuron-specific enolase (NSE) are presented as examples of neuronal and axonal protein biomarkers. For glial-specific markers, GFAP and S100beta are discussed below.

Ubiquitin Carboxy-Terminal Hydrolase L1 Protein

Ubiquitin carboxy-terminal hydrolase L1 protein is a cysteine protease that is predominantly expressed in neurons, although it is also expressed in small amounts in neuroendocrine cells. This enzyme is relatively small, around 25 kDa and comprises ∼2% of the total soluble protein in the brain. The other name for this protein is neuronal-specific protein gene product 9.5. Known function of UCHL1 is that it hydrolyzes the C-terminal bond of ubiquitin or unfolded polypeptides (Setsuie and Wada, 2007).

Several publications have indicated that UCHL1 can be a biomarker for TBI. Recently, the biokinetic parameters of UCHL1 were measured from a cohort of severely injured TBI patients (Brophy et al., 2011). A more recent study (Mondello et al., 2012b) demonstrated that UCHL1 can be used as a biomarker for severely injured TBI patients. Compared to control, the serum UCHL1 levels of TBI patients were significantly elevated measured after the acute phase and then over a week.

Spectrin Breakdown Products

AlphaII-spectrin is primarily found in neurons and is concentrated in axons and presynaptic terminals (Riederer et al., 1986). Upon activation in TBI, calpain cleaves the protein to breakdown products (SBDPs) of molecular weights 150 kDa (SBDP150) and 145 kDa (SBDP145) and casapse-3 cleaves it to a 120-kDa product (SBDP120). Calpain and caspase-3 are major executioners of necrotic and apoptotic cell death, respectively, during ischemia or TBI (Ringger et al., 2004; Pineda et al., 2007; Mondello et al., 2010). SBDPs concurrently indicate calpain and caspase-3 proteolysis of alphaII-spectrin, providing crucial information on the underlying cell death mechanisms. In CSF, distinct temporal release patterns of SBDP145 and SBDP120 were observed to reflect different temporal characteristics of protease activation (Mondello et al., 2010). Elevated levels of SBDPs in CSF from adults with severe TBI were reported and their significant relationships with severity of injury and outcome (Pineda et al., 2007). Increased CSF SBDP levels were found to be significantly associated with mortality in patients with severe TBI. The temporal profile of SBDPs in non-survivors was also found to be different those of survivors (Mondello et al., 2010). Taken together, these findings suggest that SBDPs may provide crucial information not only on severity of brain injury, but also on underlying pathophysiological mechanisms associated with necrotic and apoptotic cell death.

Neuron-Specific Enolase

Neuron-specific enolase is a glycolytic pathway enzyme and highly expressed in neuronal cytoplasm. NSE has been shown to have the sensitivity and specificity to detect neuronal cell death (Selakovic et al., 2005). In addition, studies have been conducted examining CSF and serum NSE levels from adults with severe TBI, and their relationship with severity of injury and clinical outcome. Increased CSF and serum levels of NSE have been reported after TBI. NSE concentrations were also associated with severity of injury, CT scan findings, and outcome (Ross et al., 1996; Herrmann et al., 2000; Selakovic et al., 2005).

Glial Fibrillary Acidic Protein

Of the numerous candidate biomarkers for TBI, this protein holds the most promise. One of the strengths of GFAP as an ideal biomarker for TBI is that this protein is not found outside the central nervous system (Galea et al., 1995). First reported in 1971 (Eng et al., 1971), GFAP is found only in astroglial cytoskeleton. GFAP is an intermediate filament protein that forms networks that support the astroglial cells. Damage to the astroglial cells (astrogliosis) shows subsequent upregulation of GFAP. During injury, astroglial cells react by producing more GFAP. Evidence shows that serum GFAP is elevated with several types of brain damage, including TBI (Pelinka et al., 2004a,b; Nylén et al., 2006).

What makes GFAP specific to brain trauma is that even if the body is subjected to multiple forms of trauma, GFAP doesn’t spike up without brain injury (Pelinka et al., 2004b; Vos et al., 2004). Thus, GFAP as a biomarker is a specific indicator of injury to the glia. There’s also a high likelihood that GFAP can predict death or unfavorable outcomes (Vos et al., 2010; Zurek and Fedora, 2012). According to the proceedings of the military mild TBI diagnostic workshop (2010), validation studies in humans are already on-going (Marion et al., 2011).

S100ß

S100ß is mainly found in astroglia and Schwann cells (Donato, 1986; Donato et al., 1986a,b), and is one of the most well-known biomarkers of brain damage. The concentration of S100ß is known to increase in the CSF and serum after injury making this protein a potential biomarker for TBI (Townend et al., 2006). This protein is not influenced by hemolysis and has a biological half-life of 2 h. S100ß belongs to a family of low molecular weight (9–13 kDa) calcium-binding S100 proteins and is involved in signal transduction (Heizmann et al., 2002). Several studies have examined the value of this marker, demonstrating correlation with injury and outcome (Pelinka et al., 2003a; Berger et al., 2005; Kleindienst et al., 2007; Egea-Guerrero et al., 2012). However, several limitations have been found. First, S100ß is not specific to the brain, showing up in non-nervous cells such as adipocytes, epidermal, chondrocytes, melanoma cells, and Langerhans cells (Zimmer et al., 1995). The presence of this protein outside the central nervous system is compounded by the problem that general trauma without brain injury can increase the said protein (Rothoerl and Woertgen, 2001). Second, S100ß spikes up after hemorrhagic shock, correlating the concentration to shock severity (Pelinka et al., 2003a,b,c). Because of this, it seems that S100ß cannot be used as a single biomarker for TBI. A recent study has looked at the ratio of S100ß against GFAP (Pelinka et al., 2004a), instead of looking at S100ß alone. In the study, the ratio of GFAP against S100ß was used to determine brain damage and prognosis. In another study, S100ß seemed to be a useful indicator of patients with intracranial lesion (Egea-Guerrero et al., 2012).

Another limitation in using S100ß as a biomarker for TBI is the relatively short serum half-life (Jackson et al., 2000). The obvious countermeasure to this problem is to measure the proteins right after injury; however, most mild TBI victims are not evaluated as soon as the injury occurs.

Conclusion

Proteomics, with the advancement in MS along with the bioinformatics software, has opened opportunities to interrogate protein dynamics and provide insights into the biochemistry of TBI. Over the past years, proteomics has led to the discovery of many candidate biomarkers and is becoming the method-of-choice for preliminary candidate marker selection. However, identification of candidate biomarkers using this approach is proving to be only the initial step in the development of a useful biomarker. Systems biology coupled to data mining strategies has been applied to harness these large data sets into organized and interlinked databases that can be queried to identify non-redundant brain injury pathways. The pathways can be exploited to determine the utilities of these proteins as diagnostic biomarkers and/or therapeutic targets.

This review provides an overview of the integration of proteomics, bioinformatics, and systems biology in TBI biomarker discovery. At present, proteomic biomarker discovery experiments have generated a long list of TBI biomarker candidates. Clearly, the next step is translating a robust biomarker or panel of biomarkers to clinical use. Currently, sensitive and specific immunoassays are being developed to validate a number of TBI biomarkers in clinical samples. However, the high cost of assay development and availability of antibodies result in a bottleneck in the clinical validation pipeline of the long list of discovered potential biomarkers. Targeted proteomics is a growing trend among the proteomic community. Mass spectrometry-based measurements such as multiple reaction monitoring (MRM) is a promising technique that could revolutionize biomarker validation. The current technologies are still evolving to address fundamental problems in identifying low abundant protein biomarkers such as in the case of mild TBI. The trend of lower costs, highly sensitive instruments (Orbitrap), and better electronic hardware will most likely increase targeted proteomics experiments in the future.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Agoston, D. V., Gyorgy, A., Eidelman, O., and Pollard, H. B. (2009). Proteomic biomarkers for blast neurotrauma: targeting cerebral edema, inflammation, and neuronal death cascades. J. Neurotrauma 26, 901–911. doi:10.1089/neu.2008.0724

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Alexander, M. P. (1995). Mild traumatic brain injury: pathophysiology, natural history, and clinical management. Neurology 45, 1253–1260. doi:10.1212/WNL.45.7.1253