Characterization and Relative Quantitation of Wheat, Rye, and Barley Gluten Protein Types by Liquid Chromatography–Tandem Mass Spectrometry

The consumption of wheat, rye, and barley may cause adverse reactions to wheat such as celiac disease, non-celiac gluten/wheat sensitivity, or wheat allergy. The storage proteins (gluten) are known as major triggers, but also other functional protein groups such as α-amylase/trypsin-inhibitors or enzymes are possibly harmful for people suffering of adverse reactions to wheat. Gluten is widely used as a collective term for the complex protein mixture of wheat, rye or barley and can be subdivided into the following gluten protein types (GPTs): α-gliadins, γ-gliadins, ω5-gliadins, ω1,2-gliadins, high- and low-molecular-weight glutenin subunits of wheat, ω-secalins, high-molecular-weight secalins, γ-75k-secalins and γ-40k-secalins of rye, and C-hordeins, γ-hordeins, B-hordeins, and D-hordeins of barley. GPTs isolated from the flours are useful as reference materials for clinical studies, diagnostics or in food analyses and to elucidate disease mechanisms. A combined strategy of protein separation according to solubility followed by preparative reversed-phase high-performance liquid chromatography was employed to purify the GPTs according to hydrophobicity. Due to the heterogeneity of gluten proteins and their partly polymeric nature, it is a challenge to obtain highly purified GPTs with only one protein group. Therefore, it is essential to characterize and identify the proteins and their proportions in each GPT. In this study, the complexity of gluten from wheat, rye, and barley was demonstrated by identification of the individual proteins employing an undirected proteomics strategy involving liquid chromatography–tandem mass spectrometry of tryptic and chymotryptic hydrolysates of the GPTs. Different protein groups were obtained and the relative composition of the GPTs was revealed. Multiple reaction monitoring liquid chromatography–tandem mass spectrometry was used for the relative quantitation of the most abundant gluten proteins. These analyses also allowed the identification of known wheat allergens and celiac disease-active peptides. Combined with functional assays, these findings may shed light on the mechanisms of gluten/wheat-related disorders and may be useful to characterize reference materials for analytical or diagnostic assays more precisely.

These characteristic features of the GPTs are known to contribute to the CD-immunoreactivity of wheat, rye, and barley, because most CD-active peptides are derived from these repetitive units. For example, the T-cell epitopes QGYYPTSPQ (DQ8.5-glut-H1), QQPQQPFPQ (DQ2.5-glia-γ4c), or QQPQQPFPQ (DQ8-glia-γ1a) contain typical repetitive units highlighted in bold . Beside CD, a wide range of wheat, rye, and barley proteins are potential allergens or triggers of innate immunity in NCGS. The recently published reference sequence RefSeq v1.0 of the hexaploid common wheat genome (International Wheat Genome Sequencing Consortium (IWGSC), 2018) provides further insights as the first reference to which known immunoreactive gluten and non-gluten proteins can be annotated (Juhasz et al., 2018).
Numerous studies have demonstrated the complexity of gluten as a mixture of closely related, but distinct proteins (Arentz-Hansen et al., 2000;Dupont et al., 2011;Colgrave et al., 2013;Schalk et al., 2017). Their similarity poses major difficulties in clearly separating gluten into well-defined gluten protein fractions, GPTs and especially individual gluten these findings may shed light on the mechanisms of gluten/wheat-related disorders and may be useful to characterize reference materials for analytical or diagnostic assays more precisely.
Frontiers in Plant Science | www.frontiersin.org December 2019 | Volume 10 | Article 1530 proteins (Mamone et al., 2009;Ellis et al., 2011;Lagrain et al., 2013). One strategy is to combine separation according to solubility (Osborne fractionation) with subsequent fractionation according to polarity by preparative RP-HPLC. However, the ultraviolet signal at a specific retention time during preparative RP-HPLC does not provide any further information on the identity of the proteins being collected. Considering the highly variable immunoreactivities of wheat, rye and barley proteins it is essential to know the exact composition of the GPT isolates, especially when trying to gain further insights into pathogenic cascades of CD, NCGS, and wheat allergies (Vader et al., 2002;Matsuo et al., 2005;Scherf et al., 2019). For example, wheat ATIs were only identified as triggers of innate immunity via the tolllike receptor 4 in NCGS, because they were co-purified within the ω-gliadin fraction (Junker et al., 2012). Therefore, it is crucial to identify the individual proteins within each GPT isolate and undertake relative quantitation of the highly abundant proteins by liquid chromatography-mass spectrometry (LC-MS/MS).
In the current fundamental study, LC-MS/MS analysis was applied to all isolated GPTs of wheat, rye, and barley to precisely determine the identities of the proteins in each isolate as well as their relative abundances to provide a detailed assessment of the molecular composition. A special focus was placed on the identification of known CD-immunoreactive and allergenic peptides and proteins.

Grain Samples
Grains of wheat [cultivar (cv.) Akteur, harvest year 2011, I.G. Pflanzenzucht, Munich, Germany], rye (cv. Visello, harvest year 2013, KWS Lochow, Bergen, Germany), and barley (cv. Marthe, harvest year 2009, Nordsaat Saatzucht, Langenstein, Germany) grown in Germany were milled into white flour using a Quadrumat Junior mill (Brabender, Duisburg, Germany). Subsequently, the flours were sieved to a particle size of 200 µm and allowed to rest for 2 weeks. The choice of these cultivars was based on production shares in Germany for conventional farming to ensure that these cultivars were of economic relevance and, therefore, deemed to be representative for each grain.

analysis of Moisture and Crude Protein Contents
The determination of moisture and crude protein (CP) contents (conversion factor N × 5.7) was carried out according to International Association for Cereal Science and Technology Standards 110/1 and 167.
For preparative RP-HPLC, the wheat, rye, and barley prolamin fractions (200 mg) were dissolved in 10 ml ethanol/water and the glutelin fractions (1,000 mg) in 10 ml of the glutelin extraction solution. The solutions were filtered (0.45 μm) and separated on a Jasco HPLC (Jasco, Gross-Umstadt, Germany) according to their retention times, collected from several runs, pooled and lyophilized as described previously . The isolated GPTs were again stored at -20°C until use. Long-term experience with storage of the Prolamin Working Group-gliadin reference material (Van Eckert et al., 2006) in our laboratory since its isolation in the early 2000s indicates that protein isolates are stable for several years or even decades when kept frozen at -20°C or, ideally, at -80°C.

enzymatic Cleavage of GPTs
The GPT hydrolysates were prepared as reported in Colgrave et al. (2016a;2016b). Briefly, each GPT (n = 3) was dissolved in 50 mmol/l Ambic buffer with a concentration of 2 mg/ml and applied to a 10 kDa molecular weight cutoff filter (Millipore, Australia). The GPT solutions were washed with washing solution (2 × 100 µl; 8 mol/l urea; 100 mmol/l Tris-HCl; pH 8.5) and the filters were centrifuged. For reduction, DTT solution (10 mmol/l) was added; the filters were incubated for 40 min at room temperature and then centrifuged. For cysteine alkylation, 100 µl of IAM solution (25 mmol/l; in 8 mol/l urea; 100 mmol/l Tris-HCl) was added and the solution was incubated at room temperature in the dark for 20 min. The filters were centrifuged and washing solution was added (2 × 100 µl). To exchange the buffer, two times 200 µl of Ambic buffer was added and centrifuged. The 10 kDa filters were transferred to fresh centrifuge tubes, the digestion enzyme (trypsin or chymotrypsin: 200 μl; 250 μg/ml in 50 mmol/l Ambic; 1 mmol/l CaCl 2 ; enzyme/ substrate ratio of 1/4 (w/w); respectively) was added, and the mixture was incubated overnight at 37°C. The filtrates with the enzymatically cleaved peptides were collected by centrifugation, the filters were washed again with 200 μL of Ambic, and the filtrates and the washing solution were combined separately for each replicate and lyophilized. For LC-MS/MS analysis the peptides were resuspended in 100 µl 1% FA.
Undirected LC-MS/MS analysis Aliquots (5 µl) of each GPT replicate were pooled for analysis. The LC-MS/MS analysis was performed on an Ekspert nanoLC415 (Eksigent, Dublin, CA, United States) directly coupled to a TripleTOF 6600 MS (SCIEX, Redwood City, CA, United States) with the following parameters: Trap column: ChromXP C18 (3 μm, 12 nm, 10 × 0.3 mm); flow rate: 10 μl/min solvent A; 5 min; column: ChromXP C18 (3 μm, 12 nm, 150 mm × 0.3 mm); flow rate: 5 μl/min; solvents: (A) 5% DMSO, 0.1% FA, 94.9% water; (B) 5% DMSO, 0.1% FA, 90% acetonitrile, 4.9% water; linear gradient from 3 to 25% solvent B over 68 min, followed by a second linear step from 25-35% solvent B over 5 min, followed by a third linear step from 35-80% B over 2 min; a 3 min hold at 80% B; return to 3% B over 1 min; 8 min of re-equilibration; injection volume: 2 µl. DMSO was added as it enhances ionization and increases the signal-to-noise ratio (Hahne et al., 2013). The eluent from the HPLC was directly coupled to the DuoSpray source of the TripleTOF 6600 MS. The MS settings were as follows: Ion spray voltage: 5,500 V; curtain gas: 138 kPa (20 psi); ion source gas 1 and 2 (GS1 and GS2): 103 and 138 kPa (15 and 20 psi); heated interface temperature: 100°C. The MS was operated in the information-dependent acquisition (IDA) mode. The IDA method consisted of a high-resolution time-offlight-MS survey scan followed by 30 MS/MS scans, each with an accumulation time of 40 ms. The mass-to-charge (m/z) range of the acquisition of the MS1 spectra in positive ion mode was 400-1,250 with a 0.25 s accumulation time. MS2 spectra were acquired on precursor ions that exceeded 150 counts/s with charge states 2+ to 5+ and over the mass range of m/z 100-1,500 using the manufacturer's rolling collision energy based on the size and charge of the precursor ion and a collision energy spread of 5 V for optimum peptide fragmentation. Analysis was carried out with dynamic ion exclusion of precursor ions with a 15 s interval after one occurrence and a mass tolerance of 100 ppm, and peaks within 6 Da of the precursor mass were excluded.

Data analysis for Protein identification
For protein identification, the SCIEX.wiff raw files were directly used as input in the ProteinPilot 5.0 software (SCIEX) with the Paragon algorithm (Shilov et al., 2007). The raw data were searched against a database comprising UniProtKB-Poaceae proteins (https://www.uniprot.org; version 2018/02) appended with cRAP (http://www.thegpm.org/crap/), the common repository of adventitious proteins (1,601,923 sequences). The settings used were: IAM as the alkylating agent; trypsin, chymotrypsin, or no enzyme as the cleavage enzyme. ProteinPilot automatically considers enzyme cleavage specificity rules and all UniMod modifications, including e.g., oxidation of methionine and deamidation of asparagine and glutamine, and uses a probabilitybased approach that considers sample treatment conditions. A 1% global false discovery rate (FDR) was applied for the protein identifications. The detected proteins were classified according to Dupont et al. (2011) into the following groups: gluten proteins, ATIs, globulins, β-amylase, other enzymes, farinins, serpins, grain softness proteins and puroindolines (GSPs+PINs), aveninlike proteins, other inhibitors, uncharacterized proteins (name of entries in the database UniProtKB) and others. The group "others" contains all identified proteins, which could not be assigned to any of the aforementioned groups. All proteins identified as "uncharacterized" and "predicted" were manually reviewed using the basic local alignment search tool (BLAST) (Altschul et al., 1990) on the UniProtKB webpage with the target database UniProtKB reference proteomes plus SwissProt (parameters: identity >70%, except for hits with names of a group or from the subfamily Pooideae). Due to the challenge of having different terms and often uncurated and incomplete protein sequences in the UniProtKB Poaceae database, the protein names for gluten proteins were summarized in the group "gluten proteins", which comprise gliadins, glutelins, glutenins and prolamins for wheat, secalins, glutelins, glutenins and prolamins for rye and hordeins, glutelins, glutenins and prolamins for barley. By means of the rank for the specified protein given by the Paragon algorithm in ProteinPilot, the detected proteins are sorted relative to all other ones. The proportion in each different group was calculated as the number of identified proteins per group multiplied by the number of distinct peptides with a >95% confidence level by which these proteins were identified to have a weighting factor for the rank of the specific protein relative to all other proteins

Preparation of the Multiple Reaction Monitoring Methods Using Skyline
Within each GPT, the identified proteins were selected according to the following parameters: belonging to the family Poaceae, the subfamily Pooideae and to gluten; 1% global FDR; confidence score > 99% and unused score > 2.0. The manually curated FASTA files list and the results of the undirected LC-MS/MS experiments were imported into Skyline (version 4.2.0.19072). Multiple reaction monitoring (MRM) transitions were determined for each peptide predicted with precursor ion (Q1) with m/z (50-1,500) and charge (2+; 3+) and fragment ion (Q3) m/z values using the data collected in the undirected LC-MS/MS experiments (Colgrave et al., 2012). Up to six transitions were used in the preliminary analyses and the MRM transitions were refined and the top four MRM transitions were selected per peptide for use in the final method. In the subsequent experiments scheduled MRM transitions were used for analysis in triplicate.

Multiple Reaction Monitoring Mass Spectrometry for Relative Protein Quantitation
Scheduled MRM experiments were used for quantitation of the reduced and alkylated tryptic and chymotryptic peptides of each GPT in triplicate, respectively. The LC-MS/MS analysis was performed on an UHPLC system (Shimadzu Nexera, Sydney, Australia) directly coupled to a QTRAP 6500 mass spectrometer (SCIEX). The cycle time was set to 0.3 s, and the MRM transitions were scheduled to be monitored within 60 s of their expected retention time (± 30 s) (Colgrave et al., 2017a).

Relative Protein Quantitation
The peaks were integrated using Skyline. The relative quantitation of the proteins within each GPT was performed by using the "best flyer methodology" (Ludwig et al., 2012), in which the peak areas of four transitions of one peptide (average of three replicates) were summarized. One peptide is used to represent one protein and the values of the peak area of each peptide were assigned to the respective protein. The datasets from the tryptic and chymotryptic digests were combined by removing the duplicate protein with the lower value. Then, the areas of all proteins from the same category according to their UniProtKB accession were summarized. The calculations were done in Microsoft Excel and the graphical images were done in Origin (version 2018b (9.55), OriginLab Northampton, MA, USA).

General Characterization of Gluten Protein Types
The moisture contents of the flours were 14.59 ± 0.01% for wheat, 11.42 ± 0.01% for rye and 12.09 ± 0.06% for barley. The contents of CP, albumin/globulin, prolamin, and glutenin fractions in the flours are given in Table S1. Table S2 lists the CP contents of the GPTs isolated from wheat, rye and barley flours and the proportions of each GPT within total gluten. The Osborne fraction values are based on flour weight; the proportions of GPTs are based on total gluten content (Lexhaller et al., 2016;Lexhaller et al., 2017). The results corresponded well to those reported previously (Gellrich et al., 2003;Kerpes et al., 2016;Schalk et al., 2017) identification of Protein Groups in the Gluten Protein Types The Osborne fractions (prolamins and glutelins) extracted from the flours were separated into the GPTs by preparative RP-HPLC. These purified GPTs were reduced, alkylated and subjected to tryptic (T) and chymotryptic (C) hydrolysis, respectively. The GPT hydrolysates were analyzed by LC-MS/MS to identify the complete suite of proteins present in each GPT. Proteins with identical sequences were used once. For each GPT, the suite of proteins identified after tryptic digest (Table S3) and after chymotryptic digest (Table S4) were recorded. All proteins originally identified as "uncharacterized" or "predicted" were manually searched again using the BLAST tool available from the UniProtKB webpage. According to the data of the undirected LC-MS/MS experiments, Figure 1 shows the qualitative composition and proportion of the proteins in each GPT.

Barley
The C-hordein-GPT consisted mainly of 62% gluten proteins, 10% ATIs and 7% GSPs+PINs. The γ-hordein-GPT was composed of over 92% gluten proteins and 4% ATIs and the residual groups amounted only to 4% altogether. The compositions of B-and D-hordein-GPTs were similar, but the B-hordein-GPT had a greater diversity of enzymes (15% in total) and contained 11% uncharacterized proteins. In the D-hordein-GPT ( Figure 1C) high proportions of other proteins (24%) were present.

identification of Single Proteins in the Gluten Protein Types
Tables S3 and S4 list all identified proteins with their UniProtKB accession number, name, organism, rank, score, sequence coverage and number of identified peptides. As an overview of the qualitative data, the three proteins with the highest ranks identified in the tryptic ( Table 2) and in the chymotryptic ( Table 3) hydrolysates, respectively, of each GPT according to the rank are summarized. The rank of each specified protein is relative to all identified proteins in the fraction and contaminant proteins, such as the proteases used and/or keratins from sample preparation were excluded.

Wheat
The high-scoring proteins detected in the tryptic hydrolysates of the α-gliadin-GPT and the γ-gliadin-GPT represented gluten proteins, except one α-amylase-inhibitor ( Table 2). The topranked proteins often did not match those of the corresponding protein type, whereas the matching proteins appeared at lower ranks, e.g., γ-gliadins (D0ES80; H8Y0P9) at ranks five and seven in the γ-gliadin-GPT with similar scores and peptide numbers. The chymotryptic hydrolysates (Table 3) showed similar compositions. The tryptic hydrolysate of the ω5-gliadin-GPT contained mainly HMW-GS proteins, but an ω-gliadin (A0A0B5J8A9) was identified based on eight peptides at rank 12. Surprisingly, no ω-gliadin was identified in the chymotryptic hydrolysate of the ω5-gliadin-GPT. The tryptic hydrolysate of the ω1,2-gliadin-GPT was composed of different types of proteins representing the two main groups of this GPT ( Figure 1A). The chymotryptic hydrolysate contained an ω-gliadin protein (A0A060N0S6) at rank 1 with by far the highest score and the most identified peptides (89). In the tryptic and chymotryptic hydrolysates of the HMW-GS-GPT the highest ranked proteins were HMW-GS. The high-scoring proteins in the tryptic LMW-GS-GPT were the 12S seed storage globulin (M7ZK46), which belongs to the cupin super-family with nutrient reservoir activity (Dunwell, 1998) and one LMW-GS, which was identified with the highest number of peptides. These proteins represent the main group, gluten proteins, and the second main group in this GPT, the globulins ( Figure 1A). Globulins are known to polymerize via interchain disulfide bonds and may thus appear in the highmolecular-weight group (Vensel et al., 2014).
Frontiers in Plant Science | www.frontiersin.org December 2019 | Volume 10 | Article 1530 and a HMW-GS, which represent the two main groups of the ω-secalin-GPT in Figure 1B. Only two proteins passing the 1% FDR threshold were identified in the chymotryptic hydrolysate of the ω-secalin-GPT (Table 3). In the tryptic and chymotryptic hydrolysates of the HMW-secalin-GPT, the highest ranked proteins were a HMW-secalin (Q93WF0; rank 2) and a wheat HMW-GS protein (W6AW92; rank 1), which is, however, very similar to the HMW-secalin protein D3XQB8 (95.8% identity).
The tryptic hydrolysate of the γ-75k-secalin-GPT consisted mainly of the 75k gamma secalin protein E5KZQ2. The high scoring proteins represent the three main groups in the γ-75ksecalin-GPT ( Figure 1B). Another 75k γ-secalin protein (E5KZQ6) was also identified with a high number of peptides, but a lower score. In the chymotryptic hydrolysate, the protein identified with the most peptides (49) was the 75k γ-secalin E5KZQ1 at rank 3. In case of the γ-40k-secalin-GPT, only one γ-prolamin protein was identified in the tryptic hydrolysate at rank 3. A sucrose synthase and an uncharacterized protein (W5AHI2) ranked first and second, respectively. The BLAST search identified an actin-2 protein (M8ASF1) with 100% identity to this uncharacterized protein. Uncharacterized proteins represented one of the largest groups in the γ-40k-secalin-GPT ( Figure 1B), probably due to missing reference protein sequences. The chymotryptic hydrolysate showed a similar proportion with a formate dehydrogenase and two uncharacterized proteins as the three high-scoring proteins. RNA-binding protein h 11.63 10 a The rank of the specified protein is relative to all other proteins in the list of detected proteins, b Unused ProtScore, defined as a measure of the protein confidence for a detected protein, calculated from the peptide confidence for peptides from spectra that are not already completely "used" by higher scoring winning proteins, thus reflecting the amount of total, unique peptide evidence related to a given protein, c after BLAST search (identified as uncharacterized protein: R7W8L3), d after BLAST search (identified as uncharacterized protein: W5AHI2), e after BLAST search (identified as predicted protein: F2CR90), f after BLAST search (identified as predicted protein: F2E9N0), g after BLAST search (identified as predicted protein: F2DZW3), h after BLAST search (identified as predicted protein: F2CR90).

Barley
The high-scoring proteins detected in the tryptic hydrolysate of the C-hordein-GPT (Table 2) corresponded to the three main groups of this GPT, the gluten proteins, the group of others and the group of GSPs+PINs (Figure 1C). A C-hordein (Q40055) was identified at rank 23. An uncharacterized protein of Hordeum vulgare subsp. vulgare (A0A287EIM7) sharing 99.0% homology with the C-hordein (P06472) was present in the chymotryptic hydrolysate of the C-hordein-GPT (Table 3). Two B-hordeins and the previously reported γ3-hordein (P80198) (Colgrave et al., 2012) were detected with a high number of peptides in the tryptic hydrolysate of the γ-hordein-GPT. Only two uncharacterized proteins from Hordeum vulgare subsp. vulgare were identified in the chymotryptic γ-hordein-GPT hydrolysate. The highest ranked protein was identified as a B1-hordein (P06470) with an identity of 94.6% after the BLAST search. The tryptic and chymotryptic hydrolysates of the B-hordein-GPT contained the B3-hordein I6TMW4 with 102 peptides and the two other B-hordeins with a high peptide number, B1-hordein (P06470) and B hordein (Q40026). D-hordein (I6TRS8, 209 peptides detected) was the highest ranking protein in the tryptic hydrolysate of the D-hordein-GPT. The D-hordein (I6SW34, 99 peptides) and an uncharacterized protein (A0A287EEX5, 2 peptides), which was identified as a C-hordein (P02864) with 50% identity were identified in the chymotryptic hydrolysate. Moreover, D-hordeins were detected in all other hordein GPTs with high sequence coverage. The best three protein hits of each GPT are summarized in Tables 2 and 3, according to their ranking of identification. The total numbers of gluten proteins identified using either trypsin or chymotrypsin are presented in Table 4. The numbers of identified a The rank of the specified protein is relative to all other proteins in the list of detected proteins, b Unused ProtScore, defined as a measure of the protein confidence for a detected protein, calculated from the peptide confidence for peptides from spectra that are not already completely "used" by higher scoring winning proteins, thus reflecting the amount of total, unique peptide evidence related to a given protein, c after BLAST search (identified as uncharacterized protein: T1LG74), d after BLAST search (identified as uncharacterized protein: A0A287EIM7), e after BLAST search (identified as uncharacterized protein: A0A287EFG2), f after BLAST search (identified as uncharacterized protein: A0A287EEX5).
Frontiers in Plant Science | www.frontiersin.org December 2019 | Volume 10 | Article 1530 proteins were between 2-to 10-fold higher in all GPT hydrolysates using the so-called gold standard proteolytic enzyme trypsin as compared to chymotrypsin. The numbers of identified gluten proteins were 2-to 8-fold higher in the tryptic hydrolysates, except for HMW-GS and LMW-GS. Chymotrypsin revealed as many gluten proteins as trypsin for HMW-GS and more gluten proteins were identified in the chymotryptic hydrolysate of LMW-GS than with trypsin. The total numbers of identified proteins differed from 24 for the γ-hordeins up to 317 for the γ-40k-secalins in the tryptic hydrolysates and from 4 (ω5-gliadins) to 58 (γ-40k-secalins) in the chymotryptic hydrolysates. The ratio of the numbers of all identified proteins to the numbers of identified gluten proteins ranged from 2 for α-gliadins up to 29 for γ-40k-secalins in the tryptic hydrolysates and from 1 for α-gliadins, ω5-gliadins and ω1,2-gliadins to 19 for γ-40k-secalins in the chymotryptic hydrolysates. It should be noted that 18 gluten proteins, but no GPT-specific proteins were identified (73 proteins in total) in the tryptic digest of the ω1,2-gliadin-GPT. In contrast, only seven gluten proteins were identified in the chymotryptic hydrolysate, but among which three of them were ω-gliadin proteins. The same findings were observed for the LMW-GS, for which 22 LMW-GS proteins of 27 gluten proteins were identified in the chymotryptic hydrolysate, but only 2 LMW-GS-proteins within 20 gluten proteins in the tryptic hydrolysate. For the hordeins, the data shows that the enrichment is more specific and that the trypsin data for these GPTs is misleading, because in the chymotryptic hydrolysates less gluten proteins were identified, but more of them corresponded to their appropriate GPT. When looking at the other GPTs, more GPT-specific proteins were identified in the tryptic than in the chymotryptic hydrolysates.

identification of immunoreactive Proteins
Various gluten and non-gluten proteins of wheat, rye and barley have been identified as triggers of adverse reactions. The proteomic characterization of the GPTs also provided an insight into the presence of immunoreactive proteins. All identified proteins of the GPTs were searched for the UniProtKB accession based on the allergen code of the World Health Organization/ International Union of Immunological Societies and for the name of the immunoreactive proteins. The identified allergens with their allergen code, molecular weight and identification parameters are shown in Table 5. Some of the allergens were identified only in one GPT with a small number of peptides (profilin in the LMW-GS-GPT or serpin in the γ-40k-secalin-GPT), but especially ATIs and gluten proteins were very abundant and present in more than one GPT. However, it should be noted that most of the allergens were enriched in one GPT. The WDEIA allergen tri a 19 "ω5-gliadin" was identified only in the appropriate GPT. Beside the shown exemplary allergens, many identified proteins contained peptides with known CD-active sequences. Immunoreactive peptides carrying known, non-deamidated peptide-binding motifs of gluten-specific T-cells are shown in Table 6. CD-active peptides were identified in all wheat GPTs, except ω5-gliadins. The list of T-cell epitopes according to Sollid et al. (2012) contains 31 entries that are reduced to 21 different motifs after reversal of deamidation and removal of duplicates. One of these motifs is specific to oats that were not studied, leaving 20 possible motifs. Of these, five epitopes were not identified (DQ2.5-glia-α3, DQ2.5-glia-γ4a, DQ2.5-glia-γ4b, DQ2.5-glia-γ4d, DQ8-glia-α1), but 15 motifs were detected, especially in the ω1,2-gliadin-, LMW-GS-, and HMW-GS-GPTs. The findings were comparable for the rye GPTs, where similar numbers of peptides were identified in the ω-and HMWsecalin-GPTs as in the γ-75k-secalin-GPT, with the exception of the γ-40k-secalin-GPT with just two epitopes. In the γ-, B-, and D-hordein-GPTs just one peptide-binding motif was detected, but six different peptides were identified in the C-hordein-GPT. The DQ2.5-glia-γ4c peptide-binding motif QQPQQPFPQ  a Molecular weight according to UniProtKB accession, b T, tryptic digest, C, chymotryptic digest, c Unused ProtScore, defined as a measure of the protein confidence for a detected protein, calculated from the peptide confidence for peptides from spectra that are not already completely "used" by higher scoring winning proteins, thus reflecting the amount of total, unique peptide evidence related to a given protein, d 96% identity to Q9FS79 Triticum aestivum.

Relative Quantitation of Proteins Within Gluten Protein Types
The tryptic and chymotryptic GPT hydrolysates were then subjected to relative quantitation to monitor the relative abundance of the peptides. Only peptides of gluten-derived proteins were selected for the MRM analysis. According to the "best-flyer method" of Ludwig et al. (2012), the peak areas of the four most intense transitions of the best flying peptide per protein (TopPep1/TopTra4) were summed. The model TopPep1/ TopTra4 was selected, because only one peptide was detected for many gluten proteins in the undirected LC-MS/MS experiments and it is indicated that this model is as reasonable and robust as the others. The peak areas cannot be compared between peptides, because the MS response is dependent on the amino acid sequence, but the peak areas of the same peptide may be compared between the GPTs. The peak areas of the peptides were summed according to their categories (Figure 2). To estimate the enrichment of each category in every GPT the peak areas of each category were converted to a percentage relative to the summed peak area of the respective category for ease of data comparison.

Wheat
For the wheat GPTs, the single proteins were grouped according to their UniProtKB names into the categories LMW-GS, α-, γ-, and ω-gliadins, HMW-GS and avenin-like proteins. LMW-GS constituted the main proportion in the appropriate LMW-GS-GPT, but they were also enriched in the α-and γ-gliadin-GPTs and were present in the other wheat GPTs (Figure 2A). Vice versa, a large share of α-gliadins was detected in the α-gliadin-(≈42% of total α-gliadins) and HMW-GS-GPT (≈40% of total α-gliadins). The percentages always refer to 100% of total protein type summed over all wheat, rye or barley GPTs, respectively, e.g., to 100% of total α-gliadins summed over all wheat GPTs. Smaller proportions of α-gliadins were detected in the ω1,2-, γ-gliadin-, and LMW-GS-GPTs. The γ-gliadins were detected in almost all GPTs, except the ω-gliadin-GPTs, but were noticeably enriched in the γ-gliadin-GPT (≈66% of total γ-gliadins). The ω-gliadins were present almost only in the ω1,2-gliadin-GPT (≈76% of total ω-gliadins). HMW-GS accounted for a small proportion in each wheat GPT, but the HMW-GS-GPT had the highest proportion of HMW-GS (≈77% of total HMW-GS), as expected. The ω5-gliadin-GPT showed low proportions of the analyzed proteins of HMW-GS, LMW-GS and ω-gliadins. The avenin-like proteins were present in small amounts in almost all wheat GPTs, except the ω5-gliadin-GPT. The technical variation was assessed by examining the mean (combining GPTs of wheat) coefficient of variation (CV) for each peptide with an overall average of 13% for the cleavage with trypsin and 12% for the cleavage with chymotrypsin.

Barley
The barley GPTs were grouped into the following categories: D-hordeins, B-hordeins, γ3-hordeins, C-hordeins, avenin-like proteins, and HMW-GS from Triticum aestivum and a similar tribe (C) in the family Poaceae. In comparison with the other barley GPTs, the C-hordein-GPT contained the highest amount of C-hordeins (≈96% of total C-hordeins) and a high proportion of D-hordeins. The D-hordeins were also detected in the B-hordein-GPT, but they accounted for the largest share of their appropriate GPT (≈90% of total D-hordeins). B-and γ-hordein-GPTs were mainly composed of B-hordeins, whereas the B-hordein-GPT showed noticeably higher proportions of the B-hordeins (≈77% of total B-hordeins) and also of proteins of the other groups analyzed ( Figure 2C). The γ-hordein-GPT showed a clear enrichment of the B-hordeins. For the tryptic cleavage of the barley GPTs the average CV was 9% and for the chymotryptic cleavage 10%.

DiSCUSSiOn
In this study, we provided novel insights into the complexity of gluten from wheat, rye, and barley by identification of the individual proteins and relative quantitation of the most abundant gluten proteins in the GPTs. A preparative strategy  was used to isolate the GPTs from wheat, rye and barley flours according to solubility and hydrophobicity. The LC-MS/ MS experiments confirmed an enrichment of the expected gluten proteins in their corresponding GPTs in most cases. The application of high-resolution MS allowed a much more detailed and accurate insight into the composition of the isolated GPTs compared to our earlier low-resolution MS analyses . The data of the undirected LC-MS/MS experiments showed the qualitative composition of the GPTs, according to the number of peptides identified and revealed a first assumption of the total composition of each GPT. All GPTs contained gluten proteins other than those derived from the known RP-HPLC retention times as well as ATIs, enzymes or uncharacterized proteins. These findings underline the incomplete separation of prolamins and glutelins according to solubility and show that even the separation by preparative RP-HPLC is not clear-cut enough to separate individual GPTs without co-purifying other components, such as ATIs (Junker et al., 2012). The undirected LC-MS/MS experiments revealed that the group of gluten proteins constituted the highest proportion in the wheat GPTs followed by the second largest group of ATIs, which were present especially in the ω5-and ω1,2-gliadin-GPTs. The MRM data showed that the group of gluten proteins had different compositions of α-, γ-, ω-gliadins, LMW-GS, and HMW-GS, mostly enriched in their appropriate GPTs. However, we found that the LMW-GS were detected in all wheat GPTs. Recently, the presence of LMW-GS in the gliadin fraction has been reported as well (Boukid et al., 2019). Due to their polymeric nature (Shewry, 2019), their similarity to α-gliadins in molecular weight and also to γ-gliadins in RP-HPLC retention times, it may not be possible to achieve a clear-cut separation between those GPTs. Thus, small proportions of LMW-GS were contained in all wheat GPTs.
The ω-and HMW-secalin-GPTs showed high proportions of gluten proteins in the undirected LC-MS/MS analysis. The subsequent MRM analyses revealed that the gluten protein fractions were highly enriched with the expected protein types. As described in previous studies, HMW-secalins were detected with notably high proportions in the other rye GPTs. In case of the ω-secalin-GPT this may be due to the reduction of the disulfide bonds of the HMW-secalins, which then co-eluted in the ω-secalin-GPT (Gellrich et al., 2003). When fractionating rye gluten proteins, we observed that the separation according to solubility is even less complete than in wheat. This led to a higher co-mingling of the individual GPTs even after preparative RP-HPLC. The detection of LMW-GS and avenin-like proteins beside the main group γ-75k-secalins in this GPT may give another hint for the similarity of those GPTs due to the close genetic relationship of rye and wheat (Kasarda et al., 1983). There was no reliable reference sequence available for the γ-40k-secalins (June 2019), but the group named γ-prolamins was only detected in the γ-40k-secalin-GPT. Although the molecular weight (UniProtKB database) of the γ-prolamins detected was somewhat too low compared to the generally known mass range for γ-40k-secalins, the assignment to this GPT would be possible due to amino acid sequence, organism and similarity to other rye proteins. This fact showed the incompleteness of the rye protein entries in the UniProtKB database, because these γ-prolamins were very similar to previously identified ones .
The same separation issue as for the rye GPTs appeared for barley GPTs. As stated by Schalk et al. (2017), γ/B-hordeins from the prolamin fraction contained the monomeric γ-hordeins and partly the disulfide-bound B-hordeins. The B/γ-hordeins prepared from glutelin fraction showed the opposite case with the majority of oligomeric or polymeric B-hordeins. Similar results were obtained in this study, except that the γ-hordeins were detected with similar proportions in all barley GPTs. The same applied to the D-hordeins, which were clearly enriched in the D-hordein-GPT, but also identified with noticeably high amounts in the other GPTs. This may also be traced back to the customized separation technique. The identification of hordeins revealed again the challenge with incomplete or unannotated protein entries in the database (Colgrave et al., 2013). Especially the number of entries for barley and rye were low and many proteins were matched as uncharacterized proteins. Reliable protein reference sequences, especially for the Hordeum sp. and Secale sp. are urgently needed, because the proteomics results are likely to be affected by the drastically different number of protein sequences available.
One limitation of the current study is that the results are based on the analysis of GPTs isolated from one single cultivar of each grain grown in one year. Although the choice of the cultivars was done carefully to select representative samples, genetic and environmental factors and their interaction are known to influence the proteome composition of cereals (Hajas et al., 2018;Juhasz et al., 2018;Malalgoda et al., 2018;Geisslitz et al., 2019). The results obtained here thus only provide one snapshot and are expected to change depending on the flour sample. The overall procedure from milling to collecting sufficient amounts of GPTs after preparative RP-HPLC is rather time-consuming as well as cost-and laborintensive, so that it is impossible to do this for more than a very limited number of samples. This is why the current study first focused on determining the efficiency of fractionation of the various GPTs, prior to studying the variability arising from different factors.
This study also revealed that trypsin is preferred for the identification experiments for almost all GPTs, except for ω1,2gliadins and LMW-GS, which were better characterized using the chymotryptic hydrolysate to increase sequence coverage. This may be in part due to the fact that ω1,2-gliadins are more resistant to trypsin and have less K/R (trypsin cleavage sites), so these will be under-represented compared to "other" proteins that have higher K/R and hence more tryptic peptides, such as HMW-GS (Alves et al., 2018). However, for the identification of specific gluten proteins, chymotrypsin yielded more results, because it is shown that the enrichment is more specific and that the trypsin data for some GPTs might be misleading. In general, gluten contains few lysine and arginine residues, but it seems that trypsin was still mostly superior to chymotrypsin due to its cleavage specificity, efficiency and delivery of peptides with favorable chromatographic and MS properties in terms of ionization and fragmentation, as has been reported before (Colgrave et al., 2017b). Most peptides were tryptic, but some were also generated from aspecific cleavage sites. We also observed that the identified proteins and their ranks change depending on the cleavage enzyme used. Due to a number of confounding factors, it is hard to make an assessment which enzyme is more representative of the truth, which is why the results of both approaches were combined in Figure 2. Further experiments would be necessary using additional enzymes with different cleavage specificities to investigate this in more detail. The undirected LC-MS/MS analysis of the chymotryptic hydrolysates seemed to be more suitable for the detection of peptides with CD-active epitopes, because significantly more of these peptides were identified than after tryptic hydrolysis. It is known that peptides containing CD-active epitopes are typically resistant to cleavage by trypsin and may therefore be identified in a low amount (Shan et al., 2005). In total, 15 out of 20 different CD-active epitopes were detected. Of the five that were not detected, two (DQ2.5-glia-γ4a, DQ2.5-glia-γ4d) were not present either in historical and modern spring wheat cultivars (Malalgoda et al., 2018).
To conclude, the combination of discovery proteomics and relative quantitation of gluten proteins provided novel insights into the relative amounts of the individual proteins in purified GPTs. These well-defined materials are suitable for a wide range of applications and have already been used as reference materials to quantitate gluten from wheat, rye and barley using targeted LC-MS/ MS (Schalk et al., 2018a;Schalk et al., 2018b), as stimulatory agents for epitope mapping (Röckendorf et al., 2017) and for recognition profiling of monoclonal antibodies . Further potential uses are a variety of functional assays to study mechanisms of immune activation. Our findings raise awareness of the challenges of obtaining "pure" GPTs for analytical purposes and clinical studies on disease mechanisms. Especially when applying gluten or gluten fractions in studies on pathomechanisms of, e.g., CD, NCGS, or WDEIA, it is essential to know which proteins are present in the fractions of interest to establish relationships between structure, functionality and bioactivity.

DaTa aVaiLaBiLiTY STaTeMenT
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (http://proteomecentral. proteomexchange.org) with the dataset identifier PXD016065 and are publicly available on Panorama Public (https:// panoramaweb.org/nOlizr.url).
aUThOR COnTRiBUTiOnS BL planned and performed the experiments, analyzed the data, designed the figures and wrote the original draft. MC provided access to the LC-MS/MS instruments, contributed to proteomics data analysis and study design. KS was responsible for study conceptualization, contributed to funding acquisition and editing of the manuscript. All authors reviewed and edited the manuscript and approved the final version.