Degradation of Diatom Protein in Seawater: A Peptide-Level View

Peptides and proteins were identified during a controlled laboratory degradation of the marine diatom Thalassiosira weissflogii by a surface seawater microbiome. Samples from each time point were processed both with and without the protease trypsin, allowing a partial differentiation between peptides produced naturally by microbial enzymatic degradation and peptides produced from the laboratory digestion of intact protein. Over the 12-day degradation experiment, 31% of the particulate organic carbon was depleted, and there was no preferential degradation of the overall protein pool. However, there was distinct differentiation in the cellular location, secondary structure and modifications between peptides produced by microbial vs. laboratory breakdown. During the initial period of rapid algal decay and bacterial growth, intracellular components from the cytoplasm were consumed first, resulting in the accumulation of membrane-associated proteins and peptides in the detrital pool. Accompanying the enrichment of membrane protein material was an increase in the importance of ɑ-helix motifs. Methylated arginine, a post-translational modification common in cell senescence, was found in high amounts within the microbially produced detrital peptide pool, suggesting a link between in-cell modification and resistance to immediate degradation. Another modification—asparagine deamidation—accumulated within the detrital peptides. Protein taxonomies showed the bacterial community decomposing the algal material was rich in Proteobacteria, and protein annotations showed abundant transportation of solubilized carbohydrates and small peptides across membranes. At this early stage of diagenesis, no changes in bulk amino acids (THAA) were observed, yet a proteomic approach allowed us to observe selective changes in diatom protein preservation by using amino acid sequences to infer subcellular location, secondary structures, and post-translational modifications (PTMs).


INTRODUCTION
The application of "omics" to marine systems has accelerated in recent decades and contributed to a deeper understanding of the roles microorganisms play in global biogeochemical cycles (Kleiner, 2019;Saito et al., 2019). Meanwhile, geochemical techniques have shown that some of the largest pools of carbon in the ocean are the detrital remains of these organisms, both in dissolved (Aluwihare et al., 1997;Benner and Kaiser, 2003;Jiao et al., 2010) and particulate organic matter (Wakeham et al., 1997;Hedges et al., 2001;Lee et al., 2004). Though largely unexplored, analytical advances are increasing the potential for "omic" approaches to illuminate carbon cycling in marine detritus (Nunn et al., 2010;Moore et al., 2012Moore et al., , 2014. Much of the identifiable detrital organic matter in the ocean is proteinaceous (Mayer et al., 1988;Wakeham et al., 1997;Lee et al., 2004). This composition may be unsurprising given the abundance of proteins in living organisms: diatoms, one of the major primary producers in the ocean, are approximately 25% protein by dry weight (Olofsson et al., 2019). Photoautotrophic bacteria, such as Prochlorococcus, can be as much as 70% protein (Finkel et al., 2016). Determining the origins and fates of protein in marine systems is therefore critical to a predictive understanding of carbon transport and storage in a changing ocean.
Proteins are often assumed to be labile and subject to degradation (Nunn et al., 2003;Pantoja et al., 2009). Yet, some proteins have been shown to survive long exposure in seawater (Keil and Kirchman, 1994;Dong et al., 2010) and have been identified in the detrital dissolved organic matter pool (Tanoue et al., 1995;Suzuki et al., 1997;Yamada and Tanoue, 2003), in coastal and open ocean marine sediments (Mayer et al., 1986(Mayer et al., , 1988Estes et al., 2019) and in sediment pore water (Schmidt et al., 2011;Abdulla et al., 2018;Fejjar et al., 2021). Algal proteinaceous material has been shown to survive microbial degradation in 4,000 year old sediments (Knicker et al., 1996). Thus, the geochemical evidence implies that while many proteins or constituents of proteins may indeed be labile, some component of proteinaceous material has the potential to be preserved into the geologic record. However, the underlying reasons that certain proteins or their constituents are preserved while others are rapidly degraded remain largely elusive.
The aquatic geochemical literature is rich in regards to protein and amino acid degradation. Heterotrophic bacteria first secrete free or tethered endoproteases to break larger proteins to smaller peptides (Hollibaugh and Azam, 1983;Hoppe et al., 2002;Obayashi and Suzuki, 2008). Exopeptidases then cleave individual amino acids off the termini, as shown via mass spectral detection of marine detrital protein (Nunn et al., 2003;Roth and Harvey, 2006), and peptides under around 600 Da can be taken up by heterotrophic bacteria and some phytoplankton (Weiss et al., 1991;Mulholland and Lee, 2009). Encapsulation and/or sorption interactions with organic matter and inorganic mineral surfaces may limit bacterial enzyme access to (and degradation of) protein and proteinaceous compounds (Mayer, 1994;Nagata and Kirchman, 1996;Hedges et al., 2001;Knicker et al., 2001).
With the application of metaproteomics tools to marine detirial systems, peptides sourced from algal organelles and membranes have emerged as more resistant to rapid degradation in environmental and laboratory studies (Yamada and Tanoue, 2003;Nunn et al., 2010;Moore et al., 2014). Certain secondary structures such as A-helices and random coils appear to be retained within detritus while other motifs are degraded (Nunn et al., 2010), and peptides containing modified amino acids have been shown to accumulate in detrital pools of marine organic matter (Yamada and Tanoue, 2003;Liu et al., 2010;Abdulla et al., 2018;Duffy et al., in press).
Previous studies that apply metaproteomic techniques to marine detritus have generally added trypsin or other proteolytic enzymes to break proteins into smaller constituent peptides (Nunn et al., 2010;Moore et al., 2012Moore et al., , 2014. This proteolytic approach is typical in liquid chromatographymass spectrometry (LC-MS) "bottom-up" proteomics experiments which analytically require small, ionizable peptides for detection rather than bulky and complex intact proteins (Laskay et al., 2013;Saito et al., 2019). However, this analytical requirement may obscure intriguing dynamics of natural protein breakdown and digestion in the environment. In the present study we complement existing protein degradation studies by additionally looking for small peptides produced solely by microbial degradation during the experiment. This naturally produced peptide pool represents a combination of endogenous small peptides and those that have been liberated from larger proteins but are not yet fully degraded. We leverage the amino acid sequences gained from metaproteomic techniques to interrogate the subcellular origins, secondary structures, modifications, and amino acid compositions of the degrading algal proteinaceous material over time.

MATERIALS AND METHODS
The experimental workflow can be divided into distinct phases: the algal degradation experiment itself; amino acid and peptide extraction; peptide identification via a combination of a database searching and de novo peptide sequencing approaches; and data analytics.

Algal Degradation
A culture of the centric diatom Thalassiosira weissflogii was grown to approximately 1 × 10 6 cells/mL, concentrated by centrifugation, and frozen at −80 • C to render the algal culture non-viable. Aliquots were thawed and resuspended to 2 g/L by dry weight using 1 µm filtered and UV sterilized seawater from the Damariscotta Estuary, Gulf of Maine. Unsterilized seawater filtered to 1 µm was used as an inoculum to induce bacterial decomposition of the algal detritus, with 1 mL added to 15 L of algal suspension. At each time point (1, 2, 5, and 12-days), 100 mL of the degradation slurry was vacuum filtered onto 25 mm diameter 0.3 um pore size glass fiber membranes (Sterlitech GF75) at 4 • C and stored frozen prior to proteomic extraction and analysis. Further experimental details about the degradation can be found in Adams et al. (2019).

Total Hydrolyzable Amino Acids
Approximately 30 mg of material scraped from a filter was hydrolyzed in 6 N HCl as described in Cowie and Hedges (1992). Amino acids were derivatized as in Gray et al. (2017) using the AccQ Tag Ultra derivatization kit from Waters (Milford, Massachusetts). Amino acids were separated and quantified via LC-MS using a full scan method in positive ion mode, with a scan range of 100-600 m/z and a resolution of 60,000, on a Thermo Scientific Q Exactive HF Orbitrap.

Proteomic Sample Extraction
Peptides were extracted with and without the use of the serine protease trypsin, which specifically cleaves sequences on the carboxyl side (or the "C-terminal side") of lysine and arginine, except when either is bound to a C-terminal proline. Peptides created via trypsin digestion are usually between 700 and 1,500 daltons (6-14 amino acids), an ideal range for mass spectrometry detection (Laskay et al., 2013). Approximately 100 mg of material was scraped from a filter into detergent-free 50 mM ammonium bicarbonate lysis buffer with 5 mM ethylenediaminetetraacetic acid (EDTA). The chilled suspension was lysed via three cycles each of mechanical disruption with silica beads (50% 100 µm diameter and 50% 400 µm), freeze-thawing in a methanol-dry ice bath, and 30 s in a high-power water bath sonicator. The supernatant was removed after centrifugation at 4,800 rpm and the protein concentration was determined using a modified Lowry assay using reagents from Bio-Rad (Hercules, California). Extracted protein was subjected to reduction of disulfides with dithiothreitol and carbamidomethylation with iodoacetamide. One half (300 µL) of the protein extraction was subjected to in-solution digestion with trypsin (Promega Gold) overnight at room temperature by a ratio of 1 µg trypsin: 25 µg total protein in 50 mM ammonium bicarbonate buffer with 5 mM EDTA. The remaining 300 µL of protein extraction was left untreated with trypsin. Both trypsin and no-trypsin treatments were desalted using NestGroup macro-spin C18 columns (Southborough, Massachusetts) and resuspended in 5% acetonitrile with 0.1% formic acid and Waters Hi3 E. coli peptide standard mixture (100 fmol/L).
Reverse-phase liquid chromatography-high resolution mass spectrometry (LC-HRMS) analysis was performed in duplicate with a Thermo Science (Waltham, Massachusetts) EASY-nanoLC system coupled to a Thermo Orbitrap Fusion Tribrid HRMS. Peptides were separated on a home-packed analytical column consisting of a 37 cm long, 75-µm i.d. fused-silica capillary column packed with C18 particles (Magic C18AQ, 100 • A, 5 mm; Michrom) coupled to a 4 cm long, 100 m i.d. precolumn (Magic C18AQ, 200 • A, 5 mm; Michrom). Solvents of 100% LC/MS grade water with 0.1% formic acid (A) and 100% LC/MS grade acetonitrile with 0.1% formic acid (B) were used to elute peptides over a 120-min gradient from 5 to 35% solvent B. All analyses were carried out in positive mode at an NSI spray voltage of 2 kV. Data-dependent acquisition (DDA) on the top 10 ions was carried out using first higher energy collision dissociation (HCD) and then electron transfer dissociation (ETD) fragmentation methods for duplicate injections. The MS1 (parent peptide ion) scan range was 400-2,000 m/z.

Proteomic Data Analysis
Peptides were identified from the raw mass spectra using a combination database-driven and database-independent de novo sequencing approach. De novo peptide sequencing, where the amino acid sequence of peptides is determined solely from the mass spectra without comparison to a reference database (Allmer, 2011), was advantageous in this study because we lacked a paired metagenome to thoroughly characterize the microbial community from the seawater inoculum (Muth et al., 2015(Muth et al., , 2018O'Bryon et al., 2020). The combining of database and de novo is termed de novo-directed proteomics, and was performed using PeaksDB within Peaks Studio (v8.5; Bioinformatics Solutions, Waterloo, Canada; Zhang et al., 2012). The de novo-directed approach has been shown to significantly improve sensitivity and accuracy in comparison to existing database search techniques (Zhang et al., 2012).
For database searches we used a reference protein database composed of 84,000 sequence entries predicted from transcriptomes of 8 T. weissflogii strains contained in the Marine Microbial Metatranscriptome Sequencing Project (NCBI BioProject PRJNA248394, Keeling et al., 2014). We added to the reference database two Gulf of Maine surface seawater metagenomes (Yooseph et al., 2007) from the Global Ocean Survey (GOS) as an aid in identifying the seawater microbes degrading the algal detritus. Additionally, we searched against a database of common mass spectral contaminants (Mellacheruvu et al., 2013). Search parameters for both database searching and de novo sequencing included 8 maximum modifications per peptide, 15 ppm peptide mass tolerance, and 0.5 Da fragment mass tolerance. For the trypsin-digested fractions, we performed both tryptic (maximum 2 missed cleavages) and non-enzymatic constraint searches, which means that all possible peptides up to 60 residues were considered. For the fractions not treated with trypsin, only non-enzymatic constraint searches were performed. Results from technical replicates and fragmentation strategies were combined.
Peptide identification confidences are calculated differently between the database search-identified and de novo sequenced peptides. For the de novo identifications, an amino acid-level confidence score is calculated based on mass deviation from spectral features and expressed as a percentage value. We accepted de novo peptides>80% average residue local confidence (ALC) with no single amino acid score<50%. For database searches, a false discovery rate (FDR) was set<1.0% using a reversed database target-decoy strategy (i.e., searching against reversed reference protein sequences) (Elias and Gygi, 2010). De novo sequencing was also incorporated into the database searches, as PeaksDB first compares de novo sequences to the reference database to find approximate matches and decrease the search space. Agreement between de novo sequences and database search hits are also used, in part, to generate peptidelevel confidence scores derived from the p-value indicating the statistical significance of the peptide-spectrum match (the −10 lgP score). The threshold was a −10 lgP score>20. Such a score is equivalent to a p-value of 1%, signifying the probability that the identification is to a false peptide sequence (Zhang et al., 2012).
Protein identifications are notoriously difficult in samples containing many different organisms because some peptides are shared in proteins from multiple taxa. We required a minimum of 1 unique peptide per protein identification. Matching of proteins from peptides found only by de novo sequencing is detailed below.

Searching for Amino Acid Modifications
Since amino acids are frequently modified after translation, either for cell-specific purposes or during degradation, a computationally efficient method is needed to search for the myriad possible post-translational modifications (PTMs).
We used the open modification search tool PeaksPTM (Han et al., 2011), with parameters set to tryptic or non-enzymatic constraint, 2 or fewer missed cleavages, 15 ppm peptide mass tolerance, 0.2 Da fragment mass tolerance, minimum A-score > 200 (a measure of modification location confidence), fixed carbamidomethylation of cysteine, and variable oxidation of methionine. Based on PeaksPTM results, the most frequently occurring modifications were used to evaluate whether adding additional PTMs to the overall searches altered the rate of false discovery. That is, we sought the optimal balance between searching for PTMs while avoiding a vast search space that leads to decreased sensitivity (Noble, 2015;Timmins-Schiffman et al., 2017). A series of PeaksDB searches and Peaks sequencing runs of the combined data set (using same search parameters as above) with increasing numbers of variable modifications was performed to find the optimal set of PTMs to include in searches (most peptide identifications<1% FDR). Using these PTM ramping results, 10 optimal variable PTMs were selected for the final searches. They included, in addition to methionine oxidation: deamidation of asparagine, methylation of arginine, oxidation of tyrosine, methylation of lysine, oxidation of lysine, oxidation of arginine, oxidation of proline, acetylation of lysine, and glutamine cyclization (the conversion of glutamine to pyro-glutamine).

Mapping de novo Peptides to Proteins
The de novo-directed PeaksDB workflow used here outputs peptides matched to the database and de novo peptides (sequences only, no additional information). To identify peptides that came from the diatom detritus or bacterial proteins not found by database searching, the de novo peptides were aligned to the reference database (T. weissflogii transcriptomes with GOS Gulf of Maine metagenomes) using PepExplorer (Leprevost et al., 2014). PepExplorer considers common de novo sequencing errors and limitations (such as leucine and isoleucine equivalence or other combinations of amino acids having the same mass) and identifies regions of local similarity between sequences. We also performed an alignment of the sequences against a reversed version of the database to estimate a false discovery rate, which was kept<1%. Alignments were accepted at a 95% identify agreement cutoff and protein identification required at least one unique peptide alignment.
Many de novo peptides did not match back to the reference database, which was not unexpected given the low number of sequences in the database for heterotrophic bacteria and the GOS sampling locations being more open ocean (Yooseph et al., 2007) compared to the estuarine seawater inoculum. To overcome this limitation, I additionally mapped the de novo peptides to proteins contained in the entire UniProt KnowledgeBase database (The UniProt Consortium, 2018), which contains over 200 million sequences from thousands of taxa. The mapping was performed using the UniPept lowest common ancestor tool (Mesuere et al., 2016), which is built specifically for metaproteomic data and determines the taxonomic origins of peptides to the lowest possible phylogenetic rank (since some peptides are highly conserved, they may match as to only "Bacteria, " or even "Organism, " rather than to a species or genus level). The UniPept output provides the best view of the bacteria present in the experiment given the lack of genes or transcripts from which to build an ideal reference database.

Gene Ontology Terms and Secondary Structures
To identify gene ontology (GO) term annotations, the peptide sequences were aligned to the UniProt protein database using UniPept's metaproteomic functional analysis tool (Gurdeep Singh et al., 2019), which is built upon a lowest common ancestor peptide search described above. The GO categories were condensed from the broader set in order to eliminate redundancy using the REViGO tool 1 as well as manually.
Protein and protein secondary structures for all samples were estimated using the Proteus2 algorithm (Montgomerie et al., 2008) for the combination of proteins identified by PeaksDB database searching and de novo sequencing with database mapping with PepExplorer. Output is the highest likelihood of individual amino acids being part of the following structure classes: random coil, A-helices, β-strands, membrane A-helices, and membrane β-strands.
Spectral files, databases, and peptide identification (pepXML) files have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the data set identifier PXD027843.

General Characteristics of the Degradation
As previously described (Adams et al., 2019), algal biomass decreased by a factor of 2.4 during the 12-day incubation. Meanwhile, bacterial abundance increased by a factor of 5. Total particulate carbon decreased by 31% over the 12 days (Figure 1). In total, the algal detritus was partly remineralized via respiration and partly converted to bacterial biomass, but mostly remained as necromass (Adams et al., 2019). The degradation experiment started with a small inoculum of bacteria added to a detrital slurry that contained some algal proteases present during diatom cell harvesting. The algal detritus also likely contained small amounts of bacteria (or bacterial necromass) from the diatom culture, which was not confirmed to be axenic. The living bacteria then responded to the detritus and began to grow exponentially (Figure 1), during which time they released enzymes in order to break down and take up the diatom organic matter. The ratio of enzymatically hydrolyzable amino acids (a biomimetic measurement of protein bioavailability, see Mayer et al., 1995) to the organic carbon content remained constant throughout the experiment (Adams et al., 2019) indicating that there was no selective remineralization or preservation of proteins relative to other organic material at this early stage of degradation. However, by looking closely at peptides, we show that numerous compositional changes occurred in the protein pool during the 12 day period.
Our interpretation of protein dynamics during the degradation experiment relies on the distinction between two different pool of peptides: those produced in situ during the experiment (not treated with trypsin, which we will refer to as "naturally digested peptides") and those produced artificially by the laboratory digestion (referred to as "trypsin-digested peptides"). We hypothesize that the naturally digested peptides represent a mixture of proteinaceous material that is being accessed by the heterotrophic bacteria and peptides that have been released from larger proteins. The trypsin-digested peptides should represent the total of all digestible proteinaceous material present at each time point, including that which is not being actively degraded (e.g., the material that has resisted degradation thus far). Differences between these two diatom protein-derived pools provides information about relative labilities of the substrate proteins, in addition to information about the bacterial strategies of degrading the diatom necromass.
Both the trypsin and naturally digested peptide pools should also contain peptides sourced from the bacteria growing in the experiment. Given cell counts and estimated carbon per diatom and bacterial cells, there is about 150 times more algal protein than bacterial protein in the day 0 time point, and this decreases to 15 times more after 12-days (Adams et al., 2019). These peptides can be used to determine the types of bacteria present during the experiment, and the cellular functions that these bacteria are performing (e.g., Bergauer et al., 2017;Mikan et al., 2020). In the following section we examine the peptide results with the goal of trying to better understand the degradative processes occurring over the course of experiment and their implications for detrital protein cycling in the marine systems.

Algal Peptide Identification and Characteristics
The naturally digested and trypsin-digested peptides are methodologically distinct yet analytically overlapping pools. The naturally digested peptides are present in the background of the trypsin-digested fraction, since the incubation preceded proteomic sample extraction and laboratory digestion. The mass window of MS1 detection is 400-2,000 m/z. Typically, peptides produced via trypsin digestion are between 6 and 14 amino acids in length; in this data, the average trypsin-digested peptide length was 10.3 amino acids and the average peptide (MS1) mass was 1,114 Da. By contrast, the average length of peptides in the naturally digested fractions was 10.1 and the average mass, 1,082 Da. Larger polypeptides created by microbial degradation processes, if large enough, would likely have sites available to cleavage by trypsin. Thus there is additional overlap between these two pools because some larger naturally digested peptides are further subjected to trypsin digestion in the trypsindigested fraction.
The number of identified diatom peptides and proteins peaked at day 2 during the initial phase of exponential bacterial growth, and then decreased to lowest values at day 12 ( Table 1). We sorted the algal peptides by their gene ontology cellular compartment terms (GO Terms; Figure 2; see also Riffle et al., 2017;Mikan et al., 2020). The data were aggregated into eight major cellular compartments, and the three numerically dominant groups were chloroplast, cell membrane, and cytoplasmic proteins (Figure 2). In the case of membrane-bound subcellular compartments, proteins associated with organellar membranes were counted as from that organelles. For example, thylakoid membrane GO terms were classified under "Chloroplast, " for simplicity. The trypsin-digested day 0 fraction had a cellular component GO term distribution consistent with that of living diatom cells: alongside the degradation experiment's algal GO terms we plot those from peptides obtained from two Thalassiosira pseudonana proteomes published by Dyhrman et al. (2012) (Figure 2).
The naturally digested peptides contained far fewer GO term identifications after day 2, and by day 12 the only identifications in that fraction were from chloroplast and cell membrane peptides. This indicates that peptides and proteins from other cellular components were effectively degraded during the initial phase of bacterial exponential growth (Figure 1) and were not being actively degraded at day 12 because they were no longer present. Consistent with this membrane recalcitrance hypothesis is that the number of identifications of trypsin-digested peptides sourced from larger proteins also decreased for most cellular compartments other than membranes and the chloroplast (Figure 2). However, there were still some trypsin-digested peptides at day 12 for components from the mitochondria and cytoplasm, suggesting that perhaps cellular location alone does not necessarily determine whether a protein or peptide will survive the initial stages of environmental degradation. The number of trypsin-digested peptides of cytoplasmic location also dropped precipitously after day 2, while chloroplast and membrane peptides remained abundantly detectable throughout the 12-day experiment (Figure 2). This pattern indicates a shift in the detrital protein pool from one that mimics an intact living cell to one that is dominated by proteins that are associated with membranes. These findings are consistent with Laursen et al. (1996), who showed membrane proteins to be the more refractory fraction of phytoplankton protein by physical separation of subcellular fractions of phytoplankton cells. Their study reported higher proteolysis rate constants for the cytoplasmic fraction (>1.2 h −1 ) than for the membrane fraction (0.2-1 h −1 ), which correlated negatively with the ratio of chlorophyll to pheophytin (Laursen et al., 1996).
Studies of algal nutritional value to zooplankton and other animals also suggest preferential soluble protein consumption. In an evaluation of elemental uptake from diatom detritus, Reinfelder and Fisher (1991) showed that metal assimilation efficiencies of marine copepods were directly related to the cytoplasmic content of diatoms. This relationship indicates that the copepods sourced nearly all of their nutrition from the diatom cytoplasm rather than from other cellular constituents (Reinfelder and Fisher, 1991). Our experiment didn't involve zooplankton grazing, so these results cannot easily be extended to those settings since larger organisms' gut enzymes and mechanical maceration strategies were not part of our experimental system. However, the selective consumption of cytoplasmic protein components does indicate that membrane protein solubility or structure are factors in differential consumption by bacteria.

Preserved Protein Motifs
To further interrogate the proteinaceous material that survived the degradation, five classes of peptide secondary structure were evaluated: random coil, A-helices, β-strands, transmembrane A-helices, and transmembrane β-strands. Closely set A-helices contain strong hydrogen bonding between weakly polar (Ser, Thr, Cys) and strongly polar (Asn, Gln, Glu, Asp, His, Arg, Lys) amino acid residues of neighboring A-helices (Liu et al., 2004). Helices are commonly found in transmembrane proteins (Sakai and Tsukihara, 1998;Stevens and Arkin, 2000), and their ability to bend can account for the hydrophobic mismatch of the lipid bilayer (Yeagle et al., 2007). In contrast, β-strands lay relatively flat and have been hypothesized in a marine context to  Table 1. GO terms shown were condensed from a broader set to eliminate redundancy for ease of visualization using REViGO (http://revigo.irb.hr/) and further manually organized into broad categories.
adhere to mineral surfaces-ultimately aiding to their protection and resulting in enhanced preservation (Shamblin et al., 1998;Oleschuk et al., 2000;Ovesen et al., 2011). "Random coil" is not a true secondary structure, but rather is an aggregate term for short sequences where there is an absence of a helix or sheet character (Smith et al., 1996). Benchmarking of the Proteus2 machine-learning algorithm we use here showed per-residue prediction accuracy to be 87-91% for transmembrane A-helices, 86% for transmembrane β-strands, and 88% for non-membrane secondary structures (Montgomerie et al., 2008).
We note a progressive change in secondary structure distribution as the degradation proceeded, with membrane Ahelices becoming more important (Figure 3). At the same time, β-strands became a less common motif. This trend in secondary structure distributions is consistent with the GO term evidence indicating that membrane proteins are preferentially retained in the system and suggests that their tightly wrapped, difficult to denature, secondary structure could be a factor aiding in preservation. Random coils are the most common motif, and they do not change in relative importance over time, indicating that they are not particularly prone to resistance or degradation in this experiment, or that the coil category is too broad to capture any selective processes. These results are generally consistent with those of Nunn et al. (2010) in a similar degradation of a marine diatom. They identified 23 and 4 diatom peptides after 10 and 23 days of incubation with a seawater microbiome, respectively. Of the surviving 4 diatom peptides at the end of the experiment, 2 had transmembrane domains (Nunn et al., 2010).
To further investigate the theory that secondary structure is related to enhanced preservation of membrane protein components, we also compared the predicted secondary structure of just algal chloroplast/integral component of membrane peptides and algal cytoplasmic peptides that are identified at each time point of the degradation (Figure 3). Chloroplast and membrane peptides, the bulk of diatom peptide identifications on day 12 (from 167 individual proteins across both trypsin and naturally digested fractions), were 50.7% coil, 36.3% Ahelix, 3.02% β-strand, 9.93% transmembrane A-helix, and 0% membrane β-strand. In comparison, the cytoplasmic peptides (from 98 proteins across both trypsin and naturally digested fractions) identified in the initial day 0 proteome were comparatively more enriched in β-strands and depleted in Ahelices and transmembrane A-helices, with 48.3% coil, 23.8% and. In compariβ-strand, 0% transmembrane A-helix, and 0% transmembrane β-strand. Supporting the theory that A-helices are linked to preservation is that the few surviving cytoplasmic peptides on day 12 (3 proteins total, all in the trypsin-digested fraction) had 10% more A-helix character than the initial day 0 cytoplasmic peptides, at 40.6% coil, 33.1% A-helix, 26.3% β-strand, 0% transmembrane A-helix, and 0% transmembrane β-strand (Figure 3). These findings are consistent with the notion that hydrophobic interactions appear to be important in preserving membrane-associated proteins and peptides during early diagenesis (Nguyen and Harvey, 2001;Nunn et al., 2010). Together, peptide subcellular localization annotations and secondary structure analyses connect previous observations of FIGURE 3 | (A,B) Secondary structure predictions of algal peptides identified in trypsin-digested and naturally digested fractions at four points during the 12-day degradation experiment. Secondary structure motifs (coil, A-helix, β-strand, membrane A-helix, membrane β-strand) were predicted from full protein sequences using Proteus2 (Montgomerie et al., 2008) and the relative contribution of motifs determined by the identifying peptide's relative peptide area abundance. (C,D) Secondary structure predictions of just algal chloroplast and membrane peptides and algal cytoplasmic peptides that are identified at each time point of the degradation. For this comparison, proteins identified in the trypsin-digested and naturally digested fractions were combined and adjusted by relative peptide area abundance. membrane-associated protein selective preservation for bacteria (Nagata et al., 1998;Kaiser and Benner, 2008;Jiao et al., 2010) and phytoplankton (Laursen et al., 1996;Nguyen and Harvey, 2001;Wolfe et al., 2006) to molecular characteristics like 3-dimensional shape and hydrophobicity.
While subcellular location and secondary structure may be significant components that allow certain proteins to resist the early stages of degradation, two other factors have been observed in the literature; an abundance of post-translationally modified amino acids in degradation-resistant material and the enrichment of certain amino acids, like glycine, in the recalcitrant proteinaceous pool (by measurement of total hydrolyzable amino acids).

Post-translational Modifications of Amino Acids
Protein PTMs such as oxidation, phosphorylation, and methylation, play critical roles in a diverse range of biological processes like signaling, protein activity and transport, and regulation of gene expression (Shen et al., 2008;Cain et al., 2014).
PTMs are also associated with in vivo protein recycling and cell senescence (Cain et al., 2014;Dhillon and Denu, 2017). For these purposes, the "PTM" umbrella includes modifications to amino acids that could occur in detritus, including those due to protein consumption or abiotic transformations.
In the marine environment, PTMs have been linked to degradation and early diagenesis. For example, the amino acid beta alanine accumulates in the hydrolyzable phase of marine sedimentary organic matter via modification of aspartic acid (Cowie and Hedges, 1994). Modification of the nitrogencontaining side chains of glutamine, asparagine and arginine can lead to the accumulation of peptides containing deaminated side chains within anoxic marine sediment pore waters (Abdulla et al., 2018). We recently observed deamidated peptides in the sinking POM from the eastern tropical North Pacific oxygen deficient zone (Duffy et al., in press), where Van Mooy et al. (2002) hypothesized that amino acids could be selectively deaminated in order to provide reduced nitrogen to fuel chemoautotrophic processes (Van Mooy et al., 2002). In laboratory experiments, Keil and Kirchman showed that methylated peptides were accessed less efficiently by bacteria than non-methylated peptides (Keil and Kirchman, 1992). Glycan modifications of proteins contribute to stability, a mechanism that's been used to increase the shelf life of protein and peptide-based pharmaceuticals (Zhou and Qiu, 2019). Indeed, Keil and Kirchman showed that glycosylated ribulose-1,5-bisphosphate (RuBisCo) was degraded 100 times more slowly than its unmodified counterpart (Keil and Kirchman, 1993). All together, the marine literature suggests that PTMs are highly relevant to protein degradation, as the byproducts of protein degradation processes and/or as potential factors in degradation resistance.
In the present study, we first performed a non-discriminatory "open" search of thousands of possible PTMs in the UniMod database using PeaksPTM (see above in "Materials and Methods"). We then selected only the 10 most frequently occurring mass modifications in the open search results for the suite of PTMs in the ultimate database and de novo peptide searches. Our goal was to identify patterns in PTM distribution to learn more about their roles in degradation and preservation. To that end, we wanted to distinguish between PTMs present in algal protein prior to degradation and PTMs associated with the degradative process. We compared the PTMs observed in the naturally digested peptide pool, which should represent proteins being actively degraded by the bacteria, against the trypsindigested peptides. This comparison was done for only the algal protein; no bacterial proteins were included in this differential analysis of PTMs (Figure 4).
Oxidation PTMs were generally enriched in the trypsindigested component, implying that these modifications occurred within the cell prior to heterotrophic attack. Generally, they did not increase in relative abundance over time in the trypsin-digested peptide pool, suggesting that oxidations don't meaningfully impact the lability of proteinaceous material (Figure 4). It's notable that during the experiment, chloroplast protein-derived peptides increasingly became dominant among identifications (Figure 2). The algal culture was exposed to light before the degradation experiment in the dark. Photosynthesis as an oxygenic process produces active oxygen species and radicals which can cause damage to cells. Oxidations of amino acids are frequent PTMs in photosynthesis-associated proteins (Aro et al., 2005;Galetskiy et al., 2011). Similarly, lysine acetylation of chloroplast proteins has been demonstrated in plants (Lehtimäki et al., 2015) and in diatoms (Chen X.-H. et al., 2018), though the PTM frequently occurs elsewhere in the cell . The abundance of thylakoid membrane and chloroplast-associated proteins that accumulate due to preferential preservation (Figure 2) is likely why oxidations and lysine acetylations are dominant PTMs of the trypsin-digested peptides (Figure 4).
In contrast, two PTMs were strongly associated with the presumably more detrital naturally digested peptides: deamidation of asparagine and arginine methylation (Figure 4). This observation that asparagines and arginines are more modified in detrital proteins has several possible explanations. These modifications could have occurred within the living cell, resulting in a pool of protein that was easily accessible, which is why they were found more in the naturally degraded peptide pool. Alternatively, these PTMs might have been created during the degradation process and could accumulate because once created they are further degraded slower than their unmodified counterparts. This latter explanation would account for the effective accumulation of deamidated peptides in both sediment pore waters (Abdulla et al., 2018) and in sinking particulate matter (Duffy et al., in press). Thus, we hypothesize that deamination occurs during degradation, though continued research is warranted.
Unlike deamidation, methylation most likely occurs within the living cell, where it is a common PTM that is used as a control of numerous cellular functions (Ghesquière et al., 2011). Methylated peptides have been shown to be inefficiently assimilated and degraded by heterotrophic bacteria (Keil and Kirchman, 1992), suggesting that PTMs produced within living cells may also play a role in determining the rate of protein degradation.
Our open PTM searches did not yield high levels of protein glycosylation, a very broad classification of modification that entails a sugar-amino acid bond. Though glycosylation is an extremely common enzymatic PTM in both eukaryotes and prokaryotes, the non-enzymatic form of the reaction (sometimes called "glycation, " or the Maillard reaction) has been posited as a mechanism of environmental protein preservation via geopolymer formation (Collins et al., 1992;Burdige, 2007). Glycosylation has been shown to decrease microbial protein degradation rates in incubations (Keil and Kirchman, 1993) and has been found in high abundance in seawater (Yamada and Tanoue, 2003). Peptides with glycosylations are difficult to ionize and detect under the mass spectral conditions we used here (Alley et al., 2013), and to evaluate them more accurately would require different proteomics preparation techniques (e.g., as performed by Moore et al., 2014). Given the findings here and in the literature, more targeted work is needed to better understand the effect that PTMs have during the early stages of protein degradation.
We examined the PTM distribution across peptides from different subcellular compartments (Supplementary Figure 1), and found that (1) most modified peptides were from cell membrane and cytoplasm proteins, and (2) the relative distribution of PTMs across the subcellular compartments did not change significantly over the 12-day degradation, even as the overall distribution of peptides from the subcellular locations did change to become more membrane-and chloroplast-dominated (Figure 2). This result suggests that mechanisms other than modification status may be more important to overall preferential degradation pattern in this system, though the enrichment of certain PTMs in the trypsin-digested peptides or naturally digested peptides (Figure 4) points to subtler preferential degradation patterns amongst modified peptides.

Amino Acid Compositions
One of the deepest sets of literature related to protein degradation in marine systems is that of the "total hydrolyzable amino acid" pool (Wakeham et al., 1997;Dauwe and Middelburg, 1998;Dauwe et al., 1999;Lee et al., 2004). THAA analyses show clear trends during long-term carbon degradation and preservation including an accumulation of the amino acids glycine, serine and threonine (Dauwe and Middelburg, 1998), and the creation of the non-protein amino acids beta-alanine and gamma-aminobutyric acid from aspartic and glutamic acids (Cowie and Hedges, 1994).
Despite the widespread use of degradation indices derived from THAA analyses, it has been difficult to reconcile changes in bulk amino acid compositions to known protein amino acid compositions, especially during the very early stages of degradation (Keil et al., 2000). We compared the amino acid composition of the peptides identified during the degradation experiment to the THAA pool, which I measured independently. To facilitate this comparison, we plotted mole fractions of amino acids in the THAA against those in the identified algal peptides, combining the naturally and trypsin-digested amino acid compositions adjusted for relative peptide abundance (Figure 5). The near 1:1 agreement between the two approaches indicates two things: (a) the protein amino acid compositions measured using the newer "omic" approach can be effectively integrated into the large body of literature based on THAA analysis, which will become useful as peptidomic approaches are applied to samples further along the degradation pathway (e.g., sediment samples); and (b) while the omic approach identifies specific ways in which protein undergoes degradation in the ocean, the early stages remain remarkably "non-selective" at the bulk molecular level (Hedges et al., 2001).

Bacterial Community
The microbial inoculum for the degradation experiment was sourced from the Damariscotta River Estuary, whose marine waters come from the Gulf of Maine. Heterotrophic bacteria grew exponentially through the middle stages of the experiment (Figure 1). A peptide-based lowest common ancestor analysis was performed using UniPept (Gurdeep Singh et al., 2019) to assign bacterial taxa to the lowest phylogenetic level possible. The taxonomic hits were then adjusted to account for peptide spectral abundance and aggregated at the class level ( Figure 6A). There is precedence for using label-free metaproteomic data for microbial biomass determinations (Kleiner et al., 2017), and here we use the approach to learn about the community of microbes degrading the algal detritus.
There is taxonomic overlap of the initial composition in this study to that of a pyrosequencing survey of planktonic microbes at three stations in the Gulf of Maine (Li et al., 2011), though it is worth noting that no estuarine sites were included in that work. We found the microbial community at the initial time point was dominated by Gammaproteobacteria (∼60% of the peptides), with notable contributions of Cytophaga (∼5%), Bacteroidia (∼2%), and Alphaproteobacteria (∼2%) (Figure 6A). While Li et al. (2011) survey found clades of SAR11 to dominate surface seawater in the open waters of the Gulf of Maine, some Gammaproteobacteria are symbionts of diatoms (Amin et al., 2012). Since we didn't sample the seawater itself for peptidomics, just the filtered diatom rot slurry, the dominance of a diatom symbiont makes sense. During the 12-day experiment, the microbial community composition changed minimally, with two exceptions: Firstly, there was a 10-fold increase of peptides that were bacterial but could not be uniquely identified at the class level or below, from 2.2% of peptides on day 0 to 22% of peptides by day 12 (Figure 6A). At the same time, the contribution of low abundance (<0.5% of all peptides) classes increased from 0.53% on day 0 to 5.4% on day 12 ( Figure 6A). Both increases, FIGURE 5 | Mole fractions of individual amino acids from all four degradation time points as derived from total acid hydrolyzable amino acid analysis (THAA, x-axis) and tandem MS/MS-based proteomics (Peptide, y-axis). The dashed 1:1 line represents perfect agreement between approaches. The peptide amino acid compositions plotted here are derived from both trypsin-digested and naturally digested proteomics fractions. Label-free peptide quantitation was determined by the relative peptide area abundance.
in non-specific bacterial peptides and in low-abundance class peptides, are potential indications that the diversity of the bacterial community was increasing during the experiment.
The increase of non-specific bacterial peptides can be explained in two ways. Most likely, a true increase in diversity and richness resulted from a broader array of shared, conserved proteins being produced in the system by related bacteria. An alternative hypothesis is that there was an ingrowth of bacterial necromass generating degraded bacterial peptides that are no longer very taxonomically specific. Bacterial necromass in this case could be from organisms in the seawater inoculum or bacteria present during the diatom culture growth. For instance, Gammaproteobacteria are known to thrive within the phycosphere of diatoms (Amin et al., 2012), and their peptide abundance peaked at day 2 when the active bacteria were growing exponentially. It is difficult to discern which of these hypotheses is correct and further work will need to be done to evaluate the processing of bacterial detritus by other bacteria. Nonetheless, the overall taxonomic changes observed in the bacterial peptidome during the 12 days were minimal.
Unlike the somewhat ambiguous taxonomic information within the bacterial peptidome, biological process GO terms associated with the bacterial peptides provide a clearer view of how the bacterial community responded to the algal detritus ( Figure 6B). The most commonly detected GO terms are associated with transmembrane transport, carbohydrate metabolism, and DNA replication and transcription ( Figure 6B). These are the terms most strongly associated with bacterial growth (Mikan et al., 2020) and suggest that most of the bacterial peptides detected are from living bacteria. This strengthens support for the hypothesis that the Gammaproteobacteria in the samples are for the most part living and not detrital.
Our bacterial functional data are generally consistent with recent work by Mikan et al. (2020), who used metaproteomic tools and a GO-term based functional analysis to evaluate the heterotrophic bacterial response to a pulse of detrital organic matter in two Arctic microbiomes during 10-day shipboard incubations. In that study, the bacterial community increased protein synthesis, carbohydrate degradation, and cellular redox processes while simultaneously decreasing C1 metabolism (Mikan et al., 2020). Throughout this experiment we observe steady levels of transmembrane transport and protein metabolism terms. Carbohydrate metabolic process GO terms maximize in the first 2 days of the degradation, with increasing biosynthesis GO terms by day 12 (Figure 6B). Mikan et al. (2020) suggested that the bacterial community shifted their carbon acquisition strategies intracellularly before there were large shifts in the taxonomic structure of the community. Without a paired metagenome or metatranscriptome with which to perform proteomic database searches, these bacterial peptide The percentage of the total number of bacterial peptide biological process gene ontology (GO) terms. GO terms shown were condensed from a broader set to eliminate redundancy for ease of visualization using the REViGO (available http://revigo.irb.hr/) and further manually organized into broader categories. data and paired GO term data are not as complete, but show the same general trends, lending further support to the notion of Mikan et al. (2020) that functional composition and redundancy, not taxonomy, may be the most relevant factor when evaluating how effectively organic matter is or will be processed by bacteria in the ocean.

CONCLUSION AND FUTURE DIRECTIONS
In this study we evaluated proteinaceous material in a marine system both with and without the use of trypsin as an extraction and identification tool. To our knowledge, this is the first such attempt to disentangle proteins that are being degraded by microbes from those that are resistant to degradation. We show that proteins from specific cellular locations are preferentially preserved during the initial stages of degradation. As has been hypothesized and demonstrated for bacterial membrane proteins (Nagata et al., 1998;Kaiser and Benner, 2008;Jiao et al., 2010) and algal membrane proteins (Laursen et al., 1996;Nguyen and Harvey, 2001;Wolfe et al., 2006), we conclusively illustrate that proteins associated with diatom chloroplasts and membranes resist initial degradation better than those without such association. The many membraneassociated and few cytoplasm peptides that resist degradation also are relatively enriched in A-helices and depleted in β-strands, consistent with the cellular location data since A-helices are enriched in membrane proteins. However, the extent to which A-helices themselves lead to degradation resistance remains to be more thoroughly evaluated, as (1) A-helices are not exclusive to membrane proteins and (2) there could be other reasons for membrane protein survival over time that causes membrane proteins' A-helix rich motifs to become enriched in detrital material.
The novel application of proteomics without the use of trypsin also allowed for the evaluation of how PTMs relate to protein degradation. We found that the oxidation and acetylation PTMs observed likely originated within the living cell, with only asparagine deamidation and arginine methylation being predominantly associated with the degraded peptide pool. We hypothesize that PTMs have an impact on the bioavailability of peptides during early diagenesis, but again more work is needed to evaluate the extent to which PTMs provide protection. In all, we add to earlier evidence of selective protein degradation mechanisms enabled by proteomic techniques (Dong et al., 2010;Nunn et al., 2010;Moore et al., 2012Moore et al., , 2014Bridoux et al., 2015). Continued advancements in metaproteomic instrumentation and computational capabilities have great potential to better our understanding of protein degradation and preservation dynamics in the ocean.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.ebi.ac.uk/ pride/archive/PXD027843.

AUTHOR CONTRIBUTIONS
CA conducted algal degradation experiments. LM designed and directed algal degradation experiments. KH and JN contributed to proteomics sample preparation. MD conducted all proteomics data analyses. RK directed proteomics experiments. MD wrote the manuscript with input from all co-authors. All authors contributed to the article and approved the submitted version.

FUNDING
Funding for this research came from the National Science Foundation (OCE 1542240 to LM and RK) and the NSF Graduate Research Fellowship Program (DGE-1762114 to MD).