Surface Sediment Samples From Early Age of Seafloor Exploration Can Provide a Late 19th Century Baseline of the Marine Environment

Ocean-ﬂoor sediment samples collected up to 150 years ago represent an important historical archive to benchmark global changes in the seaﬂoor environment, such as species’ range shifts and pollution trends. Such benchmarking requires that the historical sediment samples represent the state of the environment at—or shortly before the time of collection. However, early oceanographic expeditions sampled the ocean ﬂoor using devices like the sounding tube or a dredge, which potentially disturb the sediment surface and recover a mix of Holocene (surface) and deeper, Pleistocene sediments. Here we use climate-sensitive microfossils as a fast biometric method to assess if historical seaﬂoor samples contain a mixture of modern and glacial sediments. Our assessment is based on comparing the composition of planktonic foraminifera (PF) assemblages in historical samples with Holocene and Last Glacial Maximum (LGM) global reference datasets. We show that eight out of the nine historical samples contain PF assemblages more similar to the Holocene than to the LGM PF assemblages, but the comparisons are only signiﬁcant when there is a high local species’ temporal turnover (from the LGM to the Holocene). When analysing temporal turnover globally, we show that upwelling and temperate regions had greatest species turnover, which are areas where our methodology would be most diagnostic. Our results suggest that sediment samples from historical collections can provide a baseline of the state of marine ecosystems in the late nineteenth century, and thus be used to assess ocean global change trends.


INTRODUCTION
Late nineteenth and early twentieth century oceanographic expeditions set out to explore the vast and then widely unknown deep ocean. The voyage of HMS Challenger is a notable example. As she sailed around the globe between 1872-76, researchers mapped for the first time the shape of the ocean basins and described over 4,500 new species of marine life (Manten, 1972). These early expeditions have important historical significance, as they mark the beginning of modern oceanography and stimulated further ocean exploration (Wüst, 1964).
From a scientific perspective, the observations and material acquired by these historical expeditions have great potential for global change research (Johnson et al., 2011;Lister and Group, 2011), as they provide a pre-1900 baseline of the marine environment (e.g., Roemmich et al., 2012;Gleckler et al., 2016). Yet historical seafloor sediment samples remain largely underutilized, because early seafloor sampling techniques involved collecting surficial sediments with instruments like the sounding tube, dredge or even the anchor (Thomson and Murray, 1891). All these instruments can penetrate below the surface and disturb the top layer of the sediment. As a result, such historical sediment samples might contain surface (Holocene) sediments mixed with deeper, glacial material (Hayward and Kawagata, 2005), potentially hindering their use as a historical baseline of the modern marine environment. Coring techniques provide more accurate sediment chronology (e.g., Röhl et al., 2000); however, historical samples represent the seafloor environment as much as 50 years earlier than the earliest core samples collected (Wüst, 1964), and thus contain sediments without any objects deposited after 1900. These uncontaminated historical samples can be useful for chemical analyses of the seafloor (e.g., pollution trends; Dekov et al., 2010), single-specimen analysis (e.g., Reichart et al., 2003;Wit et al., 2010) and investigations of species range shifts and invasions in the past century (e.g., Hoeksema et al., 2011). Therefore, it is important to assess the degree to which historical sediment samples represent Holocene or mixed-Pleistocene sediments.
One way to assess the degree of glacial mixing in the historical material would be to determine its absolute age using the radiocarbon dating technique, or to use glacial material proxies (e.g., Mg/Ca, oxygen isotopes). However, in cases where the extent of mixing is small, the exponentially decaying nature of the radiocarbon analysis can cause an ambiguous dating, and the isotopic analysis would require a large enough number of specimens to correctly represent the extent of the glacial mixing. Here we propose a complimentary method that uses planktonic foraminifera assemblage composition as a climate-sensitive fingerprint of the sediment age. Planktonic foraminifera (PF) are single-celled zooplankton that produce calcium carbonate shells and, upon death, accumulate in great numbers on the ocean floor (Hemleben et al., 1989). PF assemblage composition is sensitive to sea-surface temperature (Morey et al., 2005;Fenton et al., 2016) and its change between glacial and interglacial times has been used to determine the magnitude of glacial ocean cooling (MARGO Project Members, 2009).
In this brief report, we make use of the temperature sensitivity of PF and compare the composition of their assemblages in nine historical (> 100 years old) samples against reference PF assemblages from the Holocene (Siccha and Kucera, 2017) and the Last Glacial Maximum (Kucera et al., 2005a). We test whether it is possible to recover the extent of glacial mixing in historical seafloor sediment samples using PF assemblage composition. This new biometric method contributes to a more multidisciplinary approach to dating historical sediments.

Historical Samples
Historical samples were retrieved from the Ocean-Bottom Deposits (OBD) Collection held by The Natural History Museum in London. The OBD Collection holds about 40,000 historical samples from all the world's oceans (https://doi.org/10.5519/0096416), including most of the sediment samples collected by HMS Challenger and the British Royal Navy survey ships (Kempe and Buckley, 1987). The OBD samples are kept sealed in their original glass jars and tubes and are usually dry as the result of the long (over 100 years) storage. We selected nine samples collected between 1874 and 1905, chosen to cover different oceans, latitudes and historical marine expeditions ( Table 1). Half of the amount available in the OBD containers was further split into two equal parts, leaving an archive sample and a sample to be processed. The sample processing consisted of weighing, wet washing over a 63 µm sieve and drying in a 60 • C oven. The residues were further dry sieved over a 150µm sieve and the coarser fraction was split with a microsplitter as many times as needed to produce a representative aliquot containing around 300 PF shells (see Al-Sabouni et al., 2007). All PF specimens in each of the nine final splits were picked, glued to a micropaleontology slide and identified under a stereomicroscope to species level, resulting in a total of 2,611 individuals belonging to 31 species (Table 1,  Table S1).

Holocene and Last Glacial Maximum Data
We tested whether the composition of PF assemblages in the historical samples is more similar to assemblages of the Holocene (last 11,700 years, Walker et al., 2008) or the Last Glacial Maximum (LGM, 21,000 years ago, MARGO Project Members, 2009). The Holocene census dataset (i.e., marine surface sediment samples taken by coring methods after 1945) was recently curated and published as the ForCenS dataset, comprising 4,205 assemblage counts from unique sites (Siccha and Kucera, 2017). Three LGM datasets from the MARGO project (Barrows and Juggins, 2004;Kucera et al., 2005a,b,c) were merged following the taxonomic standardization of Siccha and Kucera (2017). This merged LGM dataset includes 1165 counts from 389 unique sites. Moreover, local estimates of open-ocean sedimentation rates are available for 156 samples in the LGM datasets, and were used to analyse the results.
The assemblage compositions of the nine historical samples were then compared to the samples from the geographically nearest site in the Holocene and LGM datasets ( Figure S1A). The distances between sites were calculated using the World Geodetic System of 1984 (WGS84, Hijmans, 2015). We then compiled annual mean and standard deviation values of sea surface temperature (SST) from the World Ocean Atlas 2013 (WOA13, 0 meters depth, Locarnini et al., 2013) for each of the 27 sites (historical, Holocene and LGM) to evaluate whether the neighboring sites are at similar SST ranges. Most neighboring sites had similar SST values except the two most southern samples ( Figure S1B). The exceptional Holocene nearest sample neighboring M.8780 was substituted by the fourth nearest neighbor, 66 km farther but more similar in SST ( Figure S1B). The mean distance between our nine sites and their nearest neighbor set was 253 km in the Holocene data and 415 km in the LGM data.

Compositional Similarity
Assemblage similarity was expressed using the Morisita-Horn index (Morisita, 1959;Horn, 1966), which is an abundancebased overlap measure that preserves essential properties of similarity measurements (Jost et al., 2011). The Morisita-Horn calculates the compositional similarity by pairwise comparison of the relative abundance of each species, and is robust to undersampling (i.e., rare species occurrence) (Jost et al., 2011). The index was calculated using the under-sampling bias correction (Chao et al., 2006), bootstrap confidence intervals based on 100 replicates and the R package SpadeR (version 0.1.1, Chao et al., 2016). For each of the nine historical assemblages, we calculated the Morisita-Horn index three times: between (1) historical and neighboring Holocene assemblages, (2) historical and neighboring LGM assemblages and (3) neighboring Holocene and LGM assemblages. This third comparison gives us a baseline index value of how much the PF assemblage composition changed locally since the LGM. If historical samples are representative of surface sediments, the similarity index calculated between historical and Holocene samples should be higher than between historical and LGM samples, with non-overlapping confidence intervals. However, if historical samples are a mixture of Holocene and LGM material, confidence intervals of the historical-Holocene and historical-LGM comparisons overlap. Confidence intervals might also overlap if the baseline index value calculated between the Holocene and the LGM reference samples is high. A high baseline value means that the local Holocene and LGM assemblages are similar and thus there is less statistical potential to detect whether a historical sediment is a surface or mixed-glacial sample. To understand where our biometric methodology would be most diagnostic, we compiled a world map of species turnover since the LGM, by calculating the Morisita-Horn index for each LGM sample and its nearest Holocene neighbor. Compositional similarity indexes were averaged per site. Distances between the Holocene sites and the 389 LGM sites were calculated using the WGS84, and averaged 52 km (median 1.5 km).

RESULTS AND DISCUSSION
In eight out of the nine samples, the compositional similarity was higher between the historical and Holocene assemblages than between historical and LGM assemblages ( Figure 1A). However, non-overlapping confidence intervals were only present in two samples (M.25 and M.5246), inferring a Holocene age for these historical sediments. These two samples also showed the lowest similarities in assemblage composition between neighboring LGM and Holocene assemblages ( Figure 1A, gray dots), meaning that there was a greater change in PF assemblage composition since the LGM in these two locations. The main differences in species compositions were the high relative abundance of Globigerina bulloides in the LGM neighboring sample of M.25, and of Globoconella inflata in the LGM neighboring sample of M.5246 (Tables S3, S5).
Compositional similarity index between historical and Holocene samples was always above 0.75, reaching maximum similarity in five samples ( Figure 1A). In six samples, the confidence intervals of the three comparisons overlapped, and all the similarity indexes were above 0.75. Since LGM and Holocene PF assemblages showed higher similarity at these six sites, our biometric test is less diagnostic. The historical sample M.8780 showed no overlap between historical-Holocene and Holocene- LGM, but the historical-LGM comparison overlapped with both comparisons, suggesting that either this sample has a mix of Holocene and glacial material, or the SST differences among these neighboring samples prevents appropriate compositional comparisons ( Figure S1B). M.8780 had more Neogloboquadrina FIGURE 1 | (A) Compositional similarity (Morisita-Horn index) between planktonic foraminiferal assemblages from historical sediment samples and assemblages from surface sediments (Holocene, brown triangle) and from the Last Glacial Maximum (LGM, green squares); and between Holocene and LGM assemblages (gray dots, baseline value of local temporal turnover). 0 means that the two assemblages share no species; 1 means that the same species were present in both samples at statistically indistinguishable proportions. Lines represent confidence intervals based on 100 bootstrap replicates. The x-axis shows the historical sample number. (B) Black triangles: historical samples (nine in total). Colored dots: LGM samples from 389 sites worldwide. The colors represent the Morisita-Horn similarity index between the LGM sample and its neighboring Holocene sample (i.e., temporal turnover). Red to orange dots indicate low similarity (i.e., high species turnover), whereas blue dots indicate similar Holocene and LGM planktonic foraminifera assemblages.
pachyderma than both the Holocene and LGM neighbors, and had G. bulloides abundances above 20%, similar to the LGM neighbor (Table S3). Moreover, differences in SST among the neighboring samples of M.192 ( Figure S1B) did not seem to influence their compositional similarities ( Figure 1A). Finally, the historical sample M.7487 was the only one that showed higher similarity with LGM than Holocene assemblages, suggesting sediment mixing. The higher relative abundances of Trilobatus sacculifer, N. incompta and Globigerinita glutinata were responsible for this pattern ( Table S5).
The global comparison between the Holocene and the LGM reference datasets shows the magnitude of PF assemblages turnover since the LGM (i.e., temporal beta-diversity, Figure 1B). In general, upwelling (eastern boundary currents and equatorial regions) and temperate sites had greatest species turnover. Our methodology would be most diagnostic in these settings. Openocean sedimentation rates available for 156 sites averaged 6.8 centimeters per thousand of years (cm/ky). Therefore, historical sampling devices would have had to penetrate on average 142.8 cm (6.8 cm/ky times 21 ky) into the sediment to contaminate the surface seafloor sample with glacial material. Nevertheless, comparing sedimentation rates to the compositional similarity between LGM samples and their Holocene neighbors reveals that the greatest temporal turnover in species composition happened at sites of lower sedimentation rates (Figure S2), where sediment mixing during historical sampling would be most likely. Furthermore, six LGM neighbors of our nine historical samples had local estimates of sedimentation rate, which varied from 1.0 to 3.4 cm/ky (mean 1.8 cm/ky, Tables S3-S5). Thus, considering our historical samples only, depths of 21−71.4 cm into the sediment would have already reached glacial age material. Historical ocean-floor sampling methods potentially disturbed the surface and led to recovery of a mix of Holocene and deeper sediments, especially at sites of lower sedimentation rates. Our biometric method would be most useful at these sites with similar sedimentation settings, which also show the greatest temporal turnover in species composition ( Figure S2).

CONCLUSION
Our results indicate that historical ocean-floor sediment samples (collected more than 100 years ago) can represent surface (Holocene) sediments, despite the use of technology not designed to recover undisturbed sediments. We show that the temporal turnover in species occurrence since the LGM varies in space making our biometric method particularly suitable for upwelling and temperate areas with low sedimentation rates. The new method allows a non-destructive preliminary assessment of glacial contamination of historical samples. Independent proxy records (e.g., Mg/Ca) and/or radiocarbon dating would be valuable to validate the success of our technique. Ideally, the biometric approach would compliment chemical-based techniques to date historical sediments, aiming for more robust results as multiple alternative lines of evidence are presented. Our results highlight the scientific potential of historical seafloor sediment collections ( Table S2). As human activities increasingly modify the marine environment, these historical collections contain important information on the pre-1900 state of marine ecosystems.

DATA AVAILABILITY STATEMENT
The data and the R code used to produce this analysis are available from the NHM Data Portal: https://doi.org/10.5519/ 0001936.