How to Deal With Multi-Proxy Data for Paleoenvironmental Reconstructions: Applications to a Holocene Lake Sediment Record From the Tian Shan, Central Asia

Multi-proxy investigations on geological archives provide valuable information about environmental variations in the past. As opposed to single-proxy studies, the combination of several proxies can reveal more detailed information and strengthen subsequent paleoenvironmental reconstructions. However, there is still no consensus about how to deal with resulting highly dimensional datasets in a statistical manner. In many cases, the interpretations of multi-proxy datasets rely on visually matching several proxy records, which can lead to incorrect or insufficient interpretations. Here we report an innovative approach that combines the novel dimension reduction technique Uniform Manifold Approximation and Projection (UMAP) and the time series analysis R package asdetect to identify and characterize Holocene environmental phases and phase boundaries in a sediment core from Lake Chatyr Kol, southern Kyrgyzstan. Despite the fact that the Holocene climate evolution of Central Asia has been intensively studied during the last decades, knowledge about regional climate development during the Holocene and the underlying mechanisms is still relatively scarce. We particularly focus on phase transitions and differentiate between event-based shifts as opposed to gradual phase transitions. For this study, long-chain alkenones were used as a paleotemperature proxy and variations in long-chain alkyl diol distributions were ascribed to relative changes of algal input. The compound-specific stable hydrogen isotope compositions (δD) of individual n-alkanes were utilized as paleohydrological proxies, with the δD of mid-chain n-alkanes reflecting changes in the δD of the lake water and the δD of long-chain n-alkanes recording the δD of the meteoric water. We show the potential of modern analysis tools for data-driven paleoenvironmental reconstructions and advocate for their more frequent implementation in multi-proxy studies.

Multi-proxy investigations on geological archives provide valuable information about environmental variations in the past. As opposed to single-proxy studies, the combination of several proxies can reveal more detailed information and strengthen subsequent paleoenvironmental reconstructions. However, there is still no consensus about how to deal with resulting highly dimensional datasets in a statistical manner. In many cases, the interpretations of multi-proxy datasets rely on visually matching several proxy records, which can lead to incorrect or insufficient interpretations. Here we report an innovative approach that combines the novel dimension reduction technique Uniform Manifold Approximation and Projection (UMAP) and the time series analysis R package asdetect to identify and characterize Holocene environmental phases and phase boundaries in a sediment core from Lake Chatyr Kol, southern Kyrgyzstan. Despite the fact that the Holocene climate evolution of Central Asia has been intensively studied during the last decades, knowledge about regional climate development during the Holocene and the underlying mechanisms is still relatively scarce. We particularly focus on phase transitions and differentiate between event-based shifts as opposed to gradual phase transitions. For this study, long-chain alkenones were used as a paleotemperature proxy and variations in long-chain alkyl diol distributions were ascribed to relative changes of algal input. The compound-specific stable hydrogen isotope compositions (δD) of individual n-alkanes were utilized as paleohydrological proxies, with the δD of mid-chain n-alkanes reflecting changes in the δD of the lake water and the δD of long-chain n-alkanes recording the δD of the meteoric water. We show the potential of modern analysis tools for data-driven paleoenvironmental reconstructions and advocate for their more frequent implementation in multi-proxy studies.

INTRODUCTION
Paleoenvironmental reconstructions are essential to elucidate the Earth's natural environmental variability in the past. This knowledge enables projections for future climate changes by providing possible analogs for future scenarios (Aichner et al., 2015). Proxy analyses provide a unique tool, often quantitative, to gain insights into past climate conditions since instrumental records mostly cover only the last ∼170 years (Guillot et al., 2015). In particular molecular organic geochemical proxies, socalled biomarkers that have been preserved in natural archives have been extensively used in the field of paleoclimatology and paleoenvironment, especially over the last two decades (Eglinton and Eglinton, 2008;Wang et al., 2016). Biomarkers derive from distinct biotic sources and their abundance and composition is diagnostic for environmental parameters of their formation (Meyers, 2003). Although they typically constitute only <5% of bulk organic matter, biomarkers are widely deployed tools for reconstructing past environmental conditions (Aichner et al., 2010;Feng et al., 2015). Combining the analyses of multiple biomarkers has proven to be more robust as opposed to single-proxy analyses because complementary information and thus a broader perspective can be obtained (Birks and Birks, 2006). However, despite the recent methodological advances, the subsequent data-evaluation and interpretation of biomarker analyses are frequently carried out rather subjectively ("by eye") with little statistical or computational techniques. Consequently, this approach can be misleading and does not exploit the full potential of these complex datasets. Yet, as the number of large datasets increases and the temporal resolution of the data becomes higher, the need for a concordant approach to summarize patterns in complex multi-proxy datasets becomes even greater. Even though techniques for reducing the dimensionality of multi-proxy data already exist and may be suitable for paleoenvironmental datasets, there are still reservations about their applications. Traditionally, the most commonly used method for dimensionality reduction has been Principal Component Analysis (PCA). PCA is a multivariate statistical analysis that utilizes matrix factorization for projecting high-dimensional data on a lower dimensional subspace comprized of the so-called principal components, while retaining the maximum variability present in the original variables (Ringnér, 2008;Abdi and Williams, 2010;Naik, 2018). Thus, by creating new variables, which are linear combinations of the original variables, PCA allows comparing similarities and potential interdependencies between the original variables (Abdi and Williams, 2010). In the context of paleoenvironmental reconstructions, the described function of PCA could be problematic for two reasons: First, as the variance in multiproxy data spans across scales and PCA primarily seeks major variability components, small-scale variance may be disregarded even though it might still contain important environmental information. And second, paleoenvironmental data are often nonlinear, especially around phase transitions. In many paleoenvironmental studies, the identification of phases and their boundaries is one of the major objectives, yet PCA may be unable to represent the variability at these critical points in time adequately. Therefore, a novel nonlinear dimension reduction technique for visualization has recently been proposed. This technique, Uniform Manifold Approximation and Projection (UMAP), is generally valued for being both locally and globally optimized . This means that while it is a neighbor-embedding method that seeks to resolve the local structure within clusters of data points, it can also define the distance between these clusters in a lowdimensional space in such a way that the original global data structure is optimally represented . Since its introduction, UMAP has received considerable attention in a variety of fields (Becht et al., 2018;Diaz-Papkovich et al., 2019;Smets et al., 2019), yet has so far not received full attention in paleoenvironmental reconstructions. In a previous study we showed the effectiveness of applying UMAP to biomolecular data obtained from a lake sediment core, allowing to clearly divide the record into distinct phases in a data-driven manner (Schroeter et al., 2020). Nevertheless, it is unclear whether phase boundaries ascertained by UMAP reflect true abrupt shifts in time series, often classified as tipping points (Lenton et al., 2008;Lenton, 2011;Scheffer et al., 2012), or if these changes can also occur gradually within a determined phase. The identification of tipping points is of particular interest since they can shift a system from one state to another, but the objective identification of them in a time series remains challenging (Marwan et al., 2018). Recently, Boulton and Lenton (2019) introduced a statistical method, implemented in R, for detecting abrupt shifts in time series, which is called asdetect. As opposed to other methods of anomalous change detection that search for significant changes in mean values, asdetect differs by looking for significant changes of gradients within the series (Boulton and Lenton, 2019). However, as asdetect was originally designed for single-proxy investigations, the authors emphasize the need for novel or extended statistical analyses that can be applied to multi-proxy datasets.
Here we combine the results of PCA, UMAP and asdetect as an innovative statistical approach to define and characterize phase boundaries in a more robust approach than previously applied in paleoenvironmental studies. We utilized a multiproxy dataset of terrestrial and aquatic biomarkers obtained from a sediment core from Lake Chatyr Kol, Kyrgyzstan, Central Asia, reflecting hydrological, and environmental changes of the lake and its catchment area, respectively. Located in Central Asia, the Tian Shan is one of the largest mountain ranges in the world, playing an important role in determining the hydrological and climatological regime of northern Central Asia. Owing to the lake's position at the intercept between the extent of the mid-latitude Westerlies, the Siberian Anticyclone and partially also the Asian monsoon system (Aizen et al., 2001;Cheng et al., 2012), the Tian Shan represents a key location to study paleoenvironmental changes at high resolution. Although different types of natural archives, such as speleothems Wolff et al., 2017), tree-rings (Esper et al., 2002), loess sequences (Machalett et al., 2008) and lake sediments (Ricketts et al., 2001;Beer et al., 2007;Lauterbach et al., 2014;Mathis et al., 2014;Schwarz et al., 2017) have been analyzed during the last two decades to understand climatic and environmental fluctuations in Central Asia, knowledge about the regional climate development during the Holocene and the underlying mechanisms is still relatively scarce.
The multi-proxy assessment in this study includes the analysis of long-chain alkenones (LCAs) and the respective alkenone unsaturation index U K 37 as a temperature proxy (Brassell et al., 1986). By analyzing long-chain alkyl diols (LCDs) as an indicator for the presence of algae (Rampen et al., 2012), we connect the intra-lake bio-productivity to variable environmental conditions. We further investigated n-alkanes and their stable hydrogen isotope composition (δD) to assess the lake water budget as well as to evaluate contributions of aquatic and terrestrial sources of organic matter. By applying PCA, UMAP, and asdetect to this multi-proxy dataset, we evaluate the potential of these data analysis approaches for multi-proxy datasets, particularly regarding their ability to differentiate between event-based phase transitions and more gradual shifts. Based on our results, we emphasize the necessity of utilizing data-driven and statistical methods for paleoenvironmental and paleoclimatological studies.

STUDY AREA
Lake Chatyr Kol is an endorheic lake located at 3545 m above sea level (a.s.l.) in the southern Kyrgyz Tian Shan (40 • 37 N, 75 • 18 E) (Figure 1). Extending ∼12 km in width (NW-SE) and ∼23 km in length (SW-NE), it encompasses a surface area of ∼159 km 2 , thus being the third largest lake in Kyrgyzstan. The maximum water depth is 20 m. To the north and south, the basin (catchment area 1084 km 2 ) is limited by the At Bashy Range (∼4600 m) and the Torugart Range (>4800 m), respectively. The extensive pasture-covered plains surrounding the lake predominantly consist of eroded Quaternary material from the mountain ranges. Besides several small tributaries and rivers, the Kekagyr River is the largest and only permanent inflow. During summer, the water balance is predominantly influenced by glacier surface melting and summer storm precipitation. The current climatic regime in the Kyrgyz Tian Shan is primarily determined by the interaction between the Siberian anticyclonic circulation and the mid-latitude Westerlies (Aizen et al., 1997;Lauterbach et al., 2014), whose moisture likely originates in the North Atlantic, the Mediterranean and Caspian Sea (Aizen et al., 2001(Aizen et al., , 2006Cheng et al., 2016). Mean annual precipitation amounts to ∼300 mm year −1 in this region (Koppes et al., 2008). The high mountain ranges surrounding Lake Chatyr Kol largely prevent the transport of moisture, resulting in dry conditions and reduced winter precipitation, especially in January and February (Aizen et al., 1995(Aizen et al., , 2001. In summer, convection increases and strengthens unstable atmospheric stratification, leading to a summer maximum of precipitation brought by cold and moist westerly air masses (Aizen et al., 1995(Aizen et al., , 2001Bershaw and Lechler, 2019). Mean annual air temperatures in this region are variable; ranging from −0.34 • C in Naryn (∼103 km northeast of Lake Chatyr Kol, 2045 m a.s.l.) (Ilyasov et al., 2013) to −7.2 • C in Ak Say (∼30 km east of Lake Chatyr Kol, 3135 m a.s.l.) (Giese et al., 2007), with highest temperatures occurring in July and August. These cold and arid conditions also favor the preservation of permafrost soils (Shnitnikov et al., 1978) and thermokarst formations are widely distributed in this region (Abuduwaili et al., 2019). Vegetation in the region is sparse and characterized by desert and semi-desert vegetation with dominance of montane grassland and no trees (Taft et al., 2011). Since the lake is not inhabited by fish, a considerable number of the amphipod species Gammarus alius sp. nov., which is mostly confined to continental freshwater/brackish habitats, have colonized aquatic plants up to a depth of 50 cm (Sidorov, 2012).

Coring and Chronology
Several vertically overlapping piston cores of 3 m length were retrieved in summer 2012 from a coring site at ∼20 m water depth in the southwestern part of Lake Chatyr Kol (40 • 36.37 N, 75 • 14.02 E) by using a 60 mm UWITEC piston corer. In addition, seven gravity cores were collected in autumn 2017 by using a 90 mm UWITEC gravity corer with hammer weight (SC17_1-7). By identification and comparison of distinct macroscopic marker layers, the individual sediment cores could be correlated and merged into a continuous 623.5cm-long composite profile . By seasonal deposition, the sediments are almost continuously annually laminated (varved) except for the upper 63 cm, allowing the establishment of a floating varve chronology ("Chatvd19") through microscopic varve counting below 63 cm composite depth (Figure 2). The formation of varves at Lake Chatyr Kol is highly linked to the seasonality of ice coverage during winter and annual variations in temperature and precipitation . A detailed description of the sediments and the age model can be found in Kalanke et al. (2020) and Schroeter et al. (2020). Briefly, microfacies analysis was performed by examining petrographic thin sections, which were prepared at the GFZ German Research Center for Geosciences, Potsdam, Germany, following the method described by Brauer and Casanova (2001). Analysis included counting of individual varves, varve thickness measurements, varve type identification and varve boundary assessment. A total of 11,259 varves were counted twice and interpolated and the mean deviation (∼5%) between the two countings was implemented as the counting uncertainty. Based on the age model, the core has a basal age of 11,619 ± 603 a BP. The floating varve chronology is further supported by accelerator mass spectrometry (AMS) 14 C measurements (Poznań Radiocarbon Laboratory, Poznań, Poland) of two wood remains at 380.5 and 528.0 cm composite depth, dating to 6140 ± 137 cal. a BP (Poz-63307) and 9988 ± 203 cal. a BP (Poz-54302), respectively. Calibrated 14 C ages (cal. a BP) were calculated in OxCal 4.3 (Bronk Ramsey, 2009) using IntCal13 (Reimer et al., 2013). Gamma spectrometric analysis of 210 Pb and 137 Cs on continuous 0.5-cm-thick sediment slices of gravity core SC17_7 was performed at the GFZ German Research Center for Geosciences, Potsdam, Germany in order to complement the chronology in the uppermost, non-varved 63 cm of the composite profile . Activity concentrations of 210 Pb  were incorporated both in a constant initial concentration (CIC) model (Robbins, 1978) and in a constant rate of supply (CRS) model (Appleby and Oldfield, 1978). The onset of enhanced 137 Cs concentrations, originating from global nuclear weapon tests (Pennington et al., 1973;Kudo et al., 1998;Wright et al., 1999), was dated to 1945 by both CIC and CRS model assumptions. Consequently, AD 1945 was chosen to constrain the age-depth relationship of the upper homogenous sediments . All ages of Lake Chatyr Kol sediment record mentioned in this study refer to the described age model .

Analysis of Lipid Biomarkers
In total, 89 bulk sediment samples of 1 cm thickness were taken from the composite profile at a mean interval of ∼5 cm.

Analysis of Sedimentary n-Alkanes
Quantification and identification of n-alkanes were performed on a gas chromatograph with flame ionization detector (GC-FID, Agilent 7890B Gas Chromatograph), based on retention time and peak area comparison with an external n-alkanes standard mixture (nC 15 to nC 33 ). Separation of the hydrocarbon fraction was achieved on an Ultra two column (50 m length, 0.32 mm inner diameter, 0.52 µm film thickness, Agilent Technologies, Santa Clara, United States) at a constant helium carrier gas flow of 2 mL min −1 . The PTV injector in splitless mode started at 45 • C for 0.1 min, then heating up to 300 • C with a ramp of 14.5 • C s −1 (held for 3 min). The GC oven temperature program increased from 140 • C (held for 1 min) to 310 at 4 • C min −1 (held for 15 min) and finally to 325 • C with a ramp of 30 • C min −1 . The final temperature of 325 • C was held for 3 min.
The average chain length (ACL) (Poynter and Eglinton, 1990) was calculated according to the following equation:

Analysis of δD of Sedimentary n-Alkanes
δD of the n-alkanes was measured using a coupled GC-IRMS system (GC: HP 7890, Agilent Technologies, Palo Alto United States; IRMS: Delta V Plus, Thermo Fisher Scientific, Bremen, Germany) equipped with a DB1 ms column (length 30 m, 0.25 mm inner diameter, 0.25 µm film thickness; Agilent Technologies, Palo Alto, United States). Column flow was constant at 1.3 ml min −1 and the injector was operated at 280 • C in splitless mode. The oven program started at 110 • C (held for 1 min), heated to 320 • C (held for 8 min) at a rate of 5 • C min −1 and finally to 350 • C (held for 3 min) at 30 • C min −1 . Injection volume was 2 µL and each sample was measured in triplicate followed by a measurement of an external n-alkane standard mixture (nC 15 to nC 33 ). The H 3 + factor was determined daily to confirm stable ion source conditions and was about 6.8 ± 0.4 over the course of analysis. δD values are reported as per mille relative to VSMOW using the standard δ notation: where, R = the ratio of deuterium to hydrogen ( 2 H/ 1 H) and VSMOW = Vienna Standard Mean Ocean Water. The Evaporation-to-Inflow ratio (E/I) was calculated based on the stable isotope approach by Gat and Levy (1978): with h = relative humidity [0.44 for Lake Chatyr Kol, obtained for the Kashi station 1971-2011 AD, NOAA IGRA (Durre et al., 2006)], δD = stable hydrogen isotope values of lake water (δDnC 23 ) and inflow water (δDnC 29 ) and δD • = limiting isotopic enrichment for a desiccating water body.
An average Lake Surface Temperature of 9 • C was selected for the calculations. A more detailed explanation of the calculation is provided in Günther et al. (2016).

Analysis of Sedimentary Long-Chain Alkenones
Prior to analysis, ketone fractions containing LCAs were saponified in 0.5 M potassium hydroxide in a MeOH/water solution (95:5, v:v) at 60 • C for 12 h (Plancq et al., 2018) to remove alkenoates that could interfere with the identification of LCAs. LCAs were recovered by liquid-liquid extraction using 3 × 1 mL hexane and subsequently analyzed using GC-FID. The analysis was performed at the University of Glasgow, United Kingdom, on an Agilent 7890B GC System equipped with an Agilent VF-200 ms capillary column (60 m length, 0.25 mm inner diameter, 0.10 µm film thickness) (Longo et al., 2013) using hydrogen as carrier gas at a column flow rate of 36 cm s −1 . The oven temperature was programmed from 50 • C (held for 1 min) to 255 • C at 20 • C min −1 , then to 300 • C at 3 • C min −1 and finally to 320 • C at 10 • C min −1 followed by 10 min hold time. Peak identification was achieved by gas chromatography-mass spectrometry (GC-MS) using an Agilent 7890B GC coupled with a 5977A GC-EI mass spectrometer and comparing peak retention times and mass spectral data with published data (de Leeuw et al., 1980;Marlowe et al., 1984). The GC conditions were the same as for GC-FID analyses. The MS conditions were as follows: ion energy 70 eV; mass range 40-600 m/z. LCA concentrations were assigned using hexatriacontane (nC 36 alkane) as an internal standard. The alkenone unsaturation index was calculated as follows: (C 37:2 − C 37:4 ) (C 37:4 + C 37:3 + C 37:2 ) (Brassell et al., 1986) A potential alternative alkenone unsaturation index is the U K 37 index, which excludes contributions of the tetra-unsaturated nC 37 methyl alkenone and is frequently utilized in marine settings (e.g. Prahl and Wakeham, 1987;Sikes and Volkman, 1993;Müller et al., 1998;Conte et al., 2006). In lacustrine environments, several studies found a significant relationship of the U K 37 index and lake water temperature but no correlation between the U K 37 index and lake water temperature Nakamura et al., 2014;Zhao et al., 2017). We therefore chose the U K 37 index over the U K 37 index in this study.

Analysis of Sedimentary Long-Chain Alkyl Diols
Polar fractions were analyzed for their LCDs concentration at the Royal Netherlands Institute for Sea Research (NIOZ), Texel, Netherlands. Samples were evaporated to dryness under nitrogen and silylated with a mixture of 10 µL N,O-bis(trimethylsilyl)trifluoroacetamide (BSTFA) and 10 µL pyridine at 60 • C for 30 min. Squalane was added as an internal standard. Samples were dissolved in 80 µL ethyl acetate and analyzed by GC-MS using an Agilent 7890B GC equipped with a Agilent CP Sil-5 fused silica capillary column (25 m length, 0.32 mm inner diameter, 0.12 µm film thickness), coupled to an Agilent 5977A MSD mass spectrometer. Instrumental conditions were the same as described by de Bar et al. (2019): the temperature program started at 70 • C, increased to 130 • C at a rate of 20 • C min −1 , heated to 320 • C at 4 • C min −1 and was held at 320 • C for 25 min. Flow rate was 2 mL min −1 . The quadrupole was held at 150 • C and the electron impact ionization energy of the MS source was 70 eV. Fractional abundances were obtained using SIM of m/z = 313.3 (C 28 1,13-diol, C 30 1,15-diol, C 34 1,19diol) and 341.3 (C 30 1,13-diol, C 32 1,15-diol, C 34 1,17-diol, C 36 1,19-diol) ions (Versteegh et al., 1997;Rampen et al., 2012).

Analysis of Diatoms
Subsamples for diatom analyses (∼0.1 g) were taken in 1-32 cm intervals, depending on diatom preservation. If the valves were well preserved, a sample was analyzed at least every 8 cm. Diatom sample preparation followed Kalbe and Werner (1974). The diatom concentration was established by adding known quantities of microspheres (Battarbee and Kneen, 1982). If possible, at least 400 valves were counted in each sample at 1000× magnification and differential interference contrast (DIC) using a Leica DM 5000B microscope. Diatom species were identified based on Lange-Bertalot and Moser (1994)

Linear Dimension Reduction by PCA
A PCA was generated using the function "prcomp" in R 3.6 (R Core Team, 2019). The settings for "prcomp" were: scale = TRUE, center = TRUE. The function was supplied with concentrations of the lipid biomarkers and winter solar insolation data at 40 • N over the course of the Holocene (Berger and Loutre, 1991). Data visualization was performed using the R package factoextra 1.0.6. (Kassambara and Mundt, 2019) with the function "fviz_pca_var" (repel = TRUE). A scree plot showing the cumulative variance explained by each principal component can be found in Supplementary Figure S1.

Nonlinear Dimension Reduction by UMAP
UMAP was performed using a R 3.6 implementation as package umap 0.2.3.1 (Konopka, 2019; R Core Team, 2019). The UMAP algorithm was supplied with the concentrations of n-alkanes, LCAs, LCDs, the E/I data, the U K 37 index and the ratio of planktonic and periphytic diatoms. If not specified differently, default options were adopted. To compute the UMAP layout, we used a Manhattan distance metric and fixed the starting conditions to random state = 3 for future reproducibility. A density-based spatial clustering was applied on the resulting layout using the function "hdbscan" within the R package dbscan 1.1-5 and a minimum number of six points per cluster (minPts = 6) (Hahsler et al., 2019). Visualizations of the results were created in the R package ggplot2 3.2.1 (Wickham, 2016).
We also compared the results of UMAP with those of the more commonly used dimensionality reduction technique t-Distributed Stochastic Neighbor Embedding (t-SNE) (Supplementary Figure 2). Both dimension reduction techniques captured the same overall variability pattern. However, UMAP performed better on our dataset as it preserved more of the global structure than t-SNE, resulting in a more distinct clustering using the same settings (Figure 6 and Supplementary Figure S2). Consequently, lake phases were separated more clearly in the UMAP embedding. We therefore implemented the results of UMAP in our further discussion.

Abrupt Shift Detection
Time series analysis was performed using the package asdetect 0.1.0 in R 3.6 (Boulton and Lenton, 2019; R Core Team, 2019). Asdetect was installed using GitHub. The package asdetect was run for each biomarker individually using the default time steps and window lengths fixed between 3 (lowwl) and 1/6 (highwl) of the respective time series. Points in the time series that contribute to significant gradients have a value of 1 added or subtracted to a detection time series, which is then divided by the window length l, resulting in a function-specific detection value.

Biomarker Results
Sedimentary n-Alkanes n-alkanes are aliphatic hydrocarbons that are relatively resistant to degradation (Meyers, 2003). Generally, their distribution and the ACL are used to distinguish different origins of organic matter sources. Long-chain n-alkanes (including nC 27 , nC 29 , nC 31 ) with a strong odd-over-even predominance are common components of vascular plant epicuticular waxes (Eglinton and Hamilton, 1967). Therefore, an increase in their concentration would entail higher terrestrial organic productivity (Meyers, 2003). In contrast, short-chain n-alkanes (nC 15 to nC 19 ) are synthesized by photosynthetic bacteria and algae (Cranwell et al., 1987;Meyers and Ishiwatari, 1993), whereas submerged and floating aquatic plants produce higher amounts of the mid-chain n-alkanes nC 21 to nC 25 (Ficken et al., 2000). Consequently, these compounds reflect lacustrine productivity levels (Meyers, 2003).
The sediments of Lake Chatyr Kol contained mainly nC 17 and nC 23 -nC 31 n-alkanes with a strong odd-over-even predominance and a changing distribution pattern within the sediment record (Figure 3). This suggests that the source of n-alkanes primarily derived from allochthonous material (e.g., terrestrial plants). Total n-alkane concentrations ranged from 1.1 to 172.4 µg g −1 dry weight (d.w.) (median: 19.7 µg g −1 d.w., Figure 4). The ACL varied between 17.9 and 29.3, with a median of 25.4. A k-means cluster analysis revealed that the Lake Chatyr Kol sediment record consists of three clusters based on the n-alkane distribution (Figure 3). Sediment samples combined in cluster 1 (n = 13) are characterized by a bimodal pattern of nC 23 /nC 25 and nC 29 /nC 31 (ACL = 26.5), thus reflecting a higher contribution from aquatic macrophytes. Cluster 2 is the most abundant cluster (n = 55), containing primarily long-chain n-alkanes nC 27 to nC 31 (ACL = 27). Consequently, cluster 2 indicates predominantly terrestrial vegetation input.  δD of Sedimentary n-Alkanes δD values of n-alkanes are valuable for paleohydrological reconstructions since they mainly record the δD of the source water . The δD of the terrestrial n-alkane nC 29 thereby reflects the δD signature of the meteoric water (precipitation) modified by evapotranspiration, whereas the δD of the aquatic n-alkane nC 23 displays the isotope signal of the ambient lake water, modified by evaporation Guenther et al., 2013). Based on the isotopic differences, the E/I can be applied as a parameter for hydroclimatic balances of a lake system (Mügler et al., 2008). Higher E/I values would indicate higher evaporation relative to input water, therefore reflecting semi-arid or arid environmental conditions. By contrast, humid conditions are characterized by lower E/I values, when inflow water exceeds evaporation (Mügler et al., 2008).
δD of nC 23 varied between −56 at ∼1450 a BP and −165 at ∼7200 a BP (Supplementary Figure S3). δD of nC 29 ranged from −195 at ∼5900 a BP to −100 at ∼11,500 a BP, exhibiting smaller fluctuations than the δD of the aquatic nC 23 (Supplementary Figure S3). A significant correlation between δDnC 23 and δDnC 17 throughout the entire sediment record (Pearson correlation coefficient r = 0.48, p < 0.01, Supplementary Figures S3, S4) reinforced the potential of δDnC 23 to record changes in the lake water isotopic composition. Reconstructed E/I varied from 0.4 between ∼11,000 and 5000 a BP, pointing to humid conditions, to 0.8 between ∼5000 and 1200 a BP, indicating more arid conditions (Figure 4). The youngest part of the record (∼1000 -30 a BP) was characterized by an E/I of 0.3, representing more humid conditions (Günther et al., 2016).

Sedimentary Long-Chain Alkenones
LCAs are nC 35 -nC 42 aliphatic unsaturated ketones that are biosynthesized by haptophyte algae from the Isochrysidales order (Theroux et al., 2010). Previous investigations have shown that the occurrence of LCAs in lakes are strongly associated with salinity and stratification (Song et al., 2016;Plancq et al., 2018). Furthermore, the bloom of lacustrine alkenone producers is favored by cold water temperatures and enhanced nutrient availability . The alkenone unsaturation index U K 37 , which is based on the correlation of the degree of LCA unsaturation with the temperature of water, is increasingly used to reconstruct surface temperatures in lacustrine settings (Zink et al., 2001;Chu et al., 2005;Toney et al., 2010;Sun et al., 2012). Since LCAs are absent in surface sediments and we cannot exclude multiple haptophyte species inhabiting the lake, which would require DNA analysis, we utilized the U K 37 index as an indicator for temperature but do not report absolute temperature values.
The concentrations of nC 37 and nC 38 in the Lake Chatyr Kol sediment record exhibited large variations between 0.3 and 1571 µg g −1 d.w. with a mean of 245 µg g −1 d.w. (Figure 4). Total LCA concentrations included concentrations of the tetra-, tri-, di-unsaturated nC 37 methyl alkenones and nC 38 ethyl alkenones, as well as tri-unsaturated isomers of nC 37 and nC 38 alkenones. Values for U K 37 ranged from −0.873 to −0.057 (median −0.38, Figure 4).

Sedimentary Long-Chain Alkyl Diols
LCDs are a group of lipids that occur widespread in marine environments and consist mainly of saturated C 28 and C 30 1,13-diols, C 28 and C 30 1,14-diols, and C 30 and C 32 1,15-diols (Volkman et al., 1992(Volkman et al., , 1999Sinninghe Damsté et al., 2003;Rampen et al., 2014b). These compounds are likely produced by phytoplankton and also appear in freshwater environments (Shimokawara et al., 2010;Castañeda et al., 2011;Romero-Viana et al., 2012;Rampen et al., 2014a;Villanueva et al., 2014;Lattaud et al., 2017Lattaud et al., , 2018. A major biological source of LCDs in freshwater environments might be the Eustigmatophyceae class of algae, as previous studies have shown a link between the distribution of LCDs and eustigmatophytes (Shimokawara et al., 2010;Rampen et al., 2014a;Villanueva et al., 2014;Lattaud et al., 2018). However, the sources of LCDs as well as environmental controls on the distribution and abundance of LCDs are still poorly known (Rampen et al., 2012(Rampen et al., , 2014a. In marine settings, sea surface temperatures play an important role in determining abundance and distribution of LCDs (Rampen et al., 2012).

Planktonic and Periphytic Diatoms
Diatoms are frequently utilized as indicators of paleoenvironmental changes (Pérez et al., 2013). Based on the different adaptive strategies, diatoms can be divided into periphytic and planktonic diatoms. Periphytic diatoms colonize and attach to substrates, whereas planktonic diatoms are adapted to floating (Dunck et al., 2013).
In our sediment core, due to temporal changes in valve preservation during the period examined, the diatom concentration strongly varies between 0.0 and 5171.3 × 10 5 valves g −1 dry weight (Supplementary Figure S5). Zonation of diatom stratification was based on UMAP clustering (see chapter "Lake phases separation") and additionally visual inspection. Poor diatom preservation up to complete valve dissolution started from about 370 a BP.
Phase I (∼11,000 -10,200 a BP) and Phase II (∼10,200 -8050 a BP) are characterized by dominance of planktonic diatoms, mainly Cyclotella choctawhatcheeana Prasad. Periphytic diatoms, e.g., Cocconeis placentula Ehrenberg, are present with varying frequencies. At the beginning of Phase IIIa (∼8050 a BP), periphytic Nitzschia lacuum Lange-Bertalot occurs with higher abundance. Otherwise, the plankton dominance (C. choctawhatcheeana) increases to almost 100% in Phase IIIa. From about 5000 a BP (Phase IIIb) more periphytic diatoms that indicate saline or electrolyte-rich water, such as Achnanthes brevipes var. brevipes Agardh and Navicymbula pusilla (Grunow) Krammer 2003, appear. With the onset of Phase IV (∼2600 -780 a BP), the diatom assemblages clearly show changes in species composition, planktonic diatoms almost completely disappeared and periphytic species were dominant. From 780 a BP (Phase V) only freshwater periphytic diatoms, mainly C. placentula, occurred in Chatyr Kol. The few valves obtained initially represent periphytic and freshwater-preferring species. In the recent sediments, again planktonic species dominate.

Interdependencies Between Biomarkers
The PCA of our biomarker dataset revealed interdependencies between the various biomarker groups and parameters (Figure 5). The winter solar insolation was likely a controlling parameter of the E/I in our sediment record, which in turn affected the concentration of LCAs. Similarly, based on various lake energy and water-balance models, Li and Morrill (2010) found a close relationship between increasing winter solar insolation and increasing lake evaporation throughout the Holocene for monsoonal Asia. Higher E/I values (more arid conditions) would ultimately lead to a higher lake salinity, which likely promoted higher LCA concentrations in our record. Indeed, water salinity has been proposed as one of the determining factors for the occurrence of LCAs in lacustrine environments (Pearson et al., 2008;Toney et al., 2010;Liu et al., 2011;Song et al., 2016). Commonly, LCAs are barely present in both freshwater (0-3 g L −1 ) and hypersaline (>100 g L −1 ) lakes, but LCAs are typically found in intermediate salinity freshwater (2.41-44.43 g L −1 ) settings (Zink et al., 2001;Chu et al., 2005;Liu et al., 2011;Plancq et al., 2018). We therefore deduce a salinity-dependent occurrence of LCAs in Lake Chatyr Kol since LCAs are absent in the lake at the current salinity level of 1.18 g L −1 (measured in July 2018).
Conversely, the concentration of both LCDs and n-alkanes seemed to vary with the U K 37 index (Figure 5). Higher U K 37 values, suggesting higher temperatures, likely promoted higher occurrences of LCDs as well as n-alkanes. The temperature control on the distribution of LCDs is well observed in marine environments although the sources of the compounds are not clear (Rampen et al., 2012). In contrast, the biological sources of LCDs are likely eustigmatophyte algae (Rampen et al., 2014a;Villanueva et al., 2014). Our results suggest that the productivity of eustigmatophyte algae may have been stimulated by warmer temperatures. Similar to the occurrence of LCDs, the contribution of n-alkanes seemed to be linked to temperature (Figure 5). Since n-alkanes are dominated by their long-chain homologues FIGURE 5 | Principal Component Analysis (PCA) visualization of lipid biomarker composition and January insolation at 40 • N (Berger and Loutre, 1991) over the Holocene. Accordingly, the winter solar insolation was likely a controlling parameter of the Evaporation-to-Inflow ratio E/I, which in turn affected the presence of LCAs. Conversely, the concentration of both LCDs and n-alkanes seemed to be controlled by temperature (U K 37 index).
Frontiers in Earth Science | www.frontiersin.org FIGURE 6 | Lake phases separation by UMAP. The core was clustered into 4 separate groupings (A), which when resolved temporally yield five distinct lake phases (B). Note that one data point could not be assigned to a grouping (gray point).
in our record, an increase in their concentration would imply an enhanced deposition of allochthonous material. This can be attributed to either higher erosion and/or growth of terrestrial vegetation favored by a warmer climate. Consequently, this could have led to an input of nutrients stimulating eustigmatophyte growth.

Lake Phases Separation
We ran UMAP on the combined dataset of n-alkanes, LCAs, LCDs, E/I, U K 37 and the ratio of planktonic and periphytic diatoms, containing in total 581 independent data points. A density-based clustering of the resulting UMAP layout separated the sediment core into five distinct phases (I-V), with Phase I: ∼11,000 -10,200 a BP, Phase II: ∼10,200 -8050 a BP, Phase III: ∼8050 -2600 a BP, Phase IV: ∼2600 -780 a BP, and Phase V: ∼780 -present. Only one data point between Phase I and II remained unclustered (Figure 6).
Interestingly, Phase I (∼11,000 -10,200 a BP) and Phase V (∼780 a BP -present) were assigned to the same cluster (Figure 6), indicating similar environmental conditions. While UMAP clearly identified five temporally separated lake phases, the nature of the transitions between these phases needs further discussion.

Abrupt Shift Detection
To characterize the transitions at the previously identified phase boundaries, we ran asdetect on each proxy separately. Since we had already identified the phase boundaries, we did not apply a threshold to asdetect, but compared the individual results in a combined approach (Figure 7, Supplementary FIGURE 7 | Shift detection results by asdetect. The algorithm was performed on every proxy individually and subsequently stacked. Black lines indicate Phase boundaries defined by UMAP (see Figure 6). Figure S6). We hypothesized that phase transitions could show two major characteristics. On the one hand a tipping-point type of transition in which an event affects most of biomarker signals simultaneously. On the other hand a gradual phase transition characterized by a consecutive temporal succession of biomarker shifts.
During the Early Holocene, the phase boundaries at ∼10,200 and ∼8050 a BP were characterized by overlapping shift indicators of multiple proxies with their respective maxima/minima at the previously defined phase boundaries (Figure 7). Our data indicate that Early Holocene phase boundaries were likely event-based and could be classified as potential tipping points. In contrast, the phase boundaries in the Late Holocene at ∼2600 and 780 a BP showed less pronounced overlaps of biomarker shifts (Figure 7). Instead, a continuous succession of proxy changes suggests more gradual phase transitions during this period.

Holocene Lake Development
Based on the biomarker results and both lake phases and phase boundary identification, a reconstruction of the lake development over the course of the Holocene was conducted. In accordance with the phase separations, the lake development is discussed in separate sections.
Phase I (∼11,000 -10,200 a BP) Phase I comprises the oldest part of the sediment record and is characterized by relatively low concentrations of all biomarkers (Figures 4, 8). Relatively low U K 37 values and less negative δD values of nC 29 (−139 ± 23 ) likely reflect cold and arid climate conditions, respectively. Since aquatic plants were most probably not very abundant due to generally low in-lake productivity, concentrations of nC 23 are periodically below the limit necessary for reliable δD measurements. Less negative δD values of nC 23 at ∼11,000 to 10,900 a BP likely indicate a dry environment.
Dominant n-alkane clusters are the terrestrial cluster 2 and the microbial cluster 3 with a dominance of cluster 2 from ∼10,800 to 10,300 a BP. The change at ∼10,900 a BP from cluster 3 to cluster 2 indicates enhanced terrestrial input to the lake system. Concentrations of long-chain n-alkanes were, however, still relatively low. Nevertheless, an increase in the LCD concentration between ∼10,800 and ∼10,300 a BP indicates a slightly increasing biological productivity (Figure 4).

Phase II (∼10,200 -8050 a BP)
The transition from Phase I to Phase II was likely abrupt. Simultaneous shifts in LCDs, n-alkanes and E/I characterize the transition to Phase II as a potentially event-based tipping point (Figure 7). Rising temperatures as indicated by a sharp increase of U K 37 , most probably triggered by high summer solar insolation, promoted enhanced terrestrial input and algal productivity as seen in increasing n-alkane and the LCA amounts, respectively. A shift to a dominance of cluster 3, enhanced LCA production and the highest concentrations of LCDs, reflect high productivity of microorganisms due to more favorable lake conditions (Figure 4). The changes in the concentration and distribution of the biomarkers potentially recorded the abrupt crossing of a lake-internal system threshold as a local response to the effects of the Holocene Climate Optimum at ∼10,200 a BP (Figure 4). Low concentrations of nC 23 and thus low abundance of aquatic plants most likely reflect a higher lake level and thus a generally deeper lake as in the modern system aquatic plants occur primarily in the shallow water zone . A higher lake level can equally be inferred from the stronger dominance of planktonic over periphytic diatoms in comparison to Phase I (Figures 4, 8). Favorable lake conditions at this time interval were also inferred by Kalanke et al. (2020), who linked the dominating calcite precipitation with enhanced biological activity of e.g., Chrysophyceae and Characeae. Higher temperatures likely led to an isotopic enrichment of the Westerlies-derived precipitation, causing less negative δD values of nC 23 (δDnC 23 = −132 ± 0.3 ). E/I decreased at ∼10,200 a BP from 0.7 to 0.5, indicating higher moisture availability (Figure 4).

Phase III (∼8050 -2600 a BP)
A further potential event-based tipping point, identified by simultaneous shifts in LCDs, n-alkanes, E/I and U K 37 as well as in the planktonic/periphytic diatom ratio marks the onset of Phase III (Figures 7, 8). Warm and relatively stable environmental conditions during Phase II were interrupted by a significant drop of U K 37 at ∼8050 a BP indicating a shift to colder temperatures (Figure 4). Planktonic diatoms, for instance, were not present at ∼8050 a BP, whereas the dominance of the periphytic diatom Nitzschia lacuum suggests a cooling effect (Supplementary Figure S5). The absence of planktonic diatoms could indicate a strong shortening of the mixing phase, which is the main growth period of planktonic diatoms probably reflected by colder winters (Dreßler et al., 2011). Taking dating uncertainty into account, this cold interval possibly coincides with the North Atlantic 8.2 ka BP event (Alley and Agustsdottir, 2005;Kobashi et al., 2007). Another possibility which would explain the time lag is the hydrogeomorphology of high mountain lakes. The water balance of Lake Chatyr Kol is predominantly regulated by glacier surface melting, which in turn is influenced by air temperatures, solar radiation and precipitation (e.g., Oerlemans, 2005;Pan et al., 2012). However, oftentimes glaciers respond to climate change with a certain time lag (Jóhannesson et al., 1989;Pan et al., 2012), which would ultimately result in a response time lag of the lake. Cold and dry climate conditions during this period were also postulated for the Tibetan Plateau, mostly inferred from pollen data, such as for Lake Qinghai (∼2100 km northeast of Lake Chatyr Kol) at 8.2 cal. ka BP (Shen et al., 2005), Lake Zigetang Co (∼1700 km southeast of Lake Chatyr Kol) at 8700 -8300 cal. a BP (Herzschuh et al., 2006) and the high mountains of the northeastern Tibetan Plateau (∼2,300 km northeast of Lake Chatyr Kol) at 8300 -8000 cal. a BP (Miao et al., 2014). Furthermore, the event was observed in stalagmites across China (Dykoski et al., 2005;Wu et al., 2012;Liu et al., 2013). The overlapping timing of the reconstructed dry event ∼8200 years ago in Central Asia, that lasted ∼150 years and the 8.2 ka BP cold event of ∼160 years 8200 years ago observed in Greenland ice cores (Thomas et al., 2007), indicates an atmospheric teleconnection between North Atlantic temperature changes and precipitation in Central Asia. It is suggested that a weakening of the Atlantic Meridional Overturning Circulation affected the North Atlantic, which conversely affected the Asian Monsoon and may be related to changing mean latitudinal position of the Intertropical Convergence Zone (ITCZ) (Hu et al., 2008;Cheng et al., 2009).
A likewise decrease in temperature at Lake Chatyr Kol was also proposed by Kalanke et al. (2020) at ∼8040 a BP based on a shift in the varve microfacies. Phase III also marks the decline of the dominance of microorganisms as seen by a strong decrease in the LCD concentration as well as by a shift to a prevalence of the terrestrial n-alkane cluster 2. Contrary to the decline of microorganisms, concentrations of LCAs increased during Phase III relative to Phase II. Increases in LCA concentrations have been proposed as a stress response to cooler temperatures .
At ∼6300 -5800 a BP, δD values of nC 29 show the strongest negative values (−191 ± 4 ), possibly associated with high amounts of precipitation and higher lake levels. This would be in agreement with the dating of an exposed lake terrace, which shows a mid-Holocene age of 5786 ± 122 cal. a BP . A dominance of planktonic diatoms (C. choctawhatcheeana) and almost complete disappearance of periphytic diatoms, also indicates higher lake levels around this time. Likewise, more negative δDnC 29 values at Lake Son Kol (Central Tian Shan of Kyrgyzstan) suggest predominantly humid summers between 6000 and 4950 cal. a BP (Lauterbach et al., 2014;Schwarz et al., 2017).
Throughout Phase III, however, E/I values increased, indicating increasing dryness likely with a lower lake level. The increase of periphytic diatoms, such as the saline or electrolyte-rich water indicating species A. brevipes var. brevipes and N. pusilla (Hofmann et al., 2011), also suggests lower water levels (Supplementary Figure S5). Lowest U K 37 values were observed at ∼3900 a BP, which lasted until ∼3600 a BP (Figure 4). Furthermore, Mischke et al. (2010) observed a shift to colder conditions around 3500 cal. a BP for Lake Karakul, NE Tajikistan (∼220 km southwest of Lake Chatyr Kol) by multi-proxy analyses. An alkenone-based study from Lake Balikun, located in the eastern Tian Shan (∼1500 km northeast of Lake Chatyr Kol), showed a similar major cooling between 4800 and 3800 cal. a BP (Zhao et al., 2017). Additionally, a period of dry summers between 4950 and 3900 cal. a BP has been inferred for Lake Son Kol (∼120 km north of Lake Chatyr Kol) (Lauterbach et al., 2014). Cold-arid conditions between 4500 and 3500 cal. a BP have been globally documented and identified as the 4.2 ka BP event (Booth et al., 2005;Staubwasser and Weiss, 2006;Walker et al., 2012). The causes and forcing mechanisms behind the 4.2 ka BP event are still not well understood, but potential explanations include, amongst others, a southward shift of the ITCZ (Mayewski et al., 2004), cooler conditions in the North Atlantic (Bond et al., 1997) and a strengthening of the El Niño Southern Oscillation (Walker et al., 2012;Toth and Aronson, 2019). Recently, Perşoiu et al. (2019) hypothesized that the cold and arid conditions, which prevailed on the Eurasian landmass during the 4.2 ka BP event, resulted from a strengthening and expansion of the Siberian High, which hindered the transport of moisture by blocking the Westerlies. Consequently, this could have caused dry and cold conditions in our study area. Interestingly, both UMAP and asdetect do not characterize this event as a phase boundary nor as an abrupt shift, indicating that the 4.2 ka BP event did not lead to a lake system change in our study area. We therefore hypothesize that an event-based phase transition only occurs after an internal system threshold is exceeded.

Phase IV (∼2600 -780 a BP)
The transition to Phase IV was initiated by a shift in only a minority of the proxies, which, however, was followed by a consecutive temporal succession of biomarker shifts (Figure 7). At the transition, δDnC 23 is more negative, resulting in lower E/I values, pointing to a short-term humid phase (Figure 4). Additionally, the percentage of planktonic diatoms recorded a significant decline. Subsequently, strongly increased E/I values at ∼2300 and ∼1500 a BP characterize arid and warm periods, which led to high evaporation (δDnC 23 = −93 ± 21 ) and promoted higher concentrations of LCAs. Over the course of Phase IV, solely periphytic and primarily halophilic diatom species occurred, similarly pointing to lower lake levels and higher salinity, most likely because of higher evaporation (Figure 8). The prevailing warm and dry environmental conditions at ∼2300 and ∼1500 -1000 a BP coincide with the Roman Warm Period (RWP) and the Medieval Climate Anomaly (MCA), respectively. Similarly, Aichner et al. (2015) found warm and dry episodes at 2500 -1900 and 1500 -600 cal. a BP at Lake Karakuli, located at the westernmost edge of Xinjiang Province, China (∼240 km south of Lake Chatyr Kol). Warm conditions during the MCA are also well recorded in tree rings from Western Central Asia, including the Southern Tian Shan (Esper et al., 2002), in pollen data from the Kashgar oasis (∼150 km south of Lake Chatyr Kol) on the western margin of the Tarim Basin (Zhao et al., 2012) and for Lake Bangong Co (∼900 km southeast of Lake Chatyr Kol) (Gasse et al., 1996). During the RWP and the MCA, the lake level was likely lower, which is supported by higher abundances of periphytic diatoms and aquatic plant remains .
Decreased values for both U K 37 and E/I between ∼1030 and ∼780 a BP imply cold and wet conditions, which is also documented in a number of other paleoclimate records in Central Asia (e.g., Esper et al., 2002;Yang et al., 2009;Chen et al., 2010;Zhao et al., 2012;Aichner et al., 2015) and likely reflects the Little Ice Age (LIA). The predominance of wetter conditions during this interval is in agreement with the presence of exposed paleo-lake deposits that date to 1097 ± 166 cal. a BP and 906 ± 160 cal. a BP .

Phase V (∼780 a BP -Present)
The transition to Phase V was again initiated by a singular proxy shift followed by a lagged response of the remaining proxies (Figure 7). As inferred by UMAP, biomarker profiles during Phase V were comparable to those during Phase I (Figure 6).
The onset of Phase V is reflected by enhanced aquatic productivity, particularly by a shift from the terrestrial n-alkane cluster 2 to the aquatic n-alkane cluster 1 as well as by an increase in LCD concentrations (Figure 4). The shift in the n-alkane composition points to a dominance of submerged macrophytes, whereas, higher amounts of LCDs can be linked to higher U K 37 values, i.e., increased temperature. In addition, macrophytes dominance is also evidenced by the occurrence of the almost exclusively periphytic C. placentula (Hofmann et al., 2011).
Throughout Phase V, δDnC 29 is more negative and an average E/I of 0.3 points toward wetter conditions and higher precipitation in a warmer climate. These conditions likely discriminated against the occurrence of LCAs since their concentration decreased to a minimum (Figure 4). The reconstructed predominantly wet conditions during Phase V are in good agreement with the presence of paleo-lake deposits that date to 530 ± 204 cal. a BP .
At ∼284 a BP the concentrations of LCDs and n-alkanes declined sharply, which could be attributed to decreased U K 37 values, thus cooler temperatures. Similarly, the concentrations of LCAs remained low, likely induced by a lower E/I, indicating wetter conditions and fresher lake conditions. The inferred predominance of wetter conditions during this interval is in good agreement with the return of planktonic diatoms, which point to rising lake levels (Figures 4, 8).

CONCLUSION
The present study highlights the applicability of numerical methods for summarizing patterns in complex multi-proxy datasets. By reducing their dimensionality using PCA and UMAP, we increased their interpretability with minimal information loss. PCA allowed for elucidating proxy interdependencies while UMAP and asdetect enabled the objective identification and characterization of lake phases and corresponding phase boundaries, respectively. Over the course of the Holocene, we identified five temporally separated lake phases. In the Early Holocene, phase transitions were likely induced by supraregional events such as the onset of the Holocene Climate Optimum and the temperature decrease associated with the 8.2 ka BP event. Conversely, during the Late Holocene, phase transitions appeared to have been rather gradual, induced by rather local environmental changes within the catchment area of the lake. Whether event based or gradual, phase transitions at our study site occurred when system-internal thresholds were exceeded as a result of diverse environmental changes.
As one of the rare paleoenvironmental studies integrating data-driven methods, we were able to elucidate the high complexity of lake phase transitions and interdependencies of proxies. We advocate for the use of data-driven approaches to decipher multi-proxy paleoenvironmental records in future studies as an alternative approach to common visual inspection.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
GG designed the research idea. SL participated in field work. AS contributed diatom data. JK generated the age chronology. NS carried out laboratory analysis and the data analysis with support from JT and SS. NS prepared the manuscript. All authors discussed the results and improved the manuscript.