Estimation of physiological exercise thresholds based on dynamical correlation properties of heart rate variability

Aerobic and anaerobic thresholds of the three-zone exercise model are often used to evaluate the exercise intensity and optimize the training load. Conventionally, these thresholds are derived from the respiratory gas exchange or blood lactate concentration measurements. Here, we introduce and validate a computational method based on the RR interval (RRI) dynamics of the heart rate (HR) measurement, which enables a simple, yet reasonably accurate estimation of both metabolic thresholds. The method utilizes a newly developed dynamical detrended fluctuation analysis (DDFA) to assess the real-time changes in the dynamical correlations of the RR intervals during exercise. The training intensity is shown to be in direct correspondence with the time- and scale-dependent changes in the DDFA scaling exponent. These changes are further used in the definition of an individual measure to estimate the aerobic and anaerobic threshold. The results for 15 volunteers who participated in a cyclo-ergometer test are compared to the benchmark lactate thresholds, as well as to the ventilatory threshods and alternative HR-based estimates based on the maximal HR and the conventional detrended fluctuation analysis (DFA). Our method provides the best overall agreement with the lactate thresholds and provides a promising, cost-effective alternative to conventional protocols, which could be easily integrated in wearable devices. However, detailed statistical analysis reveals the particular strengths and weaknessess of each method with respect to the agreement and consistency with the thresholds—thus underlining the need for further studies with more data.


Introduction
Increasing availability and popularity of health technology including wearable devices such as wrist monitors, rings and smart clothing brings significant possibilities to analyze the physiological signals in everyday life.There are multiple different measures to evaluate the effects of physical exercise from the collected data alone.One of the applications from the exercise data is to optimize the training load, and determine the different physiological zones during the exercise.Currently, there are several measures to determine different physiological changes during exercise including the conventional measurements of oxygen 10.3389/fphys.2023.1299104consumption (Foster, 1983) and lactate concentration of the blood (Faude et al., 2009), as well as estimates based on the heart rate (HR) and HR variability (HRV) measurements (Cottin et al., 2006;van Campen et al., 2020) with varying reliability and usefulness.
In a three-zone model of the exercise (Stöggl and Sperlich, 2019), the training zones are separated by thresholds, which can be either determined through the ventilatory exchange (VT 1 , VT 2 ) (Beaver et al., 1986), or the lactate concentration (LT 1 , LT 2 ) (Binder et al., 2008).The first (TH 1 ) and second thresholds (TH 2 ) are known as the aerobic and anaerobic thresholds, respectively.There are multiple different ways to determine the lactate thresholds from the lactate concentration (Newell et al., 2007;Jamnick et al., 2018;Jamnick et al., 2020), which brings uncertainty in the analysis.Furthermore, these invasive measurements are costly and time-consuming.On the other hand, the non-invasive measurements of the ventilatory exchange require a strict set of quality-control criteria (Gaskill et al., 2001;Cannon et al., 2009) and furthermore requires a specialized test environment with trained personnel to perform the measurements during the exercise.
Most wearable devices use the relative HR compared to the estimated maximal HR to determine the training zones.With this method the aerobic threshold is usually estimated to be around 60%-70% of the maximal HR (HR max ) and the anaerobic threshold around 85%-90% of the HR max , respectively.However, this estimation has a few shortcomings.First, the HR max has significant individual variability, so the simple age-dependent models are not accurate or universal (Shookster et al., 2020).Secondly, even when using the actual measured individual HR max the percentage of the HR max is also highly individual.So this model only works for population averages but fails to accurately detect individual thresholds.This opens a huge need and market for more accurate detection methods.
To date, there is no golden standard in the determination of the metabolic thresholds.There is variation in the preference, depending also on the exercise protocols and their availability on the market.The Finnish standard for threshold determination is based on the lactate values (Keskinen et al., 2018), and single case studies (Jones, 1998;Tjelta, 2019) support the use of the lactate concentration to determine the thresholds.However, there are publications and standards (Hoffman, 1999;Blain et al., 2005) which use the ventilatory thresholds as a baseline measure for the thresholds.Often, these conventional methods give quantitatively different results (Yeh et al., 1983;Şekir et al., 2002), and they are also dependent on subjective visual analysis included in the interpretation of the results, e.g., in the fitting procedures of linear trends.At present, there are new computerized methods available to define the training zones (Kim et al., 2021;Zignoli, 2023), but a simple, cost-effective and accurate method is still to be found.
HRV is a physiological measure, which captures the variation of the cardiac interbeat intervals (IBI) (Goldberger et al., 2002).The physiological responses of the exercise during both rest and exercise can be analyzed with HRV (Gronwald and Hoos, 2020).There are many different HRV metrics, often divided into time-domain, frequency-domain and non-linear measures (Shaffer and Ginsberg, 2017).There are several studies examining the determination of the physiological thresholds using the frequency-domain methods (Di Michele et al., 2012;Ramos-Campo et al., 2017).Regarding non-linear HRV methods, it was recently suggested that the short-term scaling exponent α 1 of detrended fluctuation analysis (Peng et al., 1995) (DFA), i.e., a measure for the characteristics of the RR interval (RRI) correlations for the time scale of 4-16 consecutive RR intervals, provides a simple yet accurate approach to determine the ventilation thresholds (Rogers et al., 2021a;Rogers et al., 2021b).
In this study, we utilize a modified version of the DFA, i.e., dynamical detrended fluctuation analysis (Molkkari et al., 2020) (DDFA) to derive an approximation for both aerobic and anaerobic thresholds.DDFA allows the monitoring of the scaling exponent and thus the characteristics of the RR interval correlations as functions of both scale s and time t.In particular, it was shown by some of the present authors that under running α(t, s) gradually decreases from small up to higher scales, and high-intensity exercise can be characterized by anticorrelations of the RR intervals, especially for small scales (Molkkari et al., 2020).The DDFA results can be further sorted according to the HR to examine α(HR, s).Here, this quantity was used to derive a simple yet effective measure for the aerobic and anaerobic threshold.The method was tested with HR data obtained from 15 subjects during a cyclo-ergometer test and compared to simultaneous lactate and ventilation measurements.Overall, our method yields the smallest overall difference from the LTs, thus overperforming other HR-based estimations.The VTs, instead, give fairly consistent results, e.g., relatively small variations across the samples with, however, systematic and significant overestimation of both LTs.
Our further statistical analysis shows that the introduced method is relatively consistent over the HR range of the subjectspecific thresholds, even though the number of subjects (15) limits the conclusions.In summary, our method provides a promising tool to determine the physiological zones and the corresponding thresholds during training.We further discuss the potential and applicability of the method in wearable technologies.

Participants
The participants of the study are 15 healthy volunteers (aged 22-44), who performed an exercise test at Kauppi Sports Coaching Ltd.The participants filled a form stating their legal gender, age, medical risk factors, exercise background and training goals.The individual information and maximal HR, VO2, power and the RR interval filtering (described below) percentages are shown in Table 1.The participants were instructed to only eat lightly and not to consume caffeine for a few hours or alcohol for a few days before coming to the test.The exercise backgrounds of the subjects were taken into account when choosing the power levels for the test.The participants gave a written consent for the study.The approval of the study was given by the Tampere University Hospital Ethics Committee, and the principles of the Declaration of Helsinki were followed.

Test protocol and RR interval measurement
Before the exercise test weight, height and body composition of the participants were measured.The test was performed with a Monark LC4 cyclo-ergometer, and the test started after a 5-min warm up period.Based on the training background of the participant, the test was commenced with a power of 40-120 W and every 3 min the power was increased incrementally by 20-30 W. Individually chosen power levels were chosen to best suit each participant's fitness level.The choice is based on Keskinen et al. (2018), where the change of one level is derived from the body weight of the subject.As suggested by American College of Sports Medicine (2013) et al., the maximal power level is estimated as the eight increase of the power level, and the starting level is therefore determined with (Starting level = Estimated level(W) − 7 × change of one level(W)).The test was done until exhaustion, when the subjects could not keep the required power level.During the measurement, the RR intervals were measured with a Polar H10 heart rate sensor, which has been shown to be highly reliable for RRI determination, especially during intense physical exercise (Gilgen-Ammann et al., 2019;Schaffarczyk et al., 2022).We considered the RR intervals only during the exercise, so the analysis does not cover the warm-up period.Polar H10 was connected to the Polar M430 watch for data storage.After the test, the RRI data was transferred to the computer for analysis, which was performed with Python.

Preprocessing of RR intervals
First, we ignored the RRI values outside of the data range of 200-2,000 ms, or correspondingly 30-300 BPM in the HR.Then automatic filtering was performed by ignoring values that deviated by more than 10% from windowed median filter with kernel size of 7 RRIs.Data filtering in HRV measurements is a commonly known problem (dos Santos et al., 2013) and it is challenging to develop a robust and universal filtering algorithm.We did not have the access to the full ECG recordings and there is no certainty about the nature of the outliers.Therefore, we visually examined the samples and removed the beats which were statistically outliers in the current context not to cause problems in the following analysis.A few apparently incorrect missing beats or extra R peaks not detected by the automatic algorithm were removed by visual inspection.
The filtered percentages of the RRI data are shown in Table 1.Generally, the data was of excellent quality and little to none removal of the intervals was required for the majority of the samples.However, a significant amount (15.0%) of the RRIs of subject 1 were removed.Despite this removal, the filtered data of subject 1 was also used in the analysis with relatively little effect on the results.

Lactate and ventilation measurements
To measure blood lactate concentrations, fingertip blood samples were taken at the end of each 3-min exercise load.These samples were analyzed with EKF Biosen C-line Clinic device.In the definition of the thresholds through lactate measurements we followed the standard protocol used in Finland, refined over the years (Keskinen et al., 2018).The data points from the lactate measurements (mmol/L) are plotted as a function of time and visual analysis is performed.LT 1 is defined as 0.3 mmol/L above the lowest lactate level (by linear interpolation between two lactate measurements, if necessary).Then the remaining data points are split into two linear fits: the first fit is made between LT 1 and the following data point, and the second fit is made through the final points where the lactate concentration increases by more than 0.8 mmol/L during the exercise step.LT 2 is defined as the crossing point of the two slopes (Keskinen et al., 2018).
The ventilatory exchange was measured with Cosmed Quark CPET respiratory gas analyzer during the exercise.The ventilatory thresholds were defined by combining multiple automatic methods and taking the average, as described in Kim et al. (2021).VT 1 is the average of the V-slope and excess carbon dioxide methods, whereas VT 2 is the average of the V-slope and excess minute ventilation methods (Kim et al., 2021).
Furthermore, we converted the thresholds from time to HR by taking a corresponding HR value from the measurement results.During the measurement, the HR was averaged with a 30-s window, so the HR distribution was stable and thus easily convertible.

Maximal HR method
We calculated the thresholds based on maximal HR analysis, where the first threshold (HR max T 1 ) was set at 70% of the HR max , and the second threshold (HR max T 2 ) was set at 85% of the HR max (McPartland et al., 2010).These values are often used by different manufacturers of the wearable devices, in which physiological threshold estimates are built in.However, some manufacturers use different values and even determine the exercise zones in a different way compared to that of Stöggl and Sperlich (2019).Nevertheless, we resort to the most common values given above, but point out that there is significant individual variation in the percentage of HR max for both thresholds depending on, e.g., the training background and age.This is a subject of assessment also in this work (see Section 3.2).

DFA α 1 method
We compared our threshold estimation method with that of Rogers et al. (2021a); Rogers et al. (2021b) by calculating the conventional DFA α 1 every 5 seconds in 2-min segments as a function of the average HR within the segments.Smoothness priors detrending (Tarvainen et al., 2002) was applied to the RRI time series (smoothing parameter λ = 500), but other preprocessing was conducted according to Section 2.3 instead of the Kubios software as described in Rogers et al. (2021a); Rogers et al. (2021b).The algorithm relied on linear regression on the HR versus α 1 in the region of rapid near-linear decline of α 1 to determine the aerobic and anaerobic thresholds wherein the regression line crossed α 1 = 0.75 and α 1 = 0.5, respectively.However, the definition of the regression region was left subjective in Rogers et al. (2021a); Rogers et al. (2021b).Therefore, we adopted the following algorithmic approach to support systematic studies: 1) Initially select all the data points that fall within the interval 0.5 ≤ α 1 (HR) ≤ 0.75.2) If there are multiple disjoint regions (as a function of HR) where this condition is fulfilled, connect the regions by also selecting the intervening data points if there are at most n of them.Here we chose n = 4. 3) If there are still multiple disjoint regions, choose the one with the most data points.Perform linear regression over this region and record the coefficient of determination.4) Expand the region one data point at a time from either end, and choose the one region that results in the largest coefficient of determination for the linear regression.
The performance of the algorithm was visually inspected to ascertain that it produced sensible results.We point out that this method was originally developed to assess the VTs (instead of LTs).This is taken into account in the interpretation of the results below.

Dynamical DFA method
Our computational method introduced below to estimate the metabolic thresholds from the RR intervals is based on DDFA Molkkari et al. (2020) with recent improvements (Molkkari and Räsänen, 2023).We utilize the second-order DDFA, where a secondorder polynomial is fitted in the conventional DFA fluctuation function instead of a linear fit.The DDFA scaling exponents are calculated as follows.
1) Perform dynamic segmentation for each scale s, where the segment length l = 5s.2) Compute the second-order DFA fluctuation function in maximally overlapping windows for each segment at scales s − 1, s, s + 1.The fluctuation function is thus denoted as F t (s − 1), F t (s) and F t (s + 1), for the respective scales.3) In each segment, compute the dynamic scaling exponent α(t, s) by the finite difference approximation , where h − = log(s) -log(s − 1) and h + = log(s + 1) -log(s) are the logarithmic backward and forward differences.
The metabolic thresholds are derived from the resulting DDFA scaling exponents, where the scaling exponents are aggregated as a function of HR.The procedure consists of the following steps.
1) Calculate the second-order DDFA (DDFA-2) for the RRI time series with 20 logarithmically spaced integer scales between 5-64 RRIs.The range of scales corresponds to the to the joint scales of DFA α 1 and α 2 . 10.3389/fphys.2023.1299104 2) Calculate the scaling exponents α(HR, s) as a function of HR and scale s, where HR is the average of each segment.Sort the HR values by binning them to the nearest integer HR and assign α(HR, s) values to the corresponding bins.For each scale s, take the mean value of the scaling exponents within the bins yielding a distribution of α(HR bin , s). 3) Calculate an individual baseline to reduce the variability of different physiological starting conditions.The baseline is determined from the mean value of α(HR bin , s) for the 25 bins with the lowest HR for each scale s.The baseline values of each scale are subtracted from α(HR bin , s) over the whole measurement.4) Calculate the mean value of α(HR bin , s) over each scale s of a HR bin.Smoothen the resulting α(HR bin ) curve with a mean filter with a kernel size of 10 HR bins to prefer the trend over small local fluctuations obtaining the mean smoothed scaling exponent α (HR bin ).
The first DDFA derived threshold (DDFAT 1 ) is the point, where α (HR bin ) distribution drops below the baseline.The determination of this point is derived through the following calculation.First find the points where α (HR bin ) distribution crosses the baseline.If the intersection is not stable, i.e., α (HR bin ) keeps fluctuating around the baseline, move to the next one until a stable intersection is found.We find that sufficient stability can be found when α (HR bin ) remains negative at least 10 consecutive binned HR values.5) Similarly, select the second DDFA derived threshold (DDFAT 2 ) in a stable intersection where α (HR bin ) equals −0.5.

Statistical analysis
We computed Pearson correlation coefficients for the different threshold determination methods against the LTs.As the correlations were expected to be positive, we considered the corresponding one-sided alternative hypothesis against the null hypothesis of no correlation.Because of the small sample size, we computed the p-values with a random permutation test.
In addition to correlations, we assessed agreement between the threshold estimates and LTs by performing an analysis akin to Bland and Altman (1999).Assuming normality of the differences, the 95% limits of agreement (LoA) were computed as the mean difference ±1.96 standard deviations.
The uniformity of the differences throughout the range of measurements was assessed by linear regression.The test statistic was the absolute value of the regression slope, which was compared against the regression slopes of Monte Carlo-sampled uncorrelated normally distributed differences with the observed standard deviation.We calculated the p-values for the null hypothesis that the absolute value of the slope is drawn from normally distributed uncorrelated differences with the same standard deviation as the real differences versus the one-sided alternative hypothesis that the absolute value of the slope is greater than what is expected for those normally distributed uncorrelated differences.Furthermore, we calculated the p-values for the Shapiro-Wilk normality test, where again the null hypothesis is that the differences are drawn from the normal distribution.
For all the statistics, 95% confidence intervals were computed with bias-corrected and accelerated bootstrapping (Efron, 1987).In all the methods that relied on random (re)sampling, the sampling was performed 10 4 times.

Demonstration of threshold estimation
As described above, our method for the threshold estimation utilizes the dynamical RRI correlations through the DDFA scaling exponent α(HR, s). Figure 1 illustrates our method for subject 3 during the exercise.The colors in the figure correspond to the values of the second-order DDFA scaling exponent.In Figure 1A the scaling exponent is shown as functions of HR (x-axis) and scale ( yaxis), whereas in Figure 1B it is given as functions of time and scale.The transition from higher to lower α values during the exercise, or with increasing HR, is evident in colors than turn from red to blue across an increasing range of scales.The solid line in Figure 1A shows the mean α (HR bin ) used in the determination of DDFAT 1 and DDFAT 2 as described above.The corresponding estimates for the thresholds are shown as cyan vertical dashed lines, and they can be compared with the benchmark LT values shown in black vertical dashed lines.In this particular example DDFAT thresholds underestimate LT 1 and LT 2 by only one and three BPM, respectively.
Next we consider the behavior of the RRIs during the exercise over all the subjects.These results can be qualitatively compared with the distribution of the LT values computed from the lactate measurements.Figure 2 shows an aggregate plot of α(HR, s) for all the subjects (upper panel) together with the LTs (lower panel) as a function of HR.At low HR-corresponding here to the beginning of the exercise test-the scaling exponent value is relatively high.The behavior is in accordance with the DFA results previously obtained for a healthy heart at rest Goldberger et al. (2002).At higher HR the scaling exponent decreases, first at scales of 10-20 RRIs, and then through an extending range of scales with increasing HR.At high HR above about 160 BPM, the scaling exponent decreases well below 0.5, indicating anticorrelated behavior in line with the previous findings (Molkkari et al., 2020).
On the average, LT 1 qualitatively corresponds with the point where the scaling exponent begins to decrease below the baseline around 10 RRIs.On the other hand, LT 2 is approximately located in the area where the scaling exponent decreases below 0.5 from the baseline.The value is selected as the mean scaling exponents over the scales at rest corresponds to a value of around 1.0, thus a reduction of 0.5 corresponds to an arise of anticorrelated behaviour in the RRI correlations, which is associated with fatigue (Molkkari et al., 2020).This average behavior is in accordance with the threshold determination method described in the previous section.We point out, however, that as there is a large individual variance in the HRs corresponding to the respective LTs as demonstrated in the box plots of Figure 2. Quantitative comparison between the individuals, including different threshold estimation methods, is presented in the following section.

Comparison of the methods
Table 2 shows the subject-specific HRs for both the first (aerobic) and second (anaerobic) thresholds computed with all the methods.The mean values and standard deviations are shown on the last two rows, respectively.As the first observation, the HRs of the LTs are significantly lower than those of VTs, the mean differences being 20 and 12 BPM for the first and second thresholds, respectively.This is in line with previous findings Yeh et al. (1983), although recent studies in treadmill tests have showed rather good agreement between LTs and VTs (Neves et al., 2022).Secondly, Table 2 allows the examination of different thresholds for the same subject, which is not visible on the visualizations.For example, for subject 5, LTs and DDFATs are close to each other and VTs and DFAα 1 Ts are close to each other, but HR max Ts are heavily underestimated.On the other hand, for subject 13 all the other thresholds are relatively close to each other, but once again VTs and DFAα 1 Ts have significantly higher values for both thresholds.
The differences of the methods from both LT 1 and LT 2 (in BPM) are visualized in Figure 3A Most of the methods display relatively large variations with distinctive outliers for both thresholds.Some correlation between the differences with respect to LT 1 and LT 2 , especially for VT and DFAα 1 T is also visible.The summed absolute differences to illustrate the overall performance with respect to both LTs are shown in Figure 3B.The DFAα 1 T results agree well with VTs and thus systematically exceed the LT thresholds, leading to relatively high distances from the baseline in Figure 3B.On the other hand, HR max T shows reasonable overall agreement, but include a few distinctive outliers.DDFAT shows relatively good agreement with the LT values with the smallest summed deviation in Figure 3B.
In addition to mere comparison between the HR estimates for the thresholds, it is worthwhile to examine the correlations between the estimates and the LTs and their statistical significance.Table 3 shows the Pearson correlation coefficients r with their bootstrapped 95% confidence interval lower bounds (LB) and the p-values for Pearson correlation against LTs (p corr ).The VTs correlate the most with the LTs with the best statistical significance, whereas all the other methods show low to moderate correlation.We point out, however, that with the present data size correlation analysis is considerably affected by individual outliers present in the data.
Next, we utilize Bland-Altman plots to analyze the consistency in the comparisons between the methods in further detail.Figure 4 shows the differences from LT 1 Figure 4A and LT 2 Figure 4B as a function of HR with several statistical measures.In particular, the dark gray lines show the mean differences μ diff from the LTs listed also in Table 3, and the ends in the lines indicate their 95% confidence intervals calculated with bootstrapping.Correspondingly, the light grey lines show the 95% limits of agreement and their 95% confidence intervals.In addition, the figure shows the linear fits to data as a function of HR in colored lines together with their pointwise 95% confidence intervals shown in shaded areas.The statistical measure for the linearity is assessed by considering a null hypothesis of zero slope, and the corresponding p-values (p slope ) are shown in Table 3.Here high p-values indicate a small degree of linearity.In that case the slope is consistent with normally distributed differences, implying a low chance of a trend.Table 3 shows also the p-values for the null hypothesis that the differences are normally distributed with Shapiro-Wilk normality test (p norm ).The deviations from normality appear to be mainly due to few outliers in an already limited set of studied individuals, instead of skewness or other properties of the distributions.This underlines the fact that the analysis of the consistency of the results is cumbersome with the available amount of data.
Several observations can be made from Figure 4. First, the systematic overestimation of both LT 1 and LT 2 by VT 1 and VT 2 , respectively, is clearly visible, even though the correlation between LT and VT values is relatively high as noted above, i.e., VT provides consistent results in this respect.In addition, VT 1 shows the clearest linear trend with p slope = 0.15 in comparison with the other methods.HR max T 1 shows also a linear trend, although its uncertainty is large, mainly due to a few distinctive outliers.The overall performance of HR max T is relatively good, especially for LT 2 .However, the applicability of the estimate depends of the percentages of HR max selected for the thresholds (here 70% and 85%), and these percentages depend on individual fitness levels.
Figure 4 also shows that-as expected and noted also above-DFAα 1 T values are generally in line with VTs but the mean difference from the LTs is relatively large, especially for the first threshold.In contrast with VTs, however, DFAα 1 T values do not show linear bias.Finally, DDFAT shows reasonable agreement with LTs for both thresholds without linear bias; the values of p slope are the highest in DDFAT of all the methods.

Discussion
This study explores the possibilities to determine the physiological exercise thresholds through dynamical correlation properties of RRIs captured by DDFA.We implemented a method to quantify the exercise intensity by performing the DDFA of the HR time series and calculating the thresholds from the scaling exponents α(HR, s) for incremental cyclo-ergometer exercises.In accordance with previous studies (Molkkari et al., 2020), it is found that physical exercise leads to a decrease in the scaling exponents down to the regime of anticorrelations between the RRIs.These anticorrelations may reach relatively scales comprising dozens of RRIs.A combination of the scaling exponents for scales 5-64 RRIs yields a model to predict the exercise thresholds.The overall agreement with both LTs is good, and the method does not yield notable linear bias with respect to the HR.However, more data including different testing protocols are needed for further validation.If needed, the proposed model can be modified in a straightforward way to account for, e.g., different testing protocols.
Our HR-based method is directly applicable in all wearable technologies measuring the HR.As demonstrated in Figure 1, the thresholds are computed from the RRIs dynamically, which opens the possibility of calculating the thresholds in real time with a constant data flow from the measurement device, yielding a comprehensive image of the whole exercise.Hence, we find considerable potential in the implementation of our method to the wearable devices that are intensively becoming more popular in the consumer market.Such commercial development is already foreseen (Molkkari and Räsänen, 2023).
Of other HR-based methods, HR max T compares the present HR to the expected or measured maximum HR and estimates the thresholds according to pre-determined fractions (percentages) HR/HR max for both thresholds, respectively.Thus, the validity of the method strongly depends on the assessment of HR max and especially on the validity of the aforementioned percentage for the individual.The method can be accurate for a uniform set of studied subjects with similar training background.This is reflected in our study,

FIGURE 4
Bland-Altman plots of the differences to lactate for each method for (A) threshold 1 (B) threshold 2. The solid lines correspond to the mean (dark) and 95% limits of agreement (light) of the distributions.
where HR max T gives relatively good results for both thresholds, but-on the other hand-the cohort consists of subjects with high fitness levels compared to the general population.In practice, the parameters may need to be adjusted for different study groups Tanaka et al. (2001).We have implemented automization for the definition of the regression region to DFAα 1 T method.We also want to point out that the consideration of individual baselines in the method could improve the results, which should be considered in the future studies with DFAα 1 T.

Limitations
There are some considerable limitations in the present study.Firstly, even though the results are fairly consistent, the study population is relatively small (N = 15), decreasing the statistical power of the analysis.Hence, as mentioned above, further studies with larger cohorts with different testing protocols are needed.In particular, it is worth studying how accurately our method works for other endurance sports such as running, swimming or rowing, or for longer exercises, where the HR is elevated and then decreased again due to breaks (e.g., due to lactate measurements) or lowered training intensity in interval training.Furthermore, as the present study focused on healthy individuals with high fitness levels, it is important to study how the physiological thresholds hold in more diverse populations with different ages, conditions and medications such as beta-blockers (BBs), which are known to alter the HRV (Niemelä et al., 1994;Pukkila et al., 2023).
Secondly, the determination of the LTs can be done in several different ways, and there is no universal golden standard of the protocol (Jamnick et al., 2020).Here, the lactate thresholds were determined by automated algorithm according to a Finnish standard procedure with visual inspection to ascertain sensible results.In practice, the values are usually manually corrected by physician to confirm the training zones, which is prone to subjective interpretation.In this regard, it can be debated whether VTs-often determined by an average of multiple methods-provide a more reliable benchmark than LTs.Here we resorted to LTs as the benchmark, but included also VTs in the comparison.
Finally, even though the present study contains high-quality RR intervals measured using Polar H10 sensors with up to 1,000 Hz sampling rate (Schaffarczyk et al., 2022), it remains to be studied whether wearable devices without an external HR sensor can provide accurate data for our analysis.Another limitation in this study is the lack of the ECG recording, which complicates the evaluation of the nature of the outliers.Since the ECG recording is not available, we have resorted to visual examination of the samples and we removed the beats which were statistically outliers in the current context.Previously, it has been shown that sampling rates as low as 100 Hz are applicable for some HRV measures (Kwon et al., 2018).In controlled laboratory testing with standardized test procedure, some of the popular smartwatches employing photoplethysmography (PPG) have been shown to measure HRV with sufficient accuracy in rest (Shumate et al., 2021;Nuuttila et al., 2022).However, there are several sources of noise in PPG measurements including, e.g., individual variations in the body temperature and composition, gender and skin tone, as well as external factors such as light and applied pressure of the device (Fine et al., 2021).Hence, in order to estimate the physiological thresholds with PPG, significant attention has to be paid to the signal quality and preprocessing.To conclude, further studies to evaluate the DDFA-based thresholds with PPG measurements are required.

Conclusion
Knowledge of aerobic and anaerobic thresholds are of great importance not only for professional athletes but also for wellnessoriented consumers to optimize their training.There is a need to develop simple yet accurate estimates for the thresholds without a need for tedious respiratory gas exchange and blood lactate measurements.
Here we have introduced a computational method that estimates the both lactate thresholds (LTs) by utilizing the RR intervals during a HR measurement in an exercise setting.The training intensity corresponds to the time-and scale-dependent changes in the scaling exponent of the dynamical detrended fluctuation analysis (DDFA).This information was used to define individualized estimates for the thresholds (DDFAT 1 and DDFAT 2 ).The performance of the method was compared against LTs and ventilation thresholds (VTs), as well as to two other HRbased estimates obtained for 15 participants in a incremental cyclo-ergometer test.
Our DDFAT method was found to yield a reasonable agreement with the LTs.The combined agreement with both LT 1 and LT 2 was the best of all the tested methods.The simple estimate based on HR max was found to yield relatively good results as well, but this is possibly due to the uniform fitness profile of the subjects, for which the optimized percentages of HR max are suitable.
The VTs were found to exceed LTs by ≳10 BPM in a systematic fashion.A similar effect was found with the DFAα 1 method, which was originally constructed to assess VTs and performed well in that regard.Our statistical analysis showed that DDFATs have no linear bias in terms of the HR and thus appear as promising estimates for different fitness conditions.
In summary, DDFA provides a simple, cost-efficient, and accurate HR-based method to assess the athletic metabolic thresholds, and it could be easily integrated in wearable devices.However, it is still to be tested if the findings are applicable to larger populations or to other sports and alternative test protocols.After performing more extensive studies, we believe that the method can be successfully integrated into modern wearable devices such as smartwatches to be utilized in everyday use.
FIGURE 1Illustration of the determination of the thresholds DDFAT 1 and DDFAT 2 (cyan vertical lines) compared to the lactate thresholds (black vertical lines) for subject 3 during the exercise.The results are plotted as a function of binned HR in (A) and as a function of time in (B).

FIGURE 2
FIGURE 2 Aggregate plot of the DDFA scaling exponents for all 15 subjects as a function of the heart rate.The mean values of the aerobic and anaerobic lactate thresholds are shown as vertical dashed lines, and the box plots show the distributions of the threshold values.

TABLE 2 Subject-specific heart rates (in BPM) for the first (aerobic) and second (anaerobic) thresholds obtained by the different methods, along with the population mean and standard deviation.
Comparison of differences between LT 1 (x-axis) and LT 2 ( y-axis) for each subject with different threshold estimation methods.(B) Box plot of the summed absolute differences from both LT 1 and LT 2 .