Skip to main content


Front. Physiol., 21 May 2021
Sec. Exercise Physiology

Retrospective Analysis of Training and Its Response in Marathon Finishers Based on Fitness App Data

  • 1Machine Learning and Data Analytics Lab, Department Artificial Intelligence in Biomedical Engineering, University of Erlangen-Nürnberg (FAU), Erlangen, Germany
  • 2Adidas AG, Future Sport Science, Herzogenaurach, Germany
  • 3Adidas AG, Technology & Innovation, Herzogenaurach, Germany
  • 4Runtastic GmbH, Pasching, Austria
  • 5Institute for Applied Public Health and Exercise Medicine, Furtwangen University (HFU), Furtwangen, Germany

Objective: Finishing a marathon requires to prepare for a 42.2 km run. Current literature describes which training characteristics are related to marathon performance. However, which training is most effective in terms of a performance improvement remains unclear.

Methods: We conducted a retrospective analysis of training responses during a 16 weeks training period prior to an absolved marathon. The analysis was performed on unsupervised fitness app data (Runtastic) from 6,771 marathon finishers. Differences in training volume and intensity between three response and three marathon performance groups were analyzed. Training response was quantified by the improvement of the velocity of 10 km runs Δv10 between the first and last 4 weeks of the training period. Response and marathon performance groups were classified by the 33.3rd and 66.6th percentile of Δv10 and the marathon performance time, respectively.

Results: Subjects allocated in the faster marathon performance group showed systematically higher training volume and higher shares of training at low intensities. Only subjects in the moderate and high response group increased their training velocity continuously along the 16 weeks of training.

Conclusion: We demonstrate that a combination of maximized training volumes at low intensities, a continuous increase in average running speed up to the aimed marathon velocity and high intensity runs ≤ 5 % of the overall training volume was accompanied by an improved 10 km performance which likely benefited the marathon performance as well. The study at hand proves that unsupervised workouts recorded with fitness apps can be a valuable data source for future studies in sport science.

1. Introduction

Finishing a marathon is a fascinating goal, especially for recreational runners. More and more people follow this dream in recent years, which is indicated by the rising number of marathon participants (Knechtle et al., 2018; Vitti et al., 2020). The motivations for people to take on this huge effort are manifold. They can be of personal (goal achievement), social (respect of peers), physical (lose weight), and psychological (becoming less anxious) manner (Zach et al., 2017). Independent of the motives behind the decision to participate in a marathon, all those runners are united in the task to prepare well by bringing their bodies in shape to run 42.2 km.

Marathon preparation techniques have been under scientific investigation for decades. Many researchers have evaluated long distance runners' training load by analyzing their training strategies with respect to volume and intensity. A high training volume has been proven to positively influence marathon performance (Hagan et al., 1987; Gordon et al., 2017). Especially recreational runners with lower training volumes can potentially increase their performance by increasing the amount of training. This was underlined by the results of Roecker et al. (1998) and Tanda (2011), who found training volume to be one of the key predictors for marathon performance in recreational runners.

In regard to the training intensity, various overviews outline advantages when training intensity is distributed in a polarized, i.e., non-uniform manner (Seiler and Tønnessen, 2009; Hydren and Cohen, 2015; Zinner, 2016; Rosenblat et al., 2019). Such concepts suggest spending certain proportions of the total training time within a low intensity (LIT) zone, a high intensity (HIT) zone, and optionally a threshold zone. In practice, training zones are either defined from a cardiopulmonary exercises test at defined percentages of the maximal oxygen uptake (V'O2max), at intensities related to ventilatory or lactate thresholds (Meyer et al., 2005) or alternatively at percentages of maximum heart rate (Seiler and Tønnessen, 2009) as well as at percentages of target marathon velocity (Billat et al., 2001; Kenneally et al., 2018).

Overall, a significant body of research provides evidence that certain physiological factors and training characteristics are systematically related to marathon performance. However, it has yet to be shown which training characteristics are the most effective in terms of an actual fitness improvement to positively influence an individual's marathon performance. In order to demonstrate whether certain training characteristics lead to higher fitness improvements, the natural variability of the individuals' responses to training has to be considered (Bouchard and Rankinen, 2001; Ross et al., 2019).

Current findings mainly result from studies with defined, recruited, and instructed cohorts. Such supervised investigations suffer from low participant numbers. In contrast, longitudinal and unsupervised activity data from large populations recorded in a natural habitat might enable sport scientists to derive more generalizable conclusions. Nowadays, millions of runners with different fitness levels track their training progress by uploading recorded data from portable sensors onto platforms like Garmin, Strava, Runtastic, etc. The challenge in working with this kind of data lies in its unsupervised nature. The data are unlabeled, which means that values of ground truth and contextual subject information for specific research questions are missing. Besides, the accuracy of the portable sensors used to acquire the data is unknown. Due to this reasons, Hicks et al. (2019) postulated that a plausibility check of the data from portable sensors is an integral part prior to its analysis. Different publications have already shown the potential of portable sensor data from fitness apps to further improve performance prediction (Altini and Amft, 2018; Berndsen et al., 2020; Emig and Peltonen, 2020), to accurately determine the critical speed of runners and to set up pacing strategies (Smyth and Muniz-Pumares, 2020) and also to individualize training plans for marathon preparation (Feely et al., 2020).

Longitudinal investigations of physical activities before a marathon appear to be a promising approach to further improve the applicability, impact, and efficiency of marathon training plans. To the best of our knowledge, there is no research which evaluated systematic differences in marathon training characteristics in relation to its response based on longitudinal data from a large unsupervised study cohort. Thus, we contribute to the state of the art in the following way:

1. We retrospectively analyze the response to training using data from portable sensors. We assess response by comparing runs of the same distance with comparable heart rates as proposed by Boullosa et al. (2020).

2. Based on the quantity of response, we define different response groups and analyze corresponding differences in total training volume and training intensity distribution within a 16 weeks training period prior to a performed marathon.

3. Respectively for each response group, we further analyze corresponding differences in total training volume and training intensity distribution between different marathon performance groups within a 16 weeks training period prior to a performed marathon.

2. Methods

2.1. Data Set

After extensive filtering (explained below) we evaluated the marathon training of 6,771 runners. We used data recorded by portable sensors such as smartphones, smartwatches, or heart rate chest straps from anonymized users of the Runtastic fitness app for the evaluation. The subjects were chosen based on the following criteria:

• one workout between 2017 and 2019 with a total distance between 41 and 43 km

• at least 16 workouts in 16 weeks leading up to the marathon

• GPS and heart rate data for each workout

We defined a range around the exact marathon distance of 42.2 km in order to include marathons of slightly different distance and inaccuracies of GPS devices used to track the marathon. Apart from distance, no additional requirements like profile or location were set for the marathon workout. The threshold of 16 workouts in the 16 weeks leading up to the marathon was empirically chosen to assure a minimum amount of data for evaluation. The data set included 5,288 male subjects (78.1%), 1,250 female subjects (18.5%), and 233 subjects of unknown sex (3.4%). The subjects' mean age was 38.5 ± 9.7 years. Body weight and height were not taken into account, because they were not available for all subjects. The GPS data (latitude, longitude) and heart rate data were sampled with different sampling rates. However, data streams of each workout were synchronized by global timestamps (UTC). GPS data was anonymized by adding a random offset to the data stream. The study is in accordance with the Declaration of Helsinki, because the local ethics committee raised no objection to its conduction due to the anonymized nature of the data.

2.2. Data Processing

2.2.1. Extracting Overall Subject Features

For normalization purposes in later processing stages, we extracted the average marathon performance velocity vmp and the maximum training heart rate hrmax for each subject. The average marathon performance velocity vmp was determined by the duration of the marathon performance time Tmp for the distance between 41 and 43 km. The maximum training heart rate hrmax was determined to be the median of the five highest recorded heart rates over the whole training process. We decided for this approach to cope with short term outliers in the heart rate recordings.

2.2.2. Feature Extraction of Individual Workouts

We computed a set of features for each of the W workouts leading up to a subject's marathon. The first feature obtained from the i-th workout (i ∈ {1, 2, 3, …W}) was the training duration Ti. Ti was computed by subtracting the first from the last UTC timestamps of the GPS data. If the workout duration was longer than 90 min, we saved an indicator IT90, i, which was used further on to evaluate how many long workouts were performed:

IT90,i={0        if Ti<90 minutes1        if Ti90 minutes         .    (1)

For all other GPS-based features, we computed the distance and velocity over time from the GPS data. We used the great circle distance implementation of the Python package (GeoPy, 2020) to compute the distance between two consecutive GPS recordings. This resulted in a data stream of distances between two consecutive GPS-samples over time di[n]. This data stream was used to compute the total distance of the i-th workout Di by computing the sum over all samples. Similar to IT90, i (Equation 1), we computed an indicator ID15, i for workouts with distances longer than 15 km.

In order to assess training progress, we extracted the best velocity v10,i for a 10 km segment within each workout (if Di ≥ 10 km). For the respective 10 km segment, we also computed the average heart rate hr̄10,i during the time interval.

After dividing di[n] by the corresponding duration between two consecutive GPS timestamps Δtigps[n], we obtained a velocity data stream vi[n]. We used this data stream to compute a distribution Ti[Ṽ] which describes the duration a subject spent in a defined velocity bin Ṽ during the i-th workout. To be able to define comparable velocity bins across all subjects, we normalized the velocity data stream vi[n] by the subject's marathon performance velocity vmp:

v˜i[n]=vi[n]vmp    (2)

The velocity bins for the distribution Ti[Ṽ] were defined from 0.54 · vmp to 1.8 · vmp with a bin width of 0.02. Thus, we computed the duration distribution function in the following manner:

Ti[V˜x]=nV˜xΔtigps[n]         with      n   {V˜0      if v˜i[n]0.54V˜1      if 0.54<v˜i[n]0.56              V˜64    if 1.78<v˜i[n]1.80V˜65    if 1.80<v˜i[n]    (3)

For simplicity of notation, we remove the bin indicator x from the relative velocity bin and denote the duration distribution for different velocity bins Ṽ as Ti[Ṽ] in the following.

The same procedure was performed for the heart rate data hri[m]. This data stream was normalized by the subject's maximum training heart rate hrmax. The heart rate bins were defined from 0.5 to 1 · hrmax with a fixed bin width of 0.02. This procedure resulted in the duration distribution for the heart rate Ti[HR~].

2.2.3. Grouping of Workout Features in Time Frames of 4 Weeks

In order to evaluate the training progress over time, we defined training blocks of 4 weeks similar to Berndsen et al. (2020) and computed aggregated features for those training blocks. The partition of the blocks was defined based on the marathon date. Equation (4) defines the rules by which the i-th workout was assigned to training block tb:

i  {tb1  if 16 weeks tmarathon[0]ti[0]<12 weekstb2  if 12 weeks tmarathon[0]ti[0]<8 weekstb3  if 8 weeks tmarathon[0]ti[0]<4 weekstb4  if 4 weeks tmarathon[0]ti[0]<0 weeks for i{1, 2, 3, ... W}    (4)

In this equation, tmarathon[0] describes the first UTC timestamp of the marathon workout. For the y-th training block the total training time Ttby, the total training distance Dtby, the number of workouts longer than 90 min IT90, tby and further than 15 km ID15, tby could be computed by summing the values of the workouts within the training block.

Ttby=itbyTiDtby=itbyDiIT90,tby=itbyIT90,iID15,tby=itbyID15,i    (5)

The best 10 km velocity v10,tby for training block y was chosen from all v10,i of workouts in tby:

v10,tby=maxitbyv10,i    (6)

The duration distribution curves for velocity Ti[Ṽ] and heart rate Ti[HR~] were combined for the different training blocks and converted to probability distributions Ptby[X = Ṽ] and cumulative distributions Ftby[X = Ṽ] (Figure A1).

For the y-th training block, the duration distribution curve Ttby[Ṽ] was computed by summing the duration within the velocity bin Ṽ of all the workouts belonging to the training block:

Ttby[V˜]=itbyTi[V˜]      V˜      .    (7)

From Ttby[Ṽ] we computed a probability distribution Ptby[X = Ṽ] by dividing the time spent in the velocity bins by the total training time in tby:

Ptby[X=V˜]=Ttby[V˜]V˜Ttby[V˜].    (8)

The cumulative distribution Ftby[X = Ṽ] can be computed from the probability distribution by

Ftby[X=V˜]=p=0.54V˜Ptby[X=p].    (9)

The same procedure was applied to the duration distribution of the heart rate to obtain the probability distribution Ptby[X=HR~] and cumulative distribution Ftby[X=HR~].

Using the probability distribution functions, we computed the normalized mean training velocity v¯tby and the normalized mean heart rate hr¯tby for the y-th training block:

  v¯tby=V˜V˜·Ptby[X=V˜]hr¯tby=HR~HR~·Ptby[X=HR~].    (10)

We also used the distribution function of the velocity to compute the share of the workout time the subjects spent in different intensity zones. Similar to Kenneally et al. (2018) and Billat et al. (2001), we defined the zones based on the marathon velocity. The LIT zone was defined by velocities below vmp and the HIT zone by velocities above 1.2 · vmp (Figure A1). Using the cumulative distribution functions Ftby[X = Ṽ], the share of time spent in the intensity zone for tby was computed as

            LITtby=Ftby[X=1]thresholdtby=Ftby[X=1.2]-Ftby[X=1]            HITtby=1-Ftby[X=1.2].    (11)

All the computations for the training block analysis were also applied to all W workouts leading up to the marathon in order to obtain each subject's overall training statistics.

2.2.4. Filtering Data Set

An interquartile range (IQR) filter was applied to exclude all subjects, where parameters of subjects (Tmp, D, T, hrmax) exceeded thresholds of 1.5·IQR below or above the lower and upper quartile.

In order to create valid response groups, we also excluded all subjects who did not achieve a minimum average heart rate of 0.8 · hrmax for the best 10 km runs in tb1 and tb4. 0.8 · hrmax was chosen to ensure sufficient cardiopulmonary effort for an individual best 10 km performance as well as a sufficient availability of data.

2.2.5. Categorizing Subjects in Response and Marathon Performance Groups

Conventional metrics to assess performance improvement (i.e., V'O2max or lactate thresholds) were not available for the unsupervised data set. Therefore, we used the improvement of the 10 km velocity Δv10 from tb1 to tb4 as a surrogate to evaluate the response of subjects to training throughout the 16 weeks before the marathon.

Δv10=v10,tb4-v10,tb1    (12)

A positive value for Δv10 indicates an improvement and in turn a positive response to training and vice versa. The filter for the average heart rate stated in the data filtering section assured that those assessment runs were performed with a minimum cardiopulmonary effort. Despite the absence of conventional metrics to assess performance improvement we believe that Δv10 is a plausible surrogate since it should reasonably reflect an improvement in endurance capacity (Roecker, 2008). Also, research has shown that the velocity of 10 km races highly correlates to marathon performances (Karp, 2007; Tanda, 2011).

Δv10 was used to categorize the subjects into three groups: high response, moderate response and low response. The borders separating the three response groups were computed at the 33.3rd and 66.6th percentile of Δv10. We computed the percentiles for the three response groups separately on ten different v10,tb1 velocity groups due to decreasing improvement for subjects with higher initial v10,tb1 (Figure 1). The categorization of the subjects into the response groups was based on the distribution within the velocity group and not the absolute value of Δv10. We decided for this approach to assure equally sized response groups across different performance levels.


Figure 1. Visualization of the training response Δv10 across ten velocity groups. Each dot represents one subject. The response categories are color coded. The vertical black lines are located at the decile values of v10,tb1. The horizontal gray line indicates the zero line, where subjects showed neither a positive nor negative improvement. Due to the statistical approach in the response group definition, which assured equally sized groups, the low and moderate response group also included subjects with negative Δv10.

Independent of the response group, all subjects were also categorized in three equally sized groups based on their marathon performance times using the 33.3rd and 66.6th percentile. For our data set, the 33.3rd and 66.6th percentiles referred to marathon performance times of 3 h 44' and 4 h 14', respectively. Based on those values we assigned each subject to a fast, medium and slow marathon performance group.

2.3. Evaluation

The evaluation consisted of three parts. Firstly, we demonstrate plausibility of the data set by reproducing known distributions and trends from literature as recommended by Hicks et al. (2019) for large unsupervised data sets. Plausibility was analyzed by plotting histograms for marathon performance times Tmp, maximum training heart rate hrmax, training improvement Δv10 and a regression plot relating marathon average performance velocity vmp to the best 10 km velocity v10.

Secondly, mean training velocity and mean heart rate throughout the training process were analyzed to evaluate Δv10 as a reasonable surrogate to measure training response for each response group. Plausibility was assumed when normalized mean velocity Δv¯ between tb1 and tb4 increases systematically across response groups without observing a difference in normalized mean heart rate Δhr¯.

   Δv¯=v¯tb4-v¯tb1Δhr¯=hr¯tb4-hr¯tb1    (13)

Differences in Δv¯ and Δhr¯ between response groups were analyzed using a one-way analysis of variance (ANOVA).

Lastly, means and standard deviations were derived for training parameters describing the training volume. These parameters are total distance D, total training duration T, total number of workouts W and number of workouts longer than 90 min IT90 or 15 km ID15 for the complete training period of 16 weeks. Additionally, means and standard deviations were derived for the training intensity parameters describing the share of time in the LIT, threshold, and HIT zone. Finally, the performance indicators relative mean velocity v¯, best 10 km velocity v10 and relative mean heart rate hr¯ were calculated.

Differences in the training characteristics between the response and marathon performance time groups were analyzed as follows: We computed a two-way ANOVA with the training parameter being the dependent variable and the response and marathon groups being the independent variables. We excluded W, IT90 and ID15 of the ANOVAs, because the values of those training parameters were not continuous. For the intensity parameters LIT, threshold and HIT, we analyzed differences in the training process over time by computing repeated measure ANOVAs for the three training zones over the four training blocks. For all ANOVAs, we report partial η2 effect sizes if the p-values showed statistical significance with a significance level of α < 0.05. All statistical tests in this work were conducted using the Python package Pingouin (Vallat, 2018).

3. Results

3.1. Plausibility of the Data Set

Figure 2 depicts the results for the plausibility of the data set. The distribution of the marathon performance reached from 2.5 up to 6 h. We noticed distinct peaks at the full and half hour marks (Figure 2A). The histogram of the maximum training heart rate shows normally distributed values between 160 and 220 bpm (Figure 2B). A high correlation (Pearson's r = 0.77) was found between marathon average velocity and the overall best average 10 km velocity detected within the 16 weeks leading up to the marathon (Figure 2C). Lastly, a sorted distribution of Δv10 is presented in Figure 2D. Values of Δv10 ranged between −1 and 2 m/s indicating a negative or no improvement in less than a third of the population.


Figure 2. Validation of data set. (A) Distribution of marathon performance time Tmp. (B) Distribution of maximum training heart rate hrmax. (C) Visualization of the correlation between best 10 km velocity v10 in the complete training period and marathon performance velocity vmp. The blue dots indicate the individual subjects, the green line the linear regression function. (D) Adaptive potential for improvement of best 10 km velocity Δv10.

3.2. Evaluation of Response Groups

Figure 3 depicts the verification of the response group definition. Subjects in the high response group, who showed the highest improvements in Δv10, also showed the highest improvement in Δv¯, while slightly decreasing their mean heart rate. We found a large effect size for the differences of Δv¯2 = 0.136) and a small effect sizes for difference of Δhr¯2 = 0.001) between the response groups.


Figure 3. Visualization of the difference between tb4 and tb1 of (A) the normalized mean velocity and (B) the normalized mean heart rate for all subjects in the three response groups. The velocity values were normalized by the marathon performance velocity vmp and the heart rate by the maximum training heart rate hrmax.

3.3. Evaluation of Training Characteristics

Table 1 lists the mean values and standard deviations of the training parameters for subjects in the different response and marathon performance groups over all 16 weeks before the marathon. Besides, the effect sizes of the two-way ANOVA (response group, marathon performance group) for the main effects are reported in case of statistical significance (α < 0.05). We did not report effect sizes for the interaction effects, because they were not statistically significant. The results show small effect sizes for the differences between the response groups and higher effect sizes for the differences between the marathon performance groups. Our approach to categorizing subjects into response groups and marathon performance groups yielded a higher number of subjects with a fast marathon performance time in the high response group and in contrast a higher number of subjects with a slow marathon performance time in the low response group.


Table 1. Mean and standard deviation of the training parameters for the 16 week training process.

Figure 4 depicts the share of time spent in the three intensity zones during the four training blocks for the subjects in the different response and marathon performance groups. It shows differences in the intensity distributions between slow, medium, and fast runners. We observe an increasing share of training time in the LIT zone from the slow to the fast marathon group. Within the marathon performance group, the overall amount of time spent in the individual zones remains constant.


Figure 4. Visualization of the share of time spent in the intensity zones for (A) slow marathon performances, (B) medium marathon performances, and (C) the fast marathon performances. For each marathon performance group, we provide three plots showing the share of time spent in the three zones for the low response, moderate response, and high response group. The individual boxes in each plot visualize the IQR within the training blocks. The black horizontal lines within the boxes indicate the median. The whiskers extend to 1.5·IQR. We computed repeated measure ANOVAs for each response and marathon performance category for each intensity zone. The asterisks indicate the effect size of the results: *0.01 ≤ η2 <0.05, **0.05 ≤ η2 <0.12, ***η2≥0.12.

However, differences in time spent in the intensity zones between the four training blocks were found. Especially subjects allocated in the high response group decreased the time in the LIT zone throughout the training process, while increasing the share of time in the threshold and HIT zone. This is underlined by the results of the repeated-measures ANOVA for each combination of response and marathon time category in each zone over the training blocks. In Figure 4, the effect sizes of the statistical tests are indicated by asterisks. Subjects allocated in the high response group revealed the highest effect sizes for differences of time spent in the three intensity zones between the four training blocks. Differences in training volume parameters between the four training blocks were also analyzed but did not show any significant differences between the three response groups (Figure A2).

4. Discussion

In this study, we performed a large-scale retrospective data analysis of runners' training in the 16 weeks leading up to a marathon. The aim of the analysis was to evaluate differences in training characteristics between different response and marathon performance groups. The data used for the analysis were originated by members of the Runtastic fitness app who used portable sensors to track their training progress. From the initial data set of 14,773 marathon finishers only 6,771 subjects remained after applying filters to improve data quality. In particular, the filter ensuring that the subjects performed the 10 km effort in tb1 and tb4 with an average heart rate > 0.8 · hrmax reduced the number of subjects by 6,845. We believe that this drastic reduction of more than 50 % was necessary to ensure a conclusive analysis.

4.1. Plausibility of the Data Set

By reproducing known values and trends from literature as suggested by Hicks et al. (2019), we could verify that our data set can be used for the analysis of differences in training leading up to a marathon. The distribution of marathon performance times is similar to the one presented by Allen et al. (2017), including the peaks at the full and half hour marks. Thus, even though the data query only required a workout between 41 and 43 km, the marathon performance times indicate that the workouts were actual marathon races. This assumption is supported by the fact, that 98.6% of the marathon workouts were performed on the weekend. The distribution of the maximum training heart rate hrmax shows realistic results similar to data observed by others (Roecker et al., 2002; Sarzynski et al., 2013), who determined maximum heart rates using laboratory exercise tests. Thus, we believe that the maximum training heart rate hrmax also reflects the actual maximum heart rate well.

Strong correlations between average marathon velocity and average 10 km velocity have been reported by others (Karp, 2007; Tanda, 2011) and are verified by our data. The sorted values for Δv10 show a heterogeneity in response to training. In comparison to the findings from Bouchard and Rankinen (2001), the portion of the population who showed a negative or no improvement in our investigation was higher. We believe that the higher portion was due to the unsupervised nature of the data as well as the low threshold of > 0.8 · hrmax we set to verify the best 10 km performances. However, increasing the threshold of hrmax to elevate the cardiopulmonary effort for the best 10 km velocities did not change the proportion of training responses.

In comparison to supervised studies from Gordon et al. (2017) and Hagan et al. (1987), we observed lower weekly mean values in number of workouts, total training duration, and total distance. However, reduced mean values in training volume have also been shown in other unsupervised investigations (Leyk et al., 2009; Smyth and Muniz-Pumares, 2020). Lower training volumes might be caused by the heterogeneous nature of the larger data set itself.

4.2. Evaluation of Response Groups

We introduced an approach to assess physical fitness based on the best 10 km velocity v10 that was accompanied by a heart rate > 0.8 · hrmax. We classified three equally large response groups based on observed changes in the average 10 km velocity in tb1 and tb4. The idea of frequently monitoring typical training sessions to evaluate the response to training has already been proposed by Boullosa et al. (2020) and appears very practical. This is especially the case when data from recreational runners are analyzed, where laboratory fitness assessments are usually not part of the individual training routine. The 10 km velocity was chosen due to its high correlation to the marathon average velocity (Karp, 2007; Tanda, 2011). Therefore, we assume that an improvement of v10 should also positively influence the marathon performance velocity vmp.

In addition, a systematic increase in mean normalized running velocity was found when comparing the three response groups from low to high response while no systematical differences in mean normalized heart rate were present. This provides further evidence that in general Δv10 likely reflects an improved physical fitness, even though the cause for the improvement may vary between individuals (e.g., improvement due to following a specific training structure with fast runs at the end of the 16 weeks training period). Ultimately, the fact that there were more subjects with a fast marathon performance time allocated in the high response group gives final confirmation that our approach to classify the three response groups based on Δv10 is reasonable.

4.3. Evaluation of Training Characteristics

The evaluation of training characteristics between marathon performance groups revealed differences with medium to large effect sizes. The mean values of all parameters describing the training volume (D, T, W, IT90, ID15) are systematically higher for the faster marathon performance time group. Similar relationships were also reported elsewhere (Hagan et al., 1987; Tanda, 2011; Gordon et al., 2017). In accordance with others, our results also demonstrate that polarized training with maximized volumes below the aimed marathon velocity in the LIT zone yield better marathon performances (Seiler and Tønnessen, 2009). While slow marathon performance times were associated with the largest shares of training time in the threshold zone, fast marathon finishers spend on average more than 60% of their training time in the LIT zone below their average marathon velocity. The larger shares in the threshold zone for the medium and slow marathon groups might be due to the fact that recreational runners cannot control intensity well and tend to run too fast even for prescribed training plans (Foster et al., 2001).

The mean training parameters in Table 1 showed no differences between the response groups (all η2 < 0.012). This implicates that high training volumes do not influence the response to training in general. This should be of interest to novice runners, who are at higher risk to be injured from too high training loads (Buist et al., 2010; Videbæk et al., 2015). Nevertheless, the response groups differed regarding the shares of time spent in the three intensity zones throughout the four consecutive training blocks. Independent of the marathon performance time, we observed strong effect sizes for decreasing duration in the LIT zone across the four training blocks for subjects in the high response group. While this observation of course is partly a result of our definition of the response groups, the analysis demonstrates that those subjects who started to train at very low velocities and continuously increased their training velocity up to the actual marathon velocity throughout the 16 weeks responded to the highest extent, leading up to at least an average (<4 h 14') or even a fast marathon time (<3 h 44′).

4.4. Limitations

Despite all the filters applied to improve data quality, a study with unsupervised data from fitness apps cannot be as controlled as a supervised study. For our investigation, we are not able to guarantee that all subjects logged and uploaded all physical activities which could have influenced their 10 km or marathon performance. Contextual information affecting the performance of runners like humidity and temperature during a workout or an injury of a runner were not available. The results are also influenced by the varying accuracy of the different portable sensors recreational runners use to track their workouts. Running velocity was not adjusted to the elevation profile of the running route, which neglects the impact of inclines and declines to training load. Additionally, phenomenons like “hitting the wall” during a marathon (Buman et al., 2008) were not controlled for, which might cause subjects to be classified in a worse marathon performance category despite a good training process. We acknowledge that these limitations might affect the results of individuals in our analysis. However, we believe that the number of those individuals is low compared to the overall number of subjects and that the effect for most of the limitations are equally distributed over the response and marathon performance groups. Thus, differences between or within groups should not be affected. Nevertheless, a detailed analysis of the influence of those limiting factors on the response to training and the marathon performance shall be conducted in future work.

5. Conclusion

In this work, we retrospectively analyzed 16 weeks of training for 6,771 marathon finishers. We showed that unsupervised data recorded by portable sensors are suitable for performing such an analysis by reproducing known trends and values from literature. Our analysis demonstrated that a combination of maximized training volume at velocities below an individual's marathon velocity, a continuous increase in average running velocity along the complete training period up to final average marathon velocity and high velocity runs (> 1.2 · vmp) not accounting for more than 5% of the overall training volume was associated with a higher Δv10 which likely benefited the marathon performance as well. We also demonstrated that a high training volume does not generally influence the response to training.

The large variances in both the training characteristics and the corresponding responses indicate that the most effective training plan for an individual has yet to be developed. However, coaches and athletes also have to acknowledge that, even with the best and most effective training plan, the potential to improve performance is limited and partially genetically determined.

This study also showed that data recorded by portable sensors and stored on various fitness platforms are an extremely valuable source for investigating different training regimes retrospectively on large sample sizes. Especially for longitudinal investigations, the limitation of low sample sizes can be overcome. This might enable sport scientists and training physiologists to draw more generalizable conclusions in the future.

Data Availability Statement

The data set originated from the Runtastic data base. We agreed to not publish the raw data, but only aggregated results. Requests to access the aggregated results should be directed to

Ethics Statement

The studies involving human participants were reviewed and approved by Ethics committee FAU Erlangen-Nürnberg. Written informed consent from the participants' legal guardian/next of kin was not required to participate in this study in accordance with the national legislation and the institutional requirements.

Author Contributions

MZ designed the study, implemented the methodology, interpreted the results, and wrote the manuscript. CH interpreted the results, wrote, and reviewed the manuscript. BD designed the study and reviewed the manuscript. SD exported and anonymized the data set and reviewed the manuscript. KR interpreted the results and reviewed the manuscript. BE designed the study, interpreted the results, and reviewed the manuscript. All authors have read and approved the final version of the manuscript and agree with the order of presentation of the authors.

Conflict of Interest

CH and BD were employed by the adidas AG. SD was employed by the Runtastic GmbH.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.


MZ gratefully acknowledges the support of the Association of German Engineers (VDI/VDE) within the Connected Movement research project. BE gratefully acknowledges the support of the German Research Foundation (DFG) within the framework of the Heisenberg professorship program (grant 526 number ES 434/8-1).


Allen, E. J., Dechow, P. M., Pope, D. G., and Wu, G. (2017). Reference-dependent preferences: evidence from marathon runners. Manage. Sci. 63, 1657–1672. doi: 10.1287/mnsc.2015.2417

CrossRef Full Text | Google Scholar

Altini, M., and Amft, O. (2018). “Estimating running performance combining non-invasive physiological measurements and training patterns in free-living,” in 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (Honolulu, HI), 2845–2848. doi: 10.1109/EMBC.2018.8512924

PubMed Abstract | CrossRef Full Text | Google Scholar

Berndsen, J., Smyth, B., and Lawlor, A. (2020). “Mining marathon training data to generate useful user profiles,” in International Workshop on Machine Learning and Data Mining for Sports Analytics (Cham: Springer), 113–125. doi: 10.1007/978-3-030-64912-8_10

CrossRef Full Text | Google Scholar

Billat, V. L., Demarle, A., Slawinski, J., Paiva, M., and Koralsztein, J.-P. (2001). Physical and training characteristics of top-class marathon runners. Med. Sci. Sports Exerc. 33, 2089–2097. doi: 10.1097/00005768-200112000-00018

PubMed Abstract | CrossRef Full Text | Google Scholar

Bouchard, C., and Rankinen, T. (2001). Individual differences in response to regular physical activity. Med. Sci. Sports Exerc. 33, 446–451. doi: 10.1097/00005768-200106001-00013

PubMed Abstract | CrossRef Full Text | Google Scholar

Boullosa, D., Esteve-Lanao, J., Casado, A., Peyré-Tartaruga, L. A., Gomes da Rosa, R., and Del Coso, J. (2020). Factors affecting training and physical performance in recreational endurance runners. Sports 8:35. doi: 10.3390/sports8030035

PubMed Abstract | CrossRef Full Text | Google Scholar

Buist, I., Bredeweg, S. W., Bessem, B., Van Mechelen, W., Lemmink, K. A., and Diercks, R. L. (2010). Incidence and risk factors of running-related injuries during preparation for a 4-mile recreational running event. Br. J. Sports Med. 44, 598–604. doi: 10.1136/bjsm.2007.044677

PubMed Abstract | CrossRef Full Text | Google Scholar

Buman, M. P., Omli, J. W., Giacobbi, P. R. Jr, and Brewer, B. W. (2008). Experiences and coping responses of “hitting the wall” for recreational marathon runners. J. Appl. Sport Psychol. 20, 282–300. doi: 10.1080/10413200802078267

CrossRef Full Text | Google Scholar

Emig, T., and Peltonen, J. (2020). Human running performance from real-world big data. Nat. Commun. 11, 1–9. doi: 10.1038/s41467-020-18737-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Feely, C., Caulfield, B., Lawlor, A., and Smyth, B. (2020). “Providing explainable race-time predictions and training plan recommendations to marathon runners,” in Fourteenth ACM Conference on Recommender Systems, 539–544. doi: 10.1145/3383313.3412220

CrossRef Full Text | Google Scholar

Foster, J. P., Carl, H., Kara, M., Esten, P. L., and Brice, G. (2001). Differences in perceptions of training by coaches and athletes. South Afr. J. Sports Med. 8, 3–7.

GeoPy (2020). GeoPy (version 1.22.0). Available online at: (accessed February 27, 2021).

Gordon, D., Wightman, S., Basevitch, I., Johnstone, J., Espejo-Sanchez, C., Beckford, C., et al. (2017). Physiological and training characteristics of recreational marathon runners. Open Access J. Sports Med. 8:231. doi: 10.2147/OAJSM.S141657

PubMed Abstract | CrossRef Full Text | Google Scholar

Hagan, R., Upton, S., Duncan, J., and Gettman, L. (1987). Marathon performance in relation to maximal aerobic power and training indices in female distance runners. Br. J. Sports Med. 21, 3–7. doi: 10.1136/bjsm.21.1.3

PubMed Abstract | CrossRef Full Text | Google Scholar

Hicks, J. L., Althoff, T., Kuhar, P., Bostjancic, B., King, A. C., Leskovec, J., et al. (2019). Best practices for analyzing large-scale health data from wearables and smartphone apps. NPJ Digit. Med. 2:45. doi: 10.1038/s41746-019-0121-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Hydren, J. R., and Cohen, B. S. (2015). Current scientific evidence for a polarized cardiovascular endurance training model. J. Strength Condit. Res. 29, 3523–3530. doi: 10.1519/JSC.0000000000001197

PubMed Abstract | CrossRef Full Text | Google Scholar

Karp, J. R. (2007). Training characteristics of qualifiers for the us olympic marathon trials. Int. J. Sports Physiol. Perform. 2, 72–92. doi: 10.1123/ijspp.2.1.72

CrossRef Full Text | Google Scholar

Kenneally, M., Casado, A., and Santos-Concejero, J. (2018). The effect of periodization and training intensity distribution on middle-and long-distance running performance: a systematic review. Int. J. Sports Physiol. Perform. 13, 1114–1121. doi: 10.1123/ijspp.2017-0327

PubMed Abstract | CrossRef Full Text | Google Scholar

Knechtle, B., Di Gangi, S., Rüst, C. A., Rosemann, T., and Nikolaidis, P. T. (2018). Men's participation and performance in the Boston marathon from 1897 to 2017. Int. J. Sports Med. 39, 1018–1027. doi: 10.1055/a-0660-0061

PubMed Abstract | CrossRef Full Text | Google Scholar

Leyk, D., Erley, O., Gorges, W., Ridder, D., Rüther, T., Wunderlich, M., et al. (2009). Performance, training and lifestyle parameters of marathon runners aged 20–80 years: results of the pace-study. Int. J. Sports Med. 30, 360–365. doi: 10.1055/s-0028-1105935

PubMed Abstract | CrossRef Full Text | Google Scholar

Meyer, T., Lucia, A., and Earnest, C. (2005). A conceptual framework for performance diagnosis and training prescription from submaximal gas exchange parameters-theory and application. Int. J. Sports Med. 26, 1–11. doi: 10.1055/s-2004-830514

PubMed Abstract | CrossRef Full Text | Google Scholar

Roecker, K. (2008). Streit um des kaisers bart: welche laktatschwelle ist die beste? Deut. Zeitsch. Sportmed. 59:303. Available online at:

Roecker, K., Niess, A. M., Horstmann, T., Striegel, H., Mayer, F., and Dickhuth, H.-H. (2002). Heart rate prescriptions from performance and anthropometrical characteristics. Med. Sci. Sports Exerc. 34, 881–887. doi: 10.1097/00005768-200205000-00024

PubMed Abstract | CrossRef Full Text | Google Scholar

Roecker, K., Schotte, O., Niess, A. M., Horstmann, T., and Dickhuth, H.-H. (1998). Predicting competition performance in long-distance running by means of a treadmill test. Med. Sci. Sports Exerc. 30, 1552–1557. doi: 10.1097/00005768-199810000-00014

PubMed Abstract | CrossRef Full Text | Google Scholar

Rosenblat, M. A., Perrotta, A. S., and Vicenzino, B. (2019). Polarized vs. threshold training intensity distribution on endurance sport performance: a systematic review and meta-analysis of randomized controlled trials. J. Strength Condition. Res. 33, 3491–3500. doi: 10.1519/JSC.0000000000002618

PubMed Abstract | CrossRef Full Text | Google Scholar

Ross, R., Goodpaster, B. H., Koch, L. G., Sarzynski, M. A., Kohrt, W. M., Johannsen, N. M., et al. (2019). Precision exercise medicine: understanding exercise response variability. Br. J. Sports Med. 53, 1141–1153. doi: 10.1136/bjsports-2018-100328

PubMed Abstract | CrossRef Full Text | Google Scholar

Sarzynski, M., Rankinen, T., Earnest, C., Leon, A., Rao, D., Skinner, J., et al. (2013). Measured maximal heart rates compared to commonly used age-based prediction equations in the heritage family study. Am. J. Hum. Biol. 25, 695–701. doi: 10.1002/ajhb.22431

PubMed Abstract | CrossRef Full Text | Google Scholar

Seiler, S., and Tønnessen, E. (2009). Intervals, thresholds, and long slow distance: the role of intensity and duration in endurance training. Sportscience 13, 32–53. Available online at:

Google Scholar

Smyth, B., and Muniz-Pumares, D. (2020). Calculation of critical speed from raw training data in recreational marathon runners. Med. Sci. Sports Exerc. 52, 2637–2645. doi: 10.1249/MSS.0000000000002412

PubMed Abstract | CrossRef Full Text | Google Scholar

Tanda, G. (2011). Prediction of marathon performance time on the basis of training indices. J. Hum. Sport Exerc. 6, 521–520. doi: 10.4100/jhse.2011.63.05

CrossRef Full Text | Google Scholar

Vallat, R. (2018). Pingouin: statistics in python. J. Open Source Softw. 3:1026. doi: 10.21105/joss.01026

CrossRef Full Text | Google Scholar

Videbæk, S., Bueno, A. M., Nielsen, R. O., and Rasmussen, S. (2015). Incidence of running-related injuries per 1000 h of running in different types of runners: a systematic review and meta-analysis. Sports Med. 45, 1017–1026. doi: 10.1007/s40279-015-0333-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Vitti, A., Nikolaidis, P. T., Villiger, E., Onywera, V., and Knechtle, B. (2020). The “New York City marathon”: participation and performance trends of 1.2 m runners during half-century. Res. Sports Med. 28, 121–137. doi: 10.1080/15438627.2019.1586705

CrossRef Full Text | Google Scholar

Zach, S., Xia, Y., Zeev, A., Arnon, M., Choresh, N., and Tenenbaum, G. (2017). Motivation dimensions for running a marathon: a new model emerging from the motivation of marathon scale (moms). J. Sport Health Sci. 6, 302–310. doi: 10.1016/j.jshs.2015.10.003

PubMed Abstract | CrossRef Full Text | Google Scholar

Zinner, C. (2016). “Training aspects of marathon running,” in Marathon Running: Physiology, Psychology, Nutrition and Training Aspects, eds C. Zinner, and B. Sperlich (Cham: Springer), 153–171. doi: 10.1007/978-3-319-29728-6_8

CrossRef Full Text | Google Scholar



Figure A1. Exemplary visualization of (A) duration distribution curve Ptb1[X = Ṽ] and (B) cumulative duration distribution curve Ftb1[X = Ṽ] for the normalized velocity Ṽ for training block 1. The red lines indicate the barriers for the intensity zones defined based on the marathon performance velocity vmp.


Figure A2. Visualization of training parameters over training blocks. (A) Workout duration T, (B) workout distance D, (C) number of workouts W.

Keywords: marathon training, big data, wearables, training response, exercise physiology

Citation: Zrenner M, Heyde C, Duemler B, Dykman S, Roecker K and Eskofier BM (2021) Retrospective Analysis of Training and Its Response in Marathon Finishers Based on Fitness App Data. Front. Physiol. 12:669884. doi: 10.3389/fphys.2021.669884

Received: 19 February 2021; Accepted: 12 April 2021;
Published: 21 May 2021.

Edited by:

Pantelis Theodoros Nikolaidis, University of West Attica, Greece

Reviewed by:

Ivan Cuk, Singidunum University, Serbia
Aonghus Lawlor, University College Dublin, Ireland
Caio Victor Sousa, Northeastern University, United States

Copyright © 2021 Zrenner, Heyde, Duemler, Dykman, Roecker and Eskofier. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Markus Zrenner,

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.