Anthropometric Clusters of Competitive Cyclists and Their Sprint and Endurance Performance

Do athletes specialize toward sports disciplines that are well aligned with their anthropometry? Novel machine-learning algorithms now enable scientists to cluster athletes based on their individual anthropometry while integrating multiple anthropometric dimensions, which may provide new perspectives on anthropometry-dependent sports specialization. We aimed to identify clusters of competitive cyclists based on their individual anthropometry using multiple anthropometric measures, and to evaluate whether athletes with a similar anthropometry also competed in the same cycling discipline. Additionally, we assessed differences in sprint and endurance performance between the anthropometric clusters. Twenty-four nationally and internationally competitive male cyclists were included from sprint, pursuit, and road disciplines. Anthropometry was measured and k-means clustering was performed to divide cyclists into three anthropometric subgroups. Sprint performance (Wingate 1-s peak power, squat-jump mean power) and endurance performance (mean power during a 15 km time trial, V˙O2peak) were obtained. K-means clustering assigned sprinters to a mesomorphic cluster (endo-, meso-, and ectomorphy were 2.8, 5.0, and 2.4; n = 6). Pursuit and road cyclists were distributed over a short meso-ectomorphic cluster (1.6, 3.8, and 3.9; n = 9) and tall meso-ectomorphic cluster (1.5, 3.6, and 4.0; n = 9), the former consisting of significantly lighter, shorter, and smaller cyclists (p < 0.05). The mesomorphic cluster demonstrated higher sprint performance (p < 0.05), whereas the meso-ectomorphic clusters established higher endurance performance (p < 0.001). Overall, endurance performance was associated with lean ectomorph cyclists with small girths and small frontal area (p < 0.05), and sprint performance related to cyclists with larger skinfolds, larger girths, and low frontal area per body mass (p < 0.05). Clustering optimization revealed a mesomorphic cluster of sprinters with high sprint performance and short and tall meso-ectomorphic clusters of pursuit and road cyclists with high endurance performance. Anthropometry-dependent specialization was partially confirmed, as the clustering algorithm distinguished short and tall endurance-type cyclists (matching the anthropometry of all-terrain and flat-terrain road cyclists) rather than pursuit and road cyclists. Machine-learning algorithms therefore provide new insights in how athletes match their sports discipline with their individual anthropometry.

Do athletes specialize toward sports disciplines that are well aligned with their anthropometry? Novel machine-learning algorithms now enable scientists to cluster athletes based on their individual anthropometry while integrating multiple anthropometric dimensions, which may provide new perspectives on anthropometrydependent sports specialization. We aimed to identify clusters of competitive cyclists based on their individual anthropometry using multiple anthropometric measures, and to evaluate whether athletes with a similar anthropometry also competed in the same cycling discipline. Additionally, we assessed differences in sprint and endurance performance between the anthropometric clusters. Twenty-four nationally and internationally competitive male cyclists were included from sprint, pursuit, and road disciplines. Anthropometry was measured and k-means clustering was performed to divide cyclists into three anthropometric subgroups. Sprint performance (Wingate 1-s peak power, squat-jump mean power) and endurance performance (mean power during a 15 km time trial,VO 2peak ) were obtained. K-means clustering assigned sprinters to a mesomorphic cluster (endo-, meso-, and ectomorphy were 2.8, 5.0, and 2.4; n = 6). Pursuit and road cyclists were distributed over a short mesoectomorphic cluster (1.6, 3.8, and 3.9; n = 9) and tall meso-ectomorphic cluster (1.5, 3.6, and 4.0; n = 9), the former consisting of significantly lighter, shorter, and smaller cyclists (p < 0.05). The mesomorphic cluster demonstrated higher sprint performance (p < 0.05), whereas the meso-ectomorphic clusters established higher endurance performance (p < 0.001). Overall, endurance performance was associated with lean ectomorph cyclists with small girths and small frontal area (p < 0.05), and sprint performance related to cyclists with larger skinfolds, larger girths, and low frontal area per body mass (p < 0.05). Clustering optimization revealed a mesomorphic cluster of sprinters with high sprint performance and short and tall meso-ectomorphic clusters of pursuit and road cyclists with high endurance performance. Anthropometry-dependent specialization was partially confirmed, as the clustering algorithm distinguished short

INTRODUCTION
The athlete's physique is important for success in many sports (Norton et al., 1996). Even though there are many determinants that contribute to the performance of athletes, most sports require a specific range in body size and shape to compete at the top level (Norton and Olds, 2001). Consequently, athletes tend to specialize toward sports disciplines that are well aligned with their anthropometry (Foley et al., 1989). Physical comparisons of athletic champions support this anthropometry-dependent specialization, revealing large anthropometric differences between sports disciplines and a much more similar physique within sports disciplines, especially at higher levels of competition (Carter, 1970). It should be noted, however, that anthropometric measures are commonly reported for groups of a specific sports discipline (Carter, 1970;Norton and Olds, 2001), focusing on group averages and standard deviations (Norton and Olds, 2001) or distributions of a single anthropometric measure within these groups (Carter, 1970). What remains to be elucidated is whether grouping of athletes based on similarities in their individual anthropometry using multiple anthropometric dimensions, and subsequently evaluating their sports discipline, will provide new insights in anthropometrydependent specialization.
In cycling, for example, athletes specialize into the disciplines sprint, pursuit, uphill, time trial, flat-terrain, and all-terrain, each demonstrating distinct anthropometric characteristics (Foley et al., 1989;Padilla et al., 1999;Lucía et al., 2000;Mujika and Padilla, 2001;Menaspà et al., 2012). For instance, road climbers pursue a low body mass to enhance their uphill performance, as body mass increases the resistance from gravity (Mujika and Padilla, 2001). Flatterrain cyclists reduce their frontal area per body mass to improve performance during flat stages, minimizing relative energy costs to aerodynamic resistance (Mujika and Padilla, 2001). The diversity in body shapes is represented by the somatotypes, describing a predisposition toward specific forms of physical activities (Gabriel and Zierath, 2017). Mesomorph body shapes are beneficial for strength and speed activities, endomorphy contributes to strength and maximal force, whereas ectomorphy is advantageous for endurance and uphill performance (Gabriel and Zierath, 2017). Accordingly, sprinttype cyclists were found to have high mesomorphy, whereas endurance-type cyclists demonstrated higher ectomorphy and lower mesomorphy (Foley et al., 1989). Also in cycling, these anthropometric measures are commonly reported in averages and standard deviations for predefined groups of a specific sports specialization (e.g., Foley et al., 1989;Padilla et al., 1999;Lucía et al., 2000;Mujika and Padilla, 2001;Menaspà et al., 2012). However, these predefined groups may still include individual athletes with a dissimilar anthropometry, which would affect the group's average anthropometry and confound the assessment of anthropometry-dependent sports specialization.
Alternatively, one could identify subgroups of athletes solely based on their individual anthropometry, and independent of their predefined sports discipline. Uncovered groups of athletes with similar anthropometry and subsequent evaluation of their actual sports disciplines will reveal whether there is an unbiased interdependence between anthropometry and sports discipline. Over the last decade, artificial intelligence has been introduced into sports science, providing new opportunities for data analytics in sports. As part of artificial intelligence, machine-learning techniques now enable us to identify subgroups of athletes with similar anthropometry, using an integrative approach with multiple anthropometric dimensions. Unsupervised machine-learning techniques, like k-means clustering, help researchers to discover "hidden" patterns in their data and to use these patterns to classify athletes such that athletes within one subgroup are anthropometrically similar to each other, but different from athletes in another subgroup. With the implementation of such data science techniques, it is now possible to provide a new and unbiased perspective on anthropometry-dependent sports specialization. To our knowledge, it is currently unknown whether the athletes in an anthropometric cluster that is identified by similarities in individual anthropometry using multiple anthropometric measures will also compete in the same sports discipline, which would confirm anthropometry-dependent sports specialization.
In addition to the athlete's sports specialization, the athlete's physical performance will help to provide a more detailed comprehension of anthropometry-dependent sports specialization. Differences in sprint and endurance performance are of interest, as it has been highlighted that performance and physiological parameters should be interpreted in the context of the athlete's individual anthropometry (Mujika and Padilla, 2001). The relationships between anthropometric measures and athletic performance have been assessed in various sports (Chaouachi et al., 2009;Knechtle et al., 2011;Brocherie et al., 2014). Endurance performance was negatively related to sum of skinfolds in male triathletes (Knechtle et al., 2011); however, others found no relationship between anthropometric measures and track cycling performance within subgroups of cyclists (McLean and Parker, 1989). What remains to be elucidated is how anthropometry relates to both sprint and endurance performance in one heterogeneous group of competitive sprint, pursuit, and road cyclists. Anthropometric clustering using unsupervised machine learning is expected to provide a new perspective on the interrelationships between anthropometry, sports specialization, and athletic performance.
The aim of this study was to identify clusters based on individual anthropometry of sprint, pursuit, and road cyclists using multiple anthropometric measures, and to evaluate whether athletes with a similar anthropometry also competed in the same cycling discipline. Additionally, we aimed to assess differences in the anthropometric clusters' sprint and endurance performance. Moreover, relationships between anthropometric characteristics and both sprint and endurance performance were assessed in all cyclists. We hypothesized that clustering based on anthropometry will reveal separate subgroups for sprint, pursuit, and road cyclists, confirming anthropometry-dependent specialization in cycling.

Subjects
Twenty-four male cyclists from sprint, pursuit, and road disciplines volunteered to participate in this study. Cyclists competed at the national, international, or Olympic level. Prior to participation, subjects were familiarized with the experimental procedures and subjects provided written informed consent. The study was approved by the medical ethics committee of the VU medical center, Amsterdam, Netherlands (NL49060.029.14) and conducted according to the principles of the Declaration of Helsinki.

Design
In this observational study, subjects visited the lab three times. During the first visit, anthropometry was measured and subjects performed a maximal incremental exercise test. The second visit consisted of a vertical squat-jump test and 15-km cycling time trial. In the third visit, subjects performed a 30s Wingate test. Before each visit, subjects were instructed to avoid strenuous exercise and alcohol consumption within the last 24 h and to consume no caffeine or food during the last 3 h before each test. Cycle ergometer handlebar and saddle height were adjusted individually and subjects used their own clipless pedals.

Anthropometry
Measurements of body mass, stature, skinfolds, girths, and breadths were obtained by the same investigator in accordance with the International Society for the Advancement of Kinanthropometry (ISAK) level 1 protocol (Marfell-Jones et al., 2006). All measurements were taken on the right side of the subject's body. Skinfolds were obtained with a Harpenden skinfold caliper (Baty International, West Sussex, United Kingdom). Breadths were measured with a Cescorf sliding bone caliper, after applying appropriate pressure to minimize the influence of soft tissue. Measures were obtained in duplicate and mean values were used, or in triplicate using median values [i.e., if the first and second measure differed >5% for skinfolds or >1% for other anthropometric measures (Marfell-Jones et al., 2006)]. Somatotypes were determined according to the Heath-Carter model (Heath and Carter, 1967). Body surface area was determined from weight and height according to Du Bois and du Bois (1916), body fat percentage was derived from the sum of four skinfolds (Durnin and Womersley, 1974), and percentage skeletal muscle mass was estimated using an anthropometric regression model (Lee et al., 2000).

Sprint Performance
Sprint performance was assessed by the 1-s peak power output (PO peak ) during a 30-s Wingate test on a bicycle ergometer (Monark 894 E Peak Bike, Monark Exercise AB, Vansbro, Sweden), as described elsewhere (Van der Zwaard et al., 2018). The test was preceded by a 10-min warm-up (brake weight 1.5 kg) with three short accelerations. Workload was set at 10% body mass and was automatically applied to the flywheel after two revolutions. Subjects were instructed to remain seated and received strong verbal encouragement throughout the test.
Cyclists also performed a vertical squat-jump test. Subjects were instructed to bend their knees to a 90 • knee angle and hold this position for 3 s before push-off. Jumps were performed without arm swing, with hands placed above the hips. Cyclists performed four jumps, with 2-min rest in-between consecutive jumps. A fifth jump was performed if the fourth jump was >5% higher than the previous jumps. An inertial measurement unit (MPU-9150, ±16.0 g, 500 Hz, Invensense, San Jose, CA, United States) was firmly secured to the lower back, and was used to calculate average jump power during push-off. Vertically directed acceleration was multiplied by body mass to derive vertical force, which was multiplied by vertical velocity (i.e., integrated acceleration) to obtain the vertical power production. Subsequently, power production was averaged over the entire push-off phase, from the initial increase in vertical acceleration until takeoff. To ensure that analyzed jumps were actual squat jumps, the jumps with a countermovement were excluded. The highest squat jump was used for analysis.

Endurance Performance
Endurance performance was obtained from a 15-km time trial on an electronically braked bicycle ergometer (VU-MTO, Amsterdam, Netherlands), as described previously (Van der Zwaard et al., 2018). Gear ratio could be altered during the time trial. Mean power output was determined from torque and cadence measurements, sampled at 100 Hz and averaged over the duration of the time trial (PO TT ).
Subjects also performed a maximal incremental exercise test to obtain peak oxygen uptake (VO 2peak ), as described elsewhere (Van der Zwaard et al., 2016).VO 2 was recorded breath-by-breath using open circuit spirometry (Cosmed Quark CPET, Cosmed S.R.L., Rome, Italy). Before every test, volume transducer and gas analyzer were calibrated according to manufacturer's instructions.VO 2 data were filtered for extreme values andVO 2peak was defined as the highest average 30-sVO 2 value.

Unsupervised Machine Learning
K-means clustering is a popular unsupervised machine-learning algorithm that divides a data set into subgroups based on patterns in the data. Here, we performed k-means clustering with the Hartigan-Wong algorithm (Hartigan and Wong, 1979) and divided cyclists into subgroups based on anthropometric measures of body shape (meso-, ecto-, and endomorphy), body size (height, weight, and body surface area), and body composition (sum of eight skinfolds, body fat percentage, and skeletal muscle mass percentage). Optimization was performed for maximal compactness of clusters by minimizing the total within-cluster variation over all k clusters (Eq. 1). Initially, the algorithm provides a random cluster center for all k clusters. Then, observations are assigned to the nearest cluster center based on the shortest Euclidean distance, and after all data points have been assigned, the cluster centers are recalculated. The "cluster assignment" and "cluster center update" steps are iterated until the cluster assignment stops changing or the maximum number of iterations is reached.
Total within-cluster variation is minimized by minimizing the sum of squared error in Euclidean distance between individual data points and cluster centers. Where x i is the individual data point belonging to cluster C k , µ k is the center of cluster C k , || x i − µ k || is the Euclidean distance between the individual data point and cluster center, and K is the total number of clusters, which must be specified before clustering.
K-means clustering was performed using the stats package in R. Before clustering, anthropometric measures were standardized into Z-scores, removing differences in measurement scales between variables. Using this input data, the appropriate number of clusters was determined by the Elbow Criterion, Bayesian Information Criterion from the mclust package (Scrucca et al., 2016), and cluster validity criterions from the NbClust package (Charrad et al., 2014), and was found to be three clusters. Maximum number of iterations was set at 50 (though clusters were obtained within three iterations). Moreover, optimization was performed using 25 random starting partitions as initial cluster centers to enhance cluster stability.

Statistical Analysis
All data are presented as individual values or as mean ± SD. All performance measures were expressed relative to the body mass of cyclists. One-way ANOVA tests or non-parametric Kruskal-Wallis tests were used to detect group-differences between anthropometric clusters, and least significant difference post hoc tests or Mann-Whitney tests were used to localize differences. Pearson or Spearman correlations were used to assess relationships between anthropometry and physical performance. Differences were considered statistically significant if p < 0.05. Tendencies were reported if p < 0.10.

Anthropometric Clusters
K-means clustering divided cyclists into three anthropometric clusters based on individual differences in body shape, size, and composition (Figure 1 and Table 1). All sprint cyclists were allocated to a mesomorphic cluster (endo-, meso-, and ectomorphy were 2.8, 5.0, and 2.4, respectively; n = 6). Pursuit and road cyclists were distributed over a short meso-ectomorphic cluster (1.6, 3.8, and 3.9; n = 9), and tall meso-ectomorphic cluster (1.5, 3.6, and 4.0; n = 9). The somatochart of these subgroups is displayed in Figure 2. The mesomorphic cluster consisted of heavier cyclists with larger girths, but who were not as lean as cyclists of other clusters. These sprinters also had a lower frontal area per body mass. The short meso-ectomorphic cluster included cyclists that were significantly lighter, shorter, and smaller compared to cyclists in the tall meso-ectomorphic cluster, demonstrating lower thigh and shank lengths, smaller femur breadths, and smaller girths, but a higher percentage skeletal muscle mass. Pursuit and road cyclists were not allocated to different clusters, but were evenly distributed over the short and tall meso-ectomorphic clusters.

Sprint and Endurance Performance of Clusters
Physical performance of the anthropometric clusters is presented in Figure 3. The mesomorphic cluster showed a higher sprint performance compared to the short and tall meso-ectomorphic FIGURE 1 | Cluster plot with a two-dimensional representation of the three anthropometric clusters. Clusters are displayed in the two most important dimensions, which represent a combination of the anthropometric characteristics and were obtained after dimension reduction of our higher-dimensional data set [i.e., dimensions explaining 85% of the variation in our data set; for more details, see Pison et al. (1999)]. Individual values, cluster centers, and spanning ellipses of clusters are presented for the short meso-ectomorph cluster (1, circles), the tall meso-ectomorph cluster (2, triangles), and mesomorph cluster (3, pluses). Values are mean ± SD. * significantly different from mesomorphic cluster, p < 0.05. # significantly different from tall meso-ectomorphic cluster, p < 0.05. * and # indicate tendencies, p < 0.10. BMI, body mass index; BSA, body surface area; FA, frontal area.
clusters (PO peak : p = 0.023 and p = 0.022, respectively; PO jump : p = 0.001 and p < 0.001) and lower endurance performance (PO TT : p < 0.001 and p < 0.001;VO 2peak : p < 0.001 and p < 0.001). Compared to the tall subgroup, the short mesoectomorphic cluster demonstrated similar values for PO peak (p = 0.987) and PO TT (p = 0.211), but a higher PO jump (p = 0.033) and tendency for a higherVO 2peak (p = 0.056). In sum, the mesomorphic cluster showed a higher sprint performance, whereas the meso-ectomorphic groups demonstrated a better endurance performance. were both associated with lean cyclists with small girths, a small frontal area, high ectomorphy, and low meso-and endomorphy. High PO peak and PO jump related to cyclists with larger skinfolds, larger girths, and a low frontal area and body surface area per body mass, whereas high jumping performance also related to less lean cyclists with a high meso-and endomorphy and low ectomorphy. Thus, anthropometric characteristics of body size, shape, and composition were significantly related to sprint and endurance performance in a group of sprint, pursuit, and road cyclists.

DISCUSSION
This study shows how k-means clustering divided sprint, pursuit, and road cyclists into three distinct anthropometric clusters with differing physical performance. The mesomorphic cluster included all sprinters and demonstrated a higher sprint performance, whereas the short and tall meso-ectomorphic clusters of pursuit and road cyclists presented higher endurance performance. Anthropometric measures were also significantly related to performance. A high endurance performance was associated with a lean ectomorph physique with small girths and a small frontal area, whereas a high sprint performance related to cyclists with larger skinfolds, larger girths, and a low frontal area per body mass.

Anthropometry-Dependent Specialization
Currently, anthropometric characteristics are commonly reported for predefined groups of athletes of a specific sports specialization. However, it is unknown whether a machinelearning approach -grouping athletes based on individual anthropometry using multiple anthropometric dimensions and independent of sports specialization -will reveal clusters of athletes that have a similar anthropometry and compete in the same sports discipline. Using unsupervised machine learning, we uncovered three clusters based on the athletes' anthropometric characteristics. The mesomorphic cluster included all sprinters with a favorable somatotype for strength and speed performance, similar to that of elite [endo-, meso-, and ectomorphy: 2.5, 5.2, and 2.4 (White et al., 1982;McLean and Parker, 1989)] and Olympic track sprinters [1.8, 5.2, and 2.4 (Garay et al., 1974)].
The body size profile of our sprinters was comparable to that of Olympic track sprinters (Craig and Norton, 2001). Nonetheless, our sprinters were not as lean as elite track sprinters, illustrated by their higher sum of skinfolds and endomorphy (Garay et al., 1974;Foley et al., 1989), which may hamper cycling performance due to increased energetic costs to acceleration, rolling friction, and aerodynamic resistance. Thus, all cyclists of the mesomorphic cluster competed in track sprint disciplines and had a similar body size and shape to that of elite track sprinters. The short and tall meso-ectomorphic clusters included pursuit and road cyclists, with somatotypes that favored endurance performance. These results confirm the trend for a higher ectomorphy and lower mesomorphy in more endurance-type cyclists (Garay et al., 1974;Foley et al., 1989;McLean and Parker, 1989). Cyclists in both clusters had a relatively low body fat percentage (∼9%), comparable to that of professional road cyclists (Mujika and Padilla, 2001). This is beneficial for successful performance, as body fat adds to body mass but not to power-producing capabilities (Craig and Norton, 2001). The meso-ectomorphic clusters mainly differed in body size; cyclists in the short cluster were significantly smaller, shorter, and lighter. These cyclists were not necessarily very short (∼180 cm), but shorter than average Dutch males, which are the world's tallest people (Stulp et al., 2015). Smaller cyclists minimize the influence of aerodynamic resistance, giving them a competitive edge on most terrains, specifically during uphill climbing (Padilla et al., 1999;Lucía et al., 2000). Larger cyclists, however, minimize the energy costs to aerodynamic friction per body mass, giving them an advantage on level roads (Mujika and Padilla, 2001). Interestingly, the body size of the short cluster was remarkably similar to that of all-terrain road cyclists and the tall cluster matched the body size of flat-terrain road cyclists (Padilla et al., 1999).
Anthropometric clustering showed that all sprinters were allocated to one cluster, whereas pursuit and road cyclists were not assigned to separate clusters. Our findings demonstrate that it is difficult to distinguish pursuit and road cyclists based on their individual anthropometry, which corresponds to previous literature reporting similar anthropometric characteristics for pursuit and road cyclists (Garay et al., 1974;Foley et al., 1989). Nonetheless, short and tall endurance-type clusters did match the anthropometry of two other cycling specializations, that of all-terrain and flat-terrain road cyclists. Therefore, our clustering results did (partially) confirm existence of anthropometrydependent specialization.

Physical Performance
To gain more insight in how physical performance differs between groups of athletes with similar individual FIGURE 3 | Group-differences in endurance performance (left) and sprint performance (right) were presented for the three anthropometric clusters. Time trial performance (A) andVO 2peak (C) were considered as measures of endurance performance, Wingate peak power (B) and squat jump mean power (D) were taken as measures of sprint performance. Data are presented as mean ± SD. * is significantly different from the mesomorphic cluster (p < 0.05), # is significantly different from the tall meso-ectomorphic cluster (p < 0.05). # indicates a tendency forVO 2peak (p = 0.056). POTT, mean power during a 15-km time trial;VO 2peak , peak oxygen uptake; PO peak , Wingate peak power; PO jump , squat-jump mean power.
anthropometry, we also assessed the sprint and endurance performance of each cluster. To our knowledge, actual differences in sprint and endurance performance between anthropometric clusters have not yet been assessed. According to current literature (Gabriel and Zierath, 2017), anthropometry of our mesomorphic cluster was beneficial for strength and speed performance, whereas anthropometry of the meso-ectomorphic clusters favored endurance performance. We now show that performance differences between anthropometric clusters are in line with their anthropometric pre-dispositions, confirming higher sprint performance in the mesomorphic cluster and higher endurance performance in both meso-ectomorphic clusters (Figure 3).
The two endurance-type clusters revealed small, but unforeseen performance differences.VO 2peak was ∼5 mL kg −1 higher in the short cluster (p = 0.056), whereas PO TT was similar between both clusters. These findings were particularly consistent with performance differences between all-terrain and flat-terrain cyclists (Padilla et al., 1999) and may relate to body size differences. Previous literature showed that smaller cyclists had ∼12.5% higherVO 2peak and ∼11% higher body surface-to-mass ratios compared to larger cyclists, but similaṙ VO 2 -values at submaximal intensities (Swain et al., 1987). Also in our study,VO 2peak and BSA-to-mass ratios were proportional and strongly related (r = 0.82), possibly due to the influence of surface area-to-mass ratio on cardiovascular variables (Mitchell et al., 1992). Therefore, it is likely that the higherVO 2peak in the short cluster was explained by their higher BSA-to-mass ratio. For sprint performance, PO peak was similar, but PO jump was higher in the short cluster. The former result was expected, as percentage lean body mass was comparable between clusters and as similar relative peak power values (W/kg) have been reported for subjects with a different body mass but comparable proportion lean body mass (Maciejczyk et al., 2015). Conversely, in line with isometric downscaling (Bobbert, 2013), the short cluster was expected to produce less, not more, power per body mass during jumping push-off. Smaller animals produce lower power per body mass than larger animals, as they jump with All performance measures were expressed relative to the cyclist's body mass. * significant correlation, p < 0.05. * * significant correlation, p < 0.01. * indicates tendencies, p < 0.10. BMI, body mass index; BSA, body surface area; FA, frontal area; PO TT , mean power during a 15-km time trial;VO 2peak , peak oxygen uptake; PO peak , Wingate peak power; PO jump , squat-jump mean power.
higher accelerations due to their shorter body segments, which hampers build-up of active state and let muscles operate at unfavorably high velocities (Bobbert, 2013). Nonetheless, our smaller cyclists did not show this and may have compensated this disadvantage by their larger proportion of muscle mass. In brief, the small performance differences between mesoectomorphic clusters were likely explained by differences in body size and/or composition. Relationships between anthropometry and performance revealed that high endurance performance was associated with a lean ectomorph physique with small girths and a small frontal area. Lean body composition facilitates prolonged and efficient power production, as illustrated in triathletes (Knechtle et al., 2011). Ectomorph-shaped athletes with small girths are also assumed to have long and slender muscles. Such muscles are metabolically more efficient, as they avoid the negative effect of a large muscle physiological cross-sectional area on oxygen consumption during endurance performance ( Van der Zwaard et al., 2018). The high sprint performance related to cyclists with larger skinfolds, larger girths, and a low frontal area per body mass. Mesomorph athletes with larger girths are assumed to have hypertrophied muscles. Such muscles generally have a large physiological cross-sectional area -induced by musclefiber hypertrophy -which contributes to high sprint performance (Van der Zwaard et al., 2018). The relationship with skinfolds was more surprising, but likely due to a suboptimal body composition of our sprinters. The higher body fat percentage may explain why peak power per body mass was lower in our cyclists with respect to elite track sprinters, even though their absolute sprint performance was the same (Dorel et al., 2005). Taken together, our results show that sprint and endurance performance correspond to the clusters' anthropometric predispositions and highlight the value of interpreting physical performance in light of the athlete's individual anthropometry.

Data Science
Using unsupervised machine learning, we were able to distinguish three subgroups with a distinct anthropometry, which were formed independent of the athlete's cycling specialization. Unsupervised machine-learning techniques use unlabeled data (i.e., data without defined categories or groups) to learn and identify common relationships within the data. Clustering algorithms use these commonalties to divide the data into meaningful subgroups based on similarities in their individual subject characteristics (e.g., anthropometry). On the other hand, supervised machine-learning techniques may also be used to classify athletes, but these require labeled data with pre-defined subgroups (e.g., sports specialization). Therefore, unsupervised clustering algorithms are preferred, as these divide athletes into subgroups solely based on anthropometry and independent of the athlete's sports specialization.
While performing k-means clustering optimization, several assumptions and considerations should be taken into account. K-means clustering operates under the assumptions that clusters should be spherical (circular and clearly separated) and of similar size. Both assumptions were met in this study. As for considerations, firstly, features should be standardized to Z-scores during pre-processing, as no single feature is more important than another. Secondly, all anthropometric dimensions should have the same number of variables to guarantee an equal contribution of dimensions to the formation of subgroups (i.e., three features for body size, shape, and composition). Nonetheless, the same clusters were obtained when clustering without sum of skinfolds and BSA. Thirdly, clustering algorithms require researchers to specify the number of clusters in advance. Note that this could affect cluster validity, and therefore, careful determination of the optimal number of clusters using validity criterions is warranted (Charrad et al., 2014;Scrucca et al., 2016). Lastly, for cluster stability, it is recommended to repeat the clustering procedure several times with different randomly chosen initial cluster centers (e.g., 25 starting partitions per trial). While fulfilling these considerations, we tested cluster stability by repeating the k-means algorithm for 1000 subsequent trials. Results presented the same clusters in every trial (obtained within three iterations), confirming stable anthropometric clusters in the present study. When taking these considerations into account, novel machine-learning clustering algorithms enable grouping of athletes based on their individual anthropometry using an integrative approach of multiple anthropometric dimensions, which provides new perspectives on anthropometry-dependent sports specialization.

Practical Applications
Data science provides scientists with new tools for data analytics in sports. Here, we show that unsupervised machine learning divides cyclists into three anthropometric clusters with distinct differences in body size, shape, and composition, and revealed that sprint and endurance performance of clusters matched their anthropometric predispositions. Clustering may help athletes and coaches to discover how athletes match their sports discipline with their individual anthropometry. Future studies may also perform anthropometric clustering with a larger sample of cyclists competing in all cycling specializations.

CONCLUSION
In this study, we show that unsupervised machine learning enables clustering of athletes based on their individual anthropometry using an integrative approach of multiple anthropometric dimensions. K-means clustering revealed a mesomorphic cluster of sprinters with a high sprint performance and short and tall meso-ectomorphic clusters of pursuit and road cyclists with a high endurance performance. Our clustering results did confirm anthropometry-dependent specialization for sprint-and endurance-type cyclists, whereas clusters distinguished between short and tall endurance-type cyclists (that matched the anthropometry of all-terrain and flat-terrain road cyclists) rather than pursuit and road cyclists. Machine-learning algorithms therefore provide new insights in how athletes match their sports discipline with their individual anthropometry.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Medical Ethics Committee of the VU Medical Center, Amsterdam, Netherlands (NL49060.029.14). The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
SZ, CR, RJ, and JK conceived and designed the work, acquired, analyzed, and interpreted the data, and drafted and revised the manuscript.

FUNDING
This work was supported by the Foundation for Technical Sciences (STW) of the Netherlands Organization for Scientific Research (NWO) under grant 12891. Open access publication was provided by the Vrije Universiteit Amsterdam, Netherlands.