AUTHOR=Garcia-Atutxa Igor , Dudagoitia Barrio Ekaitz , Villanueva-Flores Francisca TITLE=Technical classification of professional cycling stages using unsupervised learning: implications for performance variability JOURNAL=Frontiers in Sports and Active Living VOLUME=Volume 7 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/sports-and-active-living/articles/10.3389/fspor.2025.1661456 DOI=10.3389/fspor.2025.1661456 ISSN=2624-9367 ABSTRACT=IntroductionIn professional cycling, the technical characteristics of race stages significantly influence group dynamics and performance variability among competitors. However, stage classifications have traditionally been subjective, lacking a robust empirical foundation. This study aimed to develop an objective, technical classification of professional cycling stages using unsupervised learning (KMeans) and analyze how these categories relate to collective performance variability, measured by the coefficient of variation (CV) of finish times.MethodsTechnical data and official results from 439 international race stages conducted between 2017 and 2023 were analyzed. The technical variables included distance, total vertical gain, average relative elevation, and percentages of paved and unpaved surfaces.ResultsCluster validation via Bootstrap analysis demonstrated high stability (mean silhouette index = 0.62 ± 0.03), confirming six clearly distinct technical stage groups. Results indicated that stages characterized by higher relative elevation and greater proportions of unpaved surfaces exhibited higher performance variability (higher CV),whereas less technically demanding stages showed lower variability; relative elevation emerged as the strongest predictor of CV (β = 0.42, p < 0.001), followed by unpaved percentage (β = 0.23, p < 0.01), distance (β = 0.18, p < 0.05), and vertical gain (β = 0.11, p < 0.05). Across 2017–2023, a broadly downward pattern in CV was observed, although a pooled linear-trend test with cluster fixed effects did not reach statistical significance (p = 0.315).DiscussionThe lack of physiological data and possible confounding from unmeasured stage and team factors (e.g., weather, stage order, team tactics) limit causal inference. This empirical typology provides a valuable quantitative tool to optimize competitive strategies, plan targeted training based on stage type, and prevent cumulative fatigue and performance-related injuries in high-performance cycling. Future research incorporating direct physiological data is recommended to further explore the relationship between external and internal load in professional cycling.